Detecting DNS Tunneling

May 5

DNS tunneling encodes data into DNS queries and responses to move information past controls that inspect HTTP, SMB, or direct TCP. The client encodes outbound data into subdomain labels of queries to an attacker-controlled domain, and the authoritative server decodes those labels and packs return data into response records. Tools like iodine, dnscat2, and dns2tcp implement this pattern, as do bespoke implants written for the same purpose.

The detection scores behavior, not artifacts. Behavior is how much the client talks, how varied that talk is, and over what stretch of time.

What it looks like

A constructed slice of tunneling queries from one client to one parent domain over about 16 seconds.

  
    
RowTimeSource hostProcessQuery name
32:0110.50.4.27nettrace.exemxq3jk7p9zr4w2nv8f5t1ybd6h.data-relay.example
32:0310.50.4.27nettrace.exe7k2pq8wnz4mj9rxv5ft3bdh1y6c.data-relay.example
32:0510.50.4.27nettrace.exer9j4xm7nzp2qkw5vt8fy1bd6h3c.data-relay.example
32:0710.50.4.27nettrace.exeay8wm3kjy6r1nv4qz9xt2bf5dh7.data-relay.example
32:0910.50.4.27nettrace.exezh2c6bd4yt9fv5xkw7m3rp1qbn8.data-relay.example
32:1110.50.4.27nettrace.exej8nzaqpr3mw7kx5vt4fy9bd2hc6.data-relay.example
32:1310.50.4.27nettrace.execk6h2bd9ft4yv5wxk3m7rp1qzn8.data-relay.example
32:1510.50.4.27nettrace.exeyd2hc6p3r1qzn8jmw7kx5vt4fb9.data-relay.example

  

Row	Time	Source host	Process	Query name
1	14:32:01	10.50.4.27	nettrace.exe	mxq3jk7p9zr4w2nv8f5t1ybd6h.data-relay.example
2	14:32:03	10.50.4.27	nettrace.exe	7k2pq8wnz4mj9rxv5ft3bdh1y6c.data-relay.example
3	14:32:05	10.50.4.27	nettrace.exe	r9j4xm7nzp2qkw5vt8fy1bd6h3c.data-relay.example
4	14:32:07	10.50.4.27	nettrace.exe	ay8wm3kjy6r1nv4qz9xt2bf5dh7.data-relay.example
5	14:32:09	10.50.4.27	nettrace.exe	zh2c6bd4yt9fv5xkw7m3rp1qbn8.data-relay.example
6	14:32:11	10.50.4.27	nettrace.exe	j8nzaqpr3mw7kx5vt4fy9bd2hc6.data-relay.example
7	14:32:13	10.50.4.27	nettrace.exe	ck6h2bd9ft4yv5wxk3m7rp1qzn8.data-relay.example
8	14:32:15	10.50.4.27	nettrace.exe	yd2hc6p3r1qzn8jmw7kx5vt4fb9.data-relay.example

One source host, one process, one parent domain, eight distinct leftmost labels, roughly two seconds between queries. Nothing about data-relay.example itself signals tunneling; the pattern is in the volume, the uniqueness, and the cadence.

What makes it abnormal

  
    
PropertyTunnelingNormal DNS
Queries to one parent domain per hourManyFew to moderate
Distinct subdomain count per hourVery high; near equal to totalLow; same names recur
Distinct-to-total ratioApproaches 1Well below 1
Subdomain lengthLong encoded labelsShort, recognizable names (www, mail, api)
Resolved IPsExternal attacker infrastructureOften mixed; internal services resolve to private space
Session shapeActive across many hoursShort; ends

  

Property	Tunneling	Normal DNS
Queries to one parent domain per hour	Many	Few to moderate
Distinct subdomain count per hour	Very high; near equal to total	Low; same names recur
Distinct-to-total ratio	Approaches 1	Well below 1
Subdomain length	Long encoded labels	Short, recognizable names (www, mail, api)
Resolved IPs	External attacker infrastructure	Often mixed; internal services resolve to private space
Session shape	Active across many hours	Short; ends

Any one of these in isolation matches too many legitimate cases. The combination is what isolates tunneling.

Gauging exfiltration capacity

Tunneling is fundamentally about moving data. To translate query counts into bytes, work from the per-query payload range.

Under RFC 1035, a DNS name is capped at 255 octets in wire format, and each individual label is capped at 63 octets. With a controlled parent suffix around 17 wire octets, such as ns1.example.com, encoded data spread across labels, and base32 encoding at 5 bits per character, the theoretical payload capacity lands around 146 to 150 bytes per query. That value is an optimistic upper bound. Real DNS tunneling tools usually carry less, around 60 bytes per query in production estimates, because they reserve space for sequence numbers, session identifiers, framing, reliability logic, and other protocol overhead.

  
    
Data exfiltratedLower bound queries (best-case ~150 B/query)Upper bound queries (realistic ~60 B/query)
MB~7,000~17,500
MB~70,000~175,000
MB~350,000~875,000
MB~700,000~1.75 million
MB~3.5 million~8.75 million
GB~7 million~18 million

  

Data exfiltrated	Lower bound queries (best-case ~150 B/query)	Upper bound queries (realistic ~60 B/query)
1 MB	~7,000	~17,500
10 MB	~70,000	~175,000
50 MB	~350,000	~875,000
100 MB	~700,000	~1.75 million
500 MB	~3.5 million	~8.75 million
1 GB	~7 million	~18 million

Detect it

Without any prior knowledge of a client’s environment, let the math tell you what is weird.

  
    
StepWhatWhy
Pre-filterDrop queries with leftmost label below ~10 characters. If logs include resolved IPs, also drop queries that resolve only to private/internal addresses (RFC 1918: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16).Removes ordinary recurring DNS and internal lookups so the population stats stay clean.
Group(host, parent domain, process name, hour bucket). Use the process image name or hash, never the process ID.Tunneling is one client, one process, one parent domain, in one window. Without the bucket, totals balloon across history. PIDs change on restart and split a session across keys.
AggregateDistinct subdomain count per group per bucket. Total query count alongside as a sanity check.Each tunnel query encodes different bytes, so distinct count rises with the encoding rate. The two are nearly equal in tunneling.
ScoreFor each group, subtract the bucket's population mean from the group's distinct count, divide by the bucket's population standard deviation. The result is a z-score.The bucket's mean and standard deviation come from the data the query just read. No preset reference needed.
AlertDistinct count >= ~100 AND z >= 2.100 is the floor below which there is no tunneling channel to detect. z >= 2 is the statistical anomaly cutoff against the bucket's population.
RankSort by z-score descending, then distinct count descending.Most anomalous and most impactful surface first.

  

Step	What	Why
1. Pre-filter	Drop queries with leftmost label below ~10 characters. If logs include resolved IPs, also drop queries that resolve only to private/internal addresses (RFC 1918: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16).	Removes ordinary recurring DNS and internal lookups so the population stats stay clean.
2. Group	(host, parent domain, process name, hour bucket). Use the process image name or hash, never the process ID.	Tunneling is one client, one process, one parent domain, in one window. Without the bucket, totals balloon across history. PIDs change on restart and split a session across keys.
3. Aggregate	Distinct subdomain count per group per bucket. Total query count alongside as a sanity check.	Each tunnel query encodes different bytes, so distinct count rises with the encoding rate. The two are nearly equal in tunneling.
4. Score	For each group, subtract the bucket's population mean from the group's distinct count, divide by the bucket's population standard deviation. The result is a z-score.	The bucket's mean and standard deviation come from the data the query just read. No preset reference needed.
5. Alert	Distinct count >= ~100 AND z >= 2.	100 is the floor below which there is no tunneling channel to detect. z >= 2 is the statistical anomaly cutoff against the bucket's population.
6. Rank	Sort by z-score descending, then distinct count descending.	Most anomalous and most impactful surface first.

Method choices

  
ConfidenceMethodOperational shapeWhy this rank
PrimaryStandard z-score against the population in the same hour bucketSingle-pass live query. Population stats computed once per bucket and applied to each group.Self-contained. No maintenance pipeline. With the count floor in place, mean and standard deviation are stable enough for production. Scales because the work happens inside one query.
Stronger but expensiveMedian and median absolute deviation (MAD) z-score against same-host historical bucketsMaintained reference dataset (scheduled job + storage). Cheap live query because per-host medians and MADs are precomputed.Higher signal because the host is its own control. Median and MAD ignore extreme values in the host's history. Costs the maintenance pipeline; only worth it if the primary method is producing too much noise.
AvoidStatic distinct-count thresholds (e.g., "alert if count > 5000")Cheapest computationally.Ages out as soon as a chatty new endpoint agent ships. Cannot account for host diversity. A CI server's "high" is a workstation's catastrophic.

Confidence	Method	Operational shape	Why this rank
Primary	Standard z-score against the population in the same hour bucket	Single-pass live query. Population stats computed once per bucket and applied to each group.	Self-contained. No maintenance pipeline. With the count floor in place, mean and standard deviation are stable enough for production. Scales because the work happens inside one query.
Stronger but expensive	Median and median absolute deviation (MAD) z-score against same-host historical buckets	Maintained reference dataset (scheduled job + storage). Cheap live query because per-host medians and MADs are precomputed.	Higher signal because the host is its own control. Median and MAD ignore extreme values in the host's history. Costs the maintenance pipeline; only worth it if the primary method is producing too much noise.
Avoid	Static distinct-count thresholds (e.g., "alert if count > 5000")	Cheapest computationally.	Ages out as soon as a chatty new endpoint agent ships. Cannot account for host diversity. A CI server's "high" is a workstation's catastrophic.

Enrichments for triage

Once an alert fires, attach context. None of these change the score. Analysts charged with reading your outputs will love you for it.

  
    
Triage valueEnrichmentSourceWhy
HighSample of the actual queries (first N domain names)Live queryDirect visual confirmation that the labels look like encoded data.
HighDuration and velocity (first/last timestamp, queries per second/minute/hour)Live queryWhether activity filled the hour or bursted inside it, plus channel cadence.
HighEstimated exfil capacityLive queryPer-alert version of the gauging table earlier. Distinct count multiplied by the per-query estimate. Drives response priority.
HighPersistence (consecutive anomalous buckets for the same group)Live queryTunneling sessions usually run across many hours. Legitimate bursts produce one anomaly and stop.
ModerateProcess metadata (full image path, signing status)Live queryA signed Microsoft binary, an unsigned native binary, and a known LOLBIN are three different triage paths.
ModerateSubdomain entropy (Shannon entropy of leftmost labels, averaged across the bucket)Live queryShannon entropy measures how unpredictable the characters in a string are. Encoded data scores high; pronounceable names score lower.
ModerateDomain age (WHOIS or passive DNS first-seen)External enrichmentRecently registered domains shift the prior on tunneling.
LowerResolved IP geography and Autonomous System NumberExternal enrichmentColor for triage.
LowerHost metadata (OS, owner, organizational unit, location)Maintained reference datasetRoutine triage info. Gets the alert in front of the right team faster.

  

Triage value	Enrichment	Source	Why
High	Sample of the actual queries (first N domain names)	Live query	Direct visual confirmation that the labels look like encoded data.
High	Duration and velocity (first/last timestamp, queries per second/minute/hour)	Live query	Whether activity filled the hour or bursted inside it, plus channel cadence.
High	Estimated exfil capacity	Live query	Per-alert version of the gauging table earlier. Distinct count multiplied by the per-query estimate. Drives response priority.
High	Persistence (consecutive anomalous buckets for the same group)	Live query	Tunneling sessions usually run across many hours. Legitimate bursts produce one anomaly and stop.
Moderate	Process metadata (full image path, signing status)	Live query	A signed Microsoft binary, an unsigned native binary, and a known LOLBIN are three different triage paths.
Moderate	Subdomain entropy (Shannon entropy of leftmost labels, averaged across the bucket)	Live query	Shannon entropy measures how unpredictable the characters in a string are. Encoded data scores high; pronounceable names score lower.
Moderate	Domain age (WHOIS or passive DNS first-seen)	External enrichment	Recently registered domains shift the prior on tunneling.
Lower	Resolved IP geography and Autonomous System Number	External enrichment	Color for triage.
Lower	Host metadata (OS, owner, organizational unit, location)	Maintained reference dataset	Routine triage info. Gets the alert in front of the right team faster.

Rerbt Tenite