Detecting DNS Tunneling

DNS tunneling encodes data into DNS queries and responses to move information past controls that inspect HTTP, SMB, or direct TCP. The client encodes outbound data into subdomain labels of queries to an attacker-controlled domain, and the authoritative server decodes those labels and packs return data into response records. Tools like iodine, dnscat2, and dns2tcp implement this pattern, as do bespoke implants written for the same purpose.

The detection scores behavior, not artifacts. Behavior is how much the client talks, how varied that talk is, and over what stretch of time.

What it looks like

A constructed slice of tunneling queries from one client to one parent domain over about 16 seconds.

RowTimeSource hostProcessQuery name
114:32:0110.50.4.27nettrace.exemxq3jk7p9zr4w2nv8f5t1ybd6h.data-relay.example
214:32:0310.50.4.27nettrace.exe7k2pq8wnz4mj9rxv5ft3bdh1y6c.data-relay.example
314:32:0510.50.4.27nettrace.exer9j4xm7nzp2qkw5vt8fy1bd6h3c.data-relay.example
414:32:0710.50.4.27nettrace.exeay8wm3kjy6r1nv4qz9xt2bf5dh7.data-relay.example
514:32:0910.50.4.27nettrace.exezh2c6bd4yt9fv5xkw7m3rp1qbn8.data-relay.example
614:32:1110.50.4.27nettrace.exej8nzaqpr3mw7kx5vt4fy9bd2hc6.data-relay.example
714:32:1310.50.4.27nettrace.execk6h2bd9ft4yv5wxk3m7rp1qzn8.data-relay.example
814:32:1510.50.4.27nettrace.exeyd2hc6p3r1qzn8jmw7kx5vt4fb9.data-relay.example

One source host, one process, one parent domain, eight distinct leftmost labels, roughly two seconds between queries. Nothing about data-relay.example itself signals tunneling; the pattern is in the volume, the uniqueness, and the cadence.

What makes it abnormal

PropertyTunnelingNormal DNS
Queries to one parent domain per hourManyFew to moderate
Distinct subdomain count per hourVery high; near equal to totalLow; same names recur
Distinct-to-total ratioApproaches 1Well below 1
Subdomain lengthLong encoded labelsShort, recognizable names (www, mail, api)
Resolved IPsExternal attacker infrastructureOften mixed; internal services resolve to private space
Session shapeActive across many hoursShort; ends

Any one of these in isolation matches too many legitimate cases. The combination is what isolates tunneling.

Gauging exfiltration capacity

Tunneling is fundamentally about moving data. To translate query counts into bytes, work from the per-query payload range.

Under RFC 1035, a DNS name is capped at 255 octets in wire format, and each individual label is capped at 63 octets. With a controlled parent suffix around 17 wire octets, such as ns1.example.com, encoded data spread across labels, and base32 encoding at 5 bits per character, the theoretical payload capacity lands around 146 to 150 bytes per query. That value is an optimistic upper bound. Real DNS tunneling tools usually carry less, around 60 bytes per query in production estimates, because they reserve space for sequence numbers, session identifiers, framing, reliability logic, and other protocol overhead.

Data exfiltratedLower bound queries (best-case ~150 B/query)Upper bound queries (realistic ~60 B/query)
1 MB~7,000~17,500
10 MB~70,000~175,000
50 MB~350,000~875,000
100 MB~700,000~1.75 million
500 MB~3.5 million~8.75 million
1 GB~7 million~18 million

Detect it

Without any prior knowledge of a client’s environment, let the math tell you what is weird.

StepWhatWhy
1. Pre-filterDrop queries with leftmost label below ~10 characters. If logs include resolved IPs, also drop queries that resolve only to private/internal addresses (RFC 1918: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16).Removes ordinary recurring DNS and internal lookups so the population stats stay clean.
2. Group(host, parent domain, process name, hour bucket). Use the process image name or hash, never the process ID.Tunneling is one client, one process, one parent domain, in one window. Without the bucket, totals balloon across history. PIDs change on restart and split a session across keys.
3. AggregateDistinct subdomain count per group per bucket. Total query count alongside as a sanity check.Each tunnel query encodes different bytes, so distinct count rises with the encoding rate. The two are nearly equal in tunneling.
4. ScoreFor each group, subtract the bucket's population mean from the group's distinct count, divide by the bucket's population standard deviation. The result is a z-score.The bucket's mean and standard deviation come from the data the query just read. No preset reference needed.
5. AlertDistinct count >= ~100 AND z >= 2.100 is the floor below which there is no tunneling channel to detect. z >= 2 is the statistical anomaly cutoff against the bucket's population.
6. RankSort by z-score descending, then distinct count descending.Most anomalous and most impactful surface first.

Method choices

ConfidenceMethodOperational shapeWhy this rank
PrimaryStandard z-score against the population in the same hour bucketSingle-pass live query. Population stats computed once per bucket and applied to each group.Self-contained. No maintenance pipeline. With the count floor in place, mean and standard deviation are stable enough for production. Scales because the work happens inside one query.
Stronger but expensiveMedian and median absolute deviation (MAD) z-score against same-host historical bucketsMaintained reference dataset (scheduled job + storage). Cheap live query because per-host medians and MADs are precomputed.Higher signal because the host is its own control. Median and MAD ignore extreme values in the host's history. Costs the maintenance pipeline; only worth it if the primary method is producing too much noise.
AvoidStatic distinct-count thresholds (e.g., "alert if count > 5000")Cheapest computationally.Ages out as soon as a chatty new endpoint agent ships. Cannot account for host diversity. A CI server's "high" is a workstation's catastrophic.

Enrichments for triage

Once an alert fires, attach context. None of these change the score. Analysts charged with reading your outputs will love you for it.

Triage valueEnrichmentSourceWhy
HighSample of the actual queries (first N domain names)Live queryDirect visual confirmation that the labels look like encoded data.
HighDuration and velocity (first/last timestamp, queries per second/minute/hour)Live queryWhether activity filled the hour or bursted inside it, plus channel cadence.
HighEstimated exfil capacityLive queryPer-alert version of the gauging table earlier. Distinct count multiplied by the per-query estimate. Drives response priority.
HighPersistence (consecutive anomalous buckets for the same group)Live queryTunneling sessions usually run across many hours. Legitimate bursts produce one anomaly and stop.
ModerateProcess metadata (full image path, signing status)Live queryA signed Microsoft binary, an unsigned native binary, and a known LOLBIN are three different triage paths.
ModerateSubdomain entropy (Shannon entropy of leftmost labels, averaged across the bucket)Live queryShannon entropy measures how unpredictable the characters in a string are. Encoded data scores high; pronounceable names score lower.
ModerateDomain age (WHOIS or passive DNS first-seen)External enrichmentRecently registered domains shift the prior on tunneling.
LowerResolved IP geography and Autonomous System NumberExternal enrichmentColor for triage.
LowerHost metadata (OS, owner, organizational unit, location)Maintained reference datasetRoutine triage info. Gets the alert in front of the right team faster.
Previous
Previous

Detecting Nested PowerShell Encoding