Detecting DNS Tunneling
DNS tunneling encodes data into DNS queries and responses to move information past controls that inspect HTTP, SMB, or direct TCP. The client encodes outbound data into subdomain labels of queries to an attacker-controlled domain, and the authoritative server decodes those labels and packs return data into response records. Tools like iodine, dnscat2, and dns2tcp implement this pattern, as do bespoke implants written for the same purpose.
The detection scores behavior, not artifacts. Behavior is how much the client talks, how varied that talk is, and over what stretch of time.
What it looks like
A constructed slice of tunneling queries from one client to one parent domain over about 16 seconds.
| Row | Time | Source host | Process | Query name |
|---|---|---|---|---|
| 1 | 14:32:01 | 10.50.4.27 | nettrace.exe | mxq3jk7p9zr4w2nv8f5t1ybd6h.data-relay.example |
| 2 | 14:32:03 | 10.50.4.27 | nettrace.exe | 7k2pq8wnz4mj9rxv5ft3bdh1y6c.data-relay.example |
| 3 | 14:32:05 | 10.50.4.27 | nettrace.exe | r9j4xm7nzp2qkw5vt8fy1bd6h3c.data-relay.example |
| 4 | 14:32:07 | 10.50.4.27 | nettrace.exe | ay8wm3kjy6r1nv4qz9xt2bf5dh7.data-relay.example |
| 5 | 14:32:09 | 10.50.4.27 | nettrace.exe | zh2c6bd4yt9fv5xkw7m3rp1qbn8.data-relay.example |
| 6 | 14:32:11 | 10.50.4.27 | nettrace.exe | j8nzaqpr3mw7kx5vt4fy9bd2hc6.data-relay.example |
| 7 | 14:32:13 | 10.50.4.27 | nettrace.exe | ck6h2bd9ft4yv5wxk3m7rp1qzn8.data-relay.example |
| 8 | 14:32:15 | 10.50.4.27 | nettrace.exe | yd2hc6p3r1qzn8jmw7kx5vt4fb9.data-relay.example |
One source host, one process, one parent domain, eight distinct leftmost labels, roughly two seconds between queries. Nothing about data-relay.example itself signals tunneling; the pattern is in the volume, the uniqueness, and the cadence.
What makes it abnormal
| Property | Tunneling | Normal DNS |
|---|---|---|
| Queries to one parent domain per hour | Many | Few to moderate |
| Distinct subdomain count per hour | Very high; near equal to total | Low; same names recur |
| Distinct-to-total ratio | Approaches 1 | Well below 1 |
| Subdomain length | Long encoded labels | Short, recognizable names (www, mail, api) |
| Resolved IPs | External attacker infrastructure | Often mixed; internal services resolve to private space |
| Session shape | Active across many hours | Short; ends |
Any one of these in isolation matches too many legitimate cases. The combination is what isolates tunneling.
Gauging exfiltration capacity
Tunneling is fundamentally about moving data. To translate query counts into bytes, work from the per-query payload range.
Under RFC 1035, a DNS name is capped at 255 octets in wire format, and each individual label is capped at 63 octets. With a controlled parent suffix around 17 wire octets, such as ns1.example.com, encoded data spread across labels, and base32 encoding at 5 bits per character, the theoretical payload capacity lands around 146 to 150 bytes per query. That value is an optimistic upper bound. Real DNS tunneling tools usually carry less, around 60 bytes per query in production estimates, because they reserve space for sequence numbers, session identifiers, framing, reliability logic, and other protocol overhead.
| Data exfiltrated | Lower bound queries (best-case ~150 B/query) | Upper bound queries (realistic ~60 B/query) |
|---|---|---|
| 1 MB | ~7,000 | ~17,500 |
| 10 MB | ~70,000 | ~175,000 |
| 50 MB | ~350,000 | ~875,000 |
| 100 MB | ~700,000 | ~1.75 million |
| 500 MB | ~3.5 million | ~8.75 million |
| 1 GB | ~7 million | ~18 million |
Detect it
Without any prior knowledge of a client’s environment, let the math tell you what is weird.
| Step | What | Why |
|---|---|---|
| 1. Pre-filter | Drop queries with leftmost label below ~10 characters. If logs include resolved IPs, also drop queries that resolve only to private/internal addresses (RFC 1918: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16). | Removes ordinary recurring DNS and internal lookups so the population stats stay clean. |
| 2. Group | (host, parent domain, process name, hour bucket). Use the process image name or hash, never the process ID. | Tunneling is one client, one process, one parent domain, in one window. Without the bucket, totals balloon across history. PIDs change on restart and split a session across keys. |
| 3. Aggregate | Distinct subdomain count per group per bucket. Total query count alongside as a sanity check. | Each tunnel query encodes different bytes, so distinct count rises with the encoding rate. The two are nearly equal in tunneling. |
| 4. Score | For each group, subtract the bucket's population mean from the group's distinct count, divide by the bucket's population standard deviation. The result is a z-score. | The bucket's mean and standard deviation come from the data the query just read. No preset reference needed. |
| 5. Alert | Distinct count >= ~100 AND z >= 2. | 100 is the floor below which there is no tunneling channel to detect. z >= 2 is the statistical anomaly cutoff against the bucket's population. |
| 6. Rank | Sort by z-score descending, then distinct count descending. | Most anomalous and most impactful surface first. |
Method choices
| Confidence | Method | Operational shape | Why this rank |
|---|---|---|---|
| Primary | Standard z-score against the population in the same hour bucket | Single-pass live query. Population stats computed once per bucket and applied to each group. | Self-contained. No maintenance pipeline. With the count floor in place, mean and standard deviation are stable enough for production. Scales because the work happens inside one query. |
| Stronger but expensive | Median and median absolute deviation (MAD) z-score against same-host historical buckets | Maintained reference dataset (scheduled job + storage). Cheap live query because per-host medians and MADs are precomputed. | Higher signal because the host is its own control. Median and MAD ignore extreme values in the host's history. Costs the maintenance pipeline; only worth it if the primary method is producing too much noise. |
| Avoid | Static distinct-count thresholds (e.g., "alert if count > 5000") | Cheapest computationally. | Ages out as soon as a chatty new endpoint agent ships. Cannot account for host diversity. A CI server's "high" is a workstation's catastrophic. |
Enrichments for triage
Once an alert fires, attach context. None of these change the score. Analysts charged with reading your outputs will love you for it.
| Triage value | Enrichment | Source | Why |
|---|---|---|---|
| High | Sample of the actual queries (first N domain names) | Live query | Direct visual confirmation that the labels look like encoded data. |
| High | Duration and velocity (first/last timestamp, queries per second/minute/hour) | Live query | Whether activity filled the hour or bursted inside it, plus channel cadence. |
| High | Estimated exfil capacity | Live query | Per-alert version of the gauging table earlier. Distinct count multiplied by the per-query estimate. Drives response priority. |
| High | Persistence (consecutive anomalous buckets for the same group) | Live query | Tunneling sessions usually run across many hours. Legitimate bursts produce one anomaly and stop. |
| Moderate | Process metadata (full image path, signing status) | Live query | A signed Microsoft binary, an unsigned native binary, and a known LOLBIN are three different triage paths. |
| Moderate | Subdomain entropy (Shannon entropy of leftmost labels, averaged across the bucket) | Live query | Shannon entropy measures how unpredictable the characters in a string are. Encoded data scores high; pronounceable names score lower. |
| Moderate | Domain age (WHOIS or passive DNS first-seen) | External enrichment | Recently registered domains shift the prior on tunneling. |
| Lower | Resolved IP geography and Autonomous System Number | External enrichment | Color for triage. |
| Lower | Host metadata (OS, owner, organizational unit, location) | Maintained reference dataset | Routine triage info. Gets the alert in front of the right team faster. |