| Internet-Draft | DNS Work Amplification | July 2026 |
| Duan, et al. | Expires 2 January 2027 | [Page] |
Recursive DNS resolvers are expected to bound the amount of "work" performed when answering a client query. DNS specifications discuss such bounds but leave key concepts, accounting rules, and safe limits underspecified. This leeway has led to divergent implementations and makes systematic bounding difficult, contributing to denial‑of‑service attacks that amplify resolver work and harm different DNS components. This document describes the problem space of DNS work amplification. It defines terminology for discussing work performed during a single resolution instance, develops a taxonomy of work‑amplification vulnerabilities along resource and mechanism axes, and analyses where existing DNS specifications leave amplification‑relevant behavior underspecified. The document is descriptive: it does not itself specify protocol changes or operational requirements.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 2 January 2027.¶
Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
DNS specifications have long warned that resolvers need to bound the amount of work performed per client query in order to avoid runaway loops and chain reactions (see [RFC1034] and [RFC1035]). In practice, a single resolution can exercise multiple interacting mechanisms —- alias processing, delegation traversal, retries and failover, caching, DNSSEC validation, etc. Many RFCs specify functional behavior but leave resource‑relevant properties underspecified or inconsistently defined: which internal actions are counted toward a bound, whether and when counters reset, what defaults are safe, and what constitutes relevant data for resolvers to process.¶
Over the past several years, a series of disclosures and research results have shown that protocol‑compliant but adversarially crafted authoritative zones and query/response patterns can drive disproportionate resource consumption during resolution, stressing DNS infrastructure. Examples include published attacks such as NXNSAttack [AFEK2020-NXNSATTACK], TsuNAME [MOURA2021-TSUNAME], KeyTrap [HEFTRIG2024-KEYTRAP], and CAMP [DUAN2024-CAMP]. The underlying amplification potential is largely inherent to DNS design: resolution may legitimately require multiple upstream transactions, may traverse delegation structures, may restart at aliases, and may validate cryptographic proofs, etc. Specification gaps and non‑uniform guidance further increase this underlying complexity by making worst‑case behavior harder to predict and by encouraging divergent accounting and limit choices across implementations.¶
This document addresses this class of availability vulnerability, referred to here as work amplification. A work‑amplification vulnerability allows an attacker to trigger disproportionate resource consumption —- such as outbound resolver queries, on‑wire bytes, CPU, or resolver state —- during the processing of a single client query (i.e., one resolution instance). In this document, work is the resources consumed while processing one resolution instance and is treated as a vector of resource dimensions (Section 4).¶
Work amplification differs from classic traffic reflection and amplification attacks. Reflection primarily amplifies on‑wire bytes delivered to an external victim and typically relies on source‑address spoofing; mitigations such as anti‑spoofing and response rate limiting address that threat. Work amplification primarily stresses DNS components and nearby infrastructure by amplifying their internal work (e.g., query load, validation CPU, and per‑transaction state), and can be effective even without spoofing. Reflection mitigations are therefore insufficient on their own; bounding work requires explicit budgets, accounting rules, and operational anomaly handling. This document is complementary to existing BCP guidance for reflection mitigation (e.g., [BCP38], [BCP84], and [BCP140]).¶
The goals of this document are:¶
to define a terminology for discussing DNS work amplification, including the notions of resolution instance, resolver query, work budget, and budget exhaustion;¶
to develop a two‑axis taxonomy of work‑amplification vulnerabilities along resource and mechanism dimensions; and¶
to identify where existing DNS specifications leave amplification‑relevant behavior (such as bounding, accounting, and safe defaults) underspecified, without prescribing remedies.¶
Protocol fixes and operational mitigation guidance are out of scope
for this document. Those topics are addressed in the companion
document draft-dnsop-work-amplification-bcp.¶
This document does not attempt to catalogue implementation bugs that rely on malformed packets or memory corruption. Those issues are important, but they are orthogonal to the protocol‑level behaviors considered here.¶
Unless otherwise noted, the terms in this section are used in the sense described in [RFC9499].¶
Client query: A DNS query received by a resolver from a stub resolver or other downstream client.¶
Resolver: A DNS component that performs resolution on behalf of a client query. This includes full recursive resolvers and forwarders that themselves perform recursion.¶
Authoritative server: A DNS server that is authoritative for at least one zone and answers queries for names within that zone.¶
Resolution instance: The entire resolution process undertaken by a resolver in response to a single client query. A resolution instance encompasses all resolver queries, alias-induced restarts, delegation traversal steps, auxiliary lookups (e.g., nameserver address fetches and DNSSEC chain construction), retries, and transport fallback operations.¶
A resolver can coalesce multiple identical client queries (including client retransmissions) onto a single in-flight resolution instance; in that case the work budget applies to the shared instance.¶
Resolver query: A DNS query emitted by a resolver to another resolver or authoritative server during a resolution instance. This includes queries for nameserver addresses (A/AAAA), DNSKEY/DS/NSEC* records, and any queries triggered by extension processing. Retransmissions and retries are counted as separate resolver queries.¶
Restart: The act of continuing a resolution instance for the same client query while changing the current target name due to alias processing (e.g., following a CNAME or DNAME). In RFC 1034 terms, this corresponds to the resolver algorithm's alias handling: substituting a new name for the current name being resolved and re-entering the search procedure at the new target name. The resolver can reuse cached state across restarts, but each restart represents additional work that is accounted for.¶
Delegation step: A transition within a resolution instance where the resolver processes a referral at a zone cut and updates its candidate authoritative server set for a child zone. In RFC 1034 terms, this corresponds to processing an NS RRset and updating the resolver's list of candidate servers ("SLIST") for the next zone cut. Each delegation step represents additional work that is accounted for.¶
Work budget: A per-resolution-instance set of counters that collectively bound the amount of work the resolver performs.¶
Budget exhaustion: The event that occurs when any work-budget counter reaches its configured limit. When budget exhaustion occurs, the resolver stops expanding the resolution instance. The resolver typically returns an appropriate error to the client (typically SERVFAIL); under severe overload, a resolver may shed load by dropping responses.¶
Max-work event: A client query and associated upstream response pattern that causes a resolver to exhaust its work budget under the configured limits.¶
Work-amplification vulnerability: A protocol behavior or feature interaction that allows an attacker, via protocol‑compliant authoritative data and/or query patterns, to cause disproportionate resource consumption (queries, bytes, CPU, or state) while processing a single client query. Amplification potential is often exacerbated when specifications omit, loosen, or inconsistently define the bounds and accounting rules needed to make worst‑case behavior predictable.¶
Work amplification becomes exploitable when an adversary can both (1) influence the authoritative data and server behaviors consulted during a resolution instance so that a resolver following protocol rules performs unusually large work, and (2) trigger resolution instances that exercise those behaviors at one or more target resolvers.¶
Accordingly, this document assumes an attacker with the following core capabilities:¶
Adversarial authoritative inputs: the attacker can publish and serve protocol‑compliant DNS data for one or more zones and can choose response behaviors that remain within protocol syntax. This includes crafting delegations, aliases, glue (or the absence of glue), DNSSEC material, and record sets that induce additional dependency lookups. It also includes response patterns that influence resolver control flow (e.g., selective omission of optional data, returning referrals, and controlled use of truncation) while remaining syntactically valid.¶
Triggering resolution: the attacker can cause client queries to be sent to one or more target resolvers for names whose resolution depends on the adversarial inputs. This may be achieved by sending queries directly, embedding attacker‑controlled names in web resources or email, or leveraging clients that use the target resolvers.¶
Some amplification vectors can also be triggered by querying existing misconfigured zones; attacker‑controlled zones make worst‑case behavior reliable and repeatable.¶
Depending on the scenario, the attacker may additionally have:¶
Distributed clients: enough sources to sustain a desired request rate or to evade per‑client limits.¶
Multiple cooperating authoritative servers/zones: the ability to spread an amplification pattern across multiple zones or servers, which can bypass some per‑upstream operational limits.¶
Source‑address spoofing: while not required for work amplification, spoofing can hinder attribution and can bypass some per‑client defences in deployments where spoofing remains possible.¶
Work amplification can be exacerbated by packet loss and delay (benign or adversarial), because retries and failover can amplify work. This document does not assume an on‑path adversary; on‑path manipulation is out of scope except insofar as it resembles loss/delay that a resolver may experience in normal operation.¶
Off-path side-channel attacks against DNS resolution (e.g., SAD DNS [MAN2020-SADDNS]) are out of scope as an independent threat, but amplified resolution work can compose with such side channels by enlarging the observable surface (more queries, retries, and state transitions) that off-path observers can probe.¶
Depending on the attack construction, different DNS components can be the primary victim (the component whose resources are intentionally exhausted). Other components may act as amplifiers or experience collateral damage.¶
This document distinguishes the following common victim roles:¶
Target resolver (primary victim): a recursive or forwarding resolver is directly targeted. The attacker sends client queries that cause the resolver to consume disproportionate CPU (including DNSSEC validation), recursion state, cache resources, and/or outbound query capacity.¶
Target authoritative server (primary victim): an authoritative server is targeted by inducing many resolvers to generate amplified resolver queries toward it. The attacker typically uses attacker‑controlled authoritative infrastructure to publish an amplification pattern that requires resolvers to fetch many dependencies (e.g., nameserver‑address lookups) from the victim.¶
Uninvolved third‑party server (primary victim): a server that is not under the attacker’s control and not otherwise part of the intended resolution can become the primary victim if resolvers are induced to direct dependency lookups to it by name (e.g., NS targets under the victim’s zone) or by address (e.g., glue addresses pointing at the victim). The victim is often an authoritative server but can also be any DNS‑speaking service reachable at the induced address.¶
Adjacent infrastructure and clients (collateral victims): work amplification can saturate links, overload stateful middleboxes (e.g., NATs, firewalls, load balancers), increase latency, and cause budget‑exhaustion errors for benign clients.¶
The most common primary targets are resolvers that perform the amplified resolution work and authoritative servers that receive the resulting amplified resolver queries. In these cases, we refer to the attack as self-amplification. This distinguishes the attack from conventional reflective amplification, where third-party servers send amplified responses to a spoofed victim address.¶
The taxonomy in Section 4 abstracts attacks into resource and mechanism categories. This section provides non‑exhaustive end‑to‑end scenarios, organised by the primary victim role (Section 3.1). Many scenarios have secondary victims (e.g., authoritative servers receive expanded queries).¶
Target resolver (resolver as primary victim). An attacker operates one or more authoritative zones and publishes well‑formed content that forces a resolver into expensive control flow. By inducing clients to resolve names in those zones (or querying directly), the attacker causes the target resolver to follow alias chains, traverse multiple delegations, perform QNAME‑minimisation probes, fetch nameserver addresses, and validate DNSSEC material. The resolver expands the resolution instance until the configured work budgets are exhausted, degrading service for other clients.¶
Target authoritative server (authoritative as primary victim). An
attacker publishes a delegation in an attacker‑controlled zone whose NS
targets are names under the victim authoritative server’s zone (e.g.,
ns1.victim.example, ns2.victim.example, ...), with little or no glue.
The attacker then sends initiating client queries to many resolvers.
Each resolver must resolve the NS target names to addresses,
sending many A/AAAA resolver queries to the victim authoritative server.
Aggregated across many resolvers, the victim receives amplified query load
that is disproportionate to the attacker’s initiating traffic.¶
Uninvolved third‑party server and infrastructure stress. This scenario resembles Scenario 2, but the amplified dependency traffic is directed at an external victim that is not authoritative for the attacker’s zone. The attacker induces resolvers to send dependency traffic to a victim that is not under the attacker’s control, for example by providing in‑bailiwick glue that points to a victim IP address or by referencing many random names under a victim zone. The induced traffic can overload the victim server even though its responses do not influence the attacker‑ controlled zone’s correctness. When combined with loss or delay, retries, failover, and transport fallback can further increase outbound queries, wire bytes, and connection state, stressing links and stateful devices along the path.¶
This section defines the two‑axis classification used throughout this document. Each vulnerability is labelled with one or more resource amplification labels and mechanism labels as defined below. By separating what resource is amplified from how it is amplified, we avoid overlapping categories and provide a stable basis for both protocol clarifications and operational mitigations.¶
The two axes are intended to be orthogonal: the resource axis (R) describes what resource is amplified, while the mechanism axis (M) describes how amplification is induced. Because the same mechanism can amplify multiple resources (for example, delegation traversal can drive both resolver queries and wire bytes), individual vulnerabilities are typically labelled with a non-empty subset of each axis. Treating the two axes independently keeps the categories non-overlapping at the axis level even when individual labels co-occur in concrete attacks.¶
The label sets in the following two subsections are not claimed to be exhaustive. An issue is included when it has been observed in deployed resolver behavior, demonstrated in published research, or discussed in prior IETF or operational work; emerging mechanisms and unobserved compositions are likely to require additional labels in future revisions. The taxonomy is descriptive, not prescriptive: the goal is to give readers a stable vocabulary for discussing work-amplification vulnerabilities rather than to enumerate every possible vector.¶
R1 - Resolver-query amplification: A client query triggers an excessive number of resolver queries during a resolution instance, including nameserver-address lookups, DNSSEC-related lookups, and transport retries. R1 covers both the count of resolver queries and the concurrency of outstanding resolver queries.¶
R2 - Wire bytes amplification: A client query results in disproportionate bytes on the wire, either upstream (resolver to authoritative) or downstream (authoritative to resolver or resolver to client). Examples include responses with large RRsets, many RRsets, repeated retries and TCP fallback due to truncation.¶
R3 - Computation amplification: The amount of CPU or wall‑clock time spent by the resolver is disproportionately high. Parsing large responses, following complex rewrite patterns and cryptographic DNSSEC validations drive R3.¶
R4 - State amplification: A query causes the resolver to allocate disproportionate in‑memory state—either per‑resolution structures (e.g., per-instance dependency tracking) or longer-lived cache state. R4 often accompanies R1-R3 when large numbers of RRsets are processed or stored.¶
NOTE: R1 and R2 are intentionally distinct axes. Many attacks amplify both, but they can occur independently: a single oversized response can trigger R2 without many resolver queries, while many small resolver queries (e.g., for dependencies) can trigger R1 without large responses. Implementations typically account for both dimensions.¶
M1 - Alias and restart semantics: Amplification enabled by aliasing and restart behavior. CNAME and DNAME chains can trigger multiple restarts. Common underspecified aspects include limits on chain length, loop detection, and how work budgets are applied across restarts.¶
M2 - Delegation traversal and nameserver discovery: Issues arising from delegation and referral processing. Examples include glueless referrals, out‑of‑bailiwick nameserver names, and deep or cyclic delegation structures. Missing limits on delegation depth or the number of nameserver addresses to fetch contribute directly to amplification. This label also covers query-name probing behavior during delegation traversal, such as QNAME minimisation ([RFC9156]), which can increase resolver queries for deep names if not bounded.¶
M3 - Server selection, retries and concurrency: Amplification from query retry and failover logic. Underspecified retry limits and concurrency policies can cause explosive growth in outstanding queries when resolvers attempt to contact multiple servers or fall back to other transports or protocols.¶
M4 - Response content multiplicity and section semantics: Amplification enabled by large or numerous RRsets and by the handling of response sections (answer, authority, additional). Large RRsets, many RRsets, or records with many RDATA entries can inflate parsing, validation, and caching work.¶
M5 - DNSSEC validation and proof semantics: DNSSEC introduces additional queries (to fetch DNSKEY and DS RRsets) and cryptographic validation work. Underspecified aspects include whether DNSSEC queries count toward budgets, limits on cryptographic verification work, and bounding NSEC3 parameters and candidate key sets to mitigate algorithmic-complexity DNSSEC attacks.¶
M6 - Cache admission, relevance and maintenance: The rules governing what a resolver may cache, for how long, and when to evict entries are only partially defined. Attackers can fill caches with irrelevant or malicious records, causing memory exhaustion or eviction of useful data. Negative caching and bailiwick rules are also relevant here.¶
M7 - Extension‑induced secondary resolution: New DNS extensions sometimes implicitly trigger additional lookups or state (for example, records that reference additional names that must be resolved, or transport and discovery mechanisms that initiate extra exchanges). If resource bounds are not clearly specified, these features can open new work‑amplification vectors.¶
Each vulnerability analysed in Section 5 is annotated with one or more resource and mechanism labels along with typical victims and preconditions. This labelling enables readers to map specification ambiguities to corresponding resource budgets and to determine which mitigations are most appropriate.¶
This subsection provides an informative mapping from a few well-known work-amplification attacks to the taxonomy in this document. The mapping is illustrative rather than exhaustive; individual attacks often have variants that exercise additional mechanisms.¶
| Attack (informative name) | Typical resource labels | Typical mechanism labels |
|---|---|---|
| NXNSAttack | R1, R2 | M2, M3 |
| TsuNAME | R1 | M1, M2 |
| KeyTrap | R3 | M5 |
| CAMP | R1-4 | M1-4 |
This section connects the taxonomy in Section 4 to existing DNS specifications. Work amplification is often inherent to DNS design: recursive resolution deliberately combines delegation traversal, alias following, caching, retries, transport fallback, and (in many deployments) DNSSEC validation. Those mechanisms can legitimately require substantial work in some cases.¶
The focus here is narrower: it identifies where specification text and non-uniform guidance can make worst‑case work harder to predict and harder to bound, and therefore make implementation and operational mitigations less systematic. Concretely, this section highlights places where RFCs:¶
do not clearly identify which work needs to be bounded for a single client query;¶
do not define what counts toward those bounds, or when counters reset;¶
do not provide (or align on) safe default limits; or¶
introduce feature interactions that change work without calling out the associated bounding and accounting implications.¶
DNS behavior relevant to work amplification is specified across a large set of RFCs, and relevant text is often scattered. Reproducing all relevant material verbatim would be both impractical and potentially misleading without the original context.¶
For each issue below, this document therefore provides two reference lists:¶
Primary normative anchors (complete within scope): the RFC sections that most directly define the base protocol behavior for the issue. These are the first places implementers and reviewers should consult.¶
Additional related documents (non‑exhaustive): other RFCs, BCPs, and active Internet‑Drafts that refine the behavior, introduce related mechanisms, or provide operational guidance.¶
Each issue discussion then explains (a) what the anchor text requires, (b) what is left to implementation or local policy, and (c) how that leeway can exacerbate the amplification vectors in Section 4.¶
An issue (A-*) is included in this section when published specification text leaves leeway in bounding, accounting, or safe defaults that directly enables one or more of the resource and mechanism vectors catalogued in Section 4. Issues that arise solely from implementation defects (e.g., bugs in parsing or memory handling) are excluded; the focus is on specification-level ambiguity that propagates predictably across conformant implementations.¶
The issues are grouped to mirror the mechanism axis (M1-M7) of the taxonomy: A-1 covers the cross-cutting work-budget framing, while A-2 through A-8 align with the alias, delegation, retry, response-content, DNSSEC, caching, and extension mechanisms respectively. This grouping lets readers move between an attack's mechanism labels and the specification anchors most relevant to clarifying it.¶
The list of issues is not claimed to be exhaustive. As with the label sets in Section 4, additional issues are likely to be identified as new DNS extensions are deployed and as further work-amplification research is published; the issues here are the ones for which sufficient deployment or research evidence currently exists to motivate inclusion.¶
Issue identifier: A‑1¶
Primary normative anchors (complete within scope):¶
[RFC1034] §5.3.3 — describes the resolver top‑level algorithm and notes that resolvers should bound the work performed for a single request, but does not formalise a complete accounting model.¶
[RFC1035] §7.1-§7.2 — sketch resolver implementation considerations, including a per‑request work counter (§7.1) and query sending, server selection, and retry handling (§7.2), while leaving most accounting choices to implementations.¶
Additional related documents (non‑exhaustive):¶
[RFC2181] (RRset handling and ranking), [RFC2308] (negative caching), [RFC4033]-[RFC4035] (DNSSEC), [RFC5155] and [RFC9276] (NSEC3 and operational guidance), [RFC7766] (DNS over TCP), [RFC8198] (aggressive use of DNSSEC‑validated cache), [RFC8767] (serve‑stale), [RFC9156] (QNAME minimisation), and [I-D.fujiwara-dnsop-dns-upper-limit-values] (upper‑limit values).¶
What is specified:¶
The base DNS specifications ([RFC1034] §5.3.3 and [RFC1035] §7.1-§7.2) acknowledge that resolvers must place bounds on the work performed on behalf of a client query and sketch one approach (e.g., a per‑request counter). They also define multiple mechanisms that can legitimately trigger additional resolver queries and processing.¶
What is left to implementation or policy:¶
The base specifications do not define a concrete accounting model for work, including:¶
which internal actions are charged to the same budget as the client query (e.g., nameserver address discovery, DNSSEC validation fetches, retries and failover, transport fallback, QNAME‑minimisation probes);¶
whether and when counters reset (e.g., on alias‑induced restarts);¶
whether independent limits compose safely (e.g., separate limits for restarts, depth, and outstanding queries without an overall envelope); and¶
what safe defaults are for modern resolver deployments.¶
These choices materially affect worst‑case work and therefore vary across implementations.¶
Work‑amplification relevance:¶
When budgets are incomplete or inconsistently applied, an attacker can force expensive combinations of mechanisms (e.g., restarts * delegation traversal * retries * validation), increasing R1-R4 and enabling M1-M7.¶
Issue identifier: A‑2¶
Primary normative anchors (complete within scope):¶
[RFC1034] §3.6.2 and §5.3.3 — describe CNAME processing and the resolver‑algorithm restart step, but do not bound chain length or loop‑handling work as part of a single client query's budget.¶
[RFC1035] §3.3.1 — defines CNAME RR semantics without specifying resolver‑side restart accounting.¶
[RFC6672] §3 — defines DNAME substitution semantics and CNAME synthesis without specifying combined alias‑chain bounds.¶
Additional related documents (non‑exhaustive):¶
[RFC2181] (clarifications), [RFC8020] (NXDOMAIN termination under empty non-terminals), and resolver‑implementation guidance and limits such as [I-D.fujiwara-dnsop-dns-upper-limit-values].¶
What is specified:¶
Alias records (CNAME and DNAME) intentionally redirect resolution and can cause the resolver to restart its search for a different name while still answering the original client query.¶
What is left to implementation or policy:¶
Specifications do not provide consistent, protocol‑level guidance on:¶
bounding alias chain length and alias‑induced restarts as part of a single client query’s work;¶
detecting and handling alias loops across administrative boundaries;¶
how restarts interact with other bounds (depth, outstanding queries, timeouts); and¶
whether partial state can be reused safely across restarts (which in practice affects whether work is repeated).¶
Work‑amplification relevance:¶
Alias chains and loops can multiply resolver queries (R1) and state tracking (R4), and can be composed with delegation traversal (M2) and retries (M3). Ambiguous or non‑uniform restart accounting can make these worst cases larger. Cyclic CNAME and NS dependencies, as in the TsuNAME phenomenon [MOURA2021-TSUNAME], illustrate how alias and delegation interactions in deployed zones can amplify queries to authoritative servers.¶
Issue identifier: A‑3¶
Primary normative anchors (complete within scope):¶
[RFC1034] §4.3 and §5.3.3 — describe the delegation model and iterative search, including nameserver‑address discovery, but do not bound dependent‑lookup fan‑out per client query.¶
[RFC1035] §3.3.11 (NS RR semantics) and §4.1 (message sections) — define the on‑wire encoding of referrals and additional‑section glue without prescribing resolver‑side admission or fetch limits.¶
[RFC2181] §5.4.1 and §6 — clarify RRset ranking, zone cuts, and glue handling, but leave out‑of‑bailiwick and missing‑glue policy to resolver implementations.¶
Additional related documents (non‑exhaustive):¶
[RFC9156] (QNAME minimisation) and operational guidance on bailiwick and glue handling (e.g., [RFC2181] and DNSOP operational documents).¶
What is specified:¶
Delegation traversal is fundamental to DNS. Referrals provide NS RRsets, and resolvers may need to obtain IP addresses for nameservers (via glue or by resolving nameserver names).¶
What is left to implementation or policy:¶
Specifications leave substantial leeway that affects worst‑case work, including:¶
how aggressively to pursue nameserver address discovery when glue is missing or inconsistent (including out‑of‑bailiwick and cyclic cases);¶
how to bound and account for the resulting dependent lookups as part of the client query's budget;¶
how to bound "fan‑out" across multiple nameservers and multiple referral steps; and¶
how QNAME minimisation changes the number and structure of delegation queries, and how those extra queries are bounded and coalesced.¶
Work‑amplification relevance:¶
Delegation traversal and nameserver discovery are a primary source of resolver‑query fan‑out (R1) and wire bytes (R2) and interact strongly with restarts (M1) and retries/failover (M3). NXNSAttack [AFEK2020-NXNSATTACK] is a concrete instance of this fan‑out, in which glueless delegations to many nameserver names force resolvers to issue dependent A/AAAA queries. Such delegation fan‑out can compose with alias restarts (M1), retries (M3), and DNSSEC validation (M5) to reach high query counts per client query.¶
Issue identifier: A‑4¶
Primary normative anchors (complete within scope):¶
[RFC1034] §5.3.3 — describes server selection and the resolver's behavior after timeouts in the iterative algorithm, but does not prescribe concrete concurrency or retry budgets.¶
[RFC1035] §7.2 — provides resolver implementation considerations including request retry handling, while leaving timing, fan‑out, and failover policy to implementations.¶
Additional related documents (non‑exhaustive):¶
What is specified:¶
Resolvers retry queries on timeouts and may query multiple servers for robustness ([RFC1034] §5.3.3 and [RFC1035] §7.2). Modern resolvers also implement parallelism for performance and transport fallback (e.g., UDP to TCP).¶
What is left to implementation or policy:¶
Specifications do not define interoperable guidance for:¶
concurrency limits (outstanding resolver queries) as part of a per‑query budget;¶
retry timing, backoff, and failover policies under adversarial load;¶
whether and how to coalesce identical in‑flight work across multiple client queries; and¶
how to bound transport fallback behavior (e.g., repeated TCP fallback to unresponsive servers).¶
Work‑amplification relevance:¶
Retry and failover logic can turn loss, delay, or adversarial response patterns into additional resolver queries (R1) and state (R4). Under stress, naive retry policies can also cause "query storms" that amplify impact on upstream authoritative infrastructure.¶
Issue identifier: A‑5¶
Primary normative anchors (complete within scope):¶
[RFC1035] §4.1-§4.2 — define DNS message format and the semantics of Question, Answer, Authority, and Additional sections, but do not bound per‑response processing work.¶
[RFC2181] §5.4.1 — defines RRset ranking and credibility rules but leaves cache‑admission limits to implementations.¶
[RFC2181] §6 — clarifies zone cuts and glue handling without bounding admission of large or repeated additional‑section data.¶
Additional related documents (non‑exhaustive):¶
[RFC6891] (EDNS(0)), [RFC8482] (minimal ANY responses), [RFC7766] (TCP fallback), and DNSOP guidance on response minimisation and truncation behavior.¶
What is specified:¶
DNS messages have Answer, Authority, and Additional sections ([RFC1035] §4.1-§4.2), and specifications describe how resolvers interpret and cache data from those sections, including glue handling and basic credibility rules ([RFC2181] §5.4.1 and §6).¶
What is left to implementation or policy:¶
Specifications do not provide consistent guidance on:¶
bounding the amount of processing for large but well‑formed responses (record multiplicity, section size, name compression effects);¶
which kinds of Additional or Authority data are "relevant enough" to admit into cache or to act upon during the current resolution;¶
how to treat repeated or inconsistent data across multiple responses in the same resolution instance; and¶
how EDNS(0) buffer sizes and response truncation interact with query retry and TCP fallback budgets.¶
Work‑amplification relevance:¶
Large responses and permissive acceptance rules increase bytes processed and stored (R2/R4) and can induce additional follow‑up work (M4/M6), especially when combined with caching behavior.¶
Issue identifier: A‑6¶
Primary normative anchors (complete within scope):¶
[RFC4033] §3, [RFC4034] §2 (DNSKEY), §3 (RRSIG), and §5 (DS), and [RFC4035] §5 — define the DNSSEC trust model and the validator's chain‑of‑trust construction, which drives additional fetches and cryptographic work without bounding per‑query validation cost.¶
[RFC5155] §5 and §10.3 — define NSEC3 hash calculation and the iteration‑count guidance that together determine validation cost.¶
[RFC9276] — provides operational guidance for NSEC3 iteration counts but does not normatively bound per‑query validation work.¶
Additional related documents (non‑exhaustive):¶
[RFC6840] (DNSSEC operational practices), [RFC8198] (aggressive use of DNSSEC‑validated cache), and documents describing algorithmic complexity attacks against DNSSEC validation (informative).¶
What is specified:¶
Validating resolvers ([RFC4035] §5) perform additional fetches (e.g., DNSKEY/DS/NSEC*) and cryptographic work (e.g., RRSIG verification) to establish authenticity and denial of existence.¶
What is left to implementation or policy:¶
DNSSEC specifications and operational guidance do not fully align on:¶
how much validation work is acceptable per client query and how to account for it in a unified budget;¶
safe default bounds for attacker‑controlled parameters (e.g., number of candidate keys, signature multiplicity, and NSEC3 iteration cost); and¶
how validation work interacts with caching (including whether expensive intermediate results are cached and reused).¶
Work‑amplification relevance:¶
DNSSEC validation can dominate CPU work (R3) and can drive additional resolver queries (R1). Key‑set amplification, key‑tag collision, and expensive denial‑of‑existence proofs are examples of M5‑style vectors; KeyTrap [HEFTRIG2024-KEYTRAP] illustrates the algorithmic‑complexity dimension of M5.¶
Issue identifier: A‑7¶
Primary normative anchors (complete within scope):¶
[RFC2181] §5.4.1-§5.4.2 — define RRset credibility ranking and ordering rules used during cache admission, but leave admission limits and replacement policy to implementations.¶
[RFC2308] §3 and §5 — define negative caching of name and type errors, without prescribing detailed eviction or replacement policy.¶
Additional related documents (non‑exhaustive):¶
[RFC1034] (caching discussion), [RFC8198] (aggressive use), [RFC8767] (serve‑stale), and operational guidance on cache sizing and eviction behavior.¶
What is specified:¶
Caching is fundamental for performance. RFCs ([RFC2181] §5.4.1-§5.4.2 and [RFC2308] §3, §5) describe credibility and negative caching and leave detailed cache policy (admission, eviction, replacement) to implementations.¶
What is left to implementation or policy:¶
Specifications do not normatively define:¶
what constitutes relevant data to process and cache for a given response and recursion context (especially for Additional‑section data);¶
limits on cache insertion work per response (e.g., record multiplicity across names and types);¶
how to account for cache maintenance work (eviction, replacement, memory pressure) in the presence of adversarially crafted names; and¶
how to prevent attacker‑controlled names from forcing repeated cache misses (e.g., via cache flooding and eviction attacks).¶
Work‑amplification relevance:¶
Cache admission and maintenance affect state and memory (R4) and can turn a small number of adversarial responses into sustained work via repeated misses, churn, and eviction overhead (M6/M7).¶
In this document, an extension (in the A‑8 sense) is a DNS mechanism that induces additional outgoing resolver queries beyond the direct resolution path. Examples include HTTPS/SVCB target resolution (where processing a record triggers follow‑up lookups for the target name) and custom EDNS options or new RR types whose processing triggers secondary lookups. Core resolution mechanisms that already have a dedicated issue are explicitly out of A‑8 scope: CNAME/DNAME chasing is discussed under A‑2 (alias processing), delegation traversal — including glueless referrals and nameserver fan‑out — is discussed under A‑3, and DNSSEC chain‑of‑trust construction (including RRSIG/DS fetches) is discussed under A‑6. A‑8 concerns the additional work an extension introduces on top of those core mechanisms.¶
Issue identifier: A‑8¶
Primary normative anchors (complete within scope):¶
[RFC6891] §6 — defines the EDNS(0) OPT pseudo‑RR and option‑code framework that extension specifications use, but does not bound the work induced by individual options.¶
Additional related documents (non‑exhaustive):¶
[RFC9460] (SVCB/HTTPS records), [RFC9156] (QNAME minimisation), [RFC8767] (serve‑stale), [RFC7766] (TCP), and future DNSOP extension documents.¶
What is specified:¶
DNS has an extension culture: the EDNS(0) framework ([RFC6891] §6) and new RR types frequently add processing steps and sometimes require additional lookups or validation.¶
What is left to implementation or policy:¶
Extension specifications do not always make explicit:¶
whether a feature introduces additional resolver queries (including "secondary" resolutions triggered by record processing);¶
how those queries are charged to per‑query budgets and concurrency limits; and¶
what safe defaults and failure modes are when those budgets are exhausted.¶
Work‑amplification relevance:¶
Unaccounted extension‑induced work can bypass otherwise effective bounds (M7), and can be composed with other mechanisms (e.g., SVCB/HTTPS address fetches plus retries, or privacy probes plus delegation fan‑out).¶
This entire document concerns security of the DNS against work‑amplification attacks. By describing terminology, a taxonomy, and specification gaps that affect work amplification, this document aims to clarify the attack surface. Readers translating this analysis into mitigations face trade‑offs:¶
Budget exhaustion may cause resolvers to return SERVFAIL for legitimate complex queries; tuning limits and monitoring for false positives involves trade‑offs between security and availability.¶
Strict alias and delegation limits may break badly configured but non‑malicious zones; correcting such configurations at the zone is generally preferable to weakening limits.¶
Per‑upstream fairness and rate limiting may allow a malicious nameserver to consume its share of resolver resources while legitimate queries are slowed; fairness algorithms benefit from monitoring and adjustment.¶
Some mitigations, such as dropping large additional sections, may interfere with future legitimate protocols that rely on these sections.¶
Amplification can never be completely eliminated: an adversary can always consume some resolver resources by issuing queries. The goal is to make amplification predictable and bounded.¶
This document has no IANA actions. It does not define any new DNS resource record types, EDNS options or protocol registries.¶
This document does not introduce any new mechanisms for collecting or processing personal data. It describes behaviors that affect resource consumption. The analysis does not reveal client identities or change privacy properties beyond those already inherent in the DNS. General DNS privacy guidance (e.g., QNAME minimisation and encrypted transports) is relevant when mitigations are deployed but is outside the scope of this document.¶
This section is to be removed before publication as an RFC.¶
Resolver implementations vary in their existing protections, such as maximum alias-chain length, maximum recursion depth, per-client rate limits, query coalescing, and DNSSEC validation limits. However, implementations differ in accounting (what counts toward which limit), in how limits interact with restarts, and in default values. Systematic work‑profiling of deployed resolvers [SP26-RPROF] has begun to characterise this variation empirically and to surface worst‑case work distributions that the per‑implementation defaults above do not directly expose.¶
This document does not currently track implementation conformance. A future revision may include an implementation report if such tracking is considered useful by the working group.¶