D. J. Bernstein
Internet publication
djbdns

The case against A6 and DNAME

Example: AOL committing A6 suicide

In an IPv6 world, dns-01.ns.aol.com and dns-02.ns.aol.com will probably share a common address prefix. The A6 specification encourages AOL to set up A6 records pointing to the common prefix:

     dns-01.ns.aol.com A6 ... prefix.aol.net
     dns-02.ns.aol.com A6 ... prefix.aol.net
     prefix.aol.net A6 ...

If AOL sets up these A6 records, nobody will be able to reach aol.com or aol.net.

Explanation: Let's say a cache needs the address of www.aol.com. It contacts the .com servers and learns

     aol.com NS dns-01.ns.aol.com
     aol.com NS dns-02.ns.aol.com
     dns-01.ns.aol.com A6 ... prefix.aol.net
     dns-02.ns.aol.com A6 ... prefix.aol.net

but it won't learn the address of prefix.aol.net. Even if the address is provided, the cache won't accept it, because .net addresses are not within the bailiwick of a .com server; this is the standard protection against poison.

(It is theoretically possible for caches to see that the prefix.aol.net address isn't poison, because the .com servers are the same as the .net servers. But let's fast forward to a time when the .com servers and the .net servers have been separated. The .com servers won't know the prefix.aol.net address, just as they don't know the ns1.nic.uk address today, and they won't have the authority to declare it even if they do know it.)

So the cache puts the www.aol.com query on hold. It needs the address of prefix.aol.net. It contacts the .net servers and learns

     aol.net NS dns-01.ns.aol.com
     aol.net NS dns-02.ns.aol.com

but it won't learn .com information from the .net servers.

So the cache puts the prefix.aol.net query on hold. It needs the address of dns-01.ns.aol.com. Repeat ad nauseam.

A quick overview of DNS lookups

I'm writing a program that talks to DNS servers to find an address given a domain name. How much memory do I need? How many servers do I need to contact?

I'll need as much as 64K for an incoming DNS message, during the brief moment before I parse it. There's not much persistent data:

The name I'm looking up.
The name of the closest zone found so far.
The addresses of servers for that zone. 16 addresses will be more than enough.

A response from a server will give me the answer, or refer me to a closer zone. The number of queries is at most 4 for www.aol.com. Simple, isn't it?

Oops, that's not quite right. I might receive a CNAME. In that case I'll have to change the name and start over. This may happen several times. RFC 1034 says that the first CNAME ``should always'' get me to the canonical name, to avoid ``extra indirections,'' but it also says that I should follow chains if they do happen.

Suddenly there's no limit to the number of queries I need.

But wait. The algorithm still doesn't work. A referral to out-of-bailiwick names, such as the aol.net referral to dns-01.ns.aol.com and dns-02.ns.aol.com, won't include the addresses of those servers. I need to put the original query on hold while I look up those addresses. So here's what I actually have to store:

The name I'm looking up.
The name of the closest known zone.
The known addresses of servers for that zone.
The names of servers whose addresses I still have to look up.
The name of a server whose address I'm looking up.
The name of the closest known zone to that.
The known addresses of servers for that zone.
The names of servers whose addresses I still have to look up.

It doesn't end there. If there are names in the last list, I'll have to put the second lookup on hold while I look for the addresses of those names. And so on.

Suddenly there's no limit to the amount of space I need. I don't have a single small array of addresses; I have an unlimited-length array of small arrays of names and addresses. This isn't so simple any more.

DNS reliability problems

Out-of-bailiwick pointers destroy DNS lookups in three ways:

Every out-of-bailiwick pointer means more queries and more opportunities for delay: packets are lost and have to be resent. The chance of finding an answer before client timeout decreases exponentially with the number of out-of-bailiwick pointers.
Caches have to limit the number of queries and the amount of memory that they dedicate to a single lookup. When these limits are exceeded, lookups fail.
As illustrated by the AOL suicide example, every out-of-bailiwick pointer is another opportunity to create a loop. When a loop appears, lookups fail.

These problems are not new. Lookups occasionally fail because system administrators have used too many out-of-bailiwick NS records, for example. (I tell my users to select in-bailiwick server names. My software automatically uses a.ns.fqdn, b.ns.fqdn, etc. as the default server names for fqdn. I also tell my users to avoid CNAME records.)

What is new with A6 and DNAME is that out-of-bailiwick pointers are encouraged. System administrators are encouraged to set up giant A6 chains and giant DNAME chains reflecting their corporate structures and network structures. The result will be a tremendous increase in the frequency of DNS lookup failures.

Server-side indirection

Why did the original DNS design include CNAME records? Why did it have names, rather than addresses, in NS records and MX records? Why do DNS packets have an ``additional section''?

RFC 1035 responds: If you copy a machine's address into an NS record, then you have to watch for changes in the address, and echo them. Indirection ``avoids the opportunity for inconsistency.''

I agree: indirection is good. But it didn't have to be a protocol feature handled by caches. It could have been handled by the server.

Instead of publishing ftp.isc.org CNAME isrv4.pa.vix.com, for example, the isc.org server can periodically look up the address of isrv4.pa.vix.com and copy it to the address of ftp.isc.org. It doesn't matter whether isrv4.pa.vix.com was copied from somewhere else. There are no chains of pointers to follow. The system is reliable.

Server-side indirection has three minor effects on DNS load:

The isc.org server has to periodically check the address of isrv4.pa.vix.com.
Caches that look up ftp.isc.org do not have to look up the address of isrv4.pa.vix.com.
The TTL of ftp.isc.org has to be reduced below the TTL of isrv4.pa.vix.com. This might force some extra lookups of ftp.isc.org.

The total effect could be positive or negative. It is certainly not overwhelming.

I'm not going to throw away reliability, even if doing so might save a few DNS queries.

The cost of signing DNS records

The following discussion is in a hypothetical world of public-key signatures on DNS records.

A signed DNS record has a relative time-to-live and an absolute expiration date.

What is the effect of the time-to-live? Caches are required to throw the record away TTL seconds after receiving the record.

What is the effect of the expiration date? Caches are required to throw the record away after the expiration date.

Administrators normally insist on being able to change their records with at most a few days notice. So they set the TTL on a record to 86400 (1 day). But what about the expiration date?

It is not acceptable from a security perspective to have an expiration date far in the future. Suppose the administrator changes the record today; an attacker can interfere with the publication of this change, by forging old DNS responses under the old signature. The expiration date is the only protection against this attack.

So the expiration date has to be at most a few days after the record is signed. Of course, the record has to be signed again before the expiration date. Conclusion: Every record has to be signed frequently, at least once every day or two.

Is this expensive? Yes. But it is essential for security.

Occasional renumbering, changing a bunch of AAAA records, does not add noticeably to the signing cost. In fact, unless renumbering has to happen with less than one or two days notice, the extra cost is zero.

A few people have made the following argument for A6: ``At large sites, AAAA renumbering changes a huge number of records, while A6 renumbering changes very few records. Signing changed records is expensive.'' This argument is fundamentally flawed. Large sites already need enough computer power to frequently sign every record, not just the records that change.

History

I wrote most of this web page in February 2001, and included the following statement:

I refuse to implement A6 and DNAME. I cannot bring myself to inflict such a rickety system on future Internet users. As of February 2001, nobody is relying on A6 or DNAME; I recommend that the A6 and DNAME proposals be terminated.

I added the signing-costs section in July 2001.

Miscellaneous quotes from other people (* meaning BIND company employee or owner):

Ian Jackson, 2001.02.08: ``I strongly agree.''
David Conrad*, 2001.03.14: ``More operational experience is required.''
Robert Elz, 2001.03.16: ``I suspect this is a pointless discussion.''
Paul Vixie*, 2001.04.28: ``Hint to whoever is in charge: the time to debate A6's merits at this level was: before it got put on the standards track, before a lot of code got written, and before a lot of operational deployment was put into various long and expensive pipelines. ... If we're going to abandon A6 then i for one am ready to listen to tony li's "too little, too soon" comment, through [sic] IPv6 itself into the waste bin, and go back to the whiteboard and solve more of the problems we actually have. There is NO justification for the rapid and early adoption of IPv6 in its present form unless A6 or something very much like A6 is made a part of it.''
Jim Bound, 2001.04.30: ``A6 will be shipped on the street and its a done deal. Do I like A6 no. Do I think a better solution exists yes. But do I advocate deployment of A6 yes. Why because its time to move forward.''
Jim Reid*, 2001.07.17: ``When is djbdns going to support all the DNS protocols that you've so far failed to implement [like] A6/DNAME chaining?''
Matt Crawford (A6 spec author), 2001.07.20: ``Your reasoning is markedly incorrect if applied to A6.''

In July 2002, IETF downgraded the A6, DNAME, and bit-label specifications (RFC 2874 and RFC 2673) from Proposed Standard to Experimental: ``A6, Binary Labels and DNAME DNS extensions should not be widely deployed for use with IPv6 at this time.''

Unfortunately, as of November 2002, the BIND company is continuing to advertise ``IPv6 resource records (A6, DNAME, etc.)'' and ``Bitstring Labels'' as major features of BIND 9. Continued vigilance will be required: the BIND company must not be allowed to fool people into deploying A6.