D. J. Bernstein
Internet publication

Notes on the Domain Name System

If you've seen my reference manuals on Internet mail, the Internet mail message header format, SMTP, and FTP, then you might be expecting something similarly comprehensive for DNS. This isn't it. Sorry.

Trusted servers

When a DNS cache---a ``full-service resolver'' under RFC 1123---wants the address of www.w3.org, it may contact the w3.org DNS servers, the org DNS servers, or the root DNS servers.

For example, as of January 2001, one of the w3.org DNS servers is w3csun1.cis.rl.ac.uk. This server has the power to define the address of www.w3.org. It can flood the other servers to prevent them from providing contradictory information.

When the cache wants the address of w3csun1.cis.rl.ac.uk, it may contact the rl.ac.uk DNS servers, the ac.uk DNS servers, the uk DNS servers, or the root DNS servers. For example, ns.eu.net, one of the ac.uk DNS servers, has the power to define the address of w3csun1.cis.rl.ac.uk. Consequently it also has the power to define the address of www.w3.org.

Similarly, all names under eu.net, hence ac.uk and w3.org, are controlled by sunic.sunet.se; all names under sunet.se, hence eu.net and ac.uk and w3.org, are controlled by beer.pilsnet.sunet.se; and beer.pilsnet.sunet.se is running an ancient version of BIND, known to allow anyone on the Internet to take over the machine.

Are the www.w3.org administrators aware that their DNS service relies on beer.pilsnet.sunet.se and 200 other obscure computers around the world?

In contrast, if w3.org had used in-bailiwick names for its servers, such as a.ns.w3.org and b.ns.w3.org and c.ns.w3.org and d.ns.w3.org, then it would not be relying on the servers for ac.uk and eu.net and sunet.se.

I pointed out this type of problem in January 2000. At that time, these same 200 computers had control over practically all names on the Internet, including *.com. The .com server names were subsequently fixed to avoid the problem. Most country-code TLDs have not been fixed.


RFC 1034's resolution algorithm allows any server on the Internet to destroy, or take over, yahoo.com. All the nasty.dom server has to do is delegate www.nasty.dom to the yahoo.com servers while providing false addresses for those servers:
     www.nasty.dom NS ns1.yahoo.com
     www.nasty.dom NS ns2.dca.yahoo.com
     www.nasty.dom NS ns3.europe.yahoo.com
     www.nasty.dom NS ns5.dcx.yahoo.com
     ns1.yahoo.com A
     ns2.dca.yahoo.com A
     ns3.europe.yahoo.com A
     ns5.dcx.yahoo.com A
The nasty.dom server can now wait for (or encourage) the cache to ask about www.nasty.dom. When the cache receives the answer, it will, according to RFC 1034, save the forged yahoo.com addresses for future reference. Subsequent queries for yahoo.com will be misdirected.

Cache poisoning was widely known in 1990. But it was viewed as merely a reliability issue, a result of sloppy administration. Someone who listed munnari.oz.au as a backup server with an out-of-date IP address would accidentally poison caches and destroy legitimate connections to munnari.oz.au.

Vixie's first BIND release, version 4.9 in 1992, featured a notion of ``credibility'' that managed to prevent the most severe cases of accidental poisoning. From a security point of view, Vixie's ``credibility'' is garbage; it doesn't even stop the yahoo.com attack described above.

It's obvious how to eliminate all poisoning. Caches must discard yahoo.com information except from the yahoo.com servers, the com servers, and the root servers. This stops malicious poisoning, so of course it stops accidental poisoning too. End of problem.

BIND finally adopted this poison-elimination rule in 1997, after cache poisoning became a popular attack tool. Did Vixie scrap his obsolete ``credibility'' rules? No! As of January 2000, they were still in BIND 8.2.2-P5, more incoherent than ever. For example, if records had ``additional section credibility,'' and if someone sent a query asking for those records, BIND would reduce the TTL of the records by 5%. Some of the other rules appear in RFC 2181.

I pointed out on bugtraq in January 2000 that, when a domain changed all its DNS server names (e.g., to switch ISPs), an attacker could trivially exploit BIND's ``credibility'' rules to break access to that domain. I also tried to point this out on namedroppers, but my message was censored by Randy Bush.

dnscache doesn't discriminate against additional records. Valid records are accepted whether they're additional records in one packet or answer records in the next; timing doesn't affect the semantics.

Limited parents

RFC 1034 assumes that parent servers will list all the NS records of child servers.

In practice, however, some parents limit the number of NS records that they will list; some parents have painful update procedures; and, for many years, the largest .com registrar pointlessly refused NS records listing host names with IP addresses that had already been registered under different host names.

So a child server often lists more NS records than its parent. It includes the NS records along with its answers, so that caches will replace the NS records from the parent with the NS records from the child. If the NS records (and associated addresses) expire after the answers do, the caches will use the complete NS list to find the new answers, and will obtain a fresh NS list at that point. The load is spread among all the servers, though not as evenly as it would be if the parent listed more servers.

Unfortunately, BIND 8.2 won't cache the fresh NS list. After the old list expires, BIND contacts the parent servers and again obtains the incomplete NS list.

Beware that, because of the ``credibility'' rules described above, the NS records from the child servers must include the NS records from the parent. Otherwise an attacker can break BIND's access to the child servers.


Suppose you're a DNS cache, and you want the address of www.espn.tv. You happen to know the address of a .tv DNS server, so you ask it for the address of www.espn.tv. ``I don't know, but I know that .espn.tv has two DNS servers, ns-1.disney.corp and ns-2.disney.corp,'' it says. ``Try asking them.''

So you contact ns-1.disney.corp. But what's the address of ns-1.disney.corp? You have to put the original question on hold while you search for the address of ns-1.disney.corp. You happen to know an address of a .corp DNS server, so you ask it for the address of ns-1.disney.corp. ``I don't know, but I know that .disney.corp has two DNS servers, zone.espn.tv and night.espn.tv,'' it says. ``Try asking them.''

Bottom line: You can't reach espn.tv, and you can't reach disney.corp.

If zone.espn.tv had been a DNS server for .espn.tv, the .tv server would have provided glue for zone.espn.tv, i.e., the IP address of zone.espn.tv. So you would have been able to contact zone.espn.tv. RFC 1034 specifically requires glue for referrals to in-bailiwick DNS servers. (Some people use the word ``glue'' only in this case.)

For referrals to out-of-bailiwick DNS servers, however, RFC 1034 says that glue is unnecessary. RFC 1537 says the same thing. RFC 1912 says the same thing. The comp.protocols.tcp-ip.domains FAQ says that ``you do not need a glue record, and, in fact, adding one is a very bad idea.'' (This is an obsolete reference to accidental poisoning; see above.) Some DNS server implementations ignore out-of-bailiwick glue by default. So the glueless domains espn.tv and disney.corp are following the rules---yet neither of them is reachable.

There can be trouble even when there are no loops. Suppose a BIND cache is looking up www.espn.tv in the following situation:

     espn.tv NS ns-1.disney.corp
     espn.tv NS ns-2.disney.corp

     disney.corp NS ns-1.disney.corp
     disney.corp NS ns-2.disney.corp
When BIND sees the glueless delegation to ns-1.disney.corp, it drops the www.espn.tv query and begins a ``sysquery'' for ns-1.disney.corp, hoping to have the ns-1.disney.corp address cached by the time the www.espn.tv query is retried. (The BIND developers refer to this bug as ``no query restart.'') Clients generally don't retry more than four times, so an initial query for a domain with four levels of gluelessness will fail; an initial query for a domain with three levels of gluelessness will be very likely to fail, and very slow if it succeeds.

``As far as I know, the Internet has not yet lost any domains to gluelessness,'' I wrote in 2000. ``But there are an increasing number of glueless domains, and I've spotted a glueless domain with glueless DNS servers. How much gluelessness must a cache tolerate? Currently dnscache allows three levels of gluelessness. This seems to be enough for now, but will it be enough in the future?''

I subsequently learned about www.monty.de, which had so many levels of gluelessness that BIND caches were completely unable to reach it:

     monty.de NS ns.norplex.net
     monty.de NS ns2.norplex.net

     norplex.net NS vserver.neptun11.de
     norplex.net NS ns1.mars11.de

     neptun11.de NS ns.germany.net
     neptun11.de NS ns2.germany.net

     mars11.de NS ns1.neptun11.de
     mars11.de NS www.gilching.de

     gilching.de NS ecrc.de
     gilching.de NS name.muenchen.roses.de
dnscache was able to find the address of www.monty.de, but it needed fourteen queries to various servers.

I recommend that all DNS servers be in-bailiwick servers with glue. External DNS servers should be given internal names, with address records copied automatically (preferably by some secure mechanism) from the external names to the internal names.

DNS should have been designed with addresses, not names, in NS records and MX records. The ``additional section'' of DNS responses should have been eliminated. RFC 1035 observes correctly that NS indirection and MX indirection ``insure [sic] consistency'' of addresses; however, this indirection should have been handled by the server, not the client. (On a related note: Microsoft Exchange Server 2000 reportedly fails to deliver to MX records that point to CNAME records; fixed in the first Service Pack.)

I have a separate page discussing A6 and DNAME from this perspective.

Expiring glue

Occasionally the address records for some DNS servers all expire from a cache, even though the servers weren't glueless in the first place:
     aol.com NS dns-01.ns.aol.com
     aol.com NS dns-02.ns.aol.com
Usually this means that the A records accompanied the NS records but with lo