Date: 12 Mar 2000 22:24:47 -0000
Message-ID: <20000312222447.11277.qmail@cr.yp.to>
From: "D. J. Bernstein" <djb@cr.yp.to>
To: namedroppers@ops.ietf.org
Subject: DNS query transmission strategy
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii

General question: Given a query and a list of appropriate servers, how
should a DNS cache decide which server to contact first?

Data points: BIND usually contacts the server with the lowest apparent
RTT; there are penalties for servers that don't respond, slight bonuses
for servers that aren't being used, and some configuration options. In
contrast, my current strategy is simply to choose a random server.

Specific question: Are the increasingly frequent .com server overloads
being exacerbated by BIND's strategy?

Simulations show the following effect at rather small loads, when there
are mild variations in server performance and network performance. The
query load becomes increasingly unbalanced, with three or four of the
servers receiving more and more of the load. Eventually one of the
servers is overloaded and begins dropping packets; it remains overloaded
until enough caches switch to other servers. Repeat ad nauseam.

What information is available about the actual .com server loads? I see
ISC's statement that f.root-servers.net handles 3000 queries per second.
How many queries per second do all of the servers together handle?
What's the peak load on f? Do all the other servers hit their peak loads
during the same minute that f does?

I've seen reports of second-level zones with two servers where one
server receives a much heavier load than the other. The same servers
could handle nearly twice as many queries if the load were balanced
properly.

---Dan