I assume all major ISPs use 2 or 3 layers of cache, which makes the size of their perimeter fleet largely irrelevant.
Not really. The resolvers tend to be geographically dispersed and use anycast. Having a multi-layered cache would probably decrease performance, except within a specific location.
Could you perhaps ask your former Route53 colleagues for some log-file insight?
They see what's behind the cache, not how much traffic the resolvers are taking. Could be the same, could be 100x more, hard to tell.
So all it takes is one hit per major ISP per TTL to keep it zippy for almost everyone. That's why DNS works so well, after all?
Caching works great with long TTLs, e.g. as used for NS, MX, CNAME records. The problem is the 60 second TTLs that are commonly used for A records in cloud services. Except for reasonably high volume names, it's not highly probable that your A records will be in a given cache at a given time. Many applications also use many different domain names (e.g., one per user), which creates a long tail of low volume names.
Of course, traffic is not uniformly distributed in any way, so there might be parts of the day when your name will be constantly served from cache everywhere, or parts of the world where it is never served from cache.
Not really. The resolvers tend to be geographically dispersed and use anycast. Having a multi-layered cache would probably decrease performance, except within a specific location.
There are some nice research papers studying DNS resolvers, e.g. here's one for cellular networks: http://www.aqualab.cs.northwestern.edu/component/attachments...
Could you perhaps ask your former Route53 colleagues for some log-file insight?
They see what's behind the cache, not how much traffic the resolvers are taking. Could be the same, could be 100x more, hard to tell.
So all it takes is one hit per major ISP per TTL to keep it zippy for almost everyone. That's why DNS works so well, after all?
Caching works great with long TTLs, e.g. as used for NS, MX, CNAME records. The problem is the 60 second TTLs that are commonly used for A records in cloud services. Except for reasonably high volume names, it's not highly probable that your A records will be in a given cache at a given time. Many applications also use many different domain names (e.g., one per user), which creates a long tail of low volume names.
Of course, traffic is not uniformly distributed in any way, so there might be parts of the day when your name will be constantly served from cache everywhere, or parts of the world where it is never served from cache.