We had to stop using alpine because we have to resolve a DNS name that resolved in a 100 hosts. Musl fails to resolve that because it does not support “upgrade to TCP” and the response does not fit into single UDP packet so it gets truncated, so node fails to resolve the name. And not only node, normal Linux tools as well. And the author says it’s a feature, not a bug so for me it is kinda hard to take that thing seriously when parts of standard are unsupported.
Traditionally, in Unix libc is part of the OS. This situation is different in Linux but Linux is an outlier here, if we look at various BSDs they keep libc in the same tree as kernel.
It has certainly evolved to be pretty complicated. You have whatever libc chooses to do with getaddrinfo(), nsswitch.conf, resolv.conf, systemd-resolved, various pieces of software (docker, vpns, wsl), and so on, all trying to control the local resolver.
Linux as a system is very ill-defined. I'd argue that GNU/Linux by definition contains glibc even if they are not in the same tree and musl based distributions are a variation which you could call "musl/Linux".
The tuples historically had 3 components--cpu, vendor and operating system. But especially as uclibc and musl became more widespread the last component is commonly split into kernel-libc. (I think this was originally extended for the benefit of Debian GNU/kFreeBSD.) The formal OS identifier for glibc-based Linux systems is "linux-gnu" (e.g. x86_64-pc-linux-gnu), and for musl "linux-musl" (e.g. aarch64-alpine-linux-musl).
Vendor is not very useful these days. It's common to see 3-tuples of cpu-kernel-libc, as opposed to 4-tuples or traditional 3-tuples. Sometimes the system is extended into, e.g., 5-tuples like cpu-vendor-kernel-libc-compiler. Autotools projects commonly have a bit of generated shell code for parsing tuples; it's quite complex owing to ~30 years of accumulated idiosyncrasies.
Because people treat libraries differently today than 30 years ago. We're used to the integration points for things being some form of blocking IPC (like dbus on GNU/Linux, COM, or syscalls) but libc is different.
libc is that service on the base OS. But rather than connecting to an OS service and passing messages back and forth you dlopen and setjmp to do the same thing. On GNU/Linux libc isn't an interface to the NSS service, libc is the NSS service. That fact that you access it via your linker is just an implementation thing.
The kernel itself actually exposes integration points this way too with lib-vdso! The kernel will actually just stick it's own routines in your programs memory space so that you can avoid the syscall overhead for certain calls.
That's up to your distro technically. DNS queries that use glibc (so everything basically) parse /etc/nsswitch.conf and follow the path of NSS modules which can do whatever they want to produce a name.
The resolve module provided by systemd talks to systemd-resolved but the dns module parses /etc/resolv.conf and does the resolution itself.
Ha, I ran into that one at work one time in our custom DNS resolver and had to add TCP upgrade. I was very confused why the tool worked when I shelled out to dig but not when I did the “correct” thing and used a resolver library. I’m very surprised that a project as big as musl would not have support for this.
If someone on my team had built an application and put 100 hosts into a DNS server, I would suggest they upload their hosts file to a webserver someplace. 100 hosts just doesn't do anything useful with most applications using gethostbyname() even in glibc, it's going to be slow, and the bug reports you get are going to be really confusing. Custom applications that are prepared to deal with all 100 hosts will be easier to implement using the output from a webserver.
Service-Discovery-Over-DNS is typically the use-case. It's used as a compatibility layer for software where you either can't or don't want to integrate the native discovery APIs. Consul is a good example of this. You don't actually have to know how to speak Consul to get automatic service discovery, all you have to do is query a DNS name to get the hosts registered for a particular service.
That doesn’t answer my question at all, unless you mean that people make bad engineering decisions because they like using cute things.
dig is not affected by alpine’s decision here because dig does not use gethostbybame.
No DNS client would be.
This affects gethostbyname which very few programs in my experience even support robustly, so any “use-case” where someone is using 100 results would surprise me.
It seems if you need to write something custom, a www client is better (which consul also supports).
I think if you insist on writing gethostbyname instead of the res_* calls in bind, and robustly handle all results in a sensible way, then that’s silly, and if you have an existing application that works great with ~70 addresses but not 100 I would be curious to know what it is.
I'm not really sure what you mean by "support gethostbyname robustly" or that "dns clients aren't affected." Because on a GNU/Linux system the only correct method of resolving DNS is by using gethostbyname (or nowadays getaddrinfo) and friends. If you do anything else things will be broken because you aren't following the distro's/system integrator's/sysadmin's/user's configured NSS modules for name resolution.
And getaddrinfo returns a linked list of results so it's not exactly hard to support 100 results. All the actual junk about TCP/UDP is completely abstracted away from the caller.
So sure, while you could use your own DNS client specifically for talking to Consul's DNS server the whole point of the thing is to act as a compatibility layer for software you didn't write and which will 100% of the time use glibc's methods.
> on a GNU/Linux system the only correct method of resolving DNS is by using gethostbyname
I don't think that's right.
gethostbyname() doesn't query DNS, it queries names, which includes /etc/hosts, and possibly NIS, active directory, and other possible things. Most applications would never be expecting 100 results from one of these queries and many will not tolerate it well.
Specialised users of gethostbyname() can certainly do better, but what I doubt is the wisdom of such specialisation: It certainly has nothing to do with the application -- it is literally under the control of the network administrator as you are well aware. Specialisation can occur in your application, but it can just as easily specialise another way.
On the other hand, if your application really wants to specially speak to Consul's DNS (as opposed to whatever the network administrator is doing) it can definitely use res_query()
> so it's not exactly hard to support 100 results
Maybe we mean different things by "support": What do you do with them?
> I'm not really sure what you mean by "support gethostbyname robustly"
When most applications connect to a host they get from gethostbyname they often connect to the first, and give up if the connection opens and resets: This is exceptionally common with load balancers and address translation. To those applications, what is the point of giving them multiple results in this situation?
A few applications try to handle the result robustly: connect to a random member of the list, or connect to several in parallel and try the request in parallel. Some applications do really wild stuff here to make a good user-experience.
Most do not.
When someone types `ping google.com` (for example) you only ever get one result. If that name doesn't ping, it doesn't try another.
Most are like that.
Hopefully that makes what I mean by "robustly" clearer.