We had to stop using alpine because we have to resolve a DNS name that resolved ...

tyingq · on Aug 26, 2021

https://twitter.com/RichFelker/status/994629795551031296

Yikes.

fanf2 · on Aug 26, 2021

That thread from 2018 refers to RFC 5966 which was obsoleted by RFC 7766 in 2016. RFC 7766 is much stricter saying TCP support is required. https://datatracker.ietf.org/doc/html/rfc7766#section-5

throwthere · on Aug 26, 2021

Oh wow. There's a big contrast between Linus screaming "don't break userspace," and that sort of crusade against the spec.

Aissen · on Aug 26, 2021

It's not the same because this is not an ABI, but an API needing a recompilation. It's actually explained in the article why it is not an ABI.

ntauthority · on Aug 26, 2021

Why is DNS resolution even part of the libc and not, say, the base OS, a service on the base OS, or if need be, an external library like c-ares?

In fact, I thought Node already depended on c-ares, why is it failing on this?

p2t2p · on Aug 26, 2021

Adding to the other responder.

Traditionally, in Unix libc is part of the OS. This situation is different in Linux but Linux is an outlier here, if we look at various BSDs they keep libc in the same tree as kernel.

WJW · on Aug 26, 2021

If the outlier has >100x as much market share as the rest of the other unixes combined, is Linux really still the outlier?

WastingMyTime89 · on Aug 26, 2021

Historically speaking? Yes.

C and Unix are considerably older than Linux after all.

nyrikki · on Aug 26, 2021

Remember "Linux" is just the kernel, it is not an operating system itself.

Alpine or Debian including libC is more equivalent to the BSDs including it.

masklinn · on Aug 26, 2021

Yes. It explains why things work the way they do, and have since long before linux was a thing.

tyingq · on Aug 26, 2021

It has certainly evolved to be pretty complicated. You have whatever libc chooses to do with getaddrinfo(), nsswitch.conf, resolv.conf, systemd-resolved, various pieces of software (docker, vpns, wsl), and so on, all trying to control the local resolver.

sharikone · on Aug 26, 2021

Linux as a system is very ill-defined. I'd argue that GNU/Linux by definition contains glibc even if they are not in the same tree and musl based distributions are a variation which you could call "musl/Linux".

wahern · on Aug 26, 2021

GNU formalized a system of tuple definitions for identifying build, host, and target environments, which was popularized by Autotools. See https://autotools.io/autoconf/canonical.html#autoconf.canoni... and https://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/a... Even if you don't use Autotools, this is the canonical way to specify environments in the Unix world, though often a simplified version is employed. (By canonical I mean the one project-agnostic system that everybody at least nominally acknowledges. It's hardly the only system out there. Even Debian has their own alternative: https://wiki.debian.org/Multiarch/Tuples .)

The tuples historically had 3 components--cpu, vendor and operating system. But especially as uclibc and musl became more widespread the last component is commonly split into kernel-libc. (I think this was originally extended for the benefit of Debian GNU/kFreeBSD.) The formal OS identifier for glibc-based Linux systems is "linux-gnu" (e.g. x86_64-pc-linux-gnu), and for musl "linux-musl" (e.g. aarch64-alpine-linux-musl).

Vendor is not very useful these days. It's common to see 3-tuples of cpu-kernel-libc, as opposed to 4-tuples or traditional 3-tuples. Sometimes the system is extended into, e.g., 5-tuples like cpu-vendor-kernel-libc-compiler. Autotools projects commonly have a bit of generated shell code for parsing tuples; it's quite complex owing to ~30 years of accumulated idiosyncrasies.

Spivak · on Aug 26, 2021

Because people treat libraries differently today than 30 years ago. We're used to the integration points for things being some form of blocking IPC (like dbus on GNU/Linux, COM, or syscalls) but libc is different.

libc is that service on the base OS. But rather than connecting to an OS service and passing messages back and forth you dlopen and setjmp to do the same thing. On GNU/Linux libc isn't an interface to the NSS service, libc is the NSS service. That fact that you access it via your linker is just an implementation thing.

The kernel itself actually exposes integration points this way too with lib-vdso! The kernel will actually just stick it's own routines in your programs memory space so that you can avoid the syscall overhead for certain calls.

trissylegs · on Aug 26, 2021

It probably made sense in 1983.

https://github.com/dank101/4.2BSD/blob/master/include/netdb....

fanf2 · on Aug 26, 2021

I think back then libresolv was separate from libc since many programs didn’t need it, and memory was tight

fanf2 · on Aug 26, 2021

libc provides the standard POSIX sockets APIs, which include DNS functions such as gethostbyname() and getaddrinfo()

Denvercoder9 · on Aug 26, 2021

I certainly don't want DNS resolution inside the kernel, and outside the kernel, libc is as "base OS" as "base OS" comes, imo.

Spivak · on Aug 26, 2021

The Linux kernel can actually resolve names but it farms out the actual work to userspace using the request-key(8) machinery.

I personally it should be renamed because it's just a generic way for the kernel to ask for data from userspace, not just keys but still.

devit · on Aug 26, 2021

Someone added it to libc and now it needs to be provided forever for compatibility.

silon42 · on Aug 26, 2021

Typically, on a modern Linux system, DNS queries from libc (and everything else) will always query a local resolver (example: systemd-resolved).

Spivak · on Aug 26, 2021

That's up to your distro technically. DNS queries that use glibc (so everything basically) parse /etc/nsswitch.conf and follow the path of NSS modules which can do whatever they want to produce a name.

The resolve module provided by systemd talks to systemd-resolved but the dns module parses /etc/resolv.conf and does the resolution itself.

synergy20 · on Aug 26, 2021

"For what it's worth, musl's DNS resolver is slated to gain support for TCP responses in the near future" from https://www.linkedin.com/pulse/musl-libc-alpines-greatest-we...

wyager · on Aug 26, 2021

Ha, I ran into that one at work one time in our custom DNS resolver and had to add TCP upgrade. I was very confused why the tool worked when I shelled out to dig but not when I did the “correct” thing and used a resolver library. I’m very surprised that a project as big as musl would not have support for this.

geocar · on Aug 26, 2021

If someone on my team had built an application and put 100 hosts into a DNS server, I would suggest they upload their hosts file to a webserver someplace. 100 hosts just doesn't do anything useful with most applications using gethostbyname() even in glibc, it's going to be slow, and the bug reports you get are going to be really confusing. Custom applications that are prepared to deal with all 100 hosts will be easier to implement using the output from a webserver.

What are you doing?

Spivak · on Aug 26, 2021

Service-Discovery-Over-DNS is typically the use-case. It's used as a compatibility layer for software where you either can't or don't want to integrate the native discovery APIs. Consul is a good example of this. You don't actually have to know how to speak Consul to get automatic service discovery, all you have to do is query a DNS name to get the hosts registered for a particular service.

geocar · on Aug 26, 2021

That doesn’t answer my question at all, unless you mean that people make bad engineering decisions because they like using cute things.

dig is not affected by alpine’s decision here because dig does not use gethostbybame.

No DNS client would be.

This affects gethostbyname which very few programs in my experience even support robustly, so any “use-case” where someone is using 100 results would surprise me.

It seems if you need to write something custom, a www client is better (which consul also supports).

I think if you insist on writing gethostbyname instead of the res_* calls in bind, and robustly handle all results in a sensible way, then that’s silly, and if you have an existing application that works great with ~70 addresses but not 100 I would be curious to know what it is.

Spivak · on Aug 27, 2021

I'm not really sure what you mean by "support gethostbyname robustly" or that "dns clients aren't affected." Because on a GNU/Linux system the only correct method of resolving DNS is by using gethostbyname (or nowadays getaddrinfo) and friends. If you do anything else things will be broken because you aren't following the distro's/system integrator's/sysadmin's/user's configured NSS modules for name resolution.

And getaddrinfo returns a linked list of results so it's not exactly hard to support 100 results. All the actual junk about TCP/UDP is completely abstracted away from the caller.

So sure, while you could use your own DNS client specifically for talking to Consul's DNS server the whole point of the thing is to act as a compatibility layer for software you didn't write and which will 100% of the time use glibc's methods.

geocar · on Aug 27, 2021

> on a GNU/Linux system the only correct method of resolving DNS is by using gethostbyname

I don't think that's right.

gethostbyname() doesn't query DNS, it queries names, which includes /etc/hosts, and possibly NIS, active directory, and other possible things. Most applications would never be expecting 100 results from one of these queries and many will not tolerate it well.

Specialised users of gethostbyname() can certainly do better, but what I doubt is the wisdom of such specialisation: It certainly has nothing to do with the application -- it is literally under the control of the network administrator as you are well aware. Specialisation can occur in your application, but it can just as easily specialise another way.

On the other hand, if your application really wants to specially speak to Consul's DNS (as opposed to whatever the network administrator is doing) it can definitely use res_query()

> so it's not exactly hard to support 100 results

Maybe we mean different things by "support": What do you do with them?

> I'm not really sure what you mean by "support gethostbyname robustly"

When most applications connect to a host they get from gethostbyname they often connect to the first, and give up if the connection opens and resets: This is exceptionally common with load balancers and address translation. To those applications, what is the point of giving them multiple results in this situation?

A few applications try to handle the result robustly: connect to a random member of the list, or connect to several in parallel and try the request in parallel. Some applications do really wild stuff here to make a good user-experience.

Most do not.

When someone types `ping google.com` (for example) you only ever get one result. If that name doesn't ping, it doesn't try another.

Most are like that.

Hopefully that makes what I mean by "robustly" clearer.