Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

We had to stop using alpine because we have to resolve a DNS name that resolved in a 100 hosts. Musl fails to resolve that because it does not support “upgrade to TCP” and the response does not fit into single UDP packet so it gets truncated, so node fails to resolve the name. And not only node, normal Linux tools as well. And the author says it’s a feature, not a bug so for me it is kinda hard to take that thing seriously when parts of standard are unsupported.



That thread from 2018 refers to RFC 5966 which was obsoleted by RFC 7766 in 2016. RFC 7766 is much stricter saying TCP support is required. https://datatracker.ietf.org/doc/html/rfc7766#section-5


Oh wow. There's a big contrast between Linus screaming "don't break userspace," and that sort of crusade against the spec.


It's not the same because this is not an ABI, but an API needing a recompilation. It's actually explained in the article why it is not an ABI.


Why is DNS resolution even part of the libc and not, say, the base OS, a service on the base OS, or if need be, an external library like c-ares?

In fact, I thought Node already depended on c-ares, why is it failing on this?


Adding to the other responder.

Traditionally, in Unix libc is part of the OS. This situation is different in Linux but Linux is an outlier here, if we look at various BSDs they keep libc in the same tree as kernel.


If the outlier has >100x as much market share as the rest of the other unixes combined, is Linux really still the outlier?


Historically speaking? Yes.

C and Unix are considerably older than Linux after all.


Remember "Linux" is just the kernel, it is not an operating system itself.

Alpine or Debian including libC is more equivalent to the BSDs including it.


Yes. It explains why things work the way they do, and have since long before linux was a thing.


It has certainly evolved to be pretty complicated. You have whatever libc chooses to do with getaddrinfo(), nsswitch.conf, resolv.conf, systemd-resolved, various pieces of software (docker, vpns, wsl), and so on, all trying to control the local resolver.


Linux as a system is very ill-defined. I'd argue that GNU/Linux by definition contains glibc even if they are not in the same tree and musl based distributions are a variation which you could call "musl/Linux".


GNU formalized a system of tuple definitions for identifying build, host, and target environments, which was popularized by Autotools. See https://autotools.io/autoconf/canonical.html#autoconf.canoni... and https://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/a... Even if you don't use Autotools, this is the canonical way to specify environments in the Unix world, though often a simplified version is employed. (By canonical I mean the one project-agnostic system that everybody at least nominally acknowledges. It's hardly the only system out there. Even Debian has their own alternative: https://wiki.debian.org/Multiarch/Tuples .)

The tuples historically had 3 components--cpu, vendor and operating system. But especially as uclibc and musl became more widespread the last component is commonly split into kernel-libc. (I think this was originally extended for the benefit of Debian GNU/kFreeBSD.) The formal OS identifier for glibc-based Linux systems is "linux-gnu" (e.g. x86_64-pc-linux-gnu), and for musl "linux-musl" (e.g. aarch64-alpine-linux-musl).

Vendor is not very useful these days. It's common to see 3-tuples of cpu-kernel-libc, as opposed to 4-tuples or traditional 3-tuples. Sometimes the system is extended into, e.g., 5-tuples like cpu-vendor-kernel-libc-compiler. Autotools projects commonly have a bit of generated shell code for parsing tuples; it's quite complex owing to ~30 years of accumulated idiosyncrasies.


Because people treat libraries differently today than 30 years ago. We're used to the integration points for things being some form of blocking IPC (like dbus on GNU/Linux, COM, or syscalls) but libc is different.

libc is that service on the base OS. But rather than connecting to an OS service and passing messages back and forth you dlopen and setjmp to do the same thing. On GNU/Linux libc isn't an interface to the NSS service, libc is the NSS service. That fact that you access it via your linker is just an implementation thing.

The kernel itself actually exposes integration points this way too with lib-vdso! The kernel will actually just stick it's own routines in your programs memory space so that you can avoid the syscall overhead for certain calls.



I think back then libresolv was separate from libc since many programs didn’t need it, and memory was tight


libc provides the standard POSIX sockets APIs, which include DNS functions such as gethostbyname() and getaddrinfo()


I certainly don't want DNS resolution inside the kernel, and outside the kernel, libc is as "base OS" as "base OS" comes, imo.


The Linux kernel can actually resolve names but it farms out the actual work to userspace using the request-key(8) machinery.

I personally it should be renamed because it's just a generic way for the kernel to ask for data from userspace, not just keys but still.


Someone added it to libc and now it needs to be provided forever for compatibility.


Typically, on a modern Linux system, DNS queries from libc (and everything else) will always query a local resolver (example: systemd-resolved).


That's up to your distro technically. DNS queries that use glibc (so everything basically) parse /etc/nsswitch.conf and follow the path of NSS modules which can do whatever they want to produce a name.

The resolve module provided by systemd talks to systemd-resolved but the dns module parses /etc/resolv.conf and does the resolution itself.


"For what it's worth, musl's DNS resolver is slated to gain support for TCP responses in the near future" from https://www.linkedin.com/pulse/musl-libc-alpines-greatest-we...


Ha, I ran into that one at work one time in our custom DNS resolver and had to add TCP upgrade. I was very confused why the tool worked when I shelled out to dig but not when I did the “correct” thing and used a resolver library. I’m very surprised that a project as big as musl would not have support for this.


If someone on my team had built an application and put 100 hosts into a DNS server, I would suggest they upload their hosts file to a webserver someplace. 100 hosts just doesn't do anything useful with most applications using gethostbyname() even in glibc, it's going to be slow, and the bug reports you get are going to be really confusing. Custom applications that are prepared to deal with all 100 hosts will be easier to implement using the output from a webserver.

What are you doing?


Service-Discovery-Over-DNS is typically the use-case. It's used as a compatibility layer for software where you either can't or don't want to integrate the native discovery APIs. Consul is a good example of this. You don't actually have to know how to speak Consul to get automatic service discovery, all you have to do is query a DNS name to get the hosts registered for a particular service.


That doesn’t answer my question at all, unless you mean that people make bad engineering decisions because they like using cute things.

dig is not affected by alpine’s decision here because dig does not use gethostbybame.

No DNS client would be.

This affects gethostbyname which very few programs in my experience even support robustly, so any “use-case” where someone is using 100 results would surprise me.

It seems if you need to write something custom, a www client is better (which consul also supports).

I think if you insist on writing gethostbyname instead of the res_* calls in bind, and robustly handle all results in a sensible way, then that’s silly, and if you have an existing application that works great with ~70 addresses but not 100 I would be curious to know what it is.


I'm not really sure what you mean by "support gethostbyname robustly" or that "dns clients aren't affected." Because on a GNU/Linux system the only correct method of resolving DNS is by using gethostbyname (or nowadays getaddrinfo) and friends. If you do anything else things will be broken because you aren't following the distro's/system integrator's/sysadmin's/user's configured NSS modules for name resolution.

And getaddrinfo returns a linked list of results so it's not exactly hard to support 100 results. All the actual junk about TCP/UDP is completely abstracted away from the caller.

So sure, while you could use your own DNS client specifically for talking to Consul's DNS server the whole point of the thing is to act as a compatibility layer for software you didn't write and which will 100% of the time use glibc's methods.


> on a GNU/Linux system the only correct method of resolving DNS is by using gethostbyname

I don't think that's right.

gethostbyname() doesn't query DNS, it queries names, which includes /etc/hosts, and possibly NIS, active directory, and other possible things. Most applications would never be expecting 100 results from one of these queries and many will not tolerate it well.

Specialised users of gethostbyname() can certainly do better, but what I doubt is the wisdom of such specialisation: It certainly has nothing to do with the application -- it is literally under the control of the network administrator as you are well aware. Specialisation can occur in your application, but it can just as easily specialise another way.

On the other hand, if your application really wants to specially speak to Consul's DNS (as opposed to whatever the network administrator is doing) it can definitely use res_query()

> so it's not exactly hard to support 100 results

Maybe we mean different things by "support": What do you do with them?

> I'm not really sure what you mean by "support gethostbyname robustly"

When most applications connect to a host they get from gethostbyname they often connect to the first, and give up if the connection opens and resets: This is exceptionally common with load balancers and address translation. To those applications, what is the point of giving them multiple results in this situation?

A few applications try to handle the result robustly: connect to a random member of the list, or connect to several in parallel and try the request in parallel. Some applications do really wild stuff here to make a good user-experience.

Most do not.

When someone types `ping google.com` (for example) you only ever get one result. If that name doesn't ping, it doesn't try another.

Most are like that.

Hopefully that makes what I mean by "robustly" clearer.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: