> The pre-existing harmonic distortion will only serve to mask any distortion and fidelity loss from truncating to 8-bits.
This is blatantly incorrect. The 8-bit conversion is not truncation, it is dithering. Dithering to 8-bit does not introduce any distortion at all, whatsoever, of any kind. If you don't understand the mechanisms and the science of how bit depths work, then you're going to come to false conclusions, like the conclusion that there's any point to the Pono player at all. We're not talking about just "subtle differences" here. In order for it to be even theoretically possible to hear the difference between 16-bit and 24-bit audio, you have to bring your audio system into a quiet room and then crank the volume levels well above into the threshold at which you can damage your ears, and even then, you still won't be able to tell the difference with the most dynamic music.
So, suppose you have a quiet room in your house, with an ambient noise level of about 30 dB. If you raise the 16-bit audio level so the noise floor is above 20 dB, then the peaks are going to be well into the 120 dB range. That's like having a symphony orchestra in the room with you, at the very peak of their performance, with all the instruments playing at once. If you've ever listened to a symphony orchestra, you know that the background noise is NOT 30 dB, but somewhat higher. So even at the peak of a symphony, your CD recording should still be able to reproduce the various unwelcome bits of noise that the musicians produce (stomachs gurgling, breathing, shuffling in their chairs, etc.)
Here is my take on dithering, with online demos : http://www.audiocheck.net/audiotests_dithering.php (still 8-bit though, because at 16-bit, it is barely noticeable, and it won't serve educational purposes very well)
Can you hear the difference between 16-bit audio with dither and 16-bit audio without dither?
Most people can. Now consider - that difference is created by adding a noise signal which is more than 90dB down compared to the maximum possible level.
By all reasonable expectations that difference should be completely inaudible under normal listening conditions.
But the effect it has isn't inaudible at all.
When you understand why, then you'll understand the difference between peer-reviewed and objectively tested psychoacoustic theory, and hand-waving about numbers.
You'll also understand why it's trivially easy to tell amplifiers and converters apart even when they have distortion products well below -90dB.
That aside - you're making the usual mistake of confusing dynamic range with resolution.
What's the effective bit resolution of a -48dB signal on a 16 bit system?
What's the resolution of the same signal on a 24dB system with the same output level?
What's the minimum number of bits needed to make quantisation noise inaudible? (Clue: rather more than 8.)
I'm honestly confused here, because you claim to be disagreeing with me, but when I read the content of your post, it sounds like you actually agree with me?
When I was talking about hearing the difference between 16-bit and 24-bit, I was assuming that we dithered our audio. You can't hear white noise at -90 dBFS in typical listening conditions. You'd have to be in a quiet room with the volume turned way up, and you'd have to have very low-noise equipment.
The effective resolution of a -48dB signal on a 16-bit system will depend on that signal's bandwidth. If you don't understand that part, then you don't understand the math.
It makes no difference if you're sampling a sine wave or broad-band noise - the effective resolution stays the same, because it's solely dependent on quantisation error, not on signal bandwidth.
The latter depends on sample rate not of sample resolution.
If you're making mistakes like that, it's not a brilliant idea to tell people who write DSP code and have designed audio hardware that they don't understand the math.
The other point still stands. If 16-bit resolution is already good enough to represent signals without audible distortion, why does it need dither to sound acceptable, while 24-bit audio doesn't?
I'm still waiting for anyone who believes 16-bit recording is perfect to explain why the industry bothered to invent a clearly audible conditioning process for signals that are supposed to be ideal already.
Let's not resort to comparing credentials here. For the record, I've designed and built audio hardware, and I'm the author of a sample-rate conversion library, which does SIMD band-limited sample rate conversion, and I also wrote the accompanying test suite. I'm not just some dude who read a blog post about audio.
Let's talk about bandwidth. If you have a pure sine wave and want to measure its amplitude, you can do a DFT on your signal and measure the appropriate bin. Let's assume that the sine wave does land in one particular bin. If your data is 16-bit with dithering, the dithering and quantization will add noise to all of the bins, but the noise will be equally divided. As you increase the length of the sample that you're analyzing, the bandwidth of each bin decreases, and the amount of noise in each bin decreases as well. However, the signal will always be concentrated in that one bin.
So, as you decrease the bandwidth, the quantization noise decreases as well. This is equivalent to saying that you have increased resolution.
I know this is counterintuitive. However, this is the foundation of how most modern ADCs work. It's called delta-sigma modulation, and it uses a low-resolution ADC internally to derive a high-resolution digital output. It's also been used in DACs. For an extreme example, look at DSD, which gives high-resolution outputs using a 1-bit signal.
The argument that "if 16 bits is enough, why do we need dithering" is kind of pointless, because we don't use 16-bit audio without dithering. It's like asking, "if this amplifier is good enough, why does it use negative feedback?" The answer of course is that negative feedback increases the linearity and flattens the response of the amplifier, and makes it less sensitive to variations in manufacturing and temperature.
As I ceded in another comment, dithering trades harmonic distortion for noise.
> In order for it to be even theoretically possible to hear the difference between 16-bit and 24-bit audio,
Only for professionally mastered audio which is not a safe assumption in this day and age. If some home engineer recorded a track with too much headroom and you get the 24-bit track you're fine, at 16-bits you have a problem.
I would love a music player where I could play tracks at 88kHz/24-bit because that's what most music is during the mixing process and then an audio engineer can give you the raw version of what they're working with without having to deal with issues of headroom, downsampling and dithering.
24-bit audio doesn't hurt anything other than file size and it has real uses, be it remix culture or just high-quality unmastered music.
You haven't done the math. Even home engineers won't have recordings with noise floors below -90 dBFS. That's just absurd. Double-blind tests have shown that trained listeners in ideal listening environments with high-end, calibrated equipment can't tell the difference between 44.1/16 and 88.2/24.
Audio engineers don't have to "deal" with the issues of downsampling and dithering. The DAW just does it. It's a solved problem. It's not even a button you have to press, it's all automatically set up for you these days. We know what algorithms to use: band-limited interpolation with dithering, possibly combined with noise shaping.
Headroom is a different issue, but even the sloppiest home engineer is going to take a look at the levels at some point. If they don't, they can just check the "normalize" checkbox when they bounce. Or they can just ignore it and leave it checked. They're still not going to give you files with noise floors below -90 dBFS, and therefore, there's still no point in giving you a 24-bit file.
Even for remixes, you're not getting any benefit, since the noise floor is above -90 dBFS anyway.
However... you want to work on a project together? Let's keep things 24-bit until the final mixdown.
I disagree that these things are, in the real world, "solved problems" and I'd rather have an unmastered raw track at 88/24 than a poorly mastered track at 44/16.
I don't disagree that CD quality basically maxes out the ear's natural capabilities, I don't think it's that easy to do.
For a remix I'd rather get a 24-bit stream than a 16-bit stream (that's really 15 bits) and has to be padded up to 24 anyway.
This is blatantly incorrect. The 8-bit conversion is not truncation, it is dithering. Dithering to 8-bit does not introduce any distortion at all, whatsoever, of any kind. If you don't understand the mechanisms and the science of how bit depths work, then you're going to come to false conclusions, like the conclusion that there's any point to the Pono player at all. We're not talking about just "subtle differences" here. In order for it to be even theoretically possible to hear the difference between 16-bit and 24-bit audio, you have to bring your audio system into a quiet room and then crank the volume levels well above into the threshold at which you can damage your ears, and even then, you still won't be able to tell the difference with the most dynamic music.
So, suppose you have a quiet room in your house, with an ambient noise level of about 30 dB. If you raise the 16-bit audio level so the noise floor is above 20 dB, then the peaks are going to be well into the 120 dB range. That's like having a symphony orchestra in the room with you, at the very peak of their performance, with all the instruments playing at once. If you've ever listened to a symphony orchestra, you know that the background noise is NOT 30 dB, but somewhat higher. So even at the peak of a symphony, your CD recording should still be able to reproduce the various unwelcome bits of noise that the musicians produce (stomachs gurgling, breathing, shuffling in their chairs, etc.)