I still really like how it makes random passwords given a permitted character set. It uses tr -cd to read only matching character bytes - discarding any others - from the random device. If you instead try to be less wasteful in turning random bytes into characters from the chosen set, you are in a sticky situation very quickly where your passwords might be less random than they should be, whereas bytes from the random device are cheap, so just throw away all the ones that aren't suitable and get more. So simple and yet I'm sure it wouldn't have occurred to me.
If you have an M-sized alphabet and you need an n-character password, why not just take ⌈log₂ Mⁿ⌉ bits and interpret it as a number in base M? That seems even simpler to me.
For example I should like a 1-character password, from the alphabet of A, B and C.
So that's 2 bits. We read two bits (awkward, the random device is of course byte oriented). Now, we have a 2-bit value and we're trying to pick A, B or C. But what do we do with the 4th possibility from our 2-bit value?
We could decide too bad we'll treat it as A (or B, or C) anyway, but now we've introduced a non-random bias to our supposedly random password.
The non-horrible option is to throw away this possibility and read two more bits hoping for a different outcome. The exact same algorithm pass uses, except you've added the extra complexity of reading less than one byte at a time from the device...
With a non-trivial (= much more realistic) length of your random number, how do you ever lose more than a fraction of a bit of entropy this way? Let's say you want a 16-character alphanumeric (a-zA-Z0-9) password. log₂ 62¹⁶≈95.267. Let's say you pull 96 bits and ignore the imbalance. Then you have 2×62¹⁶-2⁹⁶=16116640899382729306982711296 options with a probability of 2×2⁻⁹⁶ and 2⁹⁶-62¹⁶ options with a probability of just 2⁻⁹⁶. That comes out as around 95.203 bits of entropy, which isn't significantly different from 95.267. What obvious fault am I missing here?
Well, let's begin with the fact it's worse. You argue it's not "much" worse, but it is clearly worse not the same. Under 'pass' all passwords conforming to chosen rules are equally likely, under your approach some are twice as likely as others.
But then the other characteristic is simplicity. Using 'tr -cd' is, by my counting, five characters. How big is your best implementation of this supposedly "even simpler" but worse solution?
> Under 'pass' all passwords conforming to chosen rules are equally likely, under your approach some are twice as likely as others.
But they're passwords. At most half of the passwords will have different probability than the rest which means that you're still left with a staggering number of un-brute-forceable passwords. Two to the power of ninety nine is practically indistinguishable from two to the power of one hundred.
By the way, if you're still obsessed about completely equal probabilities here, you can just drop the whole sequence of bits if it's larger than the number of alternatives when interpreted as a binary number and fetch a new one. It seems to me that the expected value of number of bits consumed for such a process won't exceed twice the number of bits that are technically necessary (because the probabilities of individual numbers of attempts are a geometric sequence with a factor of at most 0.5 summing to exactly 1, which would have an expected value of 2), which is around 12.5 bits per character. With a filtration like tr -dc "0-z" mentioned below you'll consume roughly 27.3 bits per character on average. So "better" is a subjective term here; one solution may be simpler to write in a shell command line, another consumes much less entropy. If you're deciding what to do, you need to weigh your criteria properly if you have more of them. You may very well have different criteria than I do; that happens.
> Using 'tr -cd' is, by my counting, five characters
I probably wouldn't use this in the first place because it requires a subprocess and a binary that I might not have on some computers. So by my counting this could be hundreds of kilobytes of extra code. Division with a remainder is most likely something you already have in your standard library regardless of whether you use it or not.
To the extent that it could mean something to "consume" entropy from the random device they're both outputting the same size of password and so they're consuming roughly the same amount of entropy. Assuming the random device works as intended this would be a distinction that makes no difference anyway.
> a binary that I might not have on some computers
A binary that's included in the Linux Standard Base. Which computers don't you have it on?
> this could be hundreds of kilobytes of extra code
More like a few dozen kilobytes, although of course on tiny Linux systems it's bundled into something like busybox so much less.
> Division with a remainder is most likely something you already have in your standard library regardless of whether you use it or not.
The shell, which is what pass is written for, does have division with a remainder, but it cannot do this operation on 96-bit integers. So now you've added another constraint "rewrite this software in a language which has 96-bit integers, or more if longer passwords are allowed" [Hint: Longer passwords are very much allowed in pass]
So far what you've achieved is to show that this is trickier than you thought, yet produces worse results. Again, this is why I like what Jason did here.
> A binary that's included in the Linux Standard Base. Which computers don't you have it on?
I'm using some Windows computers. But for example I can use (and often do) a dumped SBCL binary on them just fine, even on those where I can't install anything.
Assuming a perfect RNG, but PRNGs don't have infinite state. So it's either impossible, or has a finite probability, with a finite probability for each of those cases, for an overall finite (non-zero) subjective probability of it running forever (in ideal program-space; obviously it will never run forever in real life).
No, you don't need a perfect RNG. You need anything that's not completely horrible.
Remember, we're not trying to read a whole password from the device, just one matching byte at a time. So, even if our password character set is a single character (pass will let you do that, but obviously you should not use such passwords in real life) we're talking about statistically 1 out of 256 bytes matches or else our PRNG is useless.
You correctly observe that the PRNG has finite state, but its state isn't so tiny that it only emits a handful of the 256 possible bytes before getting back to where it started, nobody would use such a busted algorithm.
RC4 is considered broken and unusable because the keystream is biased and thus distinguishable from random - to the extent that if you read a few million bytes from it you can detect the bias, but you're asking us to imagine that the random device has a PRNG many orders of magnitude worse in order for this to have any effect at all.
> You correctly observe that the PRNG has finite state, but its state isn't so tiny that it only emits a handful of the 256 possible bytes before getting back to where it started, nobody would use such a busted algorithm.
If the PRNG always loops through all possible states, then sure. But if there's a possibility, however small, of it getting stuck in a small loop, very rarely?
> But if there's a possibility, however small, of it getting stuck in a small loop, very rarely?
We already went around this particular "small loop" so I'm guessing you aren't learning anything from repeating it. That would be a lousy design for a PRNG, so, nobody does that.