Leading to my question: Ok keeping a zero and a minus-zero does make sense for some limits calculations...
But when all you have is 4 bits, is this not quite wasteful? Would using the bits for eg. a 2.5 not improve the model?
It might be useful. The Lion optimizer uses 1-bit values to represent forward or backward. NNs can pick up on patterns like that in very strange ways. Of course, those are 1's, not 0's, so maybe the benefit disappears when multiplying by zero. But it's important to challenge assumptions like "well, let's get rid of the negative half of 0" before you test experimentally whether it's useful or not. NNs are nothing if not shockingly weird when you try to make them.
I don't know enough about different industries to know if this is true in general. But I do know in tech it is all about networking (aka soft nepotism): spending your career making friends ("swiping right on every work relationship"), and the fraction of those who go on to massively succeed you can then call in favors. At least that's how I made three huge jumps in my 35+ year career, and how the majority of my peers got the big step-functions in pay.
Perhaps in my original post I'm just confusing academia with industry, since I know so few academics.
Lots of comments talking about how this is just some sort of ploy to feed the machine. I don't know what to tell you. I can only tell you it changed my life and the lives of many others. Hope it can help you too!
reply