Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

> This represents an almost 8x compression ratio for every weight matrix in the transformer model

Surely you’d need more ternary weights though to achieve same performance outcome?

A bit like a Q4 quant is smaller than a Q8 but also tangibly worse so the “compression” isn’t really like for like

Either way excited about more tenary progress.



We do quantization-aware training, so the model should minimize the loss w.r.t. the ternary weights, hence no degradation in performance.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: