> This represents an almost 8x compression ratio for every weight matrix in the ... | Hacker News

Hacker Timesnew | past | comments | ask | show | jobs | submit

		Havoc on Sept 10, 2024 \| parent \| context \| favorite \| on: Launch HN: Deepsilicon (YC S24) – Software and har... > This represents an almost 8x compression ratio for every weight matrix in the transformer model Surely you’d need more ternary weights though to achieve same performance outcome? A bit like a Q4 quant is smaller than a Q8 but also tangibly worse so the “compression” isn’t really like for like Either way excited about more tenary progress.

areddyyt on Sept 10, 2024 [–]

We do quantization-aware training, so the model should minimize the loss w.r.t. the ternary weights, hence no degradation in performance.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact