Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

Yes, this is pretty much TF-IDF for people too lazy to count the number of unique items in the corpus.

Since that number should be the same (or at least close!) in both good and bad datasets, I'm not sure the extra math matters much.



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: