Yes, this is pretty much TF-IDF for people too lazy to count the number of uniqu...

		jldugger on Dec 13, 2022 \| parent \| context \| favorite \| on: Tips for analyzing logs Yes, this is pretty much TF-IDF for people too lazy to count the number of unique items in the corpus. Since that number should be the same (or at least close!) in both good and bad datasets, I'm not sure the extra math matters much.