Hacker Timesnew | past | comments | ask | show | jobs | submitlogin
Ask YC: Bayesian filter for NSFW content ?
4 points by ptm on May 5, 2008 | hide | past | favorite | 5 comments
I've just launched No-NSFW (NSFW content warning system) which relies on user feedback to determine site ratings.

I'm now thinking of introducing a Bayesian filter to determine site content. Does this make sense ?

Also, where do I hunt for seed data - I'm using nsfw.reddit for NSFW data (thanks kirubakaran), what do i use for SFW data ?



Also have a look at DansGuardian http://dansguardian.org/. Blacklist files are available here: http://urlblacklist.com/

I'm not sure what you are looking for in terms of safe for work data; maybe technorati tags?



NSFW? I'm not familiar with the term (yes I could google it but perhaps you could enlighten those of us who aren't, so we don't all have to.)


Not Safe For Work, i.e., not suitable for looking at while at work.


ah thank you. once again I'm familliar with the long version but not the acronym (I find this happening a lot lately).

personally I'd just like a bayesian filter on my rss.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: