Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

One thing on my "bucket list" is to "use a bloom filter for something". They seem like such awesome data structures but I've never found a place to use one :-(


Chrome uses them for the "safe browsing" filter. Google HQ makes a bloom filter out of all the blocked websites and sends out the filter occasionally in an update. That way Chrome can test the URL you're about to visit for membership in the huge blacklist without taking much network/disk/memory/CPU. Of course there could be false positives but Chrome only has to phone home to double-check on possible matches. http://blog.alexyakunin.com/2010/03/nice-bloom-filter-applic...


Next time you are caching a bit list with memcached and running out of space, you may replace that big list with a bloom filter.


Or, more likely, replace it with a bloom filter and a database, unless the cost of misidentifying false positives is very low for your application


Could you explain that a bit more?


This is exactly the scenario that a bloom filter is for.

You have an expensive lookup. You're caching information on success/fail so that you don't have to do the expensive lookup every time. But the caches are getting large.

What you do is replace the local caches with a bloomfilter. That data structure takes a bounded amount of memory. When it says, "No, I have not seen you before," you really haven't. And when it says, "I might recognize you," it is only sometimes right. However its mistakes will not really matter because you'll do the expensive lookup.

The tradeoff is that the more data you put into a bloom filter, the higher the odds are that it will think think you might have seen things before, and therefore the less useful it becomes. But in this caching situation, it saves you work even if the false positive rate is fairly high.


Probably.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: