Downloading them as we speak. I have a big list of hosts/domains i have collected through spidering for my DNSDigger.com.
This is a hobby project that has grown a bit over my head hehe.
And there is no scraping needed. Robots.txt is just simple textfiles. Download and parse, repeat a couple of million times and build an index :)