There are two aspects of the problem that I consider particularly interesting: a) defining the problem, because being able to define the problem and justify your answer is half the puzzle in a lot of NLP work. b) scaling to a corpus of this size (vocabulary of 1M), because this scale is tricky but useful in many web problems.
Semioticians typically distinguish between paradigmatic and syntagmatic axes of semantic relatedness. Paradigmatic means that two words occur with similar other words, e.g. these two words typically have the same word immediately to their left, like "blue" and "azure". Syntagmatic means that the words typically co-occur in usage, like "blue" and "sky". Check out the image on this page for another illustration of these axes: http://www.aber.ac.uk/media/Documents/S4B/sem03.html
Regardless of whether you choose to do a paradigmatic or syntagmatic analysis, it's interesting to see how you motivate your approach and if you can scale it to 1M different vocabulary words.
I think the more fair comparison would be to show THE SAME random subset for each entry. I.e. same X words for all result sets.
Otherwise, it might happen that a superior result would just show words that don't even have good results, while an inferior subset would get better covered words.
Semioticians typically distinguish between paradigmatic and syntagmatic axes of semantic relatedness. Paradigmatic means that two words occur with similar other words, e.g. these two words typically have the same word immediately to their left, like "blue" and "azure". Syntagmatic means that the words typically co-occur in usage, like "blue" and "sky". Check out the image on this page for another illustration of these axes: http://www.aber.ac.uk/media/Documents/S4B/sem03.html
Regardless of whether you choose to do a paradigmatic or syntagmatic analysis, it's interesting to see how you motivate your approach and if you can scale it to 1M different vocabulary words.