I'm trying to automate the classification of documents that we are seeing from our crawler. I have done a google search for keyword classification using Python but I'm not getting any joy. Anyone here automatically tagging or classifying documents (Maybe after some teaching) ? Would be good to know how if you are |
the current state of the art algorithms are based on support vector machines, but their learning part could be tricky to implement in a scalable fashion. if you are looking for a quick and dirty approach, TFIDF algorithm (it is a naive "naive Bayes" :) is simple and is adequate for many applications