This post has NOT been accepted by the mailing list yet.
I have some plans to work on large scale data analysis of biomedical text mining. I am planning to use Mahout to scale up for huge datasets. My idea is to build training model based on certain list of biological terms and then build classifier model based on different machine learning algorithms in Mahout. Have anyone used word level entity recognition before ? If we tokenize the documents based on the words, how can we train it using the machine learning algorithms using mahout. Is the anybody who works on NLP problems using mahout and hadoop.