Indexing problem

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Indexing problem

Falko Guderian-2
Hi,

I indexed 20 documents. I want to evaluate my lucene index. That's why I extract all term with their frequencies in each document.
This code has helped a lot.
-------------------------------------------------------------
try
{
    TermEnum terms = indexReader.terms(new Term("content", ""));
    while ("content".equals(terms.term().field()))
    {
        TermDocs termDocs = indexReader.termDocs();
        termDocs.seek(terms);
        // ... collect term.term().text() ...
        int frequency = 0;
        for(int i = 0; i< indexWriter.numDocs(); i++) {
        ...
        freqency = termDocs.freq();
        ...
        termDocs.next();        
        }
        if (!terms.next())
            break;
    }
}
finally
{
    terms.close();
}
-------------------------------------------------------------

But there is an anomaly. In the first document(termDocs.doc() = 0) all term frequencies are greater than 0.
But it isn't correct. The first doc doesn't contain all terms.

Do you now this problem? How can I get the correct term frequencies in all docs?


Best regards
Falko Guderian


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]