Can't find documents with certain terms

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Can't find documents with certain terms

Urs Eichmann-2
My index consists of about 26 different fields. I have a very weird problem:
On certain fields, I cannot search - i.e. the search always returns 0
documents. I used Luke's Lucene Index Toolbox, and the behaviour there is
weird as well:
 
I do the following in Luke's Program:
 
a) go to the Documents Tab
b) Enter term field "unit" and value="DOSE", hit "Show all docs"
c) A list of 5 documents is displayed, which is ok. The query which was
generated is unit:DOSE. The parsed query is unit:DOSE and the rewritten
query is unit:dose
d) Then I just hit the "Search" button without changing the query or
anything
e) now the result list is empty. The only difference I can see is that the
parsed query is unit:dose now instead of unit:DOSE as before.

The unit field is non-tokenized, indexed, stored.
 
Does anyone have an explanation for this behaviour? The problem is, the same
behaviour is found in my program, e.g. if I look for "unit:DOSE", I will get
no documents returned. However, on many of the other 26 fields, it runs OK,
and I can't see any difference in the field definitions.
 
I had this problem in 1.4.3, changed now to 1.9 RC1, but the problem is
still the same.

Perhaps I should metion that I have built the index with the dotNet version
of Lucene (dotLucene), but since the Luke (java) program shows the same
behaviour, I don't believe it is a problem with the dotNet port.

Many thanks for any help or suggestions!
Urs
 
 

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Can't find documents with certain terms

Andrzej Białecki-2
Urs Eichmann wrote:

> My index consists of about 26 different fields. I have a very weird problem:
> On certain fields, I cannot search - i.e. the search always returns 0
> documents. I used Luke's Lucene Index Toolbox, and the behaviour there is
> weird as well:
>  
> I do the following in Luke's Program:
>  
> a) go to the Documents Tab
> b) Enter term field "unit" and value="DOSE", hit "Show all docs"
> c) A list of 5 documents is displayed, which is ok. The query which was
> generated is unit:DOSE. The parsed query is unit:DOSE and the rewritten
> query is unit:dose
> d) Then I just hit the "Search" button without changing the query or
> anything
> e) now the result list is empty. The only difference I can see is that the
> parsed query is unit:dose now instead of unit:DOSE as before.

There are two possible explanations:

1. perhaps the actual term in the index is 'DOSE', and you are looking
for 'dose' (what counts is what the "rewritten query" shows). Lucene
doesn't lowercase automatically, so these two terms won't match. Or
perhaps the term is 'DOSE ' (whitespace included)?

2. or, the hits you are getting are below the threshold. Please use the
HitCollector interface to collect all hits, no matter how small the score.

You can write a simple class using IndexReader to iterate over all index
terms, and see if the exact term exists (with exactly the same case as
in your rewritten query).

--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]