Lucene query question

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Lucene query question

OneWhoMikes
I am new to Lucene, but the behavior that I am seeing does not seem to
make sense to me.  I am using the latest version of Lucene (1.9.1) and
executing the following code below which creates an index with a
single document and only one field (named "test") with a value of
"[hidden email]".

If I use Luke to search through this newly created index using a query such as:
"test:[hidden email]" I do not get any matches.  However, if
using Luke I browse to the document and click the "Reconstruct & Edit"
button then without making any changes save the document I can then
rerun the same query and it will find the document!

Is this normal?  The only thing I can think of is that the index was
created with Lucene 1.9.1 and is being searched using Luke (which was
probably written using an older version of Lucene).  Any help would be
greatly appreciated.


Thanks in advance,

Mike

-------------------CODE SNIPPET BELOW-----------------


IndexWriter iw = new IndexWriter("index", new StandardAnalyzer(), true);
Document d = new Document();

d.add( new Field("test", "[hidden email]", Field.Store.YES,
Field.Index.TOKENIZED));

iw.addDocument( d );

iw.close();

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene query question

Otis Gospodnetic-2
Mike,

Do you really want to tokenize your emails?  StandardAnalyzer may in fact recognize email addresses and leave them as one token, but it would probably be better practice to make that email field UN_TOKENIZED.

Most of the time when people have trouble finding a Document they _know_ is in the index, the problem involves Analyzers, and sometimes the QueryParser+Analyzer combination.  Grab Lucene in Action code and run the app that will take your input and pass it through various Analyzers to get an idea what's happening with your test field.

Otis

----- Original Message ----
From: Mike Richmond <[hidden email]>
To: [hidden email]
Sent: Tuesday, May 9, 2006 10:18:29 PM
Subject: Lucene query question

I am new to Lucene, but the behavior that I am seeing does not seem to
make sense to me.  I am using the latest version of Lucene (1.9.1) and
executing the following code below which creates an index with a
single document and only one field (named "test") with a value of
"[hidden email]".

If I use Luke to search through this newly created index using a query such as:
"test:[hidden email]" I do not get any matches.  However, if
using Luke I browse to the document and click the "Reconstruct & Edit"
button then without making any changes save the document I can then
rerun the same query and it will find the document!

Is this normal?  The only thing I can think of is that the index was
created with Lucene 1.9.1 and is being searched using Luke (which was
probably written using an older version of Lucene).  Any help would be
greatly appreciated.


Thanks in advance,

Mike

-------------------CODE SNIPPET BELOW-----------------


IndexWriter iw = new IndexWriter("index", new StandardAnalyzer(), true);
Document d = new Document();

d.add( new Field("test", "[hidden email]", Field.Store.YES,
Field.Index.TOKENIZED));

iw.addDocument( d );

iw.close();

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene query question

OneWhoMikes
Mr. Gospodnetic,

Thanks for the quick response.  You make a good point about the field
being tokenized, and I initially had the e-mail field UN_TOKENIZED but
it did not change the result of the query (my example search was still
failing).  Do you have any ideas on what could be causing that?


Thanks again,

--Mike




On 5/10/06, Otis Gospodnetic <[hidden email]> wrote:

> Mike,
>
> Do you really want to tokenize your emails?  StandardAnalyzer may in fact recognize email addresses and leave them as one token, but it would probably be better practice to make that email field UN_TOKENIZED.
>
> Most of the time when people have trouble finding a Document they _know_ is in the index, the problem involves Analyzers, and sometimes the QueryParser+Analyzer combination.  Grab Lucene in Action code and run the app that will take your input and pass it through various Analyzers to get an idea what's happening with your test field.
>
> Otis
>
> ----- Original Message ----
> From: Mike Richmond <[hidden email]>
> To: [hidden email]
> Sent: Tuesday, May 9, 2006 10:18:29 PM
> Subject: Lucene query question
>
> I am new to Lucene, but the behavior that I am seeing does not seem to
> make sense to me.  I am using the latest version of Lucene (1.9.1) and
> executing the following code below which creates an index with a
> single document and only one field (named "test") with a value of
> "[hidden email]".
>
> If I use Luke to search through this newly created index using a query such as:
> "test:[hidden email]" I do not get any matches.  However, if
> using Luke I browse to the document and click the "Reconstruct & Edit"
> button then without making any changes save the document I can then
> rerun the same query and it will find the document!
>
> Is this normal?  The only thing I can think of is that the index was
> created with Lucene 1.9.1 and is being searched using Luke (which was
> probably written using an older version of Lucene).  Any help would be
> greatly appreciated.
>
>
> Thanks in advance,
>
> Mike
>
> -------------------CODE SNIPPET BELOW-----------------
>
>
> IndexWriter iw = new IndexWriter("index", new StandardAnalyzer(), true);
> Document d = new Document();
>
> d.add( new Field("test", "[hidden email]", Field.Store.YES,
> Field.Index.TOKENIZED));
>
> iw.addDocument( d );
>
> iw.close();
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene query question

Erick Erickson
I'll take a quick stab at it. What analyzer are you using with the query? In
the search page of Luke, near the upper right there's the "Analyzer to use
for query parsing:" box. You might try the WhitespaceAnalyzer, since that
shouldn't do anything "interesting". Also, below the search box on the
search page, you can get a lot of information from the "update" and "explain
structure" buttons.

These make a LOT of lucene's behavior clearer.

Finally, watch out for case. Lucene's comparisons are often
case-sensitive.....

Best
Erick
Reply | Threaded
Open this post in threaded view
|

RE: Lucene query question

Kinnar Kumar Sen, Noida
In reply to this post by OneWhoMikes

Hi Erik

Is there any way to do a case insensitive search without modifying the
index

Regards and Thanks
Kinnar Kumar Sen
  HCL Technologies Ltd.
  Sec-60, Noida-201301
    Ph: - 09313297423



TO SUCEED BE DIFFERENT BE DARING AND BE THERE FIRST



-----Original Message-----
From: Erick Erickson [mailto:[hidden email]]
Sent: Wednesday, May 10, 2006 5:59 PM
To: [hidden email]
Subject: Re: Lucene query question

I'll take a quick stab at it. What analyzer are you using with the
query? In
the search page of Luke, near the upper right there's the "Analyzer to
use
for query parsing:" box. You might try the WhitespaceAnalyzer, since
that
shouldn't do anything "interesting". Also, below the search box on the
search page, you can get a lot of information from the "update" and
"explain
structure" buttons.

These make a LOT of lucene's behavior clearer.

Finally, watch out for case. Lucene's comparisons are often
case-sensitive.....

Best
Erick

DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have
received this email in error please delete it and notify the sender immediately. Before opening any mail and
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene query question

OneWhoMikes
In reply to this post by Erick Erickson
Hi Erick,

I appreciate the help. I am using the "StandardAnalyzer" for both the
query and for indexing.


--Mike


On 5/10/06, Erick Erickson <[hidden email]> wrote:

> I'll take a quick stab at it. What analyzer are you using with the query? In
> the search page of Luke, near the upper right there's the "Analyzer to use
> for query parsing:" box. You might try the WhitespaceAnalyzer, since that
> shouldn't do anything "interesting". Also, below the search box on the
> search page, you can get a lot of information from the "update" and "explain
> structure" buttons.
>
> These make a LOT of lucene's behavior clearer.
>
> Finally, watch out for case. Lucene's comparisons are often
> case-sensitive.....
>
> Best
> Erick
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]