Anomaly in defining search phrase

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Anomaly in defining search phrase

tareque
I found a discrepancy in results for an identical search ("processing")
done with lucene and mysql. Seems like lucene is not returning results
where the search word is associated with "-"(hyphen) or '."(period). For
example it didn't returned result for a text that contained
"processing-7-bit" and "straighforwerd.processing" but mysql did. Is there
any settings issue or it is something unavoidable?

Thanks
Tareque
ControlDOCS

PS: In contrast to that, I previously found lucene returning some other
results those mysql didn't. For example search phrase associated with "'"
(apostrophe)  and "_"(underscore). I am not complaining about this. Rather
I found it preferable for my purpose.




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Anomaly in defining search phrase

Erik Hatcher

On Jun 21, 2005, at 2:59 PM, [hidden email] wrote:

> I found a discrepancy in results for an identical search  
> ("processing")
> done with lucene and mysql. Seems like lucene is not returning results
> where the search word is associated with "-"(hyphen) or  
> '."(period). For
> example it didn't returned result for a text that contained
> "processing-7-bit" and "straighforwerd.processing" but mysql did.  
> Is there
> any settings issue or it is something unavoidable?
>
> Thanks
> Tareque
> ControlDOCS
>
> PS: In contrast to that, I previously found lucene returning some  
> other
> results those mysql didn't. For example search phrase associated  
> with "'"
> (apostrophe)  and "_"(underscore). I am not complaining about this.  
> Rather
> I found it preferable for my purpose.

These all boil down to your choice of analyzer.  What analyzer are  
you using?

As you can see below, "processing-7-bit" is tokenized quite  
differently depending on the analyzer:

$ ant AnalyzerDemo
Buildfile: build.xml

     [input] String to analyze: [This string will be analyzed.]
processing-7-bit
      [echo] Running lia.analysis.AnalyzerDemo...
      [java] Analyzing "processing-7-bit"
      [java]   WhitespaceAnalyzer:
      [java]     [processing-7-bit]

      [java]   SimpleAnalyzer:
      [java]     [processing] [bit]

      [java]   StopAnalyzer:
      [java]     [processing] [bit]

      [java]   StandardAnalyzer:
      [java]     [processing-7-bit]

If you're using the StandardAnalyzer, you are not indexing the word  
"processing" at all.  Grab the source code from Lucene in Action at  
lucenebook.com and type "ant AnalyzerDemo" to try out the basic  
analyzers.

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Anomaly in defining search phrase

tareque
>
> On Jun 21, 2005, at 2:59 PM, [hidden email] wrote:
>
>> I found a discrepancy in results for an identical search
>> ("processing")
>> done with lucene and mysql. Seems like lucene is not returning results
>> where the search word is associated with "-"(hyphen) or
>> '."(period). For
>> example it didn't returned result for a text that contained
>> "processing-7-bit" and "straighforwerd.processing" but mysql did.
>> Is there
>> any settings issue or it is something unavoidable?
>>
>> Thanks
>> Tareque
>> ControlDOCS
>>
>> PS: In contrast to that, I previously found lucene returning some
>> other
>> results those mysql didn't. For example search phrase associated
>> with "'"
>> (apostrophe)  and "_"(underscore). I am not complaining about this.
>> Rather
>> I found it preferable for my purpose.
>
> These all boil down to your choice of analyzer.  What analyzer are
> you using?
>
> As you can see below, "processing-7-bit" is tokenized quite
> differently depending on the analyzer:
>
> $ ant AnalyzerDemo
> Buildfile: build.xml
>
>      [input] String to analyze: [This string will be analyzed.]
> processing-7-bit
>       [echo] Running lia.analysis.AnalyzerDemo...
>       [java] Analyzing "processing-7-bit"
>       [java]   WhitespaceAnalyzer:
>       [java]     [processing-7-bit]
>
>       [java]   SimpleAnalyzer:
>       [java]     [processing] [bit]
>
>       [java]   StopAnalyzer:
>       [java]     [processing] [bit]
>
>       [java]   StandardAnalyzer:
>       [java]     [processing-7-bit]
>
> If you're using the StandardAnalyzer, you are not indexing the word
> "processing" at all.  Grab the source code from Lucene in Action at
> lucenebook.com and type "ant AnalyzerDemo" to try out the basic
> analyzers.
>
>      Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


Thanks! Using StopAnalyzer helped solving the problem. Is there any detail
documentation of what each of this analyzers do?

Tareque


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Anomaly in defining search phrase

Erik Hatcher

On Jun 22, 2005, at 11:35 AM, [hidden email] wrote:
> Thanks! Using StopAnalyzer helped solving the problem. Is there any  
> detail
> documentation of what each of this analyzers do?

Here are some pointers:

     - Lucene's javadocs give a brief description, such as <http://
lucene.apache.org/java/docs/api/org/apache/lucene/analysis/
StopAnalyzer.html>

     - The source code is the ultimate documentation: <http://
svn.apache.org/viewcvs.cgi/lucene/java/trunk/src/java/org/apache/
lucene/analysis/StopAnalyzer.java?rev=168970&view=markup> - look at  
the tokenStream method

     - Several Lucene articles: <http://wiki.apache.org/jakarta- 
lucene/Resources> with the most relevant being my java.net article  
here: <http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html>  
where the AnalysisDemo code is provided.

     - And last but certainly not least, "Lucene in Action" :)  You  
can search for details of analyzers at the lucenebook.com site, like  
this: <http://www.lucenebook.com/search?query=StopAnalyzer> The  
Analysis chapter in LIA provides in-depth details of each of the  
built-in analyzers.

Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]