alpha numeric searching or highlighting problem.

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

alpha numeric searching or highlighting problem.

Shah, Yagnesh
Hi folks,
        I am playing with HighlighIt.java of "Luceneinaction" code. I have modified the text string so that "45 BC" will now "45BC" and another "45 BC" to "Z3950". I have also modified this line and my output file do not creates and highlighting.

Works:
    TermQuery query = new TermQuery(new Term("f", "ipsum"));
    TermQuery query = new TermQuery(new Term("f", "2000"));

Do not work:

    TermQuery query = new TermQuery(new Term("f", "45BC"));
    TermQuery query = new TermQuery(new Term("f", "Z3950"));

Modified text string from:

  private static final String text =
      "Contrary to popular belief, Lorem Ipsum is" +
      " not simply random text. It has roots in a piece of" +
      " classical Latin literature from 45 BC, making it over" +
      " 2000 years old. Richard McClintock, a Latin professor" +
      " at Hampden-Sydney College in Virginia, looked up one" +
      " of the more obscure Latin words, consectetur, from" +
      " a Lorem Ipsum passage, and going through the cites" +
      " of the word in classical literature, discovered the" +
      " undoubtable source. Lorem Ipsum comes from sections" +
      " 1.10.32 and 1.10.33 of \"de Finibus Bonorum et" +
      " Malorum\" (The Extremes of Good and Evil) by Cicero," +
      " written in 45 BC. This book is a treatise on the" +
      " theory of ethics, very popular during the" +
      " Renaissance. The first line of Lorem Ipsum, \"Lorem" +
      " ipsum dolor sit amet..\", comes from a line in" +
      " section 1.10.32.";  
to

  private static final String text =
      "Contrary to popular belief, Lorem Ipsum is" +
      " not simply random text. It has roots in a piece of" +
      " classical Latin literature from 45BC, making it over" +
      " 2000 years old. Richard McClintock, a Latin professor" +
      " at Hampden-Sydney College in Virginia, looked up one" +
      " of the more obscure Latin words, consectetur, from" +
      " a Lorem Ipsum passage, and going through the cites" +
      " of the word in classical literature, discovered the" +
      " undoubtable source. Lorem Ipsum comes from sections" +
      " 1.10.32 and 1.10.33 of \"de Finibus Bonorum et" +
      " Malorum\" (The Extremes of Good and Evil) by Cicero," +
      " written in Z3950. This book is a treatise on the" +
      " theory of ethics, very popular during the" +
      " Renaissance. The first line of Lorem Ipsum, \"Lorem" +
      " ipsum dolor sit amet..\", comes from a line in" +
      " section 1.10.32.";  


Yagnesh N. Shah
Senior Technology Engineer
CS Dept., 4th Floor
H. W. Wilson
950 University Avenue,
Bronx NY 10452
(718) 588 8400 x2721
http://www.hwwilson.com

 

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: alpha numeric searching or highlighting problem.

Erik Hatcher
On May 6, 2005, at 5:43 PM, Yagnesh Shah wrote:

> Hi folks,
>     I am playing with HighlighIt.java of "Luceneinaction" code. I  
> have modified the text string so that "45 BC" will now "45BC" and  
> another "45 BC" to "Z3950". I have also modified this line and my  
> output file do not creates and highlighting.
>
> Works:
>     TermQuery query = new TermQuery(new Term("f", "ipsum"));
>     TermQuery query = new TermQuery(new Term("f", "2000"));
>
> Do not work:
>
>     TermQuery query = new TermQuery(new Term("f", "45BC"));
>     TermQuery query = new TermQuery(new Term("f", "Z3950"));

This is a classic "analysis paralysis" issue.  The HighlightIt code  
uses the StandardAnalyzer.  Analyzing 45BC and Z3950 yields the  
following (from the Lucene In Action code, run "ant AnalyzerDemo"):

AnalyzerDemo:
      [echo]
      [echo]       Demonstrates analysis of sample text.
      [echo]
      [echo]       Refer to the "Analysis" chapter for much more on this
      [echo]       extremely crucial topic.
      [echo]
     [input] Press return to continue...
     [input] String to analyze: [This string will be analyzed.]
45BC Z3950
      [echo] Running lia.analysis.AnalyzerDemo...
      [java] Analyzing "45BC Z3950"
...
      [java]   StandardAnalyzer:
      [java]     [45bc] [z3950]

Notice that it has been lowercased.  A TermQuery must match the case  
exactly as the tokens returned from the analysis process.

Change to "45bc" in your TermQuery and you'll see highlighting.

     Erik


>
> Modified text string from:
>
>   private static final String text =
>       "Contrary to popular belief, Lorem Ipsum is" +
>       " not simply random text. It has roots in a piece of" +
>       " classical Latin literature from 45 BC, making it over" +
>       " 2000 years old. Richard McClintock, a Latin professor" +
>       " at Hampden-Sydney College in Virginia, looked up one" +
>       " of the more obscure Latin words, consectetur, from" +
>       " a Lorem Ipsum passage, and going through the cites" +
>       " of the word in classical literature, discovered the" +
>       " undoubtable source. Lorem Ipsum comes from sections" +
>       " 1.10.32 and 1.10.33 of \"de Finibus Bonorum et" +
>       " Malorum\" (The Extremes of Good and Evil) by Cicero," +
>       " written in 45 BC. This book is a treatise on the" +
>       " theory of ethics, very popular during the" +
>       " Renaissance. The first line of Lorem Ipsum, \"Lorem" +
>       " ipsum dolor sit amet..\", comes from a line in" +
>       " section 1.10.32.";
> to
>
>   private static final String text =
>       "Contrary to popular belief, Lorem Ipsum is" +
>       " not simply random text. It has roots in a piece of" +
>       " classical Latin literature from 45BC, making it over" +
>       " 2000 years old. Richard McClintock, a Latin professor" +
>       " at Hampden-Sydney College in Virginia, looked up one" +
>       " of the more obscure Latin words, consectetur, from" +
>       " a Lorem Ipsum passage, and going through the cites" +
>       " of the word in classical literature, discovered the" +
>       " undoubtable source. Lorem Ipsum comes from sections" +
>       " 1.10.32 and 1.10.33 of \"de Finibus Bonorum et" +
>       " Malorum\" (The Extremes of Good and Evil) by Cicero," +
>       " written in Z3950. This book is a treatise on the" +
>       " theory of ethics, very popular during the" +
>       " Renaissance. The first line of Lorem Ipsum, \"Lorem" +
>       " ipsum dolor sit amet..\", comes from a line in" +
>       " section 1.10.32.";
>
>
> Yagnesh N. Shah
> Senior Technology Engineer
> CS Dept., 4th Floor
> H. W. Wilson
> 950 University Avenue,
> Bronx NY 10452
> (718) 588 8400 x2721
> http://www.hwwilson.com
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: alpha numeric searching or highlighting problem.

Shah, Yagnesh
In reply to this post by Shah, Yagnesh
Thanks! for pointing this out.

-----Original Message-----
From: Erik Hatcher [mailto:[hidden email]]
Sent: Friday, May 06, 2005 8:15 PM
To: [hidden email]
Subject: Re: alpha numeric searching or highlighting problem.


On May 6, 2005, at 5:43 PM, Yagnesh Shah wrote:

> Hi folks,
>     I am playing with HighlighIt.java of "Luceneinaction" code. I  
> have modified the text string so that "45 BC" will now "45BC" and  
> another "45 BC" to "Z3950". I have also modified this line and my  
> output file do not creates and highlighting.
>
> Works:
>     TermQuery query = new TermQuery(new Term("f", "ipsum"));
>     TermQuery query = new TermQuery(new Term("f", "2000"));
>
> Do not work:
>
>     TermQuery query = new TermQuery(new Term("f", "45BC"));
>     TermQuery query = new TermQuery(new Term("f", "Z3950"));

This is a classic "analysis paralysis" issue.  The HighlightIt code  
uses the StandardAnalyzer.  Analyzing 45BC and Z3950 yields the  
following (from the Lucene In Action code, run "ant AnalyzerDemo"):

AnalyzerDemo:
      [echo]
      [echo]       Demonstrates analysis of sample text.
      [echo]
      [echo]       Refer to the "Analysis" chapter for much more on this
      [echo]       extremely crucial topic.
      [echo]
     [input] Press return to continue...
     [input] String to analyze: [This string will be analyzed.]
45BC Z3950
      [echo] Running lia.analysis.AnalyzerDemo...
      [java] Analyzing "45BC Z3950"
...
      [java]   StandardAnalyzer:
      [java]     [45bc] [z3950]

Notice that it has been lowercased.  A TermQuery must match the case  
exactly as the tokens returned from the analysis process.

Change to "45bc" in your TermQuery and you'll see highlighting.

     Erik


>
> Modified text string from:
>
>   private static final String text =
>       "Contrary to popular belief, Lorem Ipsum is" +
>       " not simply random text. It has roots in a piece of" +
>       " classical Latin literature from 45 BC, making it over" +
>       " 2000 years old. Richard McClintock, a Latin professor" +
>       " at Hampden-Sydney College in Virginia, looked up one" +
>       " of the more obscure Latin words, consectetur, from" +
>       " a Lorem Ipsum passage, and going through the cites" +
>       " of the word in classical literature, discovered the" +
>       " undoubtable source. Lorem Ipsum comes from sections" +
>       " 1.10.32 and 1.10.33 of \"de Finibus Bonorum et" +
>       " Malorum\" (The Extremes of Good and Evil) by Cicero," +
>       " written in 45 BC. This book is a treatise on the" +
>       " theory of ethics, very popular during the" +
>       " Renaissance. The first line of Lorem Ipsum, \"Lorem" +
>       " ipsum dolor sit amet..\", comes from a line in" +
>       " section 1.10.32.";
> to
>
>   private static final String text =
>       "Contrary to popular belief, Lorem Ipsum is" +
>       " not simply random text. It has roots in a piece of" +
>       " classical Latin literature from 45BC, making it over" +
>       " 2000 years old. Richard McClintock, a Latin professor" +
>       " at Hampden-Sydney College in Virginia, looked up one" +
>       " of the more obscure Latin words, consectetur, from" +
>       " a Lorem Ipsum passage, and going through the cites" +
>       " of the word in classical literature, discovered the" +
>       " undoubtable source. Lorem Ipsum comes from sections" +
>       " 1.10.32 and 1.10.33 of \"de Finibus Bonorum et" +
>       " Malorum\" (The Extremes of Good and Evil) by Cicero," +
>       " written in Z3950. This book is a treatise on the" +
>       " theory of ethics, very popular during the" +
>       " Renaissance. The first line of Lorem Ipsum, \"Lorem" +
>       " ipsum dolor sit amet..\", comes from a line in" +
>       " section 1.10.32.";
>
>
> Yagnesh N. Shah
> Senior Technology Engineer
> CS Dept., 4th Floor
> H. W. Wilson
> 950 University Avenue,
> Bronx NY 10452
> (718) 588 8400 x2721
> http://www.hwwilson.com
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]