Fw: Urgent : Specific search problem with whitespace analyzer

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Fw: Urgent : Specific search problem with whitespace analyzer

Krishnendra Nandi
Hi,

I am doing "field:text" kind of search using my own analyzer which behaves
like whitespaceanalyzer. Following are the code snippets for my own
whitespaceanalyzer and whitespacetokenizer.


// WhiteSpaceAnalyzerMaestro.java
package com.hewitt.itk.maestro.support.service.simplesearch;

import java.io.Reader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;

/** An Analyzer that uses WhitespaceTokenizer. */

public final class WhitespaceAnalyzerMaestro extends Analyzer {
  public TokenStream tokenStream(String fieldName, Reader reader) {
    return new WhitespaceTokenizerMaestro(reader);
  }
}



// WhitespaceTokenizerMaestro.java
package com.hewitt.itk.maestro.support.service.simplesearch;

import java.io.Reader;

import org.apache.lucene.analysis.WhitespaceTokenizer;

/** A WhitespaceTokenizerMaestro is a tokenizer that divides text at
whitespace.
 * Adjacent sequences of non-Whitespace characters form tokens. */

public class WhitespaceTokenizerMaestro extends WhitespaceTokenizer {
  /** Construct a new WhitespaceTokenizerMaestro. */
  public WhitespaceTokenizerMaestro(Reader in) {
    super(in);
  }

  /** Collects only characters which do not satisfy
   * {@link Character#isWhitespace(char)}
   * and lowercases that character before returning.*/
  protected boolean isTokenChar(char c) {
        c = Character.toLowerCase(c);
    return !Character.isWhitespace(c);
  }
}



I have modified the tokenizer class by making it return characters in
lower case.

Now my search criteria is  ISSUE_TITLE:test  in which  ISSUE_TITLE is the
field in which test is to be searched.

Following is my code snippet which is doing the search:

BooleanQuery masterQuery = new BooleanQuery();
 
 masterQuery.add(MultiFieldQueryParser.parse(
                                                        searchQuery,
                                                        fields,
                                                        analyzer),
                            REQUIRED,
                            PROHIBITED);

Here the searchquery is   ISSUE_TITLE:test , fields is the array of fields
in which ISSUE_TITLE is one of the fields and analyzer is
WhitespaceAnalyzerMaestro() (already mentioned above).

When I run the search, the masterQuery I get after running the above code
snippet has the following value:
+(ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test*
ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test*
ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test*
ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test*
ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test*
ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test*
ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test*
ISSUE_TITLE:test* ISSUE_TITLE:test*)

which I think is not correct. Is the MultiFieldQueryParser not supporting
WhiteSpaceAnalyzer?

Please help.

Regards
Krishnendra Nandi

 
The information contained in this e-mail and any accompanying documents may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient of this message, or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message, including any attachments. Any dissemination, distribution or other use of the contents of this message by anyone other than the intended recipient
is strictly prohibited.


Reply | Threaded
Open this post in threaded view
|

Re: Fw: Urgent : Specific search problem with whitespace analyzer

Daniel Naber-5
On Monday 20 November 2006 13:54, Krishnendra Nandi wrote:

> When I run the search, the masterQuery I get after running the above
> code snippet has the following value:
> +(ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test*
> ISSUE_TITLE:test*

Could you make a small self-contained test case that demonstrates this?
This would help analyzing the problem. Also, yre you using Lucene 1.4?
Have you tried to update?

Regards
 Daniel

--
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Fw: Urgent : Specific search problem with whitespace analyzer

Chris Hostetter-3
In reply to this post by Krishnendra Nandi

: I have modified the tokenizer class by making it return characters in
: lower case.

there is really no reason to do this ... have your analyzer use the
WhitespaceTokenizer, wrapped in a LowerCaseFilter ... that will illiminate
some of your custom code, and perhaps some of your problems as well.

regarding the rest of your code...

:  masterQuery.add(MultiFieldQueryParser.parse(
:                                                         searchQuery,
:                                                         fields,
:                                                         analyzer),
:                             REQUIRED,
:                             PROHIBITED);
:
: Here the searchquery is   ISSUE_TITLE:test , fields is the array of fields
: in which ISSUE_TITLE is one of the fields and analyzer is
: WhitespaceAnalyzerMaestro() (already mentioned above).

...there is a lot going on here, some of which you haven't included so we
can't be sure what exactly it is...
  1) have you tested your analyzer in isolation to ensure that it's
     working properly outside of a QueryParser?
  2) have you tried it with a plain QueryParser instead of a MultiFieldQueryParser?
  3) have you verified that fields really contains what you think it does?



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]