Searching UN_TOKENIZED fields

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Searching UN_TOKENIZED fields

deshmol-lists
Hi,

I have a field indexed as follows:

new Field(name, value, Store.YES, Index.UN_TOKENIZED)


I would like to search this field for exact match of
the query term. Thus if, for instance in the above
code snippet:

  String name="PROJECT";
  String value="Apache Lucene";

I would like to get a hit in the following case:
   query is PROJECT:"apache lucene"
OR query is PROJECT:"Apache Lucene"
OR query is PROJECT:"Apache Luc*"

...but not in the following case:
   query is: PROJECT:apache
OR query is: PROJECT:lucene

With the Indexing call as above, and a query string of
PROJECT:"Apache Lucene" I get 0 hits. I do get hits if
I create the Field as TOKENIZED, but then it also
matches the query PROJECT:apache which is not what I
want.

It is my understanding that I'm indexing correctly,
but when I query, I need to indicate to the
QueryParser that it should not tokenize the query
string.
Since the call:
Term[] terms = ((PhraseQuery) query).getTerms();

returns 2 terms, which for the above example are:
"Apache" and "Lucene"

Any ideas on how I can make this work?

Thanks,
~ amol

~ They have even applied logic to probability and vice versa. ~

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Searching UN_TOKENIZED fields

Michael D. Curtin
[hidden email] wrote:

> Hi,
>
> I have a field indexed as follows:
>
> new Field(name, value, Store.YES, Index.UN_TOKENIZED)
>
>
> I would like to search this field for exact match of
> the query term. Thus if, for instance in the above
> code snippet:
>
>   String name="PROJECT";
>   String value="Apache Lucene";
>
> I would like to get a hit in the following case:
>    query is PROJECT:"apache lucene"
> OR query is PROJECT:"Apache Lucene"
> OR query is PROJECT:"Apache Luc*"
>
> ...but not in the following case:
>    query is: PROJECT:apache
> OR query is: PROJECT:lucene
>
> With the Indexing call as above, and a query string of
> PROJECT:"Apache Lucene" I get 0 hits. I do get hits if
> I create the Field as TOKENIZED, but then it also
> matches the query PROJECT:apache which is not what I
> want.
>
> It is my understanding that I'm indexing correctly,
> but when I query, I need to indicate to the
> QueryParser that it should not tokenize the query
> string.
> Since the call:
> Term[] terms = ((PhraseQuery) query).getTerms();
>
> returns 2 terms, which for the above example are:
> "Apache" and "Lucene"
>
> Any ideas on how I can make this work?

If your queries are really as simple as this example, then you could just
build queries up programmatically instead of using QueryParser, e.g. with new
TermQuery().  Otherwise, you might want to look into using PerFieldAnalyzer
and KeywordAnalyzer with QueryParser.

--MDC

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Searching UN_TOKENIZED fields

deshmol-lists
Thanks Michael,

I general my queries could be more complex than the
example I outlined earlier, so I do need to use the
Query Parser. Hence, PerFieldAnalyzerWrapper and
KeywordAnalyzer seemed to do the trick for me.

PerFieldAnalyzerWrapper analyzer = new
PerFieldAnalyzerWrapper(new StandardAnalyzer());
analyzer.addAnalyzer("PROJECT", new
KeywordAnalyzer());
QueryParser parser = new QueryParser(defaultField,
analyzer);

~ amol




--- "Michael D. Curtin" <[hidden email]> wrote:

> [hidden email] wrote:
> > Hi,
> >
> > I have a field indexed as follows:
> >
> > new Field(name, value, Store.YES,
> Index.UN_TOKENIZED)
> >
> >
> > I would like to search this field for exact match
> of
> > the query term. Thus if, for instance in the above
> > code snippet:
> >
> >   String name="PROJECT";
> >   String value="Apache Lucene";
> >
> > I would like to get a hit in the following case:
> >    query is PROJECT:"apache lucene"
> > OR query is PROJECT:"Apache Lucene"
> > OR query is PROJECT:"Apache Luc*"
> >
> > ...but not in the following case:
> >    query is: PROJECT:apache
> > OR query is: PROJECT:lucene
> >
> > With the Indexing call as above, and a query
> string of
> > PROJECT:"Apache Lucene" I get 0 hits. I do get
> hits if
> > I create the Field as TOKENIZED, but then it also
> > matches the query PROJECT:apache which is not what
> I
> > want.
> >
> > It is my understanding that I'm indexing
> correctly,
> > but when I query, I need to indicate to the
> > QueryParser that it should not tokenize the query
> > string.
> > Since the call:
> > Term[] terms = ((PhraseQuery) query).getTerms();
> >
> > returns 2 terms, which for the above example are:
> > "Apache" and "Lucene"
> >
> > Any ideas on how I can make this work?
>
> If your queries are really as simple as this
> example, then you could just
> build queries up programmatically instead of using
> QueryParser, e.g. with new
> TermQuery().  Otherwise, you might want to look into
> using PerFieldAnalyzer
> and KeywordAnalyzer with QueryParser.
>
> --MDC
>
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> [hidden email]
> For additional commands, e-mail:
> [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]