How to search both Tokenized and Untokenized fields

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

How to search both Tokenized and Untokenized fields

rokham
Hi,

I've been trying to find a way which allows executing a query that contains both Tokenized and Untokenized fields on Lucene's index, without having to parse the query. I've been able to execute a query which only uses Tokenized fields as follows:

   QueryParser queryParser = new QueryParser( DEFAULT_FIELD, analyzer);
   Query query = queryParser.parse(queryString);
   Hits hits = indexSearcher.search(query);

This works fine for Tokenized fields but I'm not sure how to execute a query ("queryString") which contains both tokenized and untokenized fields.

Any suggestion is very much appreciated.

Rokham
Reply | Threaded
Open this post in threaded view
|

Re: How to search both Tokenized and Untokenized fields

Erick Erickson
PerFieldAnalyzerWrapper is your friend, assuming that you have separate
fields, some tokenized and some not. If you *don't* have separate
fields, then we need more details of what you hope to accomplish...

something like

(+tokenized:value1 +tokenized:vaue2) (+untokenized:value3 +
untokenized:value4)

should do the trick, where you've constructed a PerFieldAnalyzerWrapper
with a tokenizing analyzer for field "tokenized" and a non-tokenizing
analyzer
for field "untokenized".

Best
Erick

On Mon, Mar 9, 2009 at 11:01 AM, rokham <[hidden email]> wrote:

>
> Hi,
>
> I've been trying to find a way which allows executing a query that contains
> both Tokenized and Untokenized fields on Lucene's index, without having to
> parse the query. I've been able to execute a query which only uses
> Tokenized
> fields as follows:
>
>   QueryParser queryParser = new QueryParser( DEFAULT_FIELD, analyzer);
>   Query query = queryParser.parse(queryString);
>   Hits hits = indexSearcher.search(query);
>
> This works fine for Tokenized fields but I'm not sure how to execute a
> query
> ("queryString") which contains both tokenized and untokenized fields.
>
> Any suggestion is very much appreciated.
>
> Rokham
> --
> View this message in context:
> http://www.nabble.com/How-to-search-both-Tokenized-and-Untokenized-fields-tp22413438p22413438.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: How to search both Tokenized and Untokenized fields

rokham
Thanks a bunch for you very prompt reply. I looked into the PerFieldAnalyzerWrapper class and I understand how you can add a specific analyzer for each field. My question is how does this link to the query that's sent to me.

If I'm given a query as follows:
(+tokenized:value1 +tokenized:vaue2) (+untokenized:value3 + untokenized:value4)

can you please give me a seudo code/code example where I would search Lucene's index based on the given fields and my desired analyzer for each field? I'm not clear on how I can go about building a PerFieldAnalyzerWrapper object without having to parse the query and take out the fields and assign their specific analyzer to them.

Rokham


Erick Erickson wrote
PerFieldAnalyzerWrapper is your friend, assuming that you have separate
fields, some tokenized and some not. If you *don't* have separate
fields, then we need more details of what you hope to accomplish...

something like

(+tokenized:value1 +tokenized:vaue2) (+untokenized:value3 +
untokenized:value4)

should do the trick, where you've constructed a PerFieldAnalyzerWrapper
with a tokenizing analyzer for field "tokenized" and a non-tokenizing
analyzer
for field "untokenized".

Best
Erick

On Mon, Mar 9, 2009 at 11:01 AM, rokham <somebodyiknow@gmail.com> wrote:

>
> Hi,
>
> I've been trying to find a way which allows executing a query that contains
> both Tokenized and Untokenized fields on Lucene's index, without having to
> parse the query. I've been able to execute a query which only uses
> Tokenized
> fields as follows:
>
>   QueryParser queryParser = new QueryParser( DEFAULT_FIELD, analyzer);
>   Query query = queryParser.parse(queryString);
>   Hits hits = indexSearcher.search(query);
>
> This works fine for Tokenized fields but I'm not sure how to execute a
> query
> ("queryString") which contains both tokenized and untokenized fields.
>
> Any suggestion is very much appreciated.
>
> Rokham
> --
> View this message in context:
> http://www.nabble.com/How-to-search-both-Tokenized-and-Untokenized-fields-tp22413438p22413438.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Reply | Threaded
Open this post in threaded view
|

Re: How to search both Tokenized and Untokenized fields

Erick Erickson
Well, PerFieldAnalyzerWrapper is just a bunch of Analyzers,independent of
queries. See the API, but in general
PerFieldAnalyzerWrapper perf = new PerFieldAnalyzerWrapper("default", new
StandardAnalyzer());

perf.add("untokenized", new WhitespaceAnalyzer());
perf.add("tokenized", new SnowballAnalyzer());

etc...

Some time later...
QueryParser qp = new QueryParser("defaultfield", perf);
Query q = qp.parse("tokenized:value1 +tokenized:vaue2) (+untokenized:value3
+
untokenized:value4)");


But you have to get from your user input to the field:value form
and that's what your application has to do. Presumably
your application has some way of getting the query from
the user in such a fashion that you can map particular terms
to particular fields. If you don't, you have a problem that
Lucene can't help you with <G>..

Best
Erick


On Wed, Mar 11, 2009 at 1:22 AM, rokham <[hidden email]> wrote:

>
> Thanks a bunch for you very prompt reply. I looked into the
> PerFieldAnalyzerWrapper class and I understand how you can add a specific
> analyzer for each field. My question is how does this link to the query
> that's sent to me.
>
> If I'm given a query as follows:
> (+tokenized:value1 +tokenized:vaue2) (+untokenized:value3 +
> untokenized:value4)
>
> can you please give me a seudo code/code example where I would search
> Lucene's index based on the given fields and my desired analyzer for each
> field? I'm not clear on how I can go about building a
> PerFieldAnalyzerWrapper object without having to parse the query and take
> out the fields and assign their specific analyzer to them.
>
> Rokham
>
>
>
> Erick Erickson wrote:
> >
> > PerFieldAnalyzerWrapper is your friend, assuming that you have separate
> > fields, some tokenized and some not. If you *don't* have separate
> > fields, then we need more details of what you hope to accomplish...
> >
> > something like
> >
> > (+tokenized:value1 +tokenized:vaue2) (+untokenized:value3 +
> > untokenized:value4)
> >
> > should do the trick, where you've constructed a PerFieldAnalyzerWrapper
> > with a tokenizing analyzer for field "tokenized" and a non-tokenizing
> > analyzer
> > for field "untokenized".
> >
> > Best
> > Erick
> >
> > On Mon, Mar 9, 2009 at 11:01 AM, rokham <[hidden email]> wrote:
> >
> >>
> >> Hi,
> >>
> >> I've been trying to find a way which allows executing a query that
> >> contains
> >> both Tokenized and Untokenized fields on Lucene's index, without having
> >> to
> >> parse the query. I've been able to execute a query which only uses
> >> Tokenized
> >> fields as follows:
> >>
> >>   QueryParser queryParser = new QueryParser( DEFAULT_FIELD, analyzer);
> >>   Query query = queryParser.parse(queryString);
> >>   Hits hits = indexSearcher.search(query);
> >>
> >> This works fine for Tokenized fields but I'm not sure how to execute a
> >> query
> >> ("queryString") which contains both tokenized and untokenized fields.
> >>
> >> Any suggestion is very much appreciated.
> >>
> >> Rokham
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/How-to-search-both-Tokenized-and-Untokenized-fields-tp22413438p22413438.html
> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/How-to-search-both-Tokenized-and-Untokenized-fields-tp22413438p22449012.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: How to search both Tokenized and Untokenized fields

hossman

: Well, PerFieldAnalyzerWrapper is just a bunch of Analyzers,independent of
: queries. See the API, but in general
: PerFieldAnalyzerWrapper perf = new PerFieldAnalyzerWrapper("default", new
: StandardAnalyzer());
:
: perf.add("untokenized", new WhitespaceAnalyzer());
: perf.add("tokenized", new SnowballAnalyzer());

if the "untokenized" field was indexed using Field.Index.UN_TOKENIZED (or
NO_NORMS) then you'll probably want to use KeywordAnalyzer instead of
WhitespaceAnalyzer ... that way a query string like...

    +untokenized:"string with whitespace" +tokenized:"other string"

...will correctly match a doc containing that value in the untokenized
field.




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: How to search both Tokenized and Untokenized fields

Fang_Li
In reply to this post by rokham
Hi,

What do you mean untokenized field?

Are you using different analyzer for different field? If yes, I think
you just use the same analyzer (PerfieldAnalyzer, I guess) for query.

Li

-----Original Message-----
From: rokham [mailto:[hidden email]]
Sent: Monday, March 09, 2009 11:02 PM
To: [hidden email]
Subject: How to search both Tokenized and Untokenized fields


Hi,

I've been trying to find a way which allows executing a query that
contains
both Tokenized and Untokenized fields on Lucene's index, without having
to
parse the query. I've been able to execute a query which only uses
Tokenized
fields as follows:

   QueryParser queryParser = new QueryParser( DEFAULT_FIELD, analyzer);
   Query query = queryParser.parse(queryString);
   Hits hits = indexSearcher.search(query);

This works fine for Tokenized fields but I'm not sure how to execute a
query
("queryString") which contains both tokenized and untokenized fields.

Any suggestion is very much appreciated.

Rokham
--
View this message in context:
http://www.nabble.com/How-to-search-both-Tokenized-and-Untokenized-field
s-tp22413438p22413438.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]