Search with multiple wildcards

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Search with multiple wildcards

Sertic Mirko, Bedag
Hi@all

 

Is it possible to do a search with multiple wildcards in one query, for
instance "%MANAGE%" AND "CORE%"? Is there a code example available?

 

Thanks a lot

Mirko

 

Reply | Threaded
Open this post in threaded view
|

Re: Search with multiple wildcards

Erick Erickson
Sure, but you'll have to set the leading wildcard option,
which I've forgotten the exact call, but it's in the docs.

And use * rather than % <G>.

But wildcards are tricky, especially the TooManyClauses
exception. You might want to peruse the archive for wildcard
posts...

Best
Erick

On Wed, Sep 10, 2008 at 9:06 AM, Sertic Mirko, Bedag
<[hidden email]>wrote:

> Hi@all
>
>
>
> Is it possible to do a search with multiple wildcards in one query, for
> instance "%MANAGE%" AND "CORE%"? Is there a code example available?
>
>
>
> Thanks a lot
>
> Mirko
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

AW: Search with multiple wildcards

Sertic Mirko, Bedag
Hi

Thank you for your quick response:-)

Of course I need to use the * character :-) But I have read somewhere in the documentation that leading wildcards are not supported, and only one wildcard term per query. Is this limitation resolved in the current version?

Regards
Mirko

-----Ursprüngliche Nachricht-----
Von: Erick Erickson [mailto:[hidden email]]
Gesendet: Mittwoch, 10. September 2008 15:47
An: [hidden email]
Betreff: Re: Search with multiple wildcards

Sure, but you'll have to set the leading wildcard option,
which I've forgotten the exact call, but it's in the docs.

And use * rather than % <G>.

But wildcards are tricky, especially the TooManyClauses
exception. You might want to peruse the archive for wildcard
posts...

Best
Erick

On Wed, Sep 10, 2008 at 9:06 AM, Sertic Mirko, Bedag
<[hidden email]>wrote:

> Hi@all
>
>
>
> Is it possible to do a search with multiple wildcards in one query, for
> instance "%MANAGE%" AND "CORE%"? Is there a code example available?
>
>
>
> Thanks a lot
>
> Mirko
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Search with multiple wildcards

Erick Erickson
Is this what you're referring to?

Lucene supports single and multiple character wildcard searches within
single terms (not within phrase queries).
(from http://lucene.apache.org/java/docs/queryparsersyntax.html)

I'm pretty sure you can have multiple *terms* with wildcards. Luke is your
friend here, download a copy and try it <G>. Be sure on the search tab to
specify StandardAnalyzer or some such, rather than keywordanalyzer.

The phrase is trying to point out that a phrase query does NOT respect
wildcards. That is, submitting
"ab* bc* cd*" AS A PHRASE QUERY won't do what you expect. But I'm pretty
sure that

+field:ab* +field:bc* +field:cd*

will work just fine. The key here is "within single terms", which I think of
as
"within a single term query". You can add as many TermQuerys as you want.
See the query documentation for how to submit phrase queries.

Best
Erick

On Wed, Sep 10, 2008 at 10:11 AM, Sertic Mirko, Bedag <[hidden email]
> wrote:

> Hi
>
> Thank you for your quick response:-)
>
> Of course I need to use the * character :-) But I have read somewhere in
> the documentation that leading wildcards are not supported, and only one
> wildcard term per query. Is this limitation resolved in the current version?
>
> Regards
> Mirko
>
> -----Ursprüngliche Nachricht-----
> Von: Erick Erickson [mailto:[hidden email]]
> Gesendet: Mittwoch, 10. September 2008 15:47
> An: [hidden email]
> Betreff: Re: Search with multiple wildcards
>
> Sure, but you'll have to set the leading wildcard option,
> which I've forgotten the exact call, but it's in the docs.
>
> And use * rather than % <G>.
>
> But wildcards are tricky, especially the TooManyClauses
> exception. You might want to peruse the archive for wildcard
> posts...
>
> Best
> Erick
>
> On Wed, Sep 10, 2008 at 9:06 AM, Sertic Mirko, Bedag
> <[hidden email]>wrote:
>
> > Hi@all
> >
> >
> >
> > Is it possible to do a search with multiple wildcards in one query, for
> > instance "%MANAGE%" AND "CORE%"? Is there a code example available?
> >
> >
> >
> > Thanks a lot
> >
> > Mirko
> >
> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

AW: Search with multiple wildcards

Sertic Mirko, Bedag
Jep, this is what i have read.

do I need to use the query parser, or can I create a query by the api?
Is there an example available?

Thanks a lot
Mirko

-----Ursprüngliche Nachricht-----
Von: Erick Erickson [mailto:[hidden email]]
Gesendet: Mittwoch, 10. September 2008 16:45
An: [hidden email]
Betreff: Re: Search with multiple wildcards

Is this what you're referring to?

Lucene supports single and multiple character wildcard searches within
single terms (not within phrase queries).
(from http://lucene.apache.org/java/docs/queryparsersyntax.html)

I'm pretty sure you can have multiple *terms* with wildcards. Luke is your
friend here, download a copy and try it <G>. Be sure on the search tab to
specify StandardAnalyzer or some such, rather than keywordanalyzer.

The phrase is trying to point out that a phrase query does NOT respect
wildcards. That is, submitting
"ab* bc* cd*" AS A PHRASE QUERY won't do what you expect. But I'm pretty
sure that

+field:ab* +field:bc* +field:cd*

will work just fine. The key here is "within single terms", which I think of
as
"within a single term query". You can add as many TermQuerys as you want.
See the query documentation for how to submit phrase queries.

Best
Erick

On Wed, Sep 10, 2008 at 10:11 AM, Sertic Mirko, Bedag <[hidden email]
> wrote:

> Hi
>
> Thank you for your quick response:-)
>
> Of course I need to use the * character :-) But I have read somewhere in
> the documentation that leading wildcards are not supported, and only one
> wildcard term per query. Is this limitation resolved in the current version?
>
> Regards
> Mirko
>
> -----Ursprüngliche Nachricht-----
> Von: Erick Erickson [mailto:[hidden email]]
> Gesendet: Mittwoch, 10. September 2008 15:47
> An: [hidden email]
> Betreff: Re: Search with multiple wildcards
>
> Sure, but you'll have to set the leading wildcard option,
> which I've forgotten the exact call, but it's in the docs.
>
> And use * rather than % <G>.
>
> But wildcards are tricky, especially the TooManyClauses
> exception. You might want to peruse the archive for wildcard
> posts...
>
> Best
> Erick
>
> On Wed, Sep 10, 2008 at 9:06 AM, Sertic Mirko, Bedag
> <[hidden email]>wrote:
>
> > Hi@all
> >
> >
> >
> > Is it possible to do a search with multiple wildcards in one query, for
> > instance "%MANAGE%" AND "CORE%"? Is there a code example available?
> >
> >
> >
> > Thanks a lot
> >
> > Mirko
> >
> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Search with multiple wildcards

Erick Erickson
Of course you can construct your own BooleanQuery
programmatically.

It's relatively easy, just try it.

On Wed, Sep 10, 2008 at 11:52 AM, Sertic Mirko, Bedag <[hidden email]
> wrote:

> Jep, this is what i have read.
>
> do I need to use the query parser, or can I create a query by the api?
> Is there an example available?
>
> Thanks a lot
> Mirko
>
> -----Ursprüngliche Nachricht-----
> Von: Erick Erickson [mailto:[hidden email]]
> Gesendet: Mittwoch, 10. September 2008 16:45
> An: [hidden email]
> Betreff: Re: Search with multiple wildcards
>
> Is this what you're referring to?
>
> Lucene supports single and multiple character wildcard searches within
> single terms (not within phrase queries).
> (from http://lucene.apache.org/java/docs/queryparsersyntax.html)
>
> I'm pretty sure you can have multiple *terms* with wildcards. Luke is your
> friend here, download a copy and try it <G>. Be sure on the search tab to
> specify StandardAnalyzer or some such, rather than keywordanalyzer.
>
> The phrase is trying to point out that a phrase query does NOT respect
> wildcards. That is, submitting
> "ab* bc* cd*" AS A PHRASE QUERY won't do what you expect. But I'm pretty
> sure that
>
> +field:ab* +field:bc* +field:cd*
>
> will work just fine. The key here is "within single terms", which I think
> of
> as
> "within a single term query". You can add as many TermQuerys as you want.
> See the query documentation for how to submit phrase queries.
>
> Best
> Erick
>
> On Wed, Sep 10, 2008 at 10:11 AM, Sertic Mirko, Bedag
> <[hidden email]
> > wrote:
>
> > Hi
> >
> > Thank you for your quick response:-)
> >
> > Of course I need to use the * character :-) But I have read somewhere in
> > the documentation that leading wildcards are not supported, and only one
> > wildcard term per query. Is this limitation resolved in the current
> version?
> >
> > Regards
> > Mirko
> >
> > -----Ursprüngliche Nachricht-----
> > Von: Erick Erickson [mailto:[hidden email]]
> > Gesendet: Mittwoch, 10. September 2008 15:47
> > An: [hidden email]
> > Betreff: Re: Search with multiple wildcards
> >
> > Sure, but you'll have to set the leading wildcard option,
> > which I've forgotten the exact call, but it's in the docs.
> >
> > And use * rather than % <G>.
> >
> > But wildcards are tricky, especially the TooManyClauses
> > exception. You might want to peruse the archive for wildcard
> > posts...
> >
> > Best
> > Erick
> >
> > On Wed, Sep 10, 2008 at 9:06 AM, Sertic Mirko, Bedag
> > <[hidden email]>wrote:
> >
> > > Hi@all
> > >
> > >
> > >
> > > Is it possible to do a search with multiple wildcards in one query, for
> > > instance "%MANAGE%" AND "CORE%"? Is there a code example available?
> > >
> > >
> > >
> > > Thanks a lot
> > >
> > > Mirko
> > >
> > >
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

AW: Search with multiple wildcards

Sertic Mirko, Bedag
Ok, i gave it a try, but i ran into this TooManyClauses Exception. I see that
3ildcard queries are expanded before they are processed, and I see that i can
set the clauses count to Integer.MAXVALUE, and queries can consume a lot of memory,
but one final thing is still open: does a wildcard query work together with
the Lucene Highlighter? I tried it, but I only got an empty result. Without
wildcards, the highlighter works pretty smooth!

Regards
Mirko

-----Ursprüngliche Nachricht-----
Von: Erick Erickson [mailto:[hidden email]]
Gesendet: Mittwoch, 10. September 2008 18:15
An: [hidden email]
Betreff: Re: Search with multiple wildcards

Of course you can construct your own BooleanQuery
programmatically.

It's relatively easy, just try it.

On Wed, Sep 10, 2008 at 11:52 AM, Sertic Mirko, Bedag <[hidden email]
> wrote:

> Jep, this is what i have read.
>
> do I need to use the query parser, or can I create a query by the api?
> Is there an example available?
>
> Thanks a lot
> Mirko
>
> -----Ursprüngliche Nachricht-----
> Von: Erick Erickson [mailto:[hidden email]]
> Gesendet: Mittwoch, 10. September 2008 16:45
> An: [hidden email]
> Betreff: Re: Search with multiple wildcards
>
> Is this what you're referring to?
>
> Lucene supports single and multiple character wildcard searches within
> single terms (not within phrase queries).
> (from http://lucene.apache.org/java/docs/queryparsersyntax.html)
>
> I'm pretty sure you can have multiple *terms* with wildcards. Luke is your
> friend here, download a copy and try it <G>. Be sure on the search tab to
> specify StandardAnalyzer or some such, rather than keywordanalyzer.
>
> The phrase is trying to point out that a phrase query does NOT respect
> wildcards. That is, submitting
> "ab* bc* cd*" AS A PHRASE QUERY won't do what you expect. But I'm pretty
> sure that
>
> +field:ab* +field:bc* +field:cd*
>
> will work just fine. The key here is "within single terms", which I think
> of
> as
> "within a single term query". You can add as many TermQuerys as you want.
> See the query documentation for how to submit phrase queries.
>
> Best
> Erick
>
> On Wed, Sep 10, 2008 at 10:11 AM, Sertic Mirko, Bedag
> <[hidden email]
> > wrote:
>
> > Hi
> >
> > Thank you for your quick response:-)
> >
> > Of course I need to use the * character :-) But I have read somewhere in
> > the documentation that leading wildcards are not supported, and only one
> > wildcard term per query. Is this limitation resolved in the current
> version?
> >
> > Regards
> > Mirko
> >
> > -----Ursprüngliche Nachricht-----
> > Von: Erick Erickson [mailto:[hidden email]]
> > Gesendet: Mittwoch, 10. September 2008 15:47
> > An: [hidden email]
> > Betreff: Re: Search with multiple wildcards
> >
> > Sure, but you'll have to set the leading wildcard option,
> > which I've forgotten the exact call, but it's in the docs.
> >
> > And use * rather than % <G>.
> >
> > But wildcards are tricky, especially the TooManyClauses
> > exception. You might want to peruse the archive for wildcard
> > posts...
> >
> > Best
> > Erick
> >
> > On Wed, Sep 10, 2008 at 9:06 AM, Sertic Mirko, Bedag
> > <[hidden email]>wrote:
> >
> > > Hi@all
> > >
> > >
> > >
> > > Is it possible to do a search with multiple wildcards in one query, for
> > > instance "%MANAGE%" AND "CORE%"? Is there a code example available?
> > >
> > >
> > >
> > > Thanks a lot
> > >
> > > Mirko
> > >
> > >
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: AW: Search with multiple wildcards

mark harwood
In reply to this post by Sertic Mirko, Bedag
You need to call rewrite on the query to expand it then give that version to the highlighter - see the package javadocs.
http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/highlight/package-summary.html#package_description


Cheers
Mark




----- Original Message ----
From: "Sertic Mirko, Bedag" <[hidden email]>
To: [hidden email]
Sent: Thursday, 11 September, 2008 9:34:13
Subject: AW: Search with multiple wildcards

Ok, i gave it a try, but i ran into this TooManyClauses Exception. I see that
3ildcard queries are expanded before they are processed, and I see that i can
set the clauses count to Integer.MAXVALUE, and queries can consume a lot of memory,
but one final thing is still open: does a wildcard query work together with
the Lucene Highlighter? I tried it, but I only got an empty result. Without
wildcards, the highlighter works pretty smooth!

Regards
Mirko

-----Ursprüngliche Nachricht-----
Von: Erick Erickson [mailto:[hidden email]]
Gesendet: Mittwoch, 10. September 2008 18:15
An: [hidden email]
Betreff: Re: Search with multiple wildcards

Of course you can construct your own BooleanQuery
programmatically.

It's relatively easy, just try it.

On Wed, Sep 10, 2008 at 11:52 AM, Sertic Mirko, Bedag <[hidden email]
> wrote:

> Jep, this is what i have read.
>
> do I need to use the query parser, or can I create a query by the api?
> Is there an example available?
>
> Thanks a lot
> Mirko
>
> -----Ursprüngliche Nachricht-----
> Von: Erick Erickson [mailto:[hidden email]]
> Gesendet: Mittwoch, 10. September 2008 16:45
> An: [hidden email]
> Betreff: Re: Search with multiple wildcards
>
> Is this what you're referring to?
>
> Lucene supports single and multiple character wildcard searches within
> single terms (not within phrase queries).
> (from http://lucene.apache.org/java/docs/queryparsersyntax.html)
>
> I'm pretty sure you can have multiple *terms* with wildcards. Luke is your
> friend here, download a copy and try it <G>. Be sure on the search tab to
> specify StandardAnalyzer or some such, rather than keywordanalyzer.
>
> The phrase is trying to point out that a phrase query does NOT respect
> wildcards. That is, submitting
> "ab* bc* cd*" AS A PHRASE QUERY won't do what you expect. But I'm pretty
> sure that
>
> +field:ab* +field:bc* +field:cd*
>
> will work just fine. The key here is "within single terms", which I think
> of
> as
> "within a single term query". You can add as many TermQuerys as you want.
> See the query documentation for how to submit phrase queries.
>
> Best
> Erick
>
> On Wed, Sep 10, 2008 at 10:11 AM, Sertic Mirko, Bedag
> <[hidden email]
> > wrote:
>
> > Hi
> >
> > Thank you for your quick response:-)
> >
> > Of course I need to use the * character :-) But I have read somewhere in
> > the documentation that leading wildcards are not supported, and only one
> > wildcard term per query. Is this limitation resolved in the current
> version?
> >
> > Regards
> > Mirko
> >
> > -----Ursprüngliche Nachricht-----
> > Von: Erick Erickson [mailto:[hidden email]]
> > Gesendet: Mittwoch, 10. September 2008 15:47
> > An: [hidden email]
> > Betreff: Re: Search with multiple wildcards
> >
> > Sure, but you'll have to set the leading wildcard option,
> > which I've forgotten the exact call, but it's in the docs.
> >
> > And use * rather than % <G>.
> >
> > But wildcards are tricky, especially the TooManyClauses
> > exception. You might want to peruse the archive for wildcard
> > posts...
> >
> > Best
> > Erick
> >
> > On Wed, Sep 10, 2008 at 9:06 AM, Sertic Mirko, Bedag
> > <[hidden email]>wrote:
> >
> > > Hi@all
> > >
> > >
> > >
> > > Is it possible to do a search with multiple wildcards in one query, for
> > > instance "%MANAGE%" AND "CORE%"? Is there a code example available?
> > >
> > >
> > >
> > > Thanks a lot
> > >
> > > Mirko
> > >
> > >
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

AW: AW: Search with multiple wildcards

Sertic Mirko, Bedag
Ok, one final question:

If i query for "*ll*", the query is expanded to ("hallo" or "alle" or ...), so the
Highligter will highlight the words "hallo" or "alle". But how can i highlight only
the original query, so only the "ll"? Is this possible?

Thanks a lot
Mirko

-----Ursprüngliche Nachricht-----
Von: mark harwood [mailto:[hidden email]]
Gesendet: Donnerstag, 11. September 2008 11:20
An: [hidden email]
Betreff: Re: AW: Search with multiple wildcards

You need to call rewrite on the query to expand it then give that version to the highlighter - see the package javadocs.
http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/highlight/package-summary.html#package_description


Cheers
Mark




----- Original Message ----
From: "Sertic Mirko, Bedag" <[hidden email]>
To: [hidden email]
Sent: Thursday, 11 September, 2008 9:34:13
Subject: AW: Search with multiple wildcards

Ok, i gave it a try, but i ran into this TooManyClauses Exception. I see that
3ildcard queries are expanded before they are processed, and I see that i can
set the clauses count to Integer.MAXVALUE, and queries can consume a lot of memory,
but one final thing is still open: does a wildcard query work together with
the Lucene Highlighter? I tried it, but I only got an empty result. Without
wildcards, the highlighter works pretty smooth!

Regards
Mirko

-----Ursprüngliche Nachricht-----
Von: Erick Erickson [mailto:[hidden email]]
Gesendet: Mittwoch, 10. September 2008 18:15
An: [hidden email]
Betreff: Re: Search with multiple wildcards

Of course you can construct your own BooleanQuery
programmatically.

It's relatively easy, just try it.

On Wed, Sep 10, 2008 at 11:52 AM, Sertic Mirko, Bedag <[hidden email]
> wrote:

> Jep, this is what i have read.
>
> do I need to use the query parser, or can I create a query by the api?
> Is there an example available?
>
> Thanks a lot
> Mirko
>
> -----Ursprüngliche Nachricht-----
> Von: Erick Erickson [mailto:[hidden email]]
> Gesendet: Mittwoch, 10. September 2008 16:45
> An: [hidden email]
> Betreff: Re: Search with multiple wildcards
>
> Is this what you're referring to?
>
> Lucene supports single and multiple character wildcard searches within
> single terms (not within phrase queries).
> (from http://lucene.apache.org/java/docs/queryparsersyntax.html)
>
> I'm pretty sure you can have multiple *terms* with wildcards. Luke is your
> friend here, download a copy and try it <G>. Be sure on the search tab to
> specify StandardAnalyzer or some such, rather than keywordanalyzer.
>
> The phrase is trying to point out that a phrase query does NOT respect
> wildcards. That is, submitting
> "ab* bc* cd*" AS A PHRASE QUERY won't do what you expect. But I'm pretty
> sure that
>
> +field:ab* +field:bc* +field:cd*
>
> will work just fine. The key here is "within single terms", which I think
> of
> as
> "within a single term query". You can add as many TermQuerys as you want.
> See the query documentation for how to submit phrase queries.
>
> Best
> Erick
>
> On Wed, Sep 10, 2008 at 10:11 AM, Sertic Mirko, Bedag
> <[hidden email]
> > wrote:
>
> > Hi
> >
> > Thank you for your quick response:-)
> >
> > Of course I need to use the * character :-) But I have read somewhere in
> > the documentation that leading wildcards are not supported, and only one
> > wildcard term per query. Is this limitation resolved in the current
> version?
> >
> > Regards
> > Mirko
> >
> > -----Ursprüngliche Nachricht-----
> > Von: Erick Erickson [mailto:[hidden email]]
> > Gesendet: Mittwoch, 10. September 2008 15:47
> > An: [hidden email]
> > Betreff: Re: Search with multiple wildcards
> >
> > Sure, but you'll have to set the leading wildcard option,
> > which I've forgotten the exact call, but it's in the docs.
> >
> > And use * rather than % <G>.
> >
> > But wildcards are tricky, especially the TooManyClauses
> > exception. You might want to peruse the archive for wildcard
> > posts...
> >
> > Best
> > Erick
> >
> > On Wed, Sep 10, 2008 at 9:06 AM, Sertic Mirko, Bedag
> > <[hidden email]>wrote:
> >
> > > Hi@all
> > >
> > >
> > >
> > > Is it possible to do a search with multiple wildcards in one query, for
> > > instance "%MANAGE%" AND "CORE%"? Is there a code example available?
> > >
> > >
> > >
> > > Thanks a lot
> > >
> > > Mirko
> > >
> > >
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


     

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: AW: AW: Search with multiple wildcards

mark harwood
In reply to this post by Sertic Mirko, Bedag
>> Is this possible?

Not currently, the highlighter works with a list of words (or words AND phrases using the new span support) and highlights those.
To do anything else would require the higlighter to faithfully re-implement much of the logic in all of the different query types (fuzzy, wildcard, regex etc etc) which is much more challenging/difficult to maintain.



----- Original Message ----
From: "Sertic Mirko, Bedag" <[hidden email]>
To: [hidden email]
Sent: Thursday, 11 September, 2008 12:07:36
Subject: AW: AW: Search with multiple wildcards

Ok, one final question:

If i query for "*ll*", the query is expanded to ("hallo" or "alle" or ...), so the
Highligter will highlight the words "hallo" or "alle". But how can i highlight only
the original query, so only the "ll"? Is this possible?

Thanks a lot
Mirko

-----Ursprüngliche Nachricht-----
Von: mark harwood [mailto:[hidden email]]
Gesendet: Donnerstag, 11. September 2008 11:20
An: [hidden email]
Betreff: Re: AW: Search with multiple wildcards

You need to call rewrite on the query to expand it then give that version to the highlighter - see the package javadocs.
http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/highlight/package-summary.html#package_description


Cheers
Mark




----- Original Message ----
From: "Sertic Mirko, Bedag" <[hidden email]>
To: [hidden email]
Sent: Thursday, 11 September, 2008 9:34:13
Subject: AW: Search with multiple wildcards

Ok, i gave it a try, but i ran into this TooManyClauses Exception. I see that
3ildcard queries are expanded before they are processed, and I see that i can
set the clauses count to Integer.MAXVALUE, and queries can consume a lot of memory,
but one final thing is still open: does a wildcard query work together with
the Lucene Highlighter? I tried it, but I only got an empty result. Without
wildcards, the highlighter works pretty smooth!

Regards
Mirko

-----Ursprüngliche Nachricht-----
Von: Erick Erickson [mailto:[hidden email]]
Gesendet: Mittwoch, 10. September 2008 18:15
An: [hidden email]
Betreff: Re: Search with multiple wildcards

Of course you can construct your own BooleanQuery
programmatically.

It's relatively easy, just try it.

On Wed, Sep 10, 2008 at 11:52 AM, Sertic Mirko, Bedag <[hidden email]
> wrote:

> Jep, this is what i have read.
>
> do I need to use the query parser, or can I create a query by the api?
> Is there an example available?
>
> Thanks a lot
> Mirko
>
> -----Ursprüngliche Nachricht-----
> Von: Erick Erickson [mailto:[hidden email]]
> Gesendet: Mittwoch, 10. September 2008 16:45
> An: [hidden email]
> Betreff: Re: Search with multiple wildcards
>
> Is this what you're referring to?
>
> Lucene supports single and multiple character wildcard searches within
> single terms (not within phrase queries).
> (from http://lucene.apache.org/java/docs/queryparsersyntax.html)
>
> I'm pretty sure you can have multiple *terms* with wildcards. Luke is your
> friend here, download a copy and try it <G>. Be sure on the search tab to
> specify StandardAnalyzer or some such, rather than keywordanalyzer.
>
> The phrase is trying to point out that a phrase query does NOT respect
> wildcards. That is, submitting
> "ab* bc* cd*" AS A PHRASE QUERY won't do what you expect. But I'm pretty
> sure that
>
> +field:ab* +field:bc* +field:cd*
>
> will work just fine. The key here is "within single terms", which I think
> of
> as
> "within a single term query". You can add as many TermQuerys as you want.
> See the query documentation for how to submit phrase queries.
>
> Best
> Erick
>
> On Wed, Sep 10, 2008 at 10:11 AM, Sertic Mirko, Bedag
> <[hidden email]
> > wrote:
>
> > Hi
> >
> > Thank you for your quick response:-)
> >
> > Of course I need to use the * character :-) But I have read somewhere in
> > the documentation that leading wildcards are not supported, and only one
> > wildcard term per query. Is this limitation resolved in the current
> version?
> >
> > Regards
> > Mirko
> >
> > -----Ursprüngliche Nachricht-----
> > Von: Erick Erickson [mailto:[hidden email]]
> > Gesendet: Mittwoch, 10. September 2008 15:47
> > An: [hidden email]
> > Betreff: Re: Search with multiple wildcards
> >
> > Sure, but you'll have to set the leading wildcard option,
> > which I've forgotten the exact call, but it's in the docs.
> >
> > And use * rather than % <G>.
> >
> > But wildcards are tricky, especially the TooManyClauses
> > exception. You might want to peruse the archive for wildcard
> > posts...
> >
> > Best
> > Erick
> >
> > On Wed, Sep 10, 2008 at 9:06 AM, Sertic Mirko, Bedag
> > <[hidden email]>wrote:
> >
> > > Hi@all
> > >
> > >
> > >
> > > Is it possible to do a search with multiple wildcards in one query, for
> > > instance "%MANAGE%" AND "CORE%"? Is there a code example available?
> > >
> > >
> > >
> > > Thanks a lot
> > >
> > > Mirko
> > >
> > >
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


     

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: AW: AW: Search with multiple wildcards

Matthew Hall-7
Well, you could certainly manipulate your search string, removing the
wildcard punctuations, and then use that for what you pass to the
highlighter.

That should give you the functionality you are looking for.


-Matt
mark harwood wrote:

>>> Is this possible?
>>>      
>
> Not currently, the highlighter works with a list of words (or words AND phrases using the new span support) and highlights those.
> To do anything else would require the higlighter to faithfully re-implement much of the logic in all of the different query types (fuzzy, wildcard, regex etc etc) which is much more challenging/difficult to maintain.
>
>
>
> ----- Original Message ----
> From: "Sertic Mirko, Bedag" <[hidden email]>
> To: [hidden email]
> Sent: Thursday, 11 September, 2008 12:07:36
> Subject: AW: AW: Search with multiple wildcards
>
> Ok, one final question:
>
> If i query for "*ll*", the query is expanded to ("hallo" or "alle" or ...), so the
> Highligter will highlight the words "hallo" or "alle". But how can i highlight only
> the original query, so only the "ll"? Is this possible?
>
> Thanks a lot
> Mirko
>
> -----Ursprüngliche Nachricht-----
> Von: mark harwood [mailto:[hidden email]]
> Gesendet: Donnerstag, 11. September 2008 11:20
> An: [hidden email]
> Betreff: Re: AW: Search with multiple wildcards
>
> You need to call rewrite on the query to expand it then give that version to the highlighter - see the package javadocs.
> http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/highlight/package-summary.html#package_description
>
>
> Cheers
> Mark
>
>
>
>
> ----- Original Message ----
> From: "Sertic Mirko, Bedag" <[hidden email]>
> To: [hidden email]
> Sent: Thursday, 11 September, 2008 9:34:13
> Subject: AW: Search with multiple wildcards
>
> Ok, i gave it a try, but i ran into this TooManyClauses Exception. I see that
> 3ildcard queries are expanded before they are processed, and I see that i can
> set the clauses count to Integer.MAXVALUE, and queries can consume a lot of memory,
> but one final thing is still open: does a wildcard query work together with
> the Lucene Highlighter? I tried it, but I only got an empty result. Without
> wildcards, the highlighter works pretty smooth!
>
> Regards
> Mirko
>
> -----Ursprüngliche Nachricht-----
> Von: Erick Erickson [mailto:[hidden email]]
> Gesendet: Mittwoch, 10. September 2008 18:15
> An: [hidden email]
> Betreff: Re: Search with multiple wildcards
>
> Of course you can construct your own BooleanQuery
> programmatically.
>
> It's relatively easy, just try it.
>
> On Wed, Sep 10, 2008 at 11:52 AM, Sertic Mirko, Bedag <[hidden email]
>  
>> wrote:
>>    
>
>  
>> Jep, this is what i have read.
>>
>> do I need to use the query parser, or can I create a query by the api?
>> Is there an example available?
>>
>> Thanks a lot
>> Mirko
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Erick Erickson [mailto:[hidden email]]
>> Gesendet: Mittwoch, 10. September 2008 16:45
>> An: [hidden email]
>> Betreff: Re: Search with multiple wildcards
>>
>> Is this what you're referring to?
>>
>> Lucene supports single and multiple character wildcard searches within
>> single terms (not within phrase queries).
>> (from http://lucene.apache.org/java/docs/queryparsersyntax.html)
>>
>> I'm pretty sure you can have multiple *terms* with wildcards. Luke is your
>> friend here, download a copy and try it <G>. Be sure on the search tab to
>> specify StandardAnalyzer or some such, rather than keywordanalyzer.
>>
>> The phrase is trying to point out that a phrase query does NOT respect
>> wildcards. That is, submitting
>> "ab* bc* cd*" AS A PHRASE QUERY won't do what you expect. But I'm pretty
>> sure that
>>
>> +field:ab* +field:bc* +field:cd*
>>
>> will work just fine. The key here is "within single terms", which I think
>> of
>> as
>> "within a single term query". You can add as many TermQuerys as you want.
>> See the query documentation for how to submit phrase queries.
>>
>> Best
>> Erick
>>
>> On Wed, Sep 10, 2008 at 10:11 AM, Sertic Mirko, Bedag
>> <[hidden email]
>>    
>>> wrote:
>>>      
>>> Hi
>>>
>>> Thank you for your quick response:-)
>>>
>>> Of course I need to use the * character :-) But I have read somewhere in
>>> the documentation that leading wildcards are not supported, and only one
>>> wildcard term per query. Is this limitation resolved in the current
>>>      
>> version?
>>    
>>> Regards
>>> Mirko
>>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: Erick Erickson [mailto:[hidden email]]
>>> Gesendet: Mittwoch, 10. September 2008 15:47
>>> An: [hidden email]
>>> Betreff: Re: Search with multiple wildcards
>>>
>>> Sure, but you'll have to set the leading wildcard option,
>>> which I've forgotten the exact call, but it's in the docs.
>>>
>>> And use * rather than % <G>.
>>>
>>> But wildcards are tricky, especially the TooManyClauses
>>> exception. You might want to peruse the archive for wildcard
>>> posts...
>>>
>>> Best
>>> Erick
>>>
>>> On Wed, Sep 10, 2008 at 9:06 AM, Sertic Mirko, Bedag
>>> <[hidden email]>wrote:
>>>
>>>      
>>>> Hi@all
>>>>
>>>>
>>>>
>>>> Is it possible to do a search with multiple wildcards in one query, for
>>>> instance "%MANAGE%" AND "CORE%"? Is there a code example available?
>>>>
>>>>
>>>>
>>>> Thanks a lot
>>>>
>>>> Mirko
>>>>
>>>>
>>>>
>>>>
>>>>        
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>>      
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>>    
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>      
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>      
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>  



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: AW: AW: Search with multiple wildcards

mark harwood
In reply to this post by Sertic Mirko, Bedag
>>That should give you the functionality you are looking for.

If I understand your suggestion correctly, It won't. The Highlighter uses a tokenized version of the document text.

Simplistically it does the following psuedo code:

for all tokens in documentTokenStream,
   if(queryTermsSet.contains(token))
        output "<b>"+token+"</b">
   else
        output token

NOT

for all tokens in query string
     fullDocumentString.replaceAll(queryStringToken, "<b>"+queryStringToken+"</b>"

So in the given example while you suggest manipulating "ll" to be in the query string,  you cannot make "ll" appear as a token in documentTokenStream.

Actually the Highlighter logic is a fair bit more involved than this (especially when using SpanQueryScorer) but the basis of it is there in the above pseudo code.





----- Original Message ----
From: Matthew Hall <[hidden email]>
To: [hidden email]
Sent: Thursday, 11 September, 2008 14:40:26
Subject: Re: AW: AW: Search with multiple wildcards

Well, you could certainly manipulate your search string, removing the
wildcard punctuations, and then use that for what you pass to the
highlighter.

That should give you the functionality you are looking for.


-Matt
mark harwood wrote:

>>> Is this possible?
>>>      
>
> Not currently, the highlighter works with a list of words (or words AND phrases using the new span support) and highlights those.
> To do anything else would require the higlighter to faithfully re-implement much of the logic in all of the different query types (fuzzy, wildcard, regex etc etc) which is much more challenging/difficult to maintain.
>
>
>
> ----- Original Message ----
> From: "Sertic Mirko, Bedag" <[hidden email]>
> To: [hidden email]
> Sent: Thursday, 11 September, 2008 12:07:36
> Subject: AW: AW: Search with multiple wildcards
>
> Ok, one final question:
>
> If i query for "*ll*", the query is expanded to ("hallo" or "alle" or ...), so the
> Highligter will highlight the words "hallo" or "alle". But how can i highlight only
> the original query, so only the "ll"? Is this possible?
>
> Thanks a lot
> Mirko
>
> -----Ursprüngliche Nachricht-----
> Von: mark harwood [mailto:[hidden email]]
> Gesendet: Donnerstag, 11. September 2008 11:20
> An: [hidden email]
> Betreff: Re: AW: Search with multiple wildcards
>
> You need to call rewrite on the query to expand it then give that version to the highlighter - see the package javadocs.
> http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/highlight/package-summary.html#package_description
>
>
> Cheers
> Mark
>
>
>
>
> ----- Original Message ----
> From: "Sertic Mirko, Bedag" <[hidden email]>
> To: [hidden email]
> Sent: Thursday, 11 September, 2008 9:34:13
> Subject: AW: Search with multiple wildcards
>
> Ok, i gave it a try, but i ran into this TooManyClauses Exception. I see that
> 3ildcard queries are expanded before they are processed, and I see that i can
> set the clauses count to Integer.MAXVALUE, and queries can consume a lot of memory,
> but one final thing is still open: does a wildcard query work together with
> the Lucene Highlighter? I tried it, but I only got an empty result. Without
> wildcards, the highlighter works pretty smooth!
>
> Regards
> Mirko
>
> -----Ursprüngliche Nachricht-----
> Von: Erick Erickson [mailto:[hidden email]]
> Gesendet: Mittwoch, 10. September 2008 18:15
> An: [hidden email]
> Betreff: Re: Search with multiple wildcards
>
> Of course you can construct your own BooleanQuery
> programmatically.
>
> It's relatively easy, just try it.
>
> On Wed, Sep 10, 2008 at 11:52 AM, Sertic Mirko, Bedag <[hidden email]
>  
>> wrote:
>>    
>
>  
>> Jep, this is what i have read.
>>
>> do I need to use the query parser, or can I create a query by the api?
>> Is there an example available?
>>
>> Thanks a lot
>> Mirko
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Erick Erickson [mailto:[hidden email]]
>> Gesendet: Mittwoch, 10. September 2008 16:45
>> An: [hidden email]
>> Betreff: Re: Search with multiple wildcards
>>
>> Is this what you're referring to?
>>
>> Lucene supports single and multiple character wildcard searches within
>> single terms (not within phrase queries).
>> (from http://lucene.apache.org/java/docs/queryparsersyntax.html)
>>
>> I'm pretty sure you can have multiple *terms* with wildcards. Luke is your
>> friend here, download a copy and try it <G>. Be sure on the search tab to
>> specify StandardAnalyzer or some such, rather than keywordanalyzer.
>>
>> The phrase is trying to point out that a phrase query does NOT respect
>> wildcards. That is, submitting
>> "ab* bc* cd*" AS A PHRASE QUERY won't do what you expect. But I'm pretty
>> sure that
>>
>> +field:ab* +field:bc* +field:cd*
>>
>> will work just fine. The key here is "within single terms", which I think
>> of
>> as
>> "within a single term query". You can add as many TermQuerys as you want.
>> See the query documentation for how to submit phrase queries.
>>
>> Best
>> Erick
>>
>> On Wed, Sep 10, 2008 at 10:11 AM, Sertic Mirko, Bedag
>> <[hidden email]
>>    
>>> wrote:
>>>      
>>> Hi
>>>
>>> Thank you for your quick response:-)
>>>
>>> Of course I need to use the * character :-) But I have read somewhere in
>>> the documentation that leading wildcards are not supported, and only one
>>> wildcard term per query. Is this limitation resolved in the current
>>>      
>> version?
>>    
>>> Regards
>>> Mirko
>>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: Erick Erickson [mailto:[hidden email]]
>>> Gesendet: Mittwoch, 10. September 2008 15:47
>>> An: [hidden email]
>>> Betreff: Re: Search with multiple wildcards
>>>
>>> Sure, but you'll have to set the leading wildcard option,
>>> which I've forgotten the exact call, but it's in the docs.
>>>
>>> And use * rather than % <G>.
>>>
>>> But wildcards are tricky, especially the TooManyClauses
>>> exception. You might want to peruse the archive for wildcard
>>> posts...
>>>
>>> Best
>>> Erick
>>>
>>> On Wed, Sep 10, 2008 at 9:06 AM, Sertic Mirko, Bedag
>>> <[hidden email]>wrote:
>>>
>>>      
>>>> Hi@all
>>>>
>>>>
>>>>
>>>> Is it possible to do a search with multiple wildcards in one query, for
>>>> instance "%MANAGE%" AND "CORE%"? Is there a code example available?
>>>>
>>>>
>>>>
>>>> Thanks a lot
>>>>
>>>> Mirko
>>>>
>>>>
>>>>
>>>>
>>>>        
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>>      
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>>    
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>      
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>      
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>  



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


 

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: AW: AW: Search with multiple wildcards

Matthew Hall-7
Ah.. that's a darn good point..

Though, that second bit of code you have there could be used at display
time for him to get the functionality that he wants.  You could also
modify it somewhat, and apply it against the displayable part of the hit
he's getting back rather than the individual tokens.

This if of course only assuming that this functionality would be used
after a contains type search was detected.

Considering that's the only real usecase for a technique like this, I'm
thinking its probably more trouble than its worth for his case.



mark harwood wrote:

>>> That should give you the functionality you are looking for.
>>>      
>
> If I understand your suggestion correctly, It won't. The Highlighter uses a tokenized version of the document text.
>
> Simplistically it does the following psuedo code:
>
> for all tokens in documentTokenStream,
>    if(queryTermsSet.contains(token))
>         output "<b>"+token+"</b">
>    else
>         output token
>
> NOT
>
> for all tokens in query string
>      fullDocumentString.replaceAll(queryStringToken, "<b>"+queryStringToken+"</b>"
>
> So in the given example while you suggest manipulating "ll" to be in the query string,  you cannot make "ll" appear as a token in documentTokenStream.
>
> Actually the Highlighter logic is a fair bit more involved than this (especially when using SpanQueryScorer) but the basis of it is there in the above pseudo code.
>
>
>
>
>
> ----- Original Message ----
> From: Matthew Hall <[hidden email]>
> To: [hidden email]
> Sent: Thursday, 11 September, 2008 14:40:26
> Subject: Re: AW: AW: Search with multiple wildcards
>
> Well, you could certainly manipulate your search string, removing the
> wildcard punctuations, and then use that for what you pass to the
> highlighter.
>
> That should give you the functionality you are looking for.
>
>
> -Matt
> mark harwood wrote:
>  
>>>> Is this possible?
>>>>      
>>>>        
>> Not currently, the highlighter works with a list of words (or words AND phrases using the new span support) and highlights those.
>> To do anything else would require the higlighter to faithfully re-implement much of the logic in all of the different query types (fuzzy, wildcard, regex etc etc) which is much more challenging/difficult to maintain.
>>
>>
>>
>> ----- Original Message ----
>> From: "Sertic Mirko, Bedag" <[hidden email]>
>> To: [hidden email]
>> Sent: Thursday, 11 September, 2008 12:07:36
>> Subject: AW: AW: Search with multiple wildcards
>>
>> Ok, one final question:
>>
>> If i query for "*ll*", the query is expanded to ("hallo" or "alle" or ...), so the
>> Highligter will highlight the words "hallo" or "alle". But how can i highlight only
>> the original query, so only the "ll"? Is this possible?
>>
>> Thanks a lot
>> Mirko
>>
>> -----Ursprüngliche Nachricht-----
>> Von: mark harwood [mailto:[hidden email]]
>> Gesendet: Donnerstag, 11. September 2008 11:20
>> An: [hidden email]
>> Betreff: Re: AW: Search with multiple wildcards
>>
>> You need to call rewrite on the query to expand it then give that version to the highlighter - see the package javadocs.
>> http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/highlight/package-summary.html#package_description
>>
>>
>> Cheers
>> Mark
>>
>>
>>
>>
>> ----- Original Message ----
>> From: "Sertic Mirko, Bedag" <[hidden email]>
>> To: [hidden email]
>> Sent: Thursday, 11 September, 2008 9:34:13
>> Subject: AW: Search with multiple wildcards
>>
>> Ok, i gave it a try, but i ran into this TooManyClauses Exception. I see that
>> 3ildcard queries are expanded before they are processed, and I see that i can
>> set the clauses count to Integer.MAXVALUE, and queries can consume a lot of memory,
>> but one final thing is still open: does a wildcard query work together with
>> the Lucene Highlighter? I tried it, but I only got an empty result. Without
>> wildcards, the highlighter works pretty smooth!
>>
>> Regards
>> Mirko
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Erick Erickson [mailto:[hidden email]]
>> Gesendet: Mittwoch, 10. September 2008 18:15
>> An: [hidden email]
>> Betreff: Re: Search with multiple wildcards
>>
>> Of course you can construct your own BooleanQuery
>> programmatically.
>>
>> It's relatively easy, just try it.
>>
>> On Wed, Sep 10, 2008 at 11:52 AM, Sertic Mirko, Bedag <[hidden email]
>>  
>>    
>>> wrote:
>>>    
>>>      
>>  
>>    
>>> Jep, this is what i have read.
>>>
>>> do I need to use the query parser, or can I create a query by the api?
>>> Is there an example available?
>>>
>>> Thanks a lot
>>> Mirko
>>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: Erick Erickson [mailto:[hidden email]]
>>> Gesendet: Mittwoch, 10. September 2008 16:45
>>> An: [hidden email]
>>> Betreff: Re: Search with multiple wildcards
>>>
>>> Is this what you're referring to?
>>>
>>> Lucene supports single and multiple character wildcard searches within
>>> single terms (not within phrase queries).
>>> (from http://lucene.apache.org/java/docs/queryparsersyntax.html)
>>>
>>> I'm pretty sure you can have multiple *terms* with wildcards. Luke is your
>>> friend here, download a copy and try it <G>. Be sure on the search tab to
>>> specify StandardAnalyzer or some such, rather than keywordanalyzer.
>>>
>>> The phrase is trying to point out that a phrase query does NOT respect
>>> wildcards. That is, submitting
>>> "ab* bc* cd*" AS A PHRASE QUERY won't do what you expect. But I'm pretty
>>> sure that
>>>
>>> +field:ab* +field:bc* +field:cd*
>>>
>>> will work just fine. The key here is "within single terms", which I think
>>> of
>>> as
>>> "within a single term query". You can add as many TermQuerys as you want.
>>> See the query documentation for how to submit phrase queries.
>>>
>>> Best
>>> Erick
>>>
>>> On Wed, Sep 10, 2008 at 10:11 AM, Sertic Mirko, Bedag
>>> <[hidden email]
>>>    
>>>      
>>>> wrote:
>>>>      
>>>> Hi
>>>>
>>>> Thank you for your quick response:-)
>>>>
>>>> Of course I need to use the * character :-) But I have read somewhere in
>>>> the documentation that leading wildcards are not supported, and only one
>>>> wildcard term per query. Is this limitation resolved in the current
>>>>      
>>>>        
>>> version?
>>>    
>>>      
>>>> Regards
>>>> Mirko
>>>>
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: Erick Erickson [mailto:[hidden email]]
>>>> Gesendet: Mittwoch, 10. September 2008 15:47
>>>> An: [hidden email]
>>>> Betreff: Re: Search with multiple wildcards
>>>>
>>>> Sure, but you'll have to set the leading wildcard option,
>>>> which I've forgotten the exact call, but it's in the docs.
>>>>
>>>> And use * rather than % <G>.
>>>>
>>>> But wildcards are tricky, especially the TooManyClauses
>>>> exception. You might want to peruse the archive for wildcard
>>>> posts...
>>>>
>>>> Best
>>>> Erick
>>>>
>>>> On Wed, Sep 10, 2008 at 9:06 AM, Sertic Mirko, Bedag
>>>> <[hidden email]>wrote:
>>>>
>>>>      
>>>>        
>>>>> Hi@all
>>>>>
>>>>>
>>>>>
>>>>> Is it possible to do a search with multiple wildcards in one query, for
>>>>> instance "%MANAGE%" AND "CORE%"? Is there a code example available?
>>>>>
>>>>>
>>>>>
>>>>> Thanks a lot
>>>>>
>>>>> Mirko
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>        
>>>>>          
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>>>      
>>>>        
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>>    
>>>      
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>>      
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>>      
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>>  
>>    
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>      
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>  


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

AW: AW: AW: Search with multiple wildcards

Sertic Mirko, Bedag
Ok, i see the problems.

I will talk to my customer about this requirement. Perhaps he doesn't need it anymore.

Again, thanks a lot to all, you saved my day!

Regards
Mirko

-----Ursprüngliche Nachricht-----
Von: Matthew Hall [mailto:[hidden email]]
Gesendet: Donnerstag, 11. September 2008 16:05
An: [hidden email]
Betreff: Re: AW: AW: Search with multiple wildcards

Ah.. that's a darn good point..

Though, that second bit of code you have there could be used at display
time for him to get the functionality that he wants.  You could also
modify it somewhat, and apply it against the displayable part of the hit
he's getting back rather than the individual tokens.

This if of course only assuming that this functionality would be used
after a contains type search was detected.

Considering that's the only real usecase for a technique like this, I'm
thinking its probably more trouble than its worth for his case.



mark harwood wrote:

>>> That should give you the functionality you are looking for.
>>>      
>
> If I understand your suggestion correctly, It won't. The Highlighter uses a tokenized version of the document text.
>
> Simplistically it does the following psuedo code:
>
> for all tokens in documentTokenStream,
>    if(queryTermsSet.contains(token))
>         output "<b>"+token+"</b">
>    else
>         output token
>
> NOT
>
> for all tokens in query string
>      fullDocumentString.replaceAll(queryStringToken, "<b>"+queryStringToken+"</b>"
>
> So in the given example while you suggest manipulating "ll" to be in the query string,  you cannot make "ll" appear as a token in documentTokenStream.
>
> Actually the Highlighter logic is a fair bit more involved than this (especially when using SpanQueryScorer) but the basis of it is there in the above pseudo code.
>
>
>
>
>
> ----- Original Message ----
> From: Matthew Hall <[hidden email]>
> To: [hidden email]
> Sent: Thursday, 11 September, 2008 14:40:26
> Subject: Re: AW: AW: Search with multiple wildcards
>
> Well, you could certainly manipulate your search string, removing the
> wildcard punctuations, and then use that for what you pass to the
> highlighter.
>
> That should give you the functionality you are looking for.
>
>
> -Matt
> mark harwood wrote:
>  
>>>> Is this possible?
>>>>      
>>>>        
>> Not currently, the highlighter works with a list of words (or words AND phrases using the new span support) and highlights those.
>> To do anything else would require the higlighter to faithfully re-implement much of the logic in all of the different query types (fuzzy, wildcard, regex etc etc) which is much more challenging/difficult to maintain.
>>
>>
>>
>> ----- Original Message ----
>> From: "Sertic Mirko, Bedag" <[hidden email]>
>> To: [hidden email]
>> Sent: Thursday, 11 September, 2008 12:07:36
>> Subject: AW: AW: Search with multiple wildcards
>>
>> Ok, one final question:
>>
>> If i query for "*ll*", the query is expanded to ("hallo" or "alle" or ...), so the
>> Highligter will highlight the words "hallo" or "alle". But how can i highlight only
>> the original query, so only the "ll"? Is this possible?
>>
>> Thanks a lot
>> Mirko
>>
>> -----Ursprüngliche Nachricht-----
>> Von: mark harwood [mailto:[hidden email]]
>> Gesendet: Donnerstag, 11. September 2008 11:20
>> An: [hidden email]
>> Betreff: Re: AW: Search with multiple wildcards
>>
>> You need to call rewrite on the query to expand it then give that version to the highlighter - see the package javadocs.
>> http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/highlight/package-summary.html#package_description
>>
>>
>> Cheers
>> Mark
>>
>>
>>
>>
>> ----- Original Message ----
>> From: "Sertic Mirko, Bedag" <[hidden email]>
>> To: [hidden email]
>> Sent: Thursday, 11 September, 2008 9:34:13
>> Subject: AW: Search with multiple wildcards
>>
>> Ok, i gave it a try, but i ran into this TooManyClauses Exception. I see that
>> 3ildcard queries are expanded before they are processed, and I see that i can
>> set the clauses count to Integer.MAXVALUE, and queries can consume a lot of memory,
>> but one final thing is still open: does a wildcard query work together with
>> the Lucene Highlighter? I tried it, but I only got an empty result. Without
>> wildcards, the highlighter works pretty smooth!
>>
>> Regards
>> Mirko
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Erick Erickson [mailto:[hidden email]]
>> Gesendet: Mittwoch, 10. September 2008 18:15
>> An: [hidden email]
>> Betreff: Re: Search with multiple wildcards
>>
>> Of course you can construct your own BooleanQuery
>> programmatically.
>>
>> It's relatively easy, just try it.
>>
>> On Wed, Sep 10, 2008 at 11:52 AM, Sertic Mirko, Bedag <[hidden email]
>>  
>>    
>>> wrote:
>>>    
>>>      
>>  
>>    
>>> Jep, this is what i have read.
>>>
>>> do I need to use the query parser, or can I create a query by the api?
>>> Is there an example available?
>>>
>>> Thanks a lot
>>> Mirko
>>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: Erick Erickson [mailto:[hidden email]]
>>> Gesendet: Mittwoch, 10. September 2008 16:45
>>> An: [hidden email]
>>> Betreff: Re: Search with multiple wildcards
>>>
>>> Is this what you're referring to?
>>>
>>> Lucene supports single and multiple character wildcard searches within
>>> single terms (not within phrase queries).
>>> (from http://lucene.apache.org/java/docs/queryparsersyntax.html)
>>>
>>> I'm pretty sure you can have multiple *terms* with wildcards. Luke is your
>>> friend here, download a copy and try it <G>. Be sure on the search tab to
>>> specify StandardAnalyzer or some such, rather than keywordanalyzer.
>>>
>>> The phrase is trying to point out that a phrase query does NOT respect
>>> wildcards. That is, submitting
>>> "ab* bc* cd*" AS A PHRASE QUERY won't do what you expect. But I'm pretty
>>> sure that
>>>
>>> +field:ab* +field:bc* +field:cd*
>>>
>>> will work just fine. The key here is "within single terms", which I think
>>> of
>>> as
>>> "within a single term query". You can add as many TermQuerys as you want.
>>> See the query documentation for how to submit phrase queries.
>>>
>>> Best
>>> Erick
>>>
>>> On Wed, Sep 10, 2008 at 10:11 AM, Sertic Mirko, Bedag
>>> <[hidden email]
>>>    
>>>      
>>>> wrote:
>>>>      
>>>> Hi
>>>>
>>>> Thank you for your quick response:-)
>>>>
>>>> Of course I need to use the * character :-) But I have read somewhere in
>>>> the documentation that leading wildcards are not supported, and only one
>>>> wildcard term per query. Is this limitation resolved in the current
>>>>      
>>>>        
>>> version?
>>>    
>>>      
>>>> Regards
>>>> Mirko
>>>>
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: Erick Erickson [mailto:[hidden email]]
>>>> Gesendet: Mittwoch, 10. September 2008 15:47
>>>> An: [hidden email]
>>>> Betreff: Re: Search with multiple wildcards
>>>>
>>>> Sure, but you'll have to set the leading wildcard option,
>>>> which I've forgotten the exact call, but it's in the docs.
>>>>
>>>> And use * rather than % <G>.
>>>>
>>>> But wildcards are tricky, especially the TooManyClauses
>>>> exception. You might want to peruse the archive for wildcard
>>>> posts...
>>>>
>>>> Best
>>>> Erick
>>>>
>>>> On Wed, Sep 10, 2008 at 9:06 AM, Sertic Mirko, Bedag
>>>> <[hidden email]>wrote:
>>>>
>>>>      
>>>>        
>>>>> Hi@all
>>>>>
>>>>>
>>>>>
>>>>> Is it possible to do a search with multiple wildcards in one query, for
>>>>> instance "%MANAGE%" AND "CORE%"? Is there a code example available?
>>>>>
>>>>>
>>>>>
>>>>> Thanks a lot
>>>>>
>>>>> Mirko
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>        
>>>>>          
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>>>      
>>>>        
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>>    
>>>      
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>>      
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>>      
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>>  
>>    
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>      
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>  


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]