Questions on Query Scorer

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Questions on Query Scorer

Ferdinand Chan
How can I create a QueryScorer in Lucene 2.0???

 

When I create a QueryScorer using the following codes,

 

BooleanQuery booleanQuery = new BooleanQuery();

booleanQuery.add(q1,BooleanClause.Occur.SHOULD);

booleanQuery.add(q2,BooleanClause.Occur.SHOULD);

 

QueryScorer scorer = new QueryScorer(booleanQuery);

 

It compiles successfully but throws a runtime exception when I execute the
code.

 

java.lang.NoSuchFieldError: prohibited

        at
org.apache.lucene.search.highlight.QueryTermExtractor.getTermsFromBooleanQue
ry(QueryTermExtractor.java:91)

        at
org.apache.lucene.search.highlight.QueryTermExtractor.getTerms(QueryTermExtr
actor.java:66)

        at
org.apache.lucene.search.highlight.QueryTermExtractor.getTerms(QueryTermExtr
actor.java:59)

        at
org.apache.lucene.search.highlight.QueryTermExtractor.getTerms(QueryTermExtr
actor.java:45)

        at
org.apache.lucene.search.highlight.QueryScorer.<init>(QueryScorer.java:48)

 

Can anyone suggest a solution to this problem?

 

Thanks

 

Ferdinand

 

Reply | Threaded
Open this post in threaded view
|

RE: Questions on Query Scorer

Mile Rosu
Hello,

The problem may be rather in the name of the field you are querying -
"prohibited" in your case.
You can check with Luke(http://www.getopt.org/luke/) the structure of
the index on which you are performing your query.

Mile

-----Original Message-----
From: Ferdinand Chan [mailto:[hidden email]]
Sent: Thursday, June 15, 2006 12:26 PM
To: [hidden email]
Subject: Questions on Query Scorer

How can I create a QueryScorer in Lucene 2.0???

 

When I create a QueryScorer using the following codes,

 

BooleanQuery booleanQuery = new BooleanQuery();

booleanQuery.add(q1,BooleanClause.Occur.SHOULD);

booleanQuery.add(q2,BooleanClause.Occur.SHOULD);

 

QueryScorer scorer = new QueryScorer(booleanQuery);

 

It compiles successfully but throws a runtime exception when I execute
the
code.

 

java.lang.NoSuchFieldError: prohibited

        at
org.apache.lucene.search.highlight.QueryTermExtractor.getTermsFromBoolea
nQue
ry(QueryTermExtractor.java:91)

        at
org.apache.lucene.search.highlight.QueryTermExtractor.getTerms(QueryTerm
Extr
actor.java:66)

        at
org.apache.lucene.search.highlight.QueryTermExtractor.getTerms(QueryTerm
Extr
actor.java:59)

        at
org.apache.lucene.search.highlight.QueryTermExtractor.getTerms(QueryTerm
Extr
actor.java:45)

        at
org.apache.lucene.search.highlight.QueryScorer.<init>(QueryScorer.java:4
8)

 

Can anyone suggest a solution to this problem?

 

Thanks

 

Ferdinand

 


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Questions on Query Scorer

Ferdinand Chan
Thanks Mile,

But in my code, I haven't query the term prohibited. Also, in my index,
there isn't a field called prohibited

-----Original Message-----
From: Mile Rosu [mailto:[hidden email]]
Sent: Thursday, June 15, 2006 5:44 PM
To: [hidden email]
Subject: RE: Questions on Query Scorer

Hello,

The problem may be rather in the name of the field you are querying -
"prohibited" in your case.
You can check with Luke(http://www.getopt.org/luke/) the structure of
the index on which you are performing your query.

Mile

-----Original Message-----
From: Ferdinand Chan [mailto:[hidden email]]
Sent: Thursday, June 15, 2006 12:26 PM
To: [hidden email]
Subject: Questions on Query Scorer

How can I create a QueryScorer in Lucene 2.0???

 

When I create a QueryScorer using the following codes,

 

BooleanQuery booleanQuery = new BooleanQuery();

booleanQuery.add(q1,BooleanClause.Occur.SHOULD);

booleanQuery.add(q2,BooleanClause.Occur.SHOULD);

 

QueryScorer scorer = new QueryScorer(booleanQuery);

 

It compiles successfully but throws a runtime exception when I execute
the
code.

 

java.lang.NoSuchFieldError: prohibited

        at
org.apache.lucene.search.highlight.QueryTermExtractor.getTermsFromBoolea
nQue
ry(QueryTermExtractor.java:91)

        at
org.apache.lucene.search.highlight.QueryTermExtractor.getTerms(QueryTerm
Extr
actor.java:66)

        at
org.apache.lucene.search.highlight.QueryTermExtractor.getTerms(QueryTerm
Extr
actor.java:59)

        at
org.apache.lucene.search.highlight.QueryTermExtractor.getTerms(QueryTerm
Extr
actor.java:45)

        at
org.apache.lucene.search.highlight.QueryScorer.<init>(QueryScorer.java:4
8)

 

Can anyone suggest a solution to this problem?

 

Thanks

 

Ferdinand

 


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

BooleanQuery.TooManyClauses on MultiSearcher

Rob Staveley (Tom)
In reply to this post by Ferdinand Chan
I've just added a 3rd index directory (i.e. 3rd IndexSearcher) to my
MultiSearcher and I'm getting BooleanQuery.TooManyClauses errors on queries
which were working happily on 2 indexes.

Here's an example query, which hopefully you'll find self-explanatory from
the XML structure.
--------8<--------
<composite-query analyzer='1'>
        <group required="true" prohibited="false">
                <group required="false" prohibited="false">
                        <prefix field="to" required="false"
prohibited="false">james</prefix>
                        <prefix field="cc" required="false"
prohibited="false">james</prefix>
                        <prefix field="smtp-rcptto" required="false"
prohibited="false">james</prefix>
                        <prefix field="from" required="false"
prohibited="false">james</prefix>
                        <prefix field="smtp-mailfrom" required="false"
prohibited="false">james</prefix>
                </group>
                <parse field="body" required="false"
prohibited="false">james</parse>
                <parse field="subject" required="false"
prohibited="false">james</parse>
        </group>
</composite-query>
--------8<--------

Note that there isn't even a range in there.

Do BooleanQueries not scale well across indexes?

smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: BooleanQuery.TooManyClauses on MultiSearcher

Rob Staveley (Tom)
I guess the most expensive thing I'm doing from the perspective of Boolean
clauses is heavily using PrefixQuery.

I want my user to be able to find e-mail to, cc or from james@anydomain, so
I opted for PrefixQuery on James. Bearing in mind that this is causing me
grief with BooleanQuery.TooManyClauses on my MultiSearcher, is there a
smarter approach that I should be adopting?

-----Original Message-----
From: Rob Staveley (Tom) [mailto:[hidden email]]
Sent: 15 June 2006 14:51
To: [hidden email]
Subject: BooleanQuery.TooManyClauses on MultiSearcher

I've just added a 3rd index directory (i.e. 3rd IndexSearcher) to my
MultiSearcher and I'm getting BooleanQuery.TooManyClauses errors on queries
which were working happily on 2 indexes.

Here's an example query, which hopefully you'll find self-explanatory from
the XML structure.
--------8<--------
<composite-query analyzer='1'>
        <group required="true" prohibited="false">
                <group required="false" prohibited="false">
                        <prefix field="to" required="false"
prohibited="false">james</prefix>
                        <prefix field="cc" required="false"
prohibited="false">james</prefix>
                        <prefix field="smtp-rcptto" required="false"
prohibited="false">james</prefix>
                        <prefix field="from" required="false"
prohibited="false">james</prefix>
                        <prefix field="smtp-mailfrom" required="false"
prohibited="false">james</prefix>
                </group>
                <parse field="body" required="false"
prohibited="false">james</parse>
                <parse field="subject" required="false"
prohibited="false">james</parse>
        </group>
</composite-query>
--------8<--------

Note that there isn't even a range in there.

Do BooleanQueries not scale well across indexes?

smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: BooleanQuery.TooManyClauses on MultiSearcher

Michael D. Curtin
Rob Staveley (Tom) wrote:

> I guess the most expensive thing I'm doing from the perspective of Boolean
> clauses is heavily using PrefixQuery.
>
> I want my user to be able to find e-mail to, cc or from james@anydomain, so
> I opted for PrefixQuery on James. Bearing in mind that this is causing me
> grief with BooleanQuery.TooManyClauses on my MultiSearcher, is there a
> smarter approach that I should be adopting?

I don't know about "smarter", but it seems like separating (parsing) out the
username from the hostname in the email addresses, into separate fields at
index time, would get you want you want.  If you want documents to, cc, or
from "james" from any domain, then the query
        "touser:james ccuser:james fromuser:james"
would work.  If you were looking for a specific sender at a specific site,
then the query
        "+fromuser:james +fromhost:foo.com"
would work.

Good luck!

--MDC

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: BooleanQuery.TooManyClauses on MultiSearcher

Chris Hostetter-3
In reply to this post by Rob Staveley (Tom)

: I guess the most expensive thing I'm doing from the perspective of Boolean
: clauses is heavily using PrefixQuery.
:
: I want my user to be able to find e-mail to, cc or from james@anydomain, so
: I opted for PrefixQuery on James. Bearing in mind that this is causing me
: grief with BooleanQuery.TooManyClauses on my MultiSearcher, is there a
: smarter approach that I should be adopting?

if the only reason you are using a PrefixQuery is so that searchinging for
"james" matches "[hidden email]" then i think MDC is right, split that
field up (or have one field, but put three terms in "james", "domain.com"
and "[hidden email]") .. but if you genuinely need flexible PrefixQuery
support, you may want to look at the ConstantScorePrefixQuery in Solr ...
there's nothing Solr specific about it, so you could drop it into your
Lucene installation.  I'm not entirely sure how well the
ConstantScoreQueries work with a MultiSearcher (mainly because i odn't
know how well Filter's work with MultiSearchers) but you could give it a
try -- it certainly won't have a TooManyClauses problem.

:
: -----Original Message-----
: From: Rob Staveley (Tom) [mailto:[hidden email]]
: Sent: 15 June 2006 14:51
: To: [hidden email]
: Subject: BooleanQuery.TooManyClauses on MultiSearcher
:
: I've just added a 3rd index directory (i.e. 3rd IndexSearcher) to my
: MultiSearcher and I'm getting BooleanQuery.TooManyClauses errors on queries
: which were working happily on 2 indexes.
:
: Here's an example query, which hopefully you'll find self-explanatory from
: the XML structure.
: --------8<--------
: <composite-query analyzer='1'>
: <group required="true" prohibited="false">
: <group required="false" prohibited="false">
: <prefix field="to" required="false"
: prohibited="false">james</prefix>
: <prefix field="cc" required="false"
: prohibited="false">james</prefix>
: <prefix field="smtp-rcptto" required="false"
: prohibited="false">james</prefix>
: <prefix field="from" required="false"
: prohibited="false">james</prefix>
: <prefix field="smtp-mailfrom" required="false"
: prohibited="false">james</prefix>
: </group>
: <parse field="body" required="false"
: prohibited="false">james</parse>
: <parse field="subject" required="false"
: prohibited="false">james</parse>
: </group>
: </composite-query>
: --------8<--------
:
: Note that there isn't even a range in there.
:
: Do BooleanQueries not scale well across indexes?
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: BooleanQuery.TooManyClauses on MultiSearcher

eks dev
Did not check it, but  solr is using SkippingFilter which is not yet commited in Lucene... so this will maybe not work?

By the way, any reason today not to commit SkippingFilter to Lucene?  I actually see nothing to do for this, but to commit existing SkippingFilter. If there is something I do not mind to spend a few hours to help

----- Original Message ----
From: Chris Hostetter <[hidden email]>
To: [hidden email]
Sent: Thursday, 15 June, 2006 5:49:42 PM
Subject: RE: BooleanQuery.TooManyClauses on MultiSearcher


: I guess the most expensive thing I'm doing from the perspective of Boolean
: clauses is heavily using PrefixQuery.
:
: I want my user to be able to find e-mail to, cc or from james@anydomain, so
: I opted for PrefixQuery on James. Bearing in mind that this is causing me
: grief with BooleanQuery.TooManyClauses on my MultiSearcher, is there a
: smarter approach that I should be adopting?

if the only reason you are using a PrefixQuery is so that searchinging for
"james" matches "[hidden email]" then i think MDC is right, split that
field up (or have one field, but put three terms in "james", "domain.com"
and "[hidden email]") .. but if you genuinely need flexible PrefixQuery
support, you may want to look at the ConstantScorePrefixQuery in Solr ...
there's nothing Solr specific about it, so you could drop it into your
Lucene installation.  I'm not entirely sure how well the
ConstantScoreQueries work with a MultiSearcher (mainly because i odn't
know how well Filter's work with MultiSearchers) but you could give it a
try -- it certainly won't have a TooManyClauses problem.

:
: -----Original Message-----
: From: Rob Staveley (Tom) [mailto:[hidden email]]
: Sent: 15 June 2006 14:51
: To: [hidden email]
: Subject: BooleanQuery.TooManyClauses on MultiSearcher
:
: I've just added a 3rd index directory (i.e. 3rd IndexSearcher) to my
: MultiSearcher and I'm getting BooleanQuery.TooManyClauses errors on queries
: which were working happily on 2 indexes.
:
: Here's an example query, which hopefully you'll find self-explanatory from
: the XML structure.
: --------8<--------
: <composite-query analyzer='1'>
:     <group required="true" prohibited="false">
:         <group required="false" prohibited="false">
:             <prefix field="to" required="false"
: prohibited="false">james</prefix>
:             <prefix field="cc" required="false"
: prohibited="false">james</prefix>
:             <prefix field="smtp-rcptto" required="false"
: prohibited="false">james</prefix>
:             <prefix field="from" required="false"
: prohibited="false">james</prefix>
:             <prefix field="smtp-mailfrom" required="false"
: prohibited="false">james</prefix>
:         </group>
:         <parse field="body" required="false"
: prohibited="false">james</parse>
:         <parse field="subject" required="false"
: prohibited="false">james</parse>
:     </group>
: </composite-query>
: --------8<--------
:
: Note that there isn't even a range in there.
:
: Do BooleanQueries not scale well across indexes?
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: BooleanQuery.TooManyClauses on MultiSearcher

Rob Staveley (Tom)
In reply to this post by Chris Hostetter-3
I'd quite like to avoid tokenising james from  [hidden email], because I
like the way PrefixQuery (when it works) matches [hidden email]
too. I'll take a look at ConstantScorePrefixQuery

-----Original Message-----
From: Chris Hostetter [mailto:[hidden email]]
Sent: 15 June 2006 16:50
To: [hidden email]
Subject: RE: BooleanQuery.TooManyClauses on MultiSearcher


: I guess the most expensive thing I'm doing from the perspective of Boolean
: clauses is heavily using PrefixQuery.
:
: I want my user to be able to find e-mail to, cc or from james@anydomain,
so
: I opted for PrefixQuery on James. Bearing in mind that this is causing me
: grief with BooleanQuery.TooManyClauses on my MultiSearcher, is there a
: smarter approach that I should be adopting?

if the only reason you are using a PrefixQuery is so that searchinging for
"james" matches "[hidden email]" then i think MDC is right, split that
field up (or have one field, but put three terms in "james", "domain.com"
and "[hidden email]") .. but if you genuinely need flexible PrefixQuery
support, you may want to look at the ConstantScorePrefixQuery in Solr ...
there's nothing Solr specific about it, so you could drop it into your
Lucene installation.  I'm not entirely sure how well the
ConstantScoreQueries work with a MultiSearcher (mainly because i odn't know
how well Filter's work with MultiSearchers) but you could give it a try --
it certainly won't have a TooManyClauses problem.

:
: -----Original Message-----
: From: Rob Staveley (Tom) [mailto:[hidden email]]
: Sent: 15 June 2006 14:51
: To: [hidden email]
: Subject: BooleanQuery.TooManyClauses on MultiSearcher
:
: I've just added a 3rd index directory (i.e. 3rd IndexSearcher) to my
: MultiSearcher and I'm getting BooleanQuery.TooManyClauses errors on
queries
: which were working happily on 2 indexes.
:
: Here's an example query, which hopefully you'll find self-explanatory from
: the XML structure.
: --------8<--------
: <composite-query analyzer='1'>
: <group required="true" prohibited="false">
: <group required="false" prohibited="false">
: <prefix field="to" required="false"
: prohibited="false">james</prefix>
: <prefix field="cc" required="false"
: prohibited="false">james</prefix>
: <prefix field="smtp-rcptto" required="false"
: prohibited="false">james</prefix>
: <prefix field="from" required="false"
: prohibited="false">james</prefix>
: <prefix field="smtp-mailfrom" required="false"
: prohibited="false">james</prefix>
: </group>
: <parse field="body" required="false"
: prohibited="false">james</parse>
: <parse field="subject" required="false"
: prohibited="false">james</parse>
: </group>
: </composite-query>
: --------8<--------
:
: Note that there isn't even a range in there.
:
: Do BooleanQueries not scale well across indexes?
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: BooleanQuery.TooManyClauses on MultiSearcher

Chris Hostetter-3
In reply to this post by eks dev

: Did not check it, but solr is using SkippingFilter which is not yet
: commited in Lucene... so this will maybe not work?

Solr des not use SkippingFilter.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: BooleanQuery.TooManyClauses on MultiSearcher

Chris Hostetter-3
In reply to this post by Rob Staveley (Tom)

: I'd quite like to avoid tokenising james from  [hidden email], because I
: like the way PrefixQuery (when it works) matches [hidden email]

well sure ... but if you say that becaues you want "." and "-" to be
treaded specially you could write an Email EmailAnalyzer that produces the
token stream: "james", "dean", "james.dean", "holywood.com",
"[hidden email]" ... the real question is do you really want a
search for "jam" to match "[hidden email]" while a search for
"dean", "james dean" and "holywood.com" doesn't ?


: -----Original Message-----
: From: Chris Hostetter [mailto:[hidden email]]
: Sent: 15 June 2006 16:50
: To: [hidden email]
: Subject: RE: BooleanQuery.TooManyClauses on MultiSearcher
:
:
: : I guess the most expensive thing I'm doing from the perspective of Boolean
: : clauses is heavily using PrefixQuery.
: :
: : I want my user to be able to find e-mail to, cc or from james@anydomain,
: so
: : I opted for PrefixQuery on James. Bearing in mind that this is causing me
: : grief with BooleanQuery.TooManyClauses on my MultiSearcher, is there a
: : smarter approach that I should be adopting?
:
: if the only reason you are using a PrefixQuery is so that searchinging for
: "james" matches "[hidden email]" then i think MDC is right, split that
: field up (or have one field, but put three terms in "james", "domain.com"
: and "[hidden email]") .. but if you genuinely need flexible PrefixQuery
: support, you may want to look at the ConstantScorePrefixQuery in Solr ...
: there's nothing Solr specific about it, so you could drop it into your
: Lucene installation.  I'm not entirely sure how well the
: ConstantScoreQueries work with a MultiSearcher (mainly because i odn't know
: how well Filter's work with MultiSearchers) but you could give it a try --
: it certainly won't have a TooManyClauses problem.
:
: :
: : -----Original Message-----
: : From: Rob Staveley (Tom) [mailto:[hidden email]]
: : Sent: 15 June 2006 14:51
: : To: [hidden email]
: : Subject: BooleanQuery.TooManyClauses on MultiSearcher
: :
: : I've just added a 3rd index directory (i.e. 3rd IndexSearcher) to my
: : MultiSearcher and I'm getting BooleanQuery.TooManyClauses errors on
: queries
: : which were working happily on 2 indexes.
: :
: : Here's an example query, which hopefully you'll find self-explanatory from
: : the XML structure.
: : --------8<--------
: : <composite-query analyzer='1'>
: : <group required="true" prohibited="false">
: : <group required="false" prohibited="false">
: : <prefix field="to" required="false"
: : prohibited="false">james</prefix>
: : <prefix field="cc" required="false"
: : prohibited="false">james</prefix>
: : <prefix field="smtp-rcptto" required="false"
: : prohibited="false">james</prefix>
: : <prefix field="from" required="false"
: : prohibited="false">james</prefix>
: : <prefix field="smtp-mailfrom" required="false"
: : prohibited="false">james</prefix>
: : </group>
: : <parse field="body" required="false"
: : prohibited="false">james</parse>
: : <parse field="subject" required="false"
: : prohibited="false">james</parse>
: : </group>
: : </composite-query>
: : --------8<--------
: :
: : Note that there isn't even a range in there.
: :
: : Do BooleanQueries not scale well across indexes?
: :
:
:
:
: -Hoss
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: [hidden email]
: For additional commands, e-mail: [hidden email]
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: BooleanQuery.TooManyClauses on MultiSearcher

Rob Staveley (Tom)
In reply to this post by Chris Hostetter-3
I'm still trying to get my head around ConstantScorePrefixQuery. Could I
simply use this as a drop-in replacement for PrefixQuery?

-----Original Message-----
From: Chris Hostetter [mailto:[hidden email]]
Sent: 15 June 2006 18:22
To: [hidden email]; eks dev
Subject: Re: BooleanQuery.TooManyClauses on MultiSearcher


: Did not check it, but solr is using SkippingFilter which is not yet
: commited in Lucene... so this will maybe not work?

Solr des not use SkippingFilter.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: BooleanQuery.TooManyClauses on MultiSearcher

Rob Staveley (Tom)
Incidentally, I'm getting BooleanQuery.TooManyClauses when I search on
"james", but I don't when I search on "James". Surely  the number of clauses
isn't dependent on the number of hits?!

However, I know that "fred" is relatively uncommon in my index and "neil" is
relatively common and yet "fred" is getting the BooleanQuery.TooManyClauses
and "neil" isn't. Does that make sense?

Should the actual term used in a PrefixQuery effect the number of clauses?

-----Original Message-----
From: Rob Staveley (Tom) [mailto:[hidden email]]
Sent: 15 June 2006 18:28
To: [hidden email]
Subject: RE: BooleanQuery.TooManyClauses on MultiSearcher

I'm still trying to get my head around ConstantScorePrefixQuery. Could I
simply use this as a drop-in replacement for PrefixQuery?

-----Original Message-----
From: Chris Hostetter [mailto:[hidden email]]
Sent: 15 June 2006 18:22
To: [hidden email]; eks dev
Subject: Re: BooleanQuery.TooManyClauses on MultiSearcher


: Did not check it, but solr is using SkippingFilter which is not yet
: commited in Lucene... so this will maybe not work?

Solr des not use SkippingFilter.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: BooleanQuery.TooManyClauses on MultiSearcher

Chris Hostetter-3
In reply to this post by Rob Staveley (Tom)

: I'm still trying to get my head around ConstantScorePrefixQuery. Could I
: simply use this as a drop-in replacement for PrefixQuery?

that's what it was designed to do .. you just need to grab a copy of
ConstantScorePrefixQuery and PrefixFilter from the same package
(ConstantScorePrefixQuery is just a convinient wrapper around "new
ConstantScoreQuery(new PrefixFilter(prefix))")

-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: BooleanQuery.TooManyClauses on MultiSearcher

Chris Hostetter-3
In reply to this post by Rob Staveley (Tom)

: Incidentally, I'm getting BooleanQuery.TooManyClauses when I search on
: "james", but I don't when I search on "James". Surely  the number of clauses
: isn't dependent on the number of hits?!

not the numebr of hits -- just hte number of terms in your index that
start with the prefix.

: However, I know that "fred" is relatively uncommon in my index and "neil" is
: relatively common and yet "fred" is getting the BooleanQuery.TooManyClauses
: and "neil" isn't. Does that make sense?
:
: Should the actual term used in a PrefixQuery effect the number of clauses?

yes .. the Term used in the PrefixQUery is just a convinient holder for a
fieldname and a term value prefix -- what matters is how many terms in
that field start with that prefix.  if "james*" causes a problem, but
"James*" doesn't then it sounds like your indexing analyzer is case
sensative and you have a lot more lowercase values starting with james
then upercase values starting with James .. if "fred*" causes a problem
but "neil*" doesn't then you probably have a lot more terms that start
with "fred" then you do that start with "neil" -- it doesn't matter if
"neil@foo" is the value more documents then the total number of docs that
contain any value starting with "fred", what matters is how many unique
values there are starting with "fred"




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: BooleanQuery.TooManyClauses on MultiSearcher

Rob Staveley (Tom)
In reply to this post by Chris Hostetter-3
It is a good point that you raise, Chris. I'm already treating To, Cc, From,
MAIL-FROM, and RCPT-TO as separate fields (the latter fields being from
SMTP). I'd like a "fast and loose" query on james, to find anything relevant
to James. I guess to avoid getting too many Boolean terms, I should have
another field which is a soup of the sender and recipient fields and
tokenise e-mail addresses in it as you suggest.

-----Original Message-----
From: Chris Hostetter [mailto:[hidden email]]
Sent: 15 June 2006 18:28
To: [hidden email]
Subject: RE: BooleanQuery.TooManyClauses on MultiSearcher


: I'd quite like to avoid tokenising james from  [hidden email], because I
: like the way PrefixQuery (when it works) matches [hidden email]

well sure ... but if you say that becaues you want "." and "-" to be treaded
specially you could write an Email EmailAnalyzer that produces the token
stream: "james", "dean", "james.dean", "holywood.com",
"[hidden email]" ... the real question is do you really want a
search for "jam" to match "[hidden email]" while a search for
"dean", "james dean" and "holywood.com" doesn't ?


: -----Original Message-----
: From: Chris Hostetter [mailto:[hidden email]]
: Sent: 15 June 2006 16:50
: To: [hidden email]
: Subject: RE: BooleanQuery.TooManyClauses on MultiSearcher
:
:
: : I guess the most expensive thing I'm doing from the perspective of
Boolean
: : clauses is heavily using PrefixQuery.
: :
: : I want my user to be able to find e-mail to, cc or from james@anydomain,
: so
: : I opted for PrefixQuery on James. Bearing in mind that this is causing
me
: : grief with BooleanQuery.TooManyClauses on my MultiSearcher, is there a
: : smarter approach that I should be adopting?
:
: if the only reason you are using a PrefixQuery is so that searchinging for
: "james" matches "[hidden email]" then i think MDC is right, split that
: field up (or have one field, but put three terms in "james", "domain.com"
: and "[hidden email]") .. but if you genuinely need flexible PrefixQuery
: support, you may want to look at the ConstantScorePrefixQuery in Solr ...
: there's nothing Solr specific about it, so you could drop it into your
: Lucene installation.  I'm not entirely sure how well the
: ConstantScoreQueries work with a MultiSearcher (mainly because i odn't
know
: how well Filter's work with MultiSearchers) but you could give it a try --
: it certainly won't have a TooManyClauses problem.
:
: :
: : -----Original Message-----
: : From: Rob Staveley (Tom) [mailto:[hidden email]]
: : Sent: 15 June 2006 14:51
: : To: [hidden email]
: : Subject: BooleanQuery.TooManyClauses on MultiSearcher
: :
: : I've just added a 3rd index directory (i.e. 3rd IndexSearcher) to my
: : MultiSearcher and I'm getting BooleanQuery.TooManyClauses errors on
: queries
: : which were working happily on 2 indexes.
: :
: : Here's an example query, which hopefully you'll find self-explanatory
from
: : the XML structure.
: : --------8<--------
: : <composite-query analyzer='1'>
: : <group required="true" prohibited="false">
: : <group required="false" prohibited="false">
: : <prefix field="to" required="false"
: : prohibited="false">james</prefix>
: : <prefix field="cc" required="false"
: : prohibited="false">james</prefix>
: : <prefix field="smtp-rcptto" required="false"
: : prohibited="false">james</prefix>
: : <prefix field="from" required="false"
: : prohibited="false">james</prefix>
: : <prefix field="smtp-mailfrom" required="false"
: : prohibited="false">james</prefix>
: : </group>
: : <parse field="body" required="false"
: : prohibited="false">james</parse>
: : <parse field="subject" required="false"
: : prohibited="false">james</parse>
: : </group>
: : </composite-query>
: : --------8<--------
: :
: : Note that there isn't even a range in there.
: :
: : Do BooleanQueries not scale well across indexes?
: :
:
:
:
: -Hoss
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: [hidden email]
: For additional commands, e-mail: [hidden email]
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: BooleanQuery.TooManyClauses on MultiSearcher

Rob Staveley (Tom)
In reply to this post by Chris Hostetter-3
The penny drops. Thank you so much for your time, Chris :-)

-----Original Message-----
From: Chris Hostetter [mailto:[hidden email]]
Sent: 15 June 2006 18:43
To: [hidden email]
Subject: RE: BooleanQuery.TooManyClauses on MultiSearcher


: Incidentally, I'm getting BooleanQuery.TooManyClauses when I search on
: "james", but I don't when I search on "James". Surely  the number of
clauses
: isn't dependent on the number of hits?!

not the numebr of hits -- just hte number of terms in your index that start
with the prefix.

: However, I know that "fred" is relatively uncommon in my index and "neil"
is
: relatively common and yet "fred" is getting the
BooleanQuery.TooManyClauses
: and "neil" isn't. Does that make sense?
:
: Should the actual term used in a PrefixQuery effect the number of clauses?

yes .. the Term used in the PrefixQUery is just a convinient holder for a
fieldname and a term value prefix -- what matters is how many terms in that
field start with that prefix.  if "james*" causes a problem, but "James*"
doesn't then it sounds like your indexing analyzer is case sensative and you
have a lot more lowercase values starting with james then upercase values
starting with James .. if "fred*" causes a problem but "neil*" doesn't then
you probably have a lot more terms that start with "fred" then you do that
start with "neil" -- it doesn't matter if "neil@foo" is the value more
documents then the total number of docs that contain any value starting with
"fred", what matters is how many unique values there are starting with
"fred"




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

smime.p7s (5K) Download Attachment