[jira] Created: (LUCENE-443) ConjunctionScorer tune-up

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (LUCENE-443) ConjunctionScorer tune-up

Soren Daugaard (Jira)
ConjunctionScorer tune-up
-------------------------

         Key: LUCENE-443
         URL: http://issues.apache.org/jira/browse/LUCENE-443
     Project: Lucene - Java
        Type: Bug
  Components: Search  
    Versions: 1.9    
 Environment: Linux, Java 1.5, Large Index with 4 million items and some heavily nested boolean queries
    Reporter: Abdul Chaudhry


I just recently ran a load test on the latest code from lucene , which is using a new BooleanScore and noticed the ConjunctionScorer was crunching through objects , especially while sorting as part of the skipTo call. It turns a linked list into an array, sorts the array, then converts the array back to a linked list for further processing by the scoring engines below.

'm not sure if anyone else is experiencing this as I have a very large index (> 4 million items) and I am issuing some heavily nested queries

Anyway, I decide to change the link list into an array and use a first and last marker to "simulate" a linked list.

This scaled much better during my load test as the java gargbage collector was less - umm - virulent

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-443) ConjunctionScorer tune-up

Soren Daugaard (Jira)
     [ http://issues.apache.org/jira/browse/LUCENE-443?page=all ]

Abdul Chaudhry updated LUCENE-443:
----------------------------------

    Attachment: ConjunctionScorer.java

Updated a version of the ConjunctionScorer which uses an array rather than a linked list.
The first(), last()  and doc() method calls should probably be in-lined - maybe.


> ConjunctionScorer tune-up
> -------------------------
>
>          Key: LUCENE-443
>          URL: http://issues.apache.org/jira/browse/LUCENE-443
>      Project: Lucene - Java
>         Type: Bug
>   Components: Search
>     Versions: 1.9
>  Environment: Linux, Java 1.5, Large Index with 4 million items and some heavily nested boolean queries
>     Reporter: Abdul Chaudhry
>  Attachments: ConjunctionScorer.java
>
> I just recently ran a load test on the latest code from lucene , which is using a new BooleanScore and noticed the ConjunctionScorer was crunching through objects , especially while sorting as part of the skipTo call. It turns a linked list into an array, sorts the array, then converts the array back to a linked list for further processing by the scoring engines below.
> 'm not sure if anyone else is experiencing this as I have a very large index (> 4 million items) and I am issuing some heavily nested queries
> Anyway, I decide to change the link list into an array and use a first and last marker to "simulate" a linked list.
> This scaled much better during my load test as the java gargbage collector was less - umm - virulent

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-443) ConjunctionScorer tune-up

Soren Daugaard (Jira)
In reply to this post by Soren Daugaard (Jira)
    [ http://issues.apache.org/jira/browse/LUCENE-443?page=comments#action_12331081 ]

paul.elschot commented on LUCENE-443:
-------------------------------------

I like performance, so I'm all for this.
One thing about the code: it has a few probably
superfluous casts to (Scorer) from Scorer array elements.

Regards,
Paul Elschot


> ConjunctionScorer tune-up
> -------------------------
>
>          Key: LUCENE-443
>          URL: http://issues.apache.org/jira/browse/LUCENE-443
>      Project: Lucene - Java
>         Type: Bug
>   Components: Search
>     Versions: 1.9
>  Environment: Linux, Java 1.5, Large Index with 4 million items and some heavily nested boolean queries
>     Reporter: Abdul Chaudhry
>  Attachments: ConjunctionScorer.java
>
> I just recently ran a load test on the latest code from lucene , which is using a new BooleanScore and noticed the ConjunctionScorer was crunching through objects , especially while sorting as part of the skipTo call. It turns a linked list into an array, sorts the array, then converts the array back to a linked list for further processing by the scoring engines below.
> 'm not sure if anyone else is experiencing this as I have a very large index (> 4 million items) and I am issuing some heavily nested queries
> Anyway, I decide to change the link list into an array and use a first and last marker to "simulate" a linked list.
> This scaled much better during my load test as the java gargbage collector was less - umm - virulent

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-443) ConjunctionScorer tune-up

Soren Daugaard (Jira)
In reply to this post by Soren Daugaard (Jira)
     [ http://issues.apache.org/jira/browse/LUCENE-443?page=all ]

paul.elschot updated LUCENE-443:
--------------------------------

    Attachment: ConjunctionScorer.java

Simplified version, removed most casts, inlined first() and last().
Passes all tests here.

Regards,
Paul Elschot

> ConjunctionScorer tune-up
> -------------------------
>
>          Key: LUCENE-443
>          URL: http://issues.apache.org/jira/browse/LUCENE-443
>      Project: Lucene - Java
>         Type: Bug
>   Components: Search
>     Versions: 1.9
>  Environment: Linux, Java 1.5, Large Index with 4 million items and some heavily nested boolean queries
>     Reporter: Abdul Chaudhry
>  Attachments: ConjunctionScorer.java, ConjunctionScorer.java
>
> I just recently ran a load test on the latest code from lucene , which is using a new BooleanScore and noticed the ConjunctionScorer was crunching through objects , especially while sorting as part of the skipTo call. It turns a linked list into an array, sorts the array, then converts the array back to a linked list for further processing by the scoring engines below.
> 'm not sure if anyone else is experiencing this as I have a very large index (> 4 million items) and I am issuing some heavily nested queries
> Anyway, I decide to change the link list into an array and use a first and last marker to "simulate" a linked list.
> This scaled much better during my load test as the java gargbage collector was less - umm - virulent

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-443) ConjunctionScorer tune-up

Soren Daugaard (Jira)
In reply to this post by Soren Daugaard (Jira)
    [ http://issues.apache.org/jira/browse/LUCENE-443?page=comments#action_12331227 ]

Abdul Chaudhry commented on LUCENE-443:
---------------------------------------

Thanks Paul,  re-ran load tests here, everything looks good and it looks easier to read as well.
I also checked Arrays.sort, it's a merge sort - guaranteed nlogn.
This all looks squeaky clean.
Cheers,
-- Abdul

> ConjunctionScorer tune-up
> -------------------------
>
>          Key: LUCENE-443
>          URL: http://issues.apache.org/jira/browse/LUCENE-443
>      Project: Lucene - Java
>         Type: Bug
>   Components: Search
>     Versions: 1.9
>  Environment: Linux, Java 1.5, Large Index with 4 million items and some heavily nested boolean queries
>     Reporter: Abdul Chaudhry
>  Attachments: ConjunctionScorer.java, ConjunctionScorer.java
>
> I just recently ran a load test on the latest code from lucene , which is using a new BooleanScore and noticed the ConjunctionScorer was crunching through objects , especially while sorting as part of the skipTo call. It turns a linked list into an array, sorts the array, then converts the array back to a linked list for further processing by the scoring engines below.
> 'm not sure if anyone else is experiencing this as I have a very large index (> 4 million items) and I am issuing some heavily nested queries
> Anyway, I decide to change the link list into an array and use a first and last marker to "simulate" a linked list.
> This scaled much better during my load test as the java gargbage collector was less - umm - virulent

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-443) ConjunctionScorer tune-up

Soren Daugaard (Jira)
In reply to this post by Soren Daugaard (Jira)
    [ http://issues.apache.org/jira/browse/LUCENE-443?page=comments#action_12331735 ]

paul.elschot commented on LUCENE-443:
-------------------------------------

The score() method can be simplified to use only the 'i' index. There is no need
to offset the index into the scorers with the 'pos' variable, because all scorers match.

Regards,
Paul Elschot

> ConjunctionScorer tune-up
> -------------------------
>
>          Key: LUCENE-443
>          URL: http://issues.apache.org/jira/browse/LUCENE-443
>      Project: Lucene - Java
>         Type: Bug
>   Components: Search
>     Versions: 1.9
>  Environment: Linux, Java 1.5, Large Index with 4 million items and some heavily nested boolean queries
>     Reporter: Abdul Chaudhry
>  Attachments: ConjunctionScorer.java, ConjunctionScorer.java
>
> I just recently ran a load test on the latest code from lucene , which is using a new BooleanScore and noticed the ConjunctionScorer was crunching through objects , especially while sorting as part of the skipTo call. It turns a linked list into an array, sorts the array, then converts the array back to a linked list for further processing by the scoring engines below.
> 'm not sure if anyone else is experiencing this as I have a very large index (> 4 million items) and I am issuing some heavily nested queries
> Anyway, I decide to change the link list into an array and use a first and last marker to "simulate" a linked list.
> This scaled much better during my load test as the java gargbage collector was less - umm - virulent

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-443) ConjunctionScorer tune-up

Soren Daugaard (Jira)
In reply to this post by Soren Daugaard (Jira)
    [ http://issues.apache.org/jira/browse/LUCENE-443?page=comments#action_12331775 ]

Abdul Chaudhry commented on LUCENE-443:
---------------------------------------

ok, this makes sense as the scoring engine runs something like this

while (scorer.next()) {
  int doc = scorer.doc();
  float scorer = scorer.score();
  collector.collect(doc, score);
}

That is, next() will have ordered everything, so that by the time we call the scorer.score() method , everything should be in-order.

Thanks, ill give that a go.

The impression I have with lucene, and correct me if Im wrong, is that complex queries with many terms and clauses have their bottleneck in terms of performance in the ordering phase, that is scorer.next() requires everything to be in-document order and all the scorer sub-engines must comply. Collection is a moot point as you probably have small numbers of hits. However, on the other end of the scale, for queries with one or two terms that have a very high frequency the bottleneck is really in collection, that is the priority queue in collector.collect(),
Essentially this is a sorting issue, somewhat masked and manipulated at various stages.
This looks to me like lucene needs a "Query Plan".


> ConjunctionScorer tune-up
> -------------------------
>
>          Key: LUCENE-443
>          URL: http://issues.apache.org/jira/browse/LUCENE-443
>      Project: Lucene - Java
>         Type: Bug
>   Components: Search
>     Versions: 1.9
>  Environment: Linux, Java 1.5, Large Index with 4 million items and some heavily nested boolean queries
>     Reporter: Abdul Chaudhry
>  Attachments: ConjunctionScorer.java, ConjunctionScorer.java
>
> I just recently ran a load test on the latest code from lucene , which is using a new BooleanScore and noticed the ConjunctionScorer was crunching through objects , especially while sorting as part of the skipTo call. It turns a linked list into an array, sorts the array, then converts the array back to a linked list for further processing by the scoring engines below.
> 'm not sure if anyone else is experiencing this as I have a very large index (> 4 million items) and I am issuing some heavily nested queries
> Anyway, I decide to change the link list into an array and use a first and last marker to "simulate" a linked list.
> This scaled much better during my load test as the java gargbage collector was less - umm - virulent

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-443) ConjunctionScorer tune-up

Soren Daugaard (Jira)
In reply to this post by Soren Daugaard (Jira)
    [ http://issues.apache.org/jira/browse/LUCENE-443?page=comments#action_12331778 ]

paul.elschot commented on LUCENE-443:
-------------------------------------

Once the ConjunctionScorer matches the order of the subqueries/subscorers does not matter
because they all match and a sum score needs to be formed.

Scorer.next() can only tolerate documents not to be in order at top level, where hit collection is done.
At lower levels in nested scorers, Lucene works a document at a time, and there Scorer.skipTo(docNr)
requires that all document numbers are in order. Such skipping is needed for conjunctions.
Since the score value of a document for a query depends on the score value of the subqueries,
at some point the association on a single document must be done.
For conjunctions, skipTo() is used, but for disjunctions this association is done by
a priority queue in the trunk, and a distribution like method in 1.4.3. This distribution method works
somewhat loosely on document order, and is therefore incompatible with skipping.

Regards,
Paul Elschot


> ConjunctionScorer tune-up
> -------------------------
>
>          Key: LUCENE-443
>          URL: http://issues.apache.org/jira/browse/LUCENE-443
>      Project: Lucene - Java
>         Type: Bug
>   Components: Search
>     Versions: 1.9
>  Environment: Linux, Java 1.5, Large Index with 4 million items and some heavily nested boolean queries
>     Reporter: Abdul Chaudhry
>  Attachments: ConjunctionScorer.java, ConjunctionScorer.java
>
> I just recently ran a load test on the latest code from lucene , which is using a new BooleanScore and noticed the ConjunctionScorer was crunching through objects , especially while sorting as part of the skipTo call. It turns a linked list into an array, sorts the array, then converts the array back to a linked list for further processing by the scoring engines below.
> 'm not sure if anyone else is experiencing this as I have a very large index (> 4 million items) and I am issuing some heavily nested queries
> Anyway, I decide to change the link list into an array and use a first and last marker to "simulate" a linked list.
> This scaled much better during my load test as the java gargbage collector was less - umm - virulent

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-443) ConjunctionScorer tune-up

Soren Daugaard (Jira)
In reply to this post by Soren Daugaard (Jira)
    [ http://issues.apache.org/jira/browse/LUCENE-443?page=comments#action_12361284 ]

Yonik Seeley commented on LUCENE-443:
-------------------------------------

It seems like more cruft could be removed if support for incremental adding of subscorers was removed.
Is there a reason we can't simplify things and just pass Collection<Scorer> in the constructor?



> ConjunctionScorer tune-up
> -------------------------
>
>          Key: LUCENE-443
>          URL: http://issues.apache.org/jira/browse/LUCENE-443
>      Project: Lucene - Java
>         Type: Bug
>   Components: Search
>     Versions: 1.9
>  Environment: Linux, Java 1.5, Large Index with 4 million items and some heavily nested boolean queries
>     Reporter: Abdul Chaudhry
>  Attachments: ConjunctionScorer.java, ConjunctionScorer.java
>
> I just recently ran a load test on the latest code from lucene , which is using a new BooleanScore and noticed the ConjunctionScorer was crunching through objects , especially while sorting as part of the skipTo call. It turns a linked list into an array, sorts the array, then converts the array back to a linked list for further processing by the scoring engines below.
> 'm not sure if anyone else is experiencing this as I have a very large index (> 4 million items) and I am issuing some heavily nested queries
> Anyway, I decide to change the link list into an array and use a first and last marker to "simulate" a linked list.
> This scaled much better during my load test as the java gargbage collector was less - umm - virulent

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-443) ConjunctionScorer tune-up

Soren Daugaard (Jira)
In reply to this post by Soren Daugaard (Jira)
    [ http://issues.apache.org/jira/browse/LUCENE-443?page=comments#action_12436414 ]
           
Grant Ingersoll commented on LUCENE-443:
----------------------------------------

Yonik, Paul, do either of you know the status on this one?  From the looks of it, it hasn't been implemented.  It also has the highest number of votes in JIRA, so I thought I would take a look at it.  One downside is it is not in patch form, but it also doesn't look to hard to extract the changes, either.

One issue I have with these performance issues is that we don't have a reliable benchmarking suite.  I am not a lawyer, but might we be able to use something like http://trec.nist.gov/data/reuters/reuters.html to build a sample benchmark suite?  This corpus, plus 100 or so queries could work nicely.  Of course, we would have to figure out some way for those interested to get their hands on the data.  What do others do for benchmarking?

> ConjunctionScorer tune-up
> -------------------------
>
>                 Key: LUCENE-443
>                 URL: http://issues.apache.org/jira/browse/LUCENE-443
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 1.9
>         Environment: Linux, Java 1.5, Large Index with 4 million items and some heavily nested boolean queries
>            Reporter: Abdul Chaudhry
>         Attachments: ConjunctionScorer.java, ConjunctionScorer.java
>
>
> I just recently ran a load test on the latest code from lucene , which is using a new BooleanScore and noticed the ConjunctionScorer was crunching through objects , especially while sorting as part of the skipTo call. It turns a linked list into an array, sorts the array, then converts the array back to a linked list for further processing by the scoring engines below.
> 'm not sure if anyone else is experiencing this as I have a very large index (> 4 million items) and I am issuing some heavily nested queries
> Anyway, I decide to change the link list into an array and use a first and last marker to "simulate" a linked list.
> This scaled much better during my load test as the java gargbage collector was less - umm - virulent

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-443) ConjunctionScorer tune-up

Soren Daugaard (Jira)
In reply to this post by Soren Daugaard (Jira)
     [ http://issues.apache.org/jira/browse/LUCENE-443?page=all ]

Paul Elschot updated LUCENE-443:
--------------------------------

    Attachment: Conjunction20060921.patch

Iirc the orginal performance problem was caused by creation of objects in the tight loop
doing skipTo() on al  the scorers.

This patch is against current trunk and based on the earlier posted versions of ConjunctionScorer.
which was based (by the first poster) on an existing ConjunctionScorer with an ASL notice,
which is why I could grant the licence to ASF.

> ConjunctionScorer tune-up
> -------------------------
>
>                 Key: LUCENE-443
>                 URL: http://issues.apache.org/jira/browse/LUCENE-443
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 1.9
>         Environment: Linux, Java 1.5, Large Index with 4 million items and some heavily nested boolean queries
>            Reporter: Abdul Chaudhry
>         Attachments: Conjunction20060921.patch, ConjunctionScorer.java, ConjunctionScorer.java
>
>
> I just recently ran a load test on the latest code from lucene , which is using a new BooleanScore and noticed the ConjunctionScorer was crunching through objects , especially while sorting as part of the skipTo call. It turns a linked list into an array, sorts the array, then converts the array back to a linked list for further processing by the scoring engines below.
> 'm not sure if anyone else is experiencing this as I have a very large index (> 4 million items) and I am issuing some heavily nested queries
> Anyway, I decide to change the link list into an array and use a first and last marker to "simulate" a linked list.
> This scaled much better during my load test as the java gargbage collector was less - umm - virulent

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-443) ConjunctionScorer tune-up

Soren Daugaard (Jira)
In reply to this post by Soren Daugaard (Jira)
    [ http://issues.apache.org/jira/browse/LUCENE-443?page=comments#action_12436453 ]
           
Paul Elschot commented on LUCENE-443:
-------------------------------------

I just overlooked the grant by Abdul to the ASF.

> ConjunctionScorer tune-up
> -------------------------
>
>                 Key: LUCENE-443
>                 URL: http://issues.apache.org/jira/browse/LUCENE-443
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 1.9
>         Environment: Linux, Java 1.5, Large Index with 4 million items and some heavily nested boolean queries
>            Reporter: Abdul Chaudhry
>         Attachments: Conjunction20060921.patch, ConjunctionScorer.java, ConjunctionScorer.java
>
>
> I just recently ran a load test on the latest code from lucene , which is using a new BooleanScore and noticed the ConjunctionScorer was crunching through objects , especially while sorting as part of the skipTo call. It turns a linked list into an array, sorts the array, then converts the array back to a linked list for further processing by the scoring engines below.
> 'm not sure if anyone else is experiencing this as I have a very large index (> 4 million items) and I am issuing some heavily nested queries
> Anyway, I decide to change the link list into an array and use a first and last marker to "simulate" a linked list.
> This scaled much better during my load test as the java gargbage collector was less - umm - virulent

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (LUCENE-443) ConjunctionScorer tune-up

Soren Daugaard (Jira)
In reply to this post by Soren Daugaard (Jira)
     [ http://issues.apache.org/jira/browse/LUCENE-443?page=all ]

Yonik Seeley resolved LUCENE-443.
---------------------------------

    Fix Version/s: 2.1
       Resolution: Fixed
         Assignee: Yonik Seeley

Thanks!  I just committed this.

> ConjunctionScorer tune-up
> -------------------------
>
>                 Key: LUCENE-443
>                 URL: http://issues.apache.org/jira/browse/LUCENE-443
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 1.9
>         Environment: Linux, Java 1.5, Large Index with 4 million items and some heavily nested boolean queries
>            Reporter: Abdul Chaudhry
>         Assigned To: Yonik Seeley
>             Fix For: 2.1
>
>         Attachments: Conjunction20060921.patch, ConjunctionScorer.java, ConjunctionScorer.java
>
>
> I just recently ran a load test on the latest code from lucene , which is using a new BooleanScore and noticed the ConjunctionScorer was crunching through objects , especially while sorting as part of the skipTo call. It turns a linked list into an array, sorts the array, then converts the array back to a linked list for further processing by the scoring engines below.
> 'm not sure if anyone else is experiencing this as I have a very large index (> 4 million items) and I am issuing some heavily nested queries
> Anyway, I decide to change the link list into an array and use a first and last marker to "simulate" a linked list.
> This scaled much better during my load test as the java gargbage collector was less - umm - virulent

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Resolved: (LUCENE-443) ConjunctionScorer tune-up

Peter Keegan
I did some profile testing with the new ConjuctionScorer in 2.1 and
discovered a new bottleneck in ConjunctionScorer.sortScorers. The
java.utils.Arrays.sort method is cloning the Scorers array on every sort,
which is quite expensive on large indexes because of the size of the 'norms'
array within, and isn't necessary.  We rewrote this to use simple insertion
sorting as follows:

  private void sortScorers() {
// squeeze the array down for the sort
//    if (length != scorers.length) {
//      Scorer[] temps = new Scorer[length];
//      System.arraycopy(scorers, 0, temps, 0, length);
//      scorers = temps;
//    }
    insertionSort( scorers,length );
    // note that this comparator is not consistent with equals!
//    Arrays.sort(scorers, new Comparator() {         // sort the array
//        public int compare(Object o1, Object o2) {
//          return ((Scorer)o1).doc() - ((Scorer)o2).doc();
//        }
//      });

    first = 0;
    last = length - 1;
  }
  private void insertionSort( Scorer[] scores, int len)
  {
      for (int i=0; i<len; i++) {
          for (int j=i; j>0 && scores[j-1].doc() > scores[j].doc();j-- ) {
              swap (scores, j, j-1);
          }
      }
      return;
  }
  private void swap(Object[] x, int a, int b) {
    Object t = x[a];
    x[a] = x[b];
    x[b] = t;
  }

The squeezing of the array is no longer needed.
We also initialized the Scorers array to 8 (instead of 2) to avoid having to
grow the array for common queries, although this probably has less
performance impact.

This change added about 3% to query throughput.

Peter


On 10/17/06, Yonik Seeley (JIRA) <[hidden email]> wrote:

>
>      [ http://issues.apache.org/jira/browse/LUCENE-443?page=all ]
>
> Yonik Seeley resolved LUCENE-443.
> ---------------------------------
>
>     Fix Version/s: 2.1
>        Resolution: Fixed
>          Assignee: Yonik Seeley
>
> Thanks!  I just committed this.
>
> > ConjunctionScorer tune-up
> > -------------------------
> >
> >                 Key: LUCENE-443
> >                 URL: http://issues.apache.org/jira/browse/LUCENE-443
> >             Project: Lucene - Java
> >          Issue Type: Bug
> >          Components: Search
> >    Affects Versions: 1.9
> >         Environment: Linux, Java 1.5, Large Index with 4 million items
> and some heavily nested boolean queries
> >            Reporter: Abdul Chaudhry
> >         Assigned To: Yonik Seeley
> >             Fix For: 2.1
> >
> >         Attachments: Conjunction20060921.patch, ConjunctionScorer.java,
> ConjunctionScorer.java
> >
> >
> > I just recently ran a load test on the latest code from lucene , which
> is using a new BooleanScore and noticed the ConjunctionScorer was crunching
> through objects , especially while sorting as part of the skipTo call. It
> turns a linked list into an array, sorts the array, then converts the array
> back to a linked list for further processing by the scoring engines below.
> > 'm not sure if anyone else is experiencing this as I have a very large
> index (> 4 million items) and I am issuing some heavily nested queries
> > Anyway, I decide to change the link list into an array and use a first
> and last marker to "simulate" a linked list.
> > This scaled much better during my load test as the java gargbage
> collector was less - umm - virulent
>
> --
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the administrators:
> http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Resolved: (LUCENE-443) ConjunctionScorer tune-up

Yonik Seeley-2
On 10/23/06, Peter Keegan <[hidden email]> wrote:
> I did some profile testing with the new ConjuctionScorer in 2.1 and
> discovered a new bottleneck in ConjunctionScorer.sortScorers. The
> java.utils.Arrays.sort method is cloning the Scorers array on every sort,

Huh... that's interesting.  I wonder why Arrays.sort(int[]) is all
in-place but sort(Object[]) is not.

> which is quite expensive on large indexes because of the size of the 'norms'
> array within, and isn't necessary.  We rewrote this to use simple insertion
> sorting

The code looks more like a bubble sort to me.  Anyway, conjunctions
will normally not have many scorers I think so this should be OK.

Can you open a JIRA issue for this?


-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Resolved: (LUCENE-443) ConjunctionScorer tune-up

Paul Elschot
In reply to this post by Peter Keegan
On Monday 23 October 2006 22:12, Peter Keegan wrote:
> I did some profile testing with the new ConjuctionScorer in 2.1 and
> discovered a new bottleneck in ConjunctionScorer.sortScorers. The
> java.utils.Arrays.sort method is cloning the Scorers array on every sort,
> which is quite expensive on large indexes because of the size of the 'norms'
> array within, and isn't necessary.

Isn't the issue the creation of a new Comparator each time the scorers
are sorted? That could be easily fixed by keeping single comparator
around to do all the work.

Regards,
Paul Elschot


> We rewrote this to use simple insertion sorting as follows:
>
>   private void sortScorers() {
> // squeeze the array down for the sort
> //    if (length != scorers.length) {
> //      Scorer[] temps = new Scorer[length];
> //      System.arraycopy(scorers, 0, temps, 0, length);
> //      scorers = temps;
> //    }
>     insertionSort( scorers,length );
>     // note that this comparator is not consistent with equals!
> //    Arrays.sort(scorers, new Comparator() {         // sort the array
> //        public int compare(Object o1, Object o2) {
> //          return ((Scorer)o1).doc() - ((Scorer)o2).doc();
> //        }
> //      });
>
>     first = 0;
>     last = length - 1;
>   }
>   private void insertionSort( Scorer[] scores, int len)
>   {
>       for (int i=0; i<len; i++) {
>           for (int j=i; j>0 && scores[j-1].doc() > scores[j].doc();j-- ) {
>               swap (scores, j, j-1);
>           }
>       }
>       return;
>   }
>   private void swap(Object[] x, int a, int b) {
>     Object t = x[a];
>     x[a] = x[b];
>     x[b] = t;
>   }
>
> The squeezing of the array is no longer needed.
> We also initialized the Scorers array to 8 (instead of 2) to avoid having to
> grow the array for common queries, although this probably has less
> performance impact.
>
> This change added about 3% to query throughput.
>
> Peter
>
>
> On 10/17/06, Yonik Seeley (JIRA) <[hidden email]> wrote:
> >
> >      [ http://issues.apache.org/jira/browse/LUCENE-443?page=all ]
> >
> > Yonik Seeley resolved LUCENE-443.
> > ---------------------------------
> >
> >     Fix Version/s: 2.1
> >        Resolution: Fixed
> >          Assignee: Yonik Seeley
> >
> > Thanks!  I just committed this.
> >
> > > ConjunctionScorer tune-up
> > > -------------------------
> > >
> > >                 Key: LUCENE-443
> > >                 URL: http://issues.apache.org/jira/browse/LUCENE-443
> > >             Project: Lucene - Java
> > >          Issue Type: Bug
> > >          Components: Search
> > >    Affects Versions: 1.9
> > >         Environment: Linux, Java 1.5, Large Index with 4 million items
> > and some heavily nested boolean queries
> > >            Reporter: Abdul Chaudhry
> > >         Assigned To: Yonik Seeley
> > >             Fix For: 2.1
> > >
> > >         Attachments: Conjunction20060921.patch, ConjunctionScorer.java,
> > ConjunctionScorer.java
> > >
> > >
> > > I just recently ran a load test on the latest code from lucene , which
> > is using a new BooleanScore and noticed the ConjunctionScorer was
crunching
> > through objects , especially while sorting as part of the skipTo call. It
> > turns a linked list into an array, sorts the array, then converts the
array

> > back to a linked list for further processing by the scoring engines below.
> > > 'm not sure if anyone else is experiencing this as I have a very large
> > index (> 4 million items) and I am issuing some heavily nested queries
> > > Anyway, I decide to change the link list into an array and use a first
> > and last marker to "simulate" a linked list.
> > > This scaled much better during my load test as the java gargbage
> > collector was less - umm - virulent
> >
> > --
> > This message is automatically generated by JIRA.
> > -
> > If you think it was sent incorrectly contact one of the administrators:
> > http://issues.apache.org/jira/secure/Administrators.jspa
> > -
> > For more information on JIRA, see: http://www.atlassian.com/software/jira
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Resolved: (LUCENE-443) ConjunctionScorer tune-up

Peter Keegan
In reply to this post by Yonik Seeley-2
Huh... that's interesting.  I wonder why Arrays.sort(int[]) is all
in-place but sort(Object[]) is not.

I was wondering that myself. Here's the code:

    public static <T> void sort(T[] a, Comparator<? super T> c) {
    T[] aux = (T[])a.clone();
        if (c==null)
            mergeSort(aux, a, 0, a.length, 0);
        else
            mergeSort(aux, a, 0, a.length, 0, c);
    }
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Resolved: (LUCENE-443) ConjunctionScorer tune-up

Peter Keegan
In reply to this post by Paul Elschot
Isn't the issue the creation of a new Comparator each time the scorers
are sorted? That could be easily fixed by keeping single comparator
around to do all the work.

Yes, it's the Comparator, but I think even if you kept it around, the
Array.sort would still clone the Scorers, no?

Peter

On 10/23/06, Paul Elschot <[hidden email]> wrote:

>
> On Monday 23 October 2006 22:12, Peter Keegan wrote:
> > I did some profile testing with the new ConjuctionScorer in 2.1 and
> > discovered a new bottleneck in ConjunctionScorer.sortScorers. The
> > java.utils.Arrays.sort method is cloning the Scorers array on every
> sort,
> > which is quite expensive on large indexes because of the size of the
> 'norms'
> > array within, and isn't necessary.
>
> Isn't the issue the creation of a new Comparator each time the scorers
> are sorted? That could be easily fixed by keeping single comparator
> around to do all the work.
>
> Regards,
> Paul Elschot
>
>
> > We rewrote this to use simple insertion sorting as follows:
> >
> >   private void sortScorers() {
> > // squeeze the array down for the sort
> > //    if (length != scorers.length) {
> > //      Scorer[] temps = new Scorer[length];
> > //      System.arraycopy(scorers, 0, temps, 0, length);
> > //      scorers = temps;
> > //    }
> >     insertionSort( scorers,length );
> >     // note that this comparator is not consistent with equals!
> > //    Arrays.sort(scorers, new Comparator() {         // sort the array
> > //        public int compare(Object o1, Object o2) {
> > //          return ((Scorer)o1).doc() - ((Scorer)o2).doc();
> > //        }
> > //      });
> >
> >     first = 0;
> >     last = length - 1;
> >   }
> >   private void insertionSort( Scorer[] scores, int len)
> >   {
> >       for (int i=0; i<len; i++) {
> >           for (int j=i; j>0 && scores[j-1].doc() > scores[j].doc();j-- )
> {
> >               swap (scores, j, j-1);
> >           }
> >       }
> >       return;
> >   }
> >   private void swap(Object[] x, int a, int b) {
> >     Object t = x[a];
> >     x[a] = x[b];
> >     x[b] = t;
> >   }
> >
> > The squeezing of the array is no longer needed.
> > We also initialized the Scorers array to 8 (instead of 2) to avoid
> having to
> > grow the array for common queries, although this probably has less
> > performance impact.
> >
> > This change added about 3% to query throughput.
> >
> > Peter
> >
> >
> > On 10/17/06, Yonik Seeley (JIRA) <[hidden email]> wrote:
> > >
> > >      [ http://issues.apache.org/jira/browse/LUCENE-443?page=all ]
> > >
> > > Yonik Seeley resolved LUCENE-443.
> > > ---------------------------------
> > >
> > >     Fix Version/s: 2.1
> > >        Resolution: Fixed
> > >          Assignee: Yonik Seeley
> > >
> > > Thanks!  I just committed this.
> > >
> > > > ConjunctionScorer tune-up
> > > > -------------------------
> > > >
> > > >                 Key: LUCENE-443
> > > >                 URL: http://issues.apache.org/jira/browse/LUCENE-443
> > > >             Project: Lucene - Java
> > > >          Issue Type: Bug
> > > >          Components: Search
> > > >    Affects Versions: 1.9
> > > >         Environment: Linux, Java 1.5, Large Index with 4 million
> items
> > > and some heavily nested boolean queries
> > > >            Reporter: Abdul Chaudhry
> > > >         Assigned To: Yonik Seeley
> > > >             Fix For: 2.1
> > > >
> > > >         Attachments: Conjunction20060921.patch,
> ConjunctionScorer.java,
> > > ConjunctionScorer.java
> > > >
> > > >
> > > > I just recently ran a load test on the latest code from lucene ,
> which
> > > is using a new BooleanScore and noticed the ConjunctionScorer was
> crunching
> > > through objects , especially while sorting as part of the skipTo call.
> It
> > > turns a linked list into an array, sorts the array, then converts the
> array
> > > back to a linked list for further processing by the scoring engines
> below.
> > > > 'm not sure if anyone else is experiencing this as I have a very
> large
> > > index (> 4 million items) and I am issuing some heavily nested queries
> > > > Anyway, I decide to change the link list into an array and use a
> first
> > > and last marker to "simulate" a linked list.
> > > > This scaled much better during my load test as the java gargbage
> > > collector was less - umm - virulent
> > >
> > > --
> > > This message is automatically generated by JIRA.
> > > -
> > > If you think it was sent incorrectly contact one of the
> administrators:
> > > http://issues.apache.org/jira/secure/Administrators.jspa
> > > -
> > > For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [hidden email]
> > > For additional commands, e-mail: [hidden email]
> > >
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Resolved: (LUCENE-443) ConjunctionScorer tune-up

Peter Keegan
In reply to this post by Yonik Seeley-2
Can you open a JIRA issue for this?

Yes, it's: http://issues.apache.org/jira/browse/LUCENE-693

Peter