performance drop on 27 oct?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

performance drop on 27 oct?

Rob Audenaerde
Hi all,

There seems to be a performance drop in some benchmarks, see:


etc.

Maybe it's worth annotating what caused this drop?

Thanks,
-Rob
Reply | Threaded
Open this post in threaded view
|

Re: performance drop on 27 oct?

Alan Woodward-2
That’s a very odd drop.  The only lucene commit that happened around then is LUCENE-8018, which really shouldn’t be making a difference to query performance.  And there’s no change to the PhraseQuery graphs.

Alan Woodward
www.flax.co.uk


On 13 Nov 2017, at 14:16, Rob Audenaerde <[hidden email]> wrote:

Hi all,

There seems to be a performance drop in some benchmarks, see:


etc.

Maybe it's worth annotating what caused this drop?

Thanks,
-Rob

Reply | Threaded
Open this post in threaded view
|

Re: performance drop on 27 oct?

Chris Hostetter-3

: That’s a very odd drop.  The only lucene commit that happened around
: then is LUCENE-8018, which really shouldn’t be making a difference to
: query performance.  And there’s no change to the PhraseQuery graphs.

Each run records the Git SHA it was run against -- the dip that's been
noted was beteen these 2 runs...

https://home.apache.org/~mikemccand/lucenebench/2017.10.24.22.16.06.html
  Lucene/Solr trunk rev 81a4f7cc9cebf9c75387b1b498b556f6aa799932
  luceneutil rev 09a663e8054625f5173a92a21c9dd82c5a753dd7

https://home.apache.org/~mikemccand/lucenebench/2017.10.27.22.16.00.html
  Lucene/Solr trunk rev f1a6b68d75e58f464b2ed4ee3702a6c1b14511a0
  luceneutil rev 09a663e8054625f5173a92a21c9dd82c5a753dd7

(although to be honest, i really don't understand the "stats" listed on
those URLs -- the "QPS prev" vs the "QPS now" doen't seem to match the
graph plot data for the same two dates -- so perhaps there is a reporting
glitch in what's actaully been tested in each run?)


Assuming those SHAs are correct, there are 2 other candidate commits
besides LUCENE-8018 (see below)

I'm not very familiar with exactly what code is run by each of these
benchmarks, but is it possible the Similarity changes in LUCENE-7997 had
an impact?  IIUC some stats/calculations were changed from floats to
doubles ... could that change account for this?


-Hoss
http://www.lucidworks.com/




hossman@tray:~/lucene/dev [master] $ git log 81a4f7cc9cebf9c75387b1b498b556f6aa799932..f1a6b68d75e58f464b2ed4ee3702a6c1b14511a0 lucene/
commit 401dda7e064b6f621cba405985143724d79620c4
Author: Adrien Grand <[hidden email]>
Date:   Fri Oct 27 08:36:27 2017 +0200

    LUCENE-8018: FieldInfos retains garbage if non-sparse.

commit 7d9cf438730e9cb7e251d1d2f5c6e81eba456f5c
Author: Steve Rowe <[hidden email]>
Date:   Thu Oct 26 17:12:26 2017 -0400

    Add 6.6.2 back compat test indexes.

commit 42717d5f4bbed46009f11a86f307541a19fd7fb5
Author: Robert Muir <[hidden email]>
Date:   Tue Oct 24 22:48:04 2017 -0400

    LUCENE-7997: More sanity testing of similarities




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: performance drop on 27 oct?

Robert Muir
On Mon, Nov 13, 2017 at 8:14 PM, Chris Hostetter
<[hidden email]> wrote:
>
> I'm not very familiar with exactly what code is run by each of these
> benchmarks, but is it possible the Similarity changes in LUCENE-7997 had
> an impact?  IIUC some stats/calculations were changed from floats to
> doubles ... could that change account for this?
>

It may be the case: the problem we found there is that the previous
BM25 did not obey the monotonicity requirements needed for score-based
optimizations such as LUCENE-4100 and LUCENE-7993. These algorithms
can greatly speed up our slowest queries (disjunctions, and phrase)
but need the similarity to be well-behaved in this way in order to be
correct.

In the BM25 case, scores would decrease in some situations with very
high TF values because of floating point issues, e.g. so
score(freq=100,000) would be unexpectedly less than
score(freq=99,999), all other things being equal. There may be other
ways to re-arrange the code to avoid this problem, feel free to open
an issue if you can optimize the code better while still behaving
properly!

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: performance drop on 27 oct?

Chris Hostetter-3
: In the BM25 case, scores would decrease in some situations with very
: high TF values because of floating point issues, e.g. so
: score(freq=100,000) would be unexpectedly less than
: score(freq=99,999), all other things being equal. There may be other
: ways to re-arrange the code to avoid this problem, feel free to open
: an issue if you can optimize the code better while still behaving
: properly!

i don't have any idea how to optimize the current code, and I am
completley willing to believe the changes in LUCENE-7997 are an
improvement in terms of correctness -- which is certainly more important
then performance -- I just wanted to point out that Alan's observation
about LUCENE-8018 being the only commit around the time the performance
graphs dip wasn't accurate before anyone started ripping their hair out
trying to explain it.

If you think the float/double math in LUCENE-7997 might explain the change
in mike's graphs, then maybe mike can annotate them to record that?

(Wild spit balling idea: would be worth while to offer an
"ImpreciseBM25Similarity" that used floats instead of doubles for people
who want to eek out every lsat bit of performance -- provided it was
heavily documented with caveats regarding inaccurate scores due to
rounding errors?)


-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: performance drop on 27 oct?

Walter Underwood
The other approach would be to do equality tests with a fuzz factor, because floating point is like that. But that would probably make things slower.

Here is an example of fuzzy equals:

https://github.com/OpenGamma/Strata/blob/master/modules/math/src/test/java/com/opengamma/strata/math/impl/FuzzyEquals.java

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)


On Nov 14, 2017, at 8:57 AM, Chris Hostetter <[hidden email]> wrote:

: In the BM25 case, scores would decrease in some situations with very
: high TF values because of floating point issues, e.g. so
: score(freq=100,000) would be unexpectedly less than
: score(freq=99,999), all other things being equal. There may be other
: ways to re-arrange the code to avoid this problem, feel free to open
: an issue if you can optimize the code better while still behaving
: properly!

i don't have any idea how to optimize the current code, and I am
completley willing to believe the changes in LUCENE-7997 are an
improvement in terms of correctness -- which is certainly more important
then performance -- I just wanted to point out that Alan's observation
about LUCENE-8018 being the only commit around the time the performance
graphs dip wasn't accurate before anyone started ripping their hair out
trying to explain it.

If you think the float/double math in LUCENE-7997 might explain the change
in mike's graphs, then maybe mike can annotate them to record that?

(Wild spit balling idea: would be worth while to offer an
"ImpreciseBM25Similarity" that used floats instead of doubles for people
who want to eek out every lsat bit of performance -- provided it was
heavily documented with caveats regarding inaccurate scores due to
rounding errors?)


-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: performance drop on 27 oct?

Robert Muir
In reply to this post by Chris Hostetter-3
On Tue, Nov 14, 2017 at 11:57 AM, Chris Hostetter
<[hidden email]> wrote:
>
> (Wild spit balling idea: would be worth while to offer an
> "ImpreciseBM25Similarity" that used floats instead of doubles for people
> who want to eek out every lsat bit of performance -- provided it was
> heavily documented with caveats regarding inaccurate scores due to
> rounding errors?)
>

I think you are missing the forest for the trees: after LUCENE-4100
and LUCENE-7993 it would really be much slower, for example like 10x
slower for boolean OR queries, because it would have no choice but to
return POSITIVE_INFINITY for maxScore(). And it would be much slower
for phrase queries too because it would be forced to always enumerate
all positions, and we'd have to add crappy methods so that it could
publicly confess its brokenness and fallback algorithms to phrase
scoring for that case. This is not sustainable and completely the
wrong tradeoff. Please read the issues that i referenced and see those
benchmarks, it is extremely important to understanding the entire
issue.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]