Refactored FuzzyTermEnum

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Refactored FuzzyTermEnum

Bob Carpenter
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Refactored FuzzyTermEnum

Chris Hostetter-3

: I don't know what the protocol is for one-off contributions.

I'm not sure what you mean by a "one-off contribution" but the process for
submiting changes/improvements/additions was recenlty wiki-ized...

http://wiki.apache.org/jakarta-lucene/HowToContribute

(Incidently, please feel free to share any comments/questions you have
about hte process that you feel the wiki doesn't address well)

: I'm happy with the Apache license, so that shouldn't

When you go to attach your patch to a Jira issue, there will be an option
to make it explicit that you "Grant license to ASF for inclusion in ASF
works"

: be a problem.  I also don't know whether you use tabs
: or spaces -- I untabified the final version and used your
: two-space format in emacs.

the majority of the lucene code base seems to be no tabs, 2 space
indenting -- but the golden rule is don't change the formating of any code
(even code that is hideously formatted) as part of a patch that changes
functionality.  Formatting changes should be in their own patches/commits
so it's clear that's all that's changed.


One comment i have about your new submission is that it's not clear to me
what the purpose of your changes are -- please don't feel that's a
criticism, i just don't use FuzzyQueries myself, so maybe it's obvious to
others, but when opening the Jira issue please make sure the problem
statement that motivated you to make these changes is explained.  If it
was to fix a bug, please include a test case that fails without this
change; if it was to improve performance, please include some sample code
that demonstrates the performance improvement; if it was to add new
functionality, please include a test that demonstrates the new
functionality working.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Refactored FuzzyTermEnum

Marvin Humphrey
In reply to this post by Bob Carpenter

On Jun 13, 2006, at 12:14 PM, Bob Carpenter wrote:

> Does anyone have regression/performance test harnesses?

There's been talk of formalizing a benchmarker suite.  Andrzej wrote  
something a while ago; I don't know if it would be appropriate for this.

The benchmarker I wrote is indexing only, unfortunately.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Refactored FuzzyTermEnum

Karl Wettin-3
In reply to this post by Bob Carpenter
On Tue, 2006-06-13 at 15:14 -0400, Bob Carpenter wrote:
> I refactored the org.apache.lucene.search.FuzzyTermEnum

Nice!

> Does anyone have regression/performance test harnesses?

I have a quite unscientific test I ran. 500 documents in corpus. One
index reader. A new Index searcher for each query iteration. 100
iterations. Each iteration contains 10 queries:

is.search(new FuzzyQuery(new Term(field, "hare"), d));
is.search(new FuzzyQuery(new Term(field, "nimi"), d));
is.search(new FuzzyQuery(new Term(field, "miama"), d));
is.search(new FuzzyQuery(new Term(field, "mamma"), d));
is.search(new FuzzyQuery(new Term(field, "sumatra"), d));
is.search(new FuzzyQuery(new Term(field, "buch"), d));
is.search(new FuzzyQuery(new Term(field, "busch"), d));
is.search(new FuzzyQuery(new Term(field, "hejples"), d));
is.search(new FuzzyQuery(new Term(field, "sveden"), d));
is.search(new FuzzyQuery(new Term(field, "cwedish"), d));

I do not gather the Documents from the index.


1. On my issue 550-index

Old implementation:

1000 fuzzy~0.1, 633 per min.
1000 fuzzy~0.2, 784 per min.
1000 fuzzy~0.3, 1236 per min.
1000 fuzzy~0.4, 1462 per min.
1000 fuzzy~0.5, 1917 per min.
1000 fuzzy~0.6, 2574 per min.
1000 fuzzy~0.7, 2750 per min.
1000 fuzzy~0.8, 3375 per min.
1000 fuzzy~0.9, 3524 per min.

With your fixes:

1000 fuzzy~0.1, 603 per min.
1000 fuzzy~0.2, 886 per min.
1000 fuzzy~0.3, 1403 per min.
1000 fuzzy~0.4, 1681 per min.
1000 fuzzy~0.5, 2165 per min.
1000 fuzzy~0.6, 2961 per min.
1000 fuzzy~0.7, 3137 per min.
1000 fuzzy~0.8, 3948 per min.
1000 fuzzy~0.9, 4594 per min.

2. Standard RAMDirectory:

Old implementation:

1000 fuzzy~0.1, 121 per min.
1000 fuzzy~0.2, 190 per min.
1000 fuzzy~0.3, 342 per min.
1000 fuzzy~0.4, 456 per min.
1000 fuzzy~0.5, 578 per min.
1000 fuzzy~0.6, 632 per min.
1000 fuzzy~0.7, 645 per min.
1000 fuzzy~0.8, 679 per min.
1000 fuzzy~0.9, 696 per min.

With your fixes:

1000 fuzzy~0.1, 117 per min.
1000 fuzzy~0.2, 185 per min.
1000 fuzzy~0.3, 329 per min.
1000 fuzzy~0.4, 425 per min.
1000 fuzzy~0.5, 585 per min.
1000 fuzzy~0.6, 612 per min.
1000 fuzzy~0.7, 653 per min.
1000 fuzzy~0.8, 615 per min.
1000 fuzzy~0.9, 671 per min.




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Refactored FuzzyTermEnum

Karl Wettin-3
On Thu, 2006-06-15 at 11:42 +0200, karl wettin wrote:
>
> 2. Standard RAMDirectory:

Oups, I got the headers wrong here. Again:

With your fixes:

1000 fuzzy~0.1, 121 per min.
1000 fuzzy~0.2, 190 per min.
1000 fuzzy~0.3, 342 per min.
1000 fuzzy~0.4, 456 per min.
1000 fuzzy~0.5, 578 per min.
1000 fuzzy~0.6, 632 per min.
1000 fuzzy~0.7, 645 per min.
1000 fuzzy~0.8, 679 per min.
1000 fuzzy~0.9, 696 per min.

Old implementation:

1000 fuzzy~0.1, 117 per min.
1000 fuzzy~0.2, 185 per min.
1000 fuzzy~0.3, 329 per min.
1000 fuzzy~0.4, 425 per min.
1000 fuzzy~0.5, 585 per min.
1000 fuzzy~0.6, 612 per min.
1000 fuzzy~0.7, 653 per min.
1000 fuzzy~0.8, 615 per min.
1000 fuzzy~0.9, 671 per min.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Refactored FuzzyTermEnum

Otis Gospodnetic-2
In reply to this post by Marvin Humphrey
I'm still waiting for my employer to send in the CCLA. :(

But I've got a cool name for this thing - "lube" - contrib/lube.

Otis

----- Original Message ----
From: Marvin Humphrey <[hidden email]>
To: [hidden email]
Sent: Wednesday, June 14, 2006 3:51:52 AM
Subject: Re: Refactored FuzzyTermEnum


On Jun 13, 2006, at 12:14 PM, Bob Carpenter wrote:

> Does anyone have regression/performance test harnesses?

There's been talk of formalizing a benchmarker suite.  Andrzej wrote  
something a while ago; I don't know if it would be appropriate for this.

The benchmarker I wrote is indexing only, unfortunately.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]