DO NOT REPLY [Bug 34930] - [PATCH] IndexWriter.maybeMergeSegments() takes lots of CPU resources

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

DO NOT REPLY [Bug 34930] - [PATCH] IndexWriter.maybeMergeSegments() takes lots of CPU resources

Bugzilla from bugzilla@apache.org
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG?
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=34930>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND?
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=34930





------- Additional Comments From [hidden email]  2005-05-25 06:40 -------
Yep, throwing and catching exception in the critical path is always a performance gotcha, common case
or not. See any VM implementation performance papers such as in the IBM Systems journal some years
ago, and others.

No idea why the javacc folks didn't come up with an API that does not involve exceptions for *normal*
control flow. Well, javacc has probably been unmaintained dead code for some time now. [Even Xerces
has such gotchas deep inside it's low level native API - I chatted with this some time ago with a Sun
engineer].

Anyway, you can preallocate the IOException in FastCharStream in  a private static final var, and then
throw the same exception object again and again on EOS. That gives some factor 2x the cheap way
because the stack trace does not have to be generated and filled repeatadly (Same for the QueryParser
copy of FastCharStream).

The other additional 5x comes from getting rid of the exception completely - catching exceptions is
expensive. This is done via dirty patching the javacc generated code to not require EOS exceptions at
all. Instead you can return 0xFFFF as an EOS marker, or some other unused Unicode value. Look at the
javacc generated code and see where it catches the EOS exception. That's where you'd need to fiddle
around, making sure true abnormal exceptions are still handled properly. It's really an akward
maintainance nightmare because it interferes with generated code, so I don't really recommend this.

Still, StandardAnalyzer eats CPU (and buffer memory) like there's no tomorrow. Instead, I'd recommend
giving PatternAnalyzer (from the "memory" SVN contrib area) a try. The functionality is almost the same
as StandardAnalyzer, but it can be many times faster, especially when using it with a String rather than
a Reader, and you don't have to wait indefinitely for lucene to get fixed.

--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]