major searching performance improvement

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

major searching performance improvement

Robert Engels
Attached are files that dramatically improve the searching performance (2x improvement on several hardware configurations!) in a multithreaded, high concurrency environment.
 
The change has 3 parts:
 
1) remove synchronization required in SegmentReader document. This required changes to FieldsReader to handle concurrent access.
 
2) change FSDirectory to use a 'nio' to improve concurrency. Changed to use NioFile. This class has some workaround because under Windows, the FileChannel is not fully reentrant, and so allocates multiple handles per physical file - this code can be removed under non-Windows systems. This also required changes to InputStream to allow for reading at a direct offset.
 
3) move disk buffering into the Java layer to avoid the overhead of OS calls. The buffer percentage can be configured to store the entire index in memory. Running with as little as a 10% cache, the performance is dramatically improved. Reading larger blocks also improves the performance in most cases, but can actually degrade performance if doing very small reads. Using the cache implies that you have configured the JVM to have as much heap space available as the percent of index size on the disk. The NioFile can be easily changed to use a "soft" cache to avoid the potential of OutOfMemoryExceptions.
 
I welcome comments/feedback.
 
Robert Engels
 
 

SegmentReader.java (14K) Download Attachment
NioFile.java (3K) Download Attachment
LinkedHashMapLong.java (6K) Download Attachment
HashMapLong.java (10K) Download Attachment
MemoryLRUCache.java (1K) Download Attachment
FSDirectory.java (15K) Download Attachment
InputStream.java (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: major searching performance improvement

Doug Cutting
Robert Engels wrote:
> Attached are files that dramatically improve the searching performance
> (2x improvement on several hardware configurations!) in a multithreaded,
> high concurrency environment.

This looks like some good stuff!  Can you perhaps break it down into
independent, layered patches?  That way it would be easier to discuss
and integrate them.

> The change has 3 parts:
>  
> 1) remove synchronization required in SegmentReader document. This
> required changes to FieldsReader to handle concurrent access.

This makes good sense.  Stylistically, I would prefer the cloning be
done in ThreadLocal.initialValue().  That way if another method ever
needs the input streams the cloning code need not be altered.

> 2) change FSDirectory to use a 'nio' to improve concurrency. Changed to
> use NioFile. This class has some workaround because under Windows, the
> FileChannel is not fully reentrant, and so allocates multiple handles
> per physical file - this code can be removed under non-Windows
> systems. This also required changes to InputStream to allow for reading
> at a direct offset.

Could you please explore making this a new Directory class, extending
rather than replacing FSDirectory?  That would make it easier for folks
to evaluate.  Look at MMapDirectory for an example.

Also, did you compare the performance of this to MMapDirectory?  That
already uses nio, and should thus avoid the thread contention of
FSDirectory.  However it does not scale well on 32-bit machines whose
address space limits indexes to 4GB.

Finally, for Windows-specific code, you can check
org.apache.lucene.util.Constants.WINDOWS at runtime.

> 3) move disk buffering into the Java layer to avoid the overhead of OS
> calls. The buffer percentage can be configured to store the entire index
> in memory. Running with as little as a 10% cache, the performance is
> dramatically improved. Reading larger blocks also improves the
> performance in most cases, but can actually degrade performance if doing
> very small reads. Using the cache implies that you have configured the
> JVM to have as much heap space available as the percent of index size on
> the disk. The NioFile can be easily changed to use a "soft" cache to
> avoid the potential of OutOfMemoryExceptions.

It would be nice if this functionality could be layered on any
Directory.  Did you consider making a CachingDirectory that one can wrap
around an existing Directory implementation, that keeps an LRU cache of
data?  Even 10% by default will probably break a lot of applications.
At the Internet Archive I frequently search indexes 100GB gigabyte
indexes on machines with just 1GB of RAM.  So I am leery of enabling
this by default.

Cheers,

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: major searching performance improvement

Robert Engels
I will look at separating it out. I wanted to get initial feedback before
moving on.

1. I agree that the initialValue() is the way to go. I'll make the changes.

2. I agree that creating NioFSDirectory rather than modifying FSDirectory. I
originally felt the memory mapped files would be the fastest, but it also
requires OS calls, the "caching" code is CONSIDERABLY faster, since it does
not need to do any JNI, or make OS calls.

3. I think a "simple" fix for the case you cite, is to add an additional
'max size' parameter, which controls the maximum size of the cache for each
'segment file', so using the mergeFactor, and compound files, you can easily
compute what this max would be based on available memory and expected index
size (number of files). The problem with a SoftCache and indices of that
size, is that the JVM memory consumption would still grow to the limit
before it discarded anything (which may be ideal in some cases).

As for creating a CachingDirectory that can cache any directory that should
be feasible as well, but I am not sure it would perform as well as the
direct internal cache version.

Robert

-----Original Message-----
From: Doug Cutting [mailto:[hidden email]]
Sent: Wednesday, May 25, 2005 4:20 PM
To: [hidden email]
Subject: Re: major searching performance improvement


Robert Engels wrote:
> Attached are files that dramatically improve the searching performance
> (2x improvement on several hardware configurations!) in a multithreaded,
> high concurrency environment.

This looks like some good stuff!  Can you perhaps break it down into
independent, layered patches?  That way it would be easier to discuss
and integrate them.

> The change has 3 parts:
>
> 1) remove synchronization required in SegmentReader document. This
> required changes to FieldsReader to handle concurrent access.

This makes good sense.  Stylistically, I would prefer the cloning be
done in ThreadLocal.initialValue().  That way if another method ever
needs the input streams the cloning code need not be altered.

> 2) change FSDirectory to use a 'nio' to improve concurrency. Changed to
> use NioFile. This class has some workaround because under Windows, the
> FileChannel is not fully reentrant, and so allocates multiple handles
> per physical file - this code can be removed under non-Windows
> systems. This also required changes to InputStream to allow for reading
> at a direct offset.

Could you please explore making this a new Directory class, extending
rather than replacing FSDirectory?  That would make it easier for folks
to evaluate.  Look at MMapDirectory for an example.

Also, did you compare the performance of this to MMapDirectory?  That
already uses nio, and should thus avoid the thread contention of
FSDirectory.  However it does not scale well on 32-bit machines whose
address space limits indexes to 4GB.

Finally, for Windows-specific code, you can check
org.apache.lucene.util.Constants.WINDOWS at runtime.

> 3) move disk buffering into the Java layer to avoid the overhead of OS
> calls. The buffer percentage can be configured to store the entire index
> in memory. Running with as little as a 10% cache, the performance is
> dramatically improved. Reading larger blocks also improves the
> performance in most cases, but can actually degrade performance if doing
> very small reads. Using the cache implies that you have configured the
> JVM to have as much heap space available as the percent of index size on
> the disk. The NioFile can be easily changed to use a "soft" cache to
> avoid the potential of OutOfMemoryExceptions.

It would be nice if this functionality could be layered on any
Directory.  Did you consider making a CachingDirectory that one can wrap
around an existing Directory implementation, that keeps an LRU cache of
data?  Even 10% by default will probably break a lot of applications.
At the Internet Archive I frequently search indexes 100GB gigabyte
indexes on machines with just 1GB of RAM.  So I am leery of enabling
this by default.

Cheers,

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: major searching performance improvement

Yonik Seeley
Looks like really great stuff Robert!

> 2. I agree that creating NioFSDirectory rather than modifying FSDirectory. I
> originally felt the memory mapped files would be the fastest, but it also
> requires OS calls, the "caching" code is CONSIDERABLY faster, since it does
> not need to do any JNI, or make OS calls.

What do you mean by OS calls required by memory mapped files?

I'm not 100% sure how mmap works in java, but in C the only OS calls
are typically to set up and tear down the mapping.  Reads from the
mmaped region that are already in memory proceed at the same speed as
reads from any other piece of memory.


> The problem with a SoftCache and indices of that
> size, is that the JVM memory consumption would still grow to the limit
> before it discarded anything (which may be ideal in some cases).

Soft caches aren't great, esp with apps that generate a lot of garbage.
What might be better is a multi-level LRU that spills over into soft
references at a certain point.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: major searching performance improvement

Robert Engels
In reply to this post by Doug Cutting
Ok. Attached are the updated files. I also forgot some of the changed files
the first time around (CompoundFileReader also had synchronization that
needed to be removed).

Since I am working off the 1.4 codebase, I needed to modify FSDirectory,
since most of the methods are private and/or final. I don't have the time to
port to 1.9 at the moment, but maybe someone else does.

Some "benchmarks" using our "search server", which contains 1 million
generated documents, each with 10 fields, and a total index size of 340mb.

The client test creates 16 threads, where each thread does 20 searches (via
TCP to the server) on a random field & word, returning a maximum of 500
documents. I started the server under the 3 different configurations, and
ran the client test 5 times (without restarting the server).

The BEST times for each configuration (the client code is identical in all
cases).

Using the standard FS Directory with no changes:

84375 FindPerformanceTest : 17877010 hits in 84297 ms
0 FindPerformanceTest : 320 searches in 84297 ms, avg = 263
0 FindPerformanceTest : response time: min = 47, max = 48781, avg = 2555

Using the standard FS Directory with
'SegmentReader/FieldsReader/CompoundFileReader' concurrency changes:

31437 FindPerformanceTest : 19148916 hits in 31375 ms
0 FindPerformanceTest : 320 searches in 31375 ms, avg = 98
0 FindPerformanceTest : response time: min = 140, max = 3125, avg = 1504

Using the the NioFSDirectory with block size of 64k, and cache % of 30:

21031 FindPerformanceTest : 19558942 hits in 20938 ms
0 FindPerformanceTest : 320 searches in 20938 ms, avg = 65
0 FindPerformanceTest : response time: min = 31, max = 5890, avg = 922

Some notes...

1) my machine is relatively slow, only has a single processor, and was
running both the client "test" and the "search server" and becomes CPU bound
during concurrency. On a faster machine the avg response time went from 45ms
to 24ms using the concurrency and caching modifications.

2) you probably noticed that that the avg time is less than the minimum time
in some cases. This is due to thread switching, and calculated individual
response times. The avg time uses only the start time, end time, and number
of searches.

3) I would expect a Linux/Unix machine to show even greater improvement
since there will be less contention for the FileChannel.

4) The performance varies greatly with the BLOCKSIZE chosen in the NioFile.
I assume that this is going to be hardware/configuration and "type of
search" dependent, and as such is a configurable parameter.

Robert Engels

-----Original Message-----
From: Doug Cutting [mailto:[hidden email]]
Sent: Wednesday, May 25, 2005 4:20 PM
To: [hidden email]
Subject: Re: major searching performance improvement


Robert Engels wrote:
> Attached are files that dramatically improve the searching performance
> (2x improvement on several hardware configurations!) in a multithreaded,
> high concurrency environment.

This looks like some good stuff!  Can you perhaps break it down into
independent, layered patches?  That way it would be easier to discuss
and integrate them.

> The change has 3 parts:
>
> 1) remove synchronization required in SegmentReader document. This
> required changes to FieldsReader to handle concurrent access.

This makes good sense.  Stylistically, I would prefer the cloning be
done in ThreadLocal.initialValue().  That way if another method ever
needs the input streams the cloning code need not be altered.

> 2) change FSDirectory to use a 'nio' to improve concurrency. Changed to
> use NioFile. This class has some workaround because under Windows, the
> FileChannel is not fully reentrant, and so allocates multiple handles
> per physical file - this code can be removed under non-Windows
> systems. This also required changes to InputStream to allow for reading
> at a direct offset.

Could you please explore making this a new Directory class, extending
rather than replacing FSDirectory?  That would make it easier for folks
to evaluate.  Look at MMapDirectory for an example.

Also, did you compare the performance of this to MMapDirectory?  That
already uses nio, and should thus avoid the thread contention of
FSDirectory.  However it does not scale well on 32-bit machines whose
address space limits indexes to 4GB.

Finally, for Windows-specific code, you can check
org.apache.lucene.util.Constants.WINDOWS at runtime.

> 3) move disk buffering into the Java layer to avoid the overhead of OS
> calls. The buffer percentage can be configured to store the entire index
> in memory. Running with as little as a 10% cache, the performance is
> dramatically improved. Reading larger blocks also improves the
> performance in most cases, but can actually degrade performance if doing
> very small reads. Using the cache implies that you have configured the
> JVM to have as much heap space available as the percent of index size on
> the disk. The NioFile can be easily changed to use a "soft" cache to
> avoid the potential of OutOfMemoryExceptions.

It would be nice if this functionality could be layered on any
Directory.  Did you consider making a CachingDirectory that one can wrap
around an existing Directory implementation, that keeps an LRU cache of
data?  Even 10% by default will probably break a lot of applications.
At the Internet Archive I frequently search indexes 100GB gigabyte
indexes on machines with just 1GB of RAM.  So I am leery of enabling
this by default.

Cheers,

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

CompoundFileReader.java (8K) Download Attachment
FSDirectory.java (16K) Download Attachment
InputStream.java (9K) Download Attachment
SegmentReader.java (14K) Download Attachment
NioFSDirectory.java (1K) Download Attachment
FieldsReader.java (3K) Download Attachment
NioFile.java (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: major searching performance improvement

Robert Engels
In reply to this post by Yonik Seeley
It is my understanding of memory mapped files is that the file is assigned
an address range in the virtual address space, and using the MMU/paging
facilities the file is mapped into that range. Java cannot work with memory
pointers directly, so there is at minimum some JNI calls that are made when
filling the ByteBuffer (vs. C which will can read the memory addresses
directly).  I believe the "direct" buffers in nio improve this somewhat, but
still not to the efficiency of C.

I think the only caching performed for memory mapped files is the same that
is performed for any file by the OS, i.e. when a page of the mapped file
needs to be brought into main memory, it may be available in the disk cache,
and will be retrieved from there rather than from disk.

I agree with the soft cache... what I planned to do was change the
MemoryLRUCache to have a maximum hard size, and a maximum soft size, and
when blocks are evicted from the hard cache, they are put into the soft
cache using soft references. The soft cache will have a maximum number of
elements so that the soft cache will not grow to the maximum heap size.

Robert

-----Original Message-----
From: Yonik Seeley [mailto:[hidden email]]
Sent: Wednesday, May 25, 2005 9:13 PM
To: [hidden email]; [hidden email]
Subject: Re: major searching performance improvement


Looks like really great stuff Robert!

> 2. I agree that creating NioFSDirectory rather than modifying FSDirectory.
I
> originally felt the memory mapped files would be the fastest, but it also
> requires OS calls, the "caching" code is CONSIDERABLY faster, since it
does
> not need to do any JNI, or make OS calls.

What do you mean by OS calls required by memory mapped files?

I'm not 100% sure how mmap works in java, but in C the only OS calls
are typically to set up and tear down the mapping.  Reads from the
mmaped region that are already in memory proceed at the same speed as
reads from any other piece of memory.


> The problem with a SoftCache and indices of that
> size, is that the JVM memory consumption would still grow to the limit
> before it discarded anything (which may be ideal in some cases).

Soft caches aren't great, esp with apps that generate a lot of garbage.
What might be better is a multi-level LRU that spills over into soft
references at a certain point.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: major searching performance improvement

Doug Cutting
In reply to this post by Robert Engels
Robert Engels wrote:
> Ok. Attached are the updated files. I also forgot some of the changed files
> the first time around (CompoundFileReader also had synchronization that
> needed to be removed).

Again, it would be much easier to understand if you supplied patches,
i.e., diffs, so that we can focus on the changes.  Could you start a bug
report and attach the diffs there?

> Since I am working off the 1.4 codebase, I needed to modify FSDirectory,
> since most of the methods are private and/or final. I don't have the time to
> port to 1.9 at the moment, but maybe someone else does.

There is currently no development on the 1.4 branch.  So, if we want
these changes incorporated into Lucene, someone will need to port them
to the SVN trunk.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: major searching performance improvement

Doug Cutting
In reply to this post by Robert Engels
Robert Engels wrote:
> 2. I agree that creating NioFSDirectory rather than modifying FSDirectory. I
> originally felt the memory mapped files would be the fastest, but it also
> requires OS calls, the "caching" code is CONSIDERABLY faster, since it does
> not need to do any JNI, or make OS calls.

On the contrary, mmap, if implemented well, should require fewer system
calls and fewer buffer copies.  However, in prior, single-threaded
benchmarks, FSDirectory appeared to still be a bit faster than
MmapDirectory, as have other nio-based directories that folks have
tried.  So nio's implementation may not have been optimal, although
perhaps it is improving.  But in multi-threaded benchmarks like yours,
an nio-based Directory should shine.

> As for creating a CachingDirectory that can cache any directory that should
> be feasible as well, but I am not sure it would perform as well as the
> direct internal cache version.

I would be surprised if it is measurably slower.  It would add only a
single additional method call in the case of cache misses.

If separate, this could be layered on: FSDirectory for best
single-threaded performance; on an MMapDirectory for best multi-threaded
performance on 64-bit machines or 32-bit machines with indexes less than
a few hundred megabytes; or on an NioDirectory for best multi-threaded
performance on a 32-bit machine with a large index.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]