Poor performance "race condition" in FieldSortedHitQueue

classic Classic list List threaded Threaded
39 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Poor performance "race condition" in FieldSortedHitQueue

hutchiko
Hey all, just want to run an issue that I've recently identified while
looking at some performance issues we are having with our larger
indexes past you all.

Basically what we are seeing is that when there are a number of
concurrent searches being executed over a new IndexSearcher, the quite
expensive ScoreDocComparator generation that is done in the
FieldSortedHitQueue#getCachedComparator method ends up executing
multiple times rather the ideal case of once. This issue does not
effect the correctness of the searches only performance.

For my relatively weak understanding of the code the core of this
issue appears to lie with the FieldCacheImpl#getStringIndex method
which allows multiple concurrent requests to each generate their own
StringIndex rather than allowing the first request to do the
generation and then blocking subsequent requests until the first
request has finished.

Is this a know problem? Should I raise this as an issue or is this
"expected" behaviour. A solution would naturally require more
synchronization than is currently used but nothing particularly
complex.

Thanks,

Oliver

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Poor performance "race condition" in FieldSortedHitQueue

Yonik Seeley-2
The nature of the field cache itself means that the first sort on a
particular field can take a long, long time.  Synchronization won't
really help that much.

There are two ways around this...
1) incrementally generate the field cache (hard... not currently
supported by Lucene)
2) warm searchers in the background before exposing them to live
queries (the approach Solr takes).

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server


On 8/8/06, [hidden email] <[hidden email]> wrote:

> Hey all, just want to run an issue that I've recently identified while
> looking at some performance issues we are having with our larger
> indexes past you all.
>
> Basically what we are seeing is that when there are a number of
> concurrent searches being executed over a new IndexSearcher, the quite
> expensive ScoreDocComparator generation that is done in the
> FieldSortedHitQueue#getCachedComparator method ends up executing
> multiple times rather the ideal case of once. This issue does not
> effect the correctness of the searches only performance.
>
> For my relatively weak understanding of the code the core of this
> issue appears to lie with the FieldCacheImpl#getStringIndex method
> which allows multiple concurrent requests to each generate their own
> StringIndex rather than allowing the first request to do the
> generation and then blocking subsequent requests until the first
> request has finished.
>
> Is this a know problem? Should I raise this as an issue or is this
> "expected" behaviour. A solution would naturally require more
> synchronization than is currently used but nothing particularly
> complex.
>
> Thanks,
>
> Oliver

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Poor performance "race condition" in FieldSortedHitQueue

psmith-3

On 09/08/2006, at 12:47 PM, Yonik Seeley wrote:

> The nature of the field cache itself means that the first sort on a
> particular field can take a long, long time.  Synchronization won't
> really help that much.
>

I'm not so sure I agree with that.  If you have, say, 4 threads  
concurrently starting a search on a cold index, they will _all_  
effectively do a warm of the searcher, chewing up CPU and disk, which  
may be better utilised by other threads.  Wouldn't it be better for 1  
thread to do the warming while the others block waiting?

The option to warm-up the index before making it available to  
concurrent searches is effectively the same thing as this.  I would  
have thought it would be nicer to have it part of the search  
mechanism rather than rely on coders to constantly build that warm-
ing thread into their application.

My 5 Australian cents (currently 3.75 US cents).

Paul Smith

smime.p7s (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: Poor performance "race condition" in FieldSortedHitQueue

Oliver Hutchison-2
In reply to this post by Yonik Seeley-2
> The nature of the field cache itself means that the first
> sort on a particular field can take a long, long time.  
> Synchronization won't really help that much.

I think you may be misunderstanding my description (probably because it was
not clear enough :). The issue is not that the first search is going to take
a while as this is clearly unavoidable. The issue I'm seeing is that when
there are a number of concurrent searches that start executing before the
cache has been populated they *all* end up doing the very expensive
ScoreDocComparator generation rather than just one of them doing the
generation and the rest simply blocking until that one is done. The more
concurrent searches and the longer the generation takes the worse the effect
becomes.

> There are two ways around this...
> 2) warm searchers in the background before exposing them to
> live queries (the approach Solr takes).

This is basically how we are working around this issue, we don't actually
pre-warm the search results as we don't have a window in which to do this
but we do synchronize the FieldSortedHitQueue cache generation so it can
never gets executed more than once per index reader:

    private final Set<String> primedSortFields = new HashSet<String>();

    protected void primeCache(Sort sort) throws IOException {
        // This synchronized block allows us to be sure that a given sort
field is only primed once
        // per searcher rather than the multiple times Lucene may prime the
field if left to its
        // own devices (something we *really* want to avoid for big
indexes).
        synchronized (primedSortFields) {
            SortField[] sortFields = sort.getSort();
            for (int i = 0; i < sortFields.length; i++) {
                SortField sortField = sortFields[i];
                if (!primedSortFields.contains(sortField.getField())) {
                    primedSortFields.add(sortField.getField());
                    new FieldSortedHitQueue(getIndexReader(), new
SortField[] { sortField }, 0);
                }
            }
        }
    }

obviously this not ideal.

Thanks,

Oliver





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Poor performance "race condition" in FieldSortedHitQueue

Otis Gospodnetic-2
In reply to this post by hutchiko
Hi Oliver,

I think Yonik simply misunderstood you in that earlier email.
Have you tried modifying that FieldSortedHitQueue class and making the appropriate method(s) synchronized?
It sounds like that would fix the issue. If it does, please let us know.

Otis

----- Original Message ----
From: [hidden email]
To: [hidden email]
Sent: Tuesday, August 8, 2006 2:05:36 AM
Subject: Poor performance "race condition" in FieldSortedHitQueue

Hey all, just want to run an issue that I've recently identified while
looking at some performance issues we are having with our larger
indexes past you all.

Basically what we are seeing is that when there are a number of
concurrent searches being executed over a new IndexSearcher, the quite
expensive ScoreDocComparator generation that is done in the
FieldSortedHitQueue#getCachedComparator method ends up executing
multiple times rather the ideal case of once. This issue does not
effect the correctness of the searches only performance.

For my relatively weak understanding of the code the core of this
issue appears to lie with the FieldCacheImpl#getStringIndex method
which allows multiple concurrent requests to each generate their own
StringIndex rather than allowing the first request to do the
generation and then blocking subsequent requests until the first
request has finished.

Is this a know problem? Should I raise this as an issue or is this
"expected" behaviour. A solution would naturally require more
synchronization than is currently used but nothing particularly
complex.

Thanks,

Oliver

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Poor performance "race condition" in FieldSortedHitQueue

Doron Cohen
Hi Otis,

I think that synchronizing the entire method would be an overkill - instead
it would be sufficient to synchronize on a "by field" object so that only
if two requests for the same "cold/missing" field are racing, one of them
would wait for the other to complete loading that field.  I think there is
no need to that a lookup() for field2 would wait while a different field1
is being loaded.  I am not sure if IO wise it makes sense to serialize the
loading of two different fields (i.e. the case that both field1 and field2
are not in the readerCache), I would prefer not to.
One fast way to do this, for testing performance impact in Oliver's test
case, would be to sync on the interned field name. as follows:

  public StringIndex getStringIndex (IndexReader reader, String field)
  throws IOException {
    field = field.intern();
    synchronize(field) {  // < ----------- line added
      Object ret = lookup (reader, field, STRING_INDEX, null);
      if (ret == null) {
         final int[] retArray = new int[reader.maxDoc()];
         ... load field to cache ...
      }

This way only requests for (loading) the same field would wait. But for the
working code, it wouls be better to maintain a by-field (and by-reader)
object to avoid messing up with a system wide string - who knows who else
is synchronizing on it...

Hope this makes sense,
Doron

Otis Gospodnetic <[hidden email]> wrote on 08/08/2006 21:07:41:

> Hi Oliver,
>
> I think Yonik simply misunderstood you in that earlier email.
> Have you tried modifying that FieldSortedHitQueue class and making
> the appropriate method(s) synchronized?
> It sounds like that would fix the issue. If it does, please let us know.
>
> Otis
>
> ----- Original Message ----
> From: [hidden email]
> To: [hidden email]
> Sent: Tuesday, August 8, 2006 2:05:36 AM
> Subject: Poor performance "race condition" in FieldSortedHitQueue
>
> Hey all, just want to run an issue that I've recently identified while
> looking at some performance issues we are having with our larger
> indexes past you all.
>
> Basically what we are seeing is that when there are a number of
> concurrent searches being executed over a new IndexSearcher, the quite
> expensive ScoreDocComparator generation that is done in the
> FieldSortedHitQueue#getCachedComparator method ends up executing
> multiple times rather the ideal case of once. This issue does not
> effect the correctness of the searches only performance.
>
> For my relatively weak understanding of the code the core of this
> issue appears to lie with the FieldCacheImpl#getStringIndex method
> which allows multiple concurrent requests to each generate their own
> StringIndex rather than allowing the first request to do the
> generation and then blocking subsequent requests until the first
> request has finished.
>
> Is this a know problem? Should I raise this as an issue or is this
> "expected" behaviour. A solution would naturally require more
> synchronization than is currently used but nothing particularly
> complex.
>
> Thanks,
>
> Oliver
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Poor performance "race condition" in FieldSortedHitQueue

Oliver Hutchison-2
Otis, Doron, thanks for the feedback.

First up I'd just like to say that I totally agree with Doron on this - any
attempt to fix this issue needs to be done using as fine grain
synchronization as is possible or you'd just be introducing a new bottle
neck.

It terms of the level of granularity, the work around I posted in my
previous email and the approach suggested by Doron are basically the same
(though Doron's code is certainly preferable) and I can certainly say that
synchronizing the object creation against the field name does solve the
problem.

However I have another solution that I'm working on that may be cleaner - by
encapsulate the caching logic that is currently spread across FieldCacheImpl
and FieldSortedHitQueue it becomes quite easy to implement a more complex
but certainly more fine grained level of synchronization and we don't have
to worry about synchronizing against an interned String or using some other
trick to synchronize on the field name.

I currently have:

public abstract class Cache {

        private final Map readerCache = new WeakHashMap();

        protected Cache() {
        }

        protected abstract Object createValue(IndexReader reader, Object
key)
                        throws IOException;

        public Object get(IndexReader reader, Object key) throws IOException
{
                Map innerCache;
                Object value;
                synchronized (readerCache) {
                        innerCache = (Map) readerCache.get(reader);
                        // no inner cache create it
                        if (innerCache == null) {
                                innerCache = new HashMap();
                                readerCache.put(reader, innerCache);
                                value = null;
                        } else {
                                value = innerCache.get(key);
                        }
                        if (value == null) {
                                value = new CreationPlaceholder();
                                readerCache.put(reader, value);
                        }
                }
                if (value instanceof CreationPlaceholder) {
                        // must be one of the first threads to request this
value,
                        // synchronize on the CreationPlaceholder so we
don't block
                        // any other calls for different values
                        CreationPlaceholder ph = (CreationPlaceholder)
value;
                        synchronized (ph) {
                                // if this thread is the very first one to
reach this point
                                // then this test will be true and we should
do the creation
  if (ph.value == null) {
                                        ph.value = createValue(reader, key);
                                        synchronized (readerCache) {
                                                innerCache.put(key,
ph.value);
                                        }
                                }
                                return ph.value;
                        }
                }
                return value;
        }

        static final class CreationPlaceholder {
                Object value;
        }
}


class FieldCacheImpl implements FieldCache {

...

        public String[] getStrings(IndexReader reader, String field)
                        throws IOException {
                return (String[]) stringsCache.get(reader, field);
        }

        Cache stringsCache = new Cache() {

                protected Object createValue(IndexReader reader, Object
fieldKey)
                                throws IOException {
                        String field = ((String) fieldKey).intern();

... create String[] ...

                        return retArray;
                }
        };

        public StringIndex getStringIndex(IndexReader reader, String field)
                        throws IOException {
                return (StringIndex) stringsIndexCache.get(reader, field);
        }

        Cache stringsIndexCache = new Cache() {

                protected Object createValue(IndexReader reader, Object
fieldKey)
                                throws IOException {
                        String field = ((String) fieldKey).intern();

... create StringIndex ...

                        return value;
                }
        };

... etc
 
}

Is this an avenue worth pursuing further? Or are you guys happy to simply
synchronizing on the field?

Thanks again,

Oliver


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Poor performance "race condition" in FieldSortedHitQueue

Yonik Seeley-2
In reply to this post by Oliver Hutchison-2
On 8/8/06, Oliver Hutchison <[hidden email]> wrote:
> > The nature of the field cache itself means that the first
> > sort on a particular field can take a long, long time.
> > Synchronization won't really help that much.
>
> I think you may be misunderstanding my description (probably because it was
> not clear enough :). The issue is not that the first search is going to take
> a while as this is clearly unavoidable.

Sorry, I understood your problem perfectly, I just wasn't clear on
what I was saying.

My point was that for many uses, even the first-sorted-search delay is
not acceptable (and fixing multiple threads trying to fill the same
cache entry wouldn't solve that).  The warming-in-the-background that
Solr currenty uses solves both.

For those who can't warm in the background though, synchronizing
per-fieldcache entry would probably be a good idea.

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Poor performance "race condition" in FieldSortedHitQueue

Yonik Seeley-2
In reply to this post by Doron Cohen
On 8/9/06, Doron Cohen <[hidden email]> wrote:
>   public StringIndex getStringIndex (IndexReader reader, String field)
>   throws IOException {
>     field = field.intern();
>     synchronize(field) {  // < ----------- line added
>       Object ret = lookup (reader, field, STRING_INDEX, null);
>       if (ret == null) {
>          final int[] retArray = new int[reader.maxDoc()];
>          ... load field to cache ...
>       }

Assuming "field" wasn't being used to synchronize on something else,
this would still block *all* IndexReaders/Searchers trying to sort on
that field.

In Solr, it would make the situation worse.  If I had my warmed-up
IndexSearcher serving live requests, and a new Searcher is opened in
the background to be warmed, a getStringIndex(warming,"foo") would
also block all getStringIndex(live,"foo").

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Poor performance "race condition" in FieldSortedHitQueue

Yonik Seeley-2
In reply to this post by Oliver Hutchison-2
Definitely the right track Oliver... it's called a blocking map (most
easily implemented in Java5 via Future).  I don't think you need two
maps though, right?  just stick a placeholder in the outer map.

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

On 8/9/06, Oliver Hutchison <[hidden email]> wrote:

> Otis, Doron, thanks for the feedback.
>
> First up I'd just like to say that I totally agree with Doron on this - any
> attempt to fix this issue needs to be done using as fine grain
> synchronization as is possible or you'd just be introducing a new bottle
> neck.
>
> It terms of the level of granularity, the work around I posted in my
> previous email and the approach suggested by Doron are basically the same
> (though Doron's code is certainly preferable) and I can certainly say that
> synchronizing the object creation against the field name does solve the
> problem.
>
> However I have another solution that I'm working on that may be cleaner - by
> encapsulate the caching logic that is currently spread across FieldCacheImpl
> and FieldSortedHitQueue it becomes quite easy to implement a more complex
> but certainly more fine grained level of synchronization and we don't have
> to worry about synchronizing against an interned String or using some other
> trick to synchronize on the field name.
>
> I currently have:
>
> public abstract class Cache {
>
>         private final Map readerCache = new WeakHashMap();
>
>         protected Cache() {
>         }
>
>         protected abstract Object createValue(IndexReader reader, Object
> key)
>                         throws IOException;
>
>         public Object get(IndexReader reader, Object key) throws IOException
> {
>                 Map innerCache;
>                 Object value;
>                 synchronized (readerCache) {
>                         innerCache = (Map) readerCache.get(reader);
>                         // no inner cache create it
>                         if (innerCache == null) {
>                                 innerCache = new HashMap();
>                                 readerCache.put(reader, innerCache);
>                                 value = null;
>                         } else {
>                                 value = innerCache.get(key);
>                         }
>                         if (value == null) {
>                                 value = new CreationPlaceholder();
>                                 readerCache.put(reader, value);
>                         }
>                 }
>                 if (value instanceof CreationPlaceholder) {
>                         // must be one of the first threads to request this
> value,
>                         // synchronize on the CreationPlaceholder so we
> don't block
>                         // any other calls for different values
>                         CreationPlaceholder ph = (CreationPlaceholder)
> value;
>                         synchronized (ph) {
>                                 // if this thread is the very first one to
> reach this point
>                                 // then this test will be true and we should
> do the creation
>                                 if (ph.value == null) {
>                                         ph.value = createValue(reader, key);
>                                         synchronized (readerCache) {
>                                                 innerCache.put(key,
> ph.value);
>                                         }
>                                 }
>                                 return ph.value;
>                         }
>                 }
>                 return value;
>         }
>
>         static final class CreationPlaceholder {
>                 Object value;
>         }
> }
>
>
> class FieldCacheImpl implements FieldCache {
>
> ...
>
>         public String[] getStrings(IndexReader reader, String field)
>                         throws IOException {
>                 return (String[]) stringsCache.get(reader, field);
>         }
>
>         Cache stringsCache = new Cache() {
>
>                 protected Object createValue(IndexReader reader, Object
> fieldKey)
>                                 throws IOException {
>                         String field = ((String) fieldKey).intern();
>
> ... create String[] ...
>
>                         return retArray;
>                 }
>         };
>
>         public StringIndex getStringIndex(IndexReader reader, String field)
>                         throws IOException {
>                 return (StringIndex) stringsIndexCache.get(reader, field);
>         }
>
>         Cache stringsIndexCache = new Cache() {
>
>                 protected Object createValue(IndexReader reader, Object
> fieldKey)
>                                 throws IOException {
>                         String field = ((String) fieldKey).intern();
>
> ... create StringIndex ...
>
>                         return value;
>                 }
>         };
>
> ... etc
>
> }
>
> Is this an avenue worth pursuing further? Or are you guys happy to simply
> synchronizing on the field?
>
> Thanks again,
>
> Oliver

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Poor performance "race condition" in FieldSortedHitQueue

Oliver Hutchison-2
Yonik,

> most easily implemented in Java5 via Future.

I didn't use Java5 as I had a feeling that code is Lucene needs to compile
on Java1.3 right?

> I don't think you need two maps though, right?  just stick a
> placeholder in the outer map.

I'm using 2 maps mainly because it simplifies the implementation.
Technically all that is needed is a singe map with a key that is a composite
of index reader and field name however, given that there is also the
requirement that we only maintain a weak reference to the index reader and
the associated need to clean up the cache if the reader gets gc'd, it was
simpler for me to simulate the composite key using the 2 maps.

On a related note it would be great if there was a way to plug a custom
FieldCache implementation into Lucene, given there is a FieldCache interface
it's a shame there's no way to actually provide an alternative
implementation.

Since there seems to be a decent level of interest in this I'll try to put
together a patch and raise an issue over the next few days.

Cheers,

Oliver

> On 8/9/06, Oliver Hutchison <[hidden email]> wrote:
> > Otis, Doron, thanks for the feedback.
> >
> > First up I'd just like to say that I totally agree with
> Doron on this
> > - any attempt to fix this issue needs to be done using as
> fine grain
> > synchronization as is possible or you'd just be introducing a new
> > bottle neck.
> >
> > It terms of the level of granularity, the work around I
> posted in my
> > previous email and the approach suggested by Doron are
> basically the
> > same (though Doron's code is certainly preferable) and I
> can certainly
> > say that synchronizing the object creation against the
> field name does
> > solve the problem.
> >
> > However I have another solution that I'm working on that may be
> > cleaner - by encapsulate the caching logic that is currently spread
> > across FieldCacheImpl and FieldSortedHitQueue it becomes
> quite easy to
> > implement a more complex but certainly more fine grained level of
> > synchronization and we don't have to worry about
> synchronizing against
> > an interned String or using some other trick to synchronize
> on the field name.
> >
> > I currently have:
> >
> > public abstract class Cache {
> >
> >         private final Map readerCache = new WeakHashMap();
> >
> >         protected Cache() {
> >         }
> >
> >         protected abstract Object createValue(IndexReader reader,
> > Object
> > key)
> >                         throws IOException;
> >
> >         public Object get(IndexReader reader, Object key) throws
> > IOException {
> >                 Map innerCache;
> >                 Object value;
> >                 synchronized (readerCache) {
> >                         innerCache = (Map) readerCache.get(reader);
> >                         // no inner cache create it
> >                         if (innerCache == null) {
> >                                 innerCache = new HashMap();
> >                                 readerCache.put(reader, innerCache);
> >                                 value = null;
> >                         } else {
> >                                 value = innerCache.get(key);
> >                         }
> >                         if (value == null) {
> >                                 value = new CreationPlaceholder();
> >                                 readerCache.put(reader, value);
> >                         }
> >                 }
> >                 if (value instanceof CreationPlaceholder) {
> >                         // must be one of the first threads
> to request
> > this value,
> >                         // synchronize on the
> CreationPlaceholder so
> > we don't block
> >                         // any other calls for different values
> >                         CreationPlaceholder ph =
> (CreationPlaceholder)
> > value;
> >                         synchronized (ph) {
> >                                 // if this thread is the very first
> > one to reach this point
> >                                 // then this test will be
> true and we
> > should do the creation
> >                                 if (ph.value == null) {
> >                                         ph.value =
> createValue(reader, key);
> >                                         synchronized (readerCache) {
> >                                                 innerCache.put(key,
> > ph.value);
> >                                         }
> >                                 }
> >                                 return ph.value;
> >                         }
> >                 }
> >                 return value;
> >         }
> >
> >         static final class CreationPlaceholder {
> >                 Object value;
> >         }
> > }
> >
> >
> > class FieldCacheImpl implements FieldCache {
> >
> > ...
> >
> >         public String[] getStrings(IndexReader reader, String field)
> >                         throws IOException {
> >                 return (String[]) stringsCache.get(reader, field);
> >         }
> >
> >         Cache stringsCache = new Cache() {
> >
> >                 protected Object createValue(IndexReader reader,
> > Object
> > fieldKey)
> >                                 throws IOException {
> >                         String field = ((String) fieldKey).intern();
> >
> > ... create String[] ...
> >
> >                         return retArray;
> >                 }
> >         };
> >
> >         public StringIndex getStringIndex(IndexReader
> reader, String field)
> >                         throws IOException {
> >                 return (StringIndex)
> stringsIndexCache.get(reader, field);
> >         }
> >
> >         Cache stringsIndexCache = new Cache() {
> >
> >                 protected Object createValue(IndexReader reader,
> > Object
> > fieldKey)
> >                                 throws IOException {
> >                         String field = ((String) fieldKey).intern();
> >
> > ... create StringIndex ...
> >
> >                         return value;
> >                 }
> >         };
> >
> > ... etc
> >
> > }
> >
> > Is this an avenue worth pursuing further? Or are you guys happy to
> > simply synchronizing on the field?
> >
> > Thanks again,
> >
> > Oliver
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Poor performance "race condition" in FieldSortedHitQueue

Yonik Seeley-2
On 8/9/06, Oliver Hutchison <[hidden email]> wrote:
> Yonik,
>
> > most easily implemented in Java5 via Future.
>
> I didn't use Java5 as I had a feeling that code is Lucene needs to compile
> on Java1.3 right?

Lucene 2 currently requires Java 1.4

It was really just a side comment - people have implemented these
blocking maps before, and I've seen it done with Java5 concurrency
things like Future - a natural fit.  The way you were going about it
is perfectly fine though.

> > I don't think you need two maps though, right?  just stick a
> > placeholder in the outer map.
>
> I'm using 2 maps mainly because it simplifies the implementation.
> Technically all that is needed is a singe map with a key that is a composite
> of index reader and field name however, given that there is also the
> requirement that we only maintain a weak reference to the index reader and
> the associated need to clean up the cache if the reader gets gc'd, it was
> simpler for me to simulate the composite key using the 2 maps.

Ah, right... I browsed your code a bit too fast.  It looks fine.

> On a related note it would be great if there was a way to plug a custom
> FieldCache implementation into Lucene, given there is a FieldCache interface
> it's a shame there's no way to actually provide an alternative
> implementation.

Well, there's FieldCache.DEFAULT

  /** Expert: The cache used internally by sorting and range query classes. */
  public static FieldCache DEFAULT = new FieldCacheImpl();


-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Poor performance "race condition" in FieldSortedHitQueue

Oliver Hutchison-2

> Ah, right... I browsed your code a bit too fast.  It looks fine.

Great.

> > On a related note it would be great if there was a way to plug a
> > custom FieldCache implementation into Lucene, given there is a
> > FieldCache interface it's a shame there's no way to
> actually provide
> > an alternative implementation.
>
> Well, there's FieldCache.DEFAULT

I thought the exact same thing but what I'd forgotten was that all fields on
an interface are implicitly final.

http://java.sun.com/docs/books/jls/third_edition/html/interfaces.html#9.3

Anyway, thanks for the feedback I'll raise a couple of issues shortly.

Cheers,

Oliver


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Poor performance "race condition" in FieldSortedHitQueue

Yonik Seeley-2
On 8/9/06, Oliver Hutchison <[hidden email]> wrote:
> > Well, there's FieldCache.DEFAULT
>
> I thought the exact same thing but what I'd forgotten was that all fields on
> an interface are implicitly final.

Heh... interfaces strike again.

Well then since we *know* that no one has their own implementation
(because they would not have been able to register it), we should be
able to safely upgrade the interface to a class (anyone want to supply
a patch?)

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Poor performance "race condition" in FieldSortedHitQueue

Doron Cohen
In reply to this post by Yonik Seeley-2
[hidden email] wrote on 09/08/2006 11:22:12:
> Assuming "field" wasn't being used to synchronize on something else,
> this would still block *all* IndexReaders/Searchers trying to sort on
> that field.
>
> In Solr, it would make the situation worse.  If I had my warmed-up
> IndexSearcher serving live requests, and a new Searcher is opened in
> the background to be warmed, a getStringIndex(warming,"foo") would
> also block all getStringIndex(live,"foo").

Right, this is what I had in mind with "by-field (and by-reader)" (a few
lines further).

- Doron


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Poor performance "race condition" in FieldSortedHitQueue

Doron Cohen
In reply to this post by Yonik Seeley-2
[hidden email] wrote on 09/08/2006 20:32:20:
> Heh... interfaces strike again.
>
> Well then since we *know* that no one has their own implementation
> (because they would not have been able to register it), we should be
> able to safely upgrade the interface to a class (anyone want to supply
> a patch?)
>
> -Yonik

I'd be happy to do supply this patch - unless someone already works on it
(Oliver?).

I have one more comment on the cache implementation. It feels to me
somewhat not right that a static system wide object (FieldCache.DEFAULT) is
managing the field caching for all the indexReaders in the JVM (possibly of
different indexes), when in fact there is no
dependency/relation/cooperation between the different indexReaders, cache
wise. It seems cleaner and simpler to have FieldCacheImpl take care of a
single IndexReader, and so have that cache "belong" to the indexReader.
This would make the cache implementation simpler. Synchronization would
only need to be on field values. This way we also get rid of the
WeakHashMap (which, btw, I never got to fully trust).

Regards,
Doron


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Poor performance "race condition" in FieldSortedHitQueue

Oliver Hutchison-2

> [hidden email] wrote on 09/08/2006 20:32:20:
> > Heh... interfaces strike again.
> >
> > Well then since we *know* that no one has their own implementation
> > (because they would not have been able to register it), we
> should be
> > able to safely upgrade the interface to a class (anyone
> want to supply
> > a patch?)
> >
> > -Yonik
>
> I'd be happy to do supply this patch - unless someone already
> works on it (Oliver?).

I was intending to do this but perhaps this is not needed given your
following comments.

> I have one more comment on the cache implementation. It feels
> to me somewhat not right that a static system wide object
> (FieldCache.DEFAULT) is managing the field caching for all
> the indexReaders in the JVM (possibly of different indexes),
> when in fact there is no dependency/relation/cooperation
> between the different indexReaders, cache wise. It seems
> cleaner and simpler to have FieldCacheImpl take care of a
> single IndexReader, and so have that cache "belong" to the
> indexReader.
> This would make the cache implementation simpler.
> Synchronization would only need to be on field values. This
> way we also get rid of the WeakHashMap (which, btw, I never
> got to fully trust).

This sounds like a much nicer solution than what I was proposing. I'm still
happy to produce a patch if that would be helpful?

Cheers,

Oliver



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

NPE when sorting on a field that is missing from a doc

Oliver Hutchison-2
Hi all,

we have recently noticed that doing a locale sensitive sort on a field that
is missing from some docs causes an NPE inside the call to Collator#compare
at FieldSortedHitQueue line 320 (Lucene 2.0 src):

static ScoreDocComparator comparatorStringLocale (final IndexReader reader,
final String fieldname, final Locale locale)
  throws IOException {
    final Collator collator = Collator.getInstance (locale);
    final String field = fieldname.intern();
    final String[] index = FieldCache.DEFAULT.getStrings (reader, field);
    return new ScoreDocComparator() {

      public final int compare (final ScoreDoc i, final ScoreDoc j) {
                return collator.compare (index[i.doc], index[j.doc]);  <----
NPE in compare call one/both param null
      }


From looking at the standard String, float and int sorting and reading
LUCENE-406 I assume this in not expected behavior and that docs that do not
include the field should be sorted to appear at the start of the list of
results.

Is this a know issue? If not I'll raise the issue and create a patch.

Thanks again,

Oliver



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Poor performance "race condition" in FieldSortedHitQueue

Yonik Seeley-2
In reply to this post by Doron Cohen
On 8/10/06, Doron Cohen <[hidden email]> wrote:

> I have one more comment on the cache implementation. It feels to me
> somewhat not right that a static system wide object (FieldCache.DEFAULT) is
> managing the field caching for all the indexReaders in the JVM (possibly of
> different indexes), when in fact there is no
> dependency/relation/cooperation between the different indexReaders, cache
> wise. It seems cleaner and simpler to have FieldCacheImpl take care of a
> single IndexReader, and so have that cache "belong" to the indexReader.
> This would make the cache implementation simpler. Synchronization would
> only need to be on field values. This way we also get rid of the
> WeakHashMap (which, btw, I never got to fully trust).

Sorting was introduced to Lucene before my time, so I don't know the
reasons behind it.  Maybe it was seen as non-optimial or non-core and
so was kept out of the IndexReader.

I admit, it does feel like the level of abstraction that FieldCache is
at is higher than that of the IndexReader (the lowest level).

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Poor performance "race condition" in FieldSortedHitQueue

Doron Cohen

> On 8/10/06, Doron Cohen <[hidden email]> wrote:
> Sorting was introduced to Lucene before my time, so I don't know the
> reasons behind it.  Maybe it was seen as non-optimial or non-core and
> so was kept out of the IndexReader.
>
> I admit, it does feel like the level of abstraction that FieldCache is
> at is higher than that of the IndexReader (the lowest level).

... right, thanks, now I see what you mean. In other words, IndexReader
provides the ability to read/iterate terms and docs, but caching the term
values per doc is for a higher layer - this way keeping IndexReader simpler
and maintainable. So I guess Oliver can continue with the change as he
proposed it.




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

12