LUCENE-831 (complete cache overhaul) -> mem use

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

LUCENE-831 (complete cache overhaul) -> mem use

britske
Hi,

I recently saw activity on LUCENE-831 (Complete overhaul of FieldCache API/Implementation) which I have interest in.
I posted previously on this with my concern that given the current default cache I sometimes get OOM-errors because I have a lot of fields which are sorted on, which ultimately causes the fieldcache to grow greater then available RAM.

ultimately I want to subclass the new pluggable Fieldcache of lucene-831 to offload to disk (using ehcache or memcachedB or something) but havn't found the time yet.

What I would like to know for now is if perhaps the newly implemented standard cache in LUCENE-831 uses another strategy of caching than the standard Fieldcache in Lucene.

i.e: The normal cache consumes memory while generating a fieldcache for every document in lucene even though the document hasn't got that field set.

Since my documents are very sparse in these fields I want to sort on it would differ a_lot when documents that don't have the field in question set don't add up in the used memory.

So am I lucky? Or would I indeed have to cook up something myself?
Thanks and best regards,

Geert-Jan

Reply | Threaded
Open this post in threaded view
|

Re: LUCENE-831 (complete cache overhaul) -> mem use

Pablo Saavedra
I have the same problem with cache and too many sorted fields, and had to
implement a big workaround to be able to plug my own cache implementation in
lucene 2.3.2. What I'd really like to see in the new cache implementation is
easier pluggability and extension of the lucene classes, which is currently
not possible due to visibility issues.

My 2 cents.

2008/11/14 Britske <[hidden email]>

>
> Hi,
>
> I recently saw activity on LUCENE-831 (Complete overhaul of FieldCache
> API/Implementation) which I have interest in.
> I posted previously on this with my concern that given the current default
> cache I sometimes get OOM-errors because I have a lot of fields which are
> sorted on, which ultimately causes the fieldcache to grow greater then
> available RAM.
>
> ultimately I want to subclass the new pluggable Fieldcache of lucene-831 to
> offload to disk (using ehcache or memcachedB or something) but havn't found
> the time yet.
>
> What I would like to know for now is if perhaps the newly implemented
> standard cache in LUCENE-831 uses another strategy of caching than the
> standard Fieldcache in Lucene.
>
> i.e: The normal cache consumes memory while generating a fieldcache for
> every document in lucene even though the document hasn't got that field
> set.
>
> Since my documents are very sparse in these fields I want to sort on it
> would differ a_lot when documents that don't have the field in question set
> don't add up in the used memory.
>
> So am I lucky? Or would I indeed have to cook up something myself?
> Thanks and best regards,
>
> Geert-Jan
>
>
> --
> View this message in context:
> http://www.nabble.com/LUCENE-831-%28complete-cache-overhaul%29--%3E-mem-use-tp20505283p20505283.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: LUCENE-831 (complete cache overhaul) -> mem use

Mark Miller-3
In reply to this post by britske
Its hard to predict the future of LUCENE-831. I would bet that it will
end up in Lucene at some point in one form or another, but its hard to
say if that form will be whats in the available patches (I'm a contrib
committer so I won't have any real say in that, so take that prediction
with a grain of salt). It has strong ties to other issues and a
committer hasn't really had their whack at it yet.

Having said that though, LUCENE-831 allows for two types for dealing
with field values: either the old style int/string/long/etc arrays, or
for a small speed hit and faster reopens, an ArrayObject type that is
basically an Object that can provide access to one or two real or
virtual arrays. So technically you could use an ArrayObject that had a
sparse implementation behind it. Unfortunately, you would have to
implement new CachKeys to do this. Trivial to do, but reveals our
LUCENE-831 problem of exponential cachkey increases with every new
little option/idea and the juggling of which to use. I havn't thought
about it, but I'm hoping an API tweak can alleviate some of this.

- Mark

Britske wrote:

> Hi,
>
> I recently saw activity on LUCENE-831 (Complete overhaul of FieldCache
> API/Implementation) which I have interest in.
> I posted previously on this with my concern that given the current default
> cache I sometimes get OOM-errors because I have a lot of fields which are
> sorted on, which ultimately causes the fieldcache to grow greater then
> available RAM.
>
> ultimately I want to subclass the new pluggable Fieldcache of lucene-831 to
> offload to disk (using ehcache or memcachedB or something) but havn't found
> the time yet.
>
> What I would like to know for now is if perhaps the newly implemented
> standard cache in LUCENE-831 uses another strategy of caching than the
> standard Fieldcache in Lucene.
>
> i.e: The normal cache consumes memory while generating a fieldcache for
> every document in lucene even though the document hasn't got that field set.
>
> Since my documents are very sparse in these fields I want to sort on it
> would differ a_lot when documents that don't have the field in question set
> don't add up in the used memory.
>
> So am I lucky? Or would I indeed have to cook up something myself?
> Thanks and best regards,
>
> Geert-Jan
>
>
>  
I'm

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Extremely Large Strings Comparison (slightly off-topic)

Aaron Schon
hi I need to compare two Base64 representation strings of some MIME content that I am storing within a Lucene index. I need to efficiently compare them to find the closest match to a query Base64 string , post Lucene query.

I am not sure of the best way to approach this, could I compare the hashes and compute their similarity? Levenshtein distance seems hard because of the size of ths strings and seems inefficient? Is there any other method you could suggest?

n.b: The idea is to not to determine exact match or not, it is to compute a similarity metric. for example

John & Johnson (closer)
vs,
John & Jimmy (farther)

tia,
Aaron


     

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Extremely Large Strings Comparison (slightly off-topic)

Aaron Schon
Thanks for responding Jonathan. I will look into k-grams approach.

The objects could differ by small local changes. To provide some business context, the application requires indexing email messages and attachments. If the attachments differ by some threshold (users making edits/reviews), the attachment needs to be flagged and major/minor versioned.

Thanks
AS


----- Original Message ----
From: Jonathan Young <[hidden email]>
To: Aaron Schon <[hidden email]>
Sent: Friday, November 14, 2008 5:52:17 PM
Subject: RE: Extremely Large Strings Comparison (slightly off-topic)

Aaron - Although a naïve implementation of a Levenshtein distance metric takes O(n*m) time, if you are willing to bound the maximum distance by k << n,m (e.g. if you aren't interested in distances greater than k) then the distance calculation can take O(k*min(n,m)).

Another approach would be to use shingles or k-grams from the original document.  I believe Lucene has some support for that.

It depends a lot on the ways in which you expect the two strings to differ.  The fact that it is Base64-encoded MIME content doesn't matter - are they actually objects which are going to have small local changes, or are there likely to be changes which cascade (e.g. if the data is compressed).

I hope this helps,

--- Jonathan

-----Original Message-----
From: Aaron Schon [mailto:[hidden email]]
Sent: Friday, November 14, 2008 5:00 PM
To: [hidden email]
Subject: Extremely Large Strings Comparison (slightly off-topic)

hi I need to compare two Base64 representation strings of some MIME content that I am storing within a Lucene index. I need to efficiently compare them to find the closest match to a query Base64 string , post Lucene query.

I am not sure of the best way to approach this, could I compare the hashes and compute their similarity? Levenshtein distance seems hard because of the size of ths strings and seems inefficient? Is there any other method you could suggest?

n.b: The idea is to not to determine exact match or not, it is to compute a similarity metric. for example

John & Johnson (closer)
vs,
John & Jimmy (farther)

tia,
Aaron




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: LUCENE-831 (complete cache overhaul) -> mem use

britske
In reply to this post by Pablo Saavedra
But isn't this pluggability / visibility issues resolved with using lucene-831? Or did you roll-your-own before 831 was available?

PabloS wrote
I have the same problem with cache and too many sorted fields, and had to
implement a big workaround to be able to plug my own cache implementation in
lucene 2.3.2. What I'd really like to see in the new cache implementation is
easier pluggability and extension of the lucene classes, which is currently
not possible due to visibility issues.

My 2 cents.

2008/11/14 Britske <gbrits@gmail.com>

>
> Hi,
>
> I recently saw activity on LUCENE-831 (Complete overhaul of FieldCache
> API/Implementation) which I have interest in.
> I posted previously on this with my concern that given the current default
> cache I sometimes get OOM-errors because I have a lot of fields which are
> sorted on, which ultimately causes the fieldcache to grow greater then
> available RAM.
>
> ultimately I want to subclass the new pluggable Fieldcache of lucene-831 to
> offload to disk (using ehcache or memcachedB or something) but havn't found
> the time yet.
>
> What I would like to know for now is if perhaps the newly implemented
> standard cache in LUCENE-831 uses another strategy of caching than the
> standard Fieldcache in Lucene.
>
> i.e: The normal cache consumes memory while generating a fieldcache for
> every document in lucene even though the document hasn't got that field
> set.
>
> Since my documents are very sparse in these fields I want to sort on it
> would differ a_lot when documents that don't have the field in question set
> don't add up in the used memory.
>
> So am I lucky? Or would I indeed have to cook up something myself?
> Thanks and best regards,
>
> Geert-Jan
>
>
> --
> View this message in context:
> http://www.nabble.com/LUCENE-831-%28complete-cache-overhaul%29--%3E-mem-use-tp20505283p20505283.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Reply | Threaded
Open this post in threaded view
|

Re: LUCENE-831 (complete cache overhaul) -> mem use

britske
In reply to this post by Mark Miller-3
That ArrayObject suggestion makes sense to me. It amost seemed to be as if you were referring as this option (or at least the interfaces needed to implement this) were already available as 1 out of 2 options available in 831?

Could you give me a hint at were I have to be looking to extend what you're suggesting?
a new Cache, CacheFactory and Cachekey implementaiton for all types of cachekeys? This may sound a bit ignorant, but it would be my first time to get my head around the internals of an api instead of merely using it to imbed in a client application so any help is highly appreciated.  

Thanks for your help,

Geert-Jan


markrmiller wrote
Its hard to predict the future of LUCENE-831. I would bet that it will
end up in Lucene at some point in one form or another, but its hard to
say if that form will be whats in the available patches (I'm a contrib
committer so I won't have any real say in that, so take that prediction
with a grain of salt). It has strong ties to other issues and a
committer hasn't really had their whack at it yet.

Having said that though, LUCENE-831 allows for two types for dealing
with field values: either the old style int/string/long/etc arrays, or
for a small speed hit and faster reopens, an ArrayObject type that is
basically an Object that can provide access to one or two real or
virtual arrays. So technically you could use an ArrayObject that had a
sparse implementation behind it. Unfortunately, you would have to
implement new CachKeys to do this. Trivial to do, but reveals our
LUCENE-831 problem of exponential cachkey increases with every new
little option/idea and the juggling of which to use. I havn't thought
about it, but I'm hoping an API tweak can alleviate some of this.

- Mark

Britske wrote:
> Hi,
>
> I recently saw activity on LUCENE-831 (Complete overhaul of FieldCache
> API/Implementation) which I have interest in.
> I posted previously on this with my concern that given the current default
> cache I sometimes get OOM-errors because I have a lot of fields which are
> sorted on, which ultimately causes the fieldcache to grow greater then
> available RAM.
>
> ultimately I want to subclass the new pluggable Fieldcache of lucene-831 to
> offload to disk (using ehcache or memcachedB or something) but havn't found
> the time yet.
>
> What I would like to know for now is if perhaps the newly implemented
> standard cache in LUCENE-831 uses another strategy of caching than the
> standard Fieldcache in Lucene.
>
> i.e: The normal cache consumes memory while generating a fieldcache for
> every document in lucene even though the document hasn't got that field set.
>
> Since my documents are very sparse in these fields I want to sort on it
> would differ a_lot when documents that don't have the field in question set
> don't add up in the used memory.
>
> So am I lucky? Or would I indeed have to cook up something myself?
> Thanks and best regards,
>
> Geert-Jan
>
>
>  
I'm

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Reply | Threaded
Open this post in threaded view
|

Re: LUCENE-831 (complete cache overhaul) -> mem use

Mark Miller-3
Like I said, its pretty easy to add this, but its also going to suck.
Kind of exposes the fact that its missing the right extensibility at the
moment. Things are still a bit ugly overall.


Your going to need new CacheKeys for the data types you want to support.
A CacheKey builds and provides access to the field data and is simply:


*public* *abstract* *class* CacheKey {

*public* *abstract* CacheData buildData(IndexReader r);

*public* *abstract* *boolean* equals(Object o);

*public* *abstract* *int* hashCode();

*public* *boolean* isMergable();

*public* CacheData mergeData(*int*[] starts, CacheData[] data) ;

*public* *boolean* usesObjectArray();


For a sparse storage implementation you would use an object array, so
have usesObjectArray return true and isMergable can then be false and
you dont have to support the mergeData method.


In buildData you will load your object array and return it. Here is an
array backed IntObjectArrayCacheKey build method:

*public* CacheData buildData(IndexReader reader) *throws* IOException {

   *final* *int*[] retArray = getIntArray(reader);

   ObjectArray fieldValues = *new* ObjectArray() {

     *public* Object get(*int* index) {

       *return* *new* Integer(retArray[index]);

     }

   };

   *return* *new* CacheData(fieldValues);

}


*protected* *int*[] getIntArray(IndexReader reader) *throws* IOException {

   *final* *int*[] retArray = *new* *int*[reader.maxDoc()];

   TermDocs termDocs = reader.termDocs();

   TermEnum termEnum = reader.terms(*new* Term(field, ""));

   *try* {

     *do* {

       Term term = termEnum.term();

       *if* (term == *null* || term.field() != field)
*        break*;
*    
      int* termval = parser.parseInt(term.text());

       termDocs.seek(termEnum);

       *while* (termDocs.next()) {
        retArray[termDocs.doc()] = termval;
      }

     } *while* (termEnum.next());

   } *finally* {

     termDocs.close();

     termEnum.close();

   }

   *return* retArray;

}


So it should be fairly straightforward to return a sparse implementation
backed object array from your new CacheKey (SparseIntObjectArrayCacheKey
or something).

Now some more ugliness: You can turn on the ObjectArray cachekeys by
setting the system property 'use.object.array.sort' to true. This will
cause FieldSortedHitQueue to return ScoreDocComparators that use the
standard ObjectArray CacheKeys, IntObjectArrayCacheKey,
FloatObjectArrayCacheKey, etc.The method that builds each comparator
type knows what type to build for and whether to use primitive arrays or
ObjectArrays ie (from FieldSortedHitQueue):


*static* ScoreDocComparator comparatorDoubleOA(*final* IndexReader
reader, *final* String fieldname)


does this (it has to provide the CacheKey and know the return type):


*final* ObjectArray fieldOrder = (ObjectArray)
reader.getCachedData(*new*
DoubleObjectArrayCacheKey(field)).getCachePayload();


So you have to either change all of the ObjectArray comparator builders
to use your CacheKeys:


*final* ObjectArray fieldOrder = (ObjectArray)
reader.getCachedData(*new*
SparseIntObjectArrayCacheKey(field)).getCachePayload();


Or you have to add more options in
FieldSortedHitQueue.CacheEntry.buildData(IndexReader reader) and more
static comparator builders in FieldSortedHitQueue that use the right
CacheKeys. Obviously not very extensibility friendly at the moment. I'm
sure with some thought, things could be much better. If you decided to
jump into any of this, let me know if you have any suggestions, feedback.


- Mark



Britske wrote:

> That ArrayObject suggestion makes sense to me. It amost seemed to be as if
> you were referring as this option (or at least the interfaces needed to
> implement this) were already available as 1 out of 2 options available in
> 831?
>
> Could you give me a hint at were I have to be looking to extend what you're
> suggesting?
> a new Cache, CacheFactory and Cachekey implementaiton for all types of
> cachekeys? This may sound a bit ignorant, but it would be my first time to
> get my head around the internals of an api instead of merely using it to
> imbed in a client application so any help is highly appreciated.  
>
> Thanks for your help,
>
> Geert-Jan
>
>
>
> markrmiller wrote:
>  
>> Its hard to predict the future of LUCENE-831. I would bet that it will
>> end up in Lucene at some point in one form or another, but its hard to
>> say if that form will be whats in the available patches (I'm a contrib
>> committer so I won't have any real say in that, so take that prediction
>> with a grain of salt). It has strong ties to other issues and a
>> committer hasn't really had their whack at it yet.
>>
>> Having said that though, LUCENE-831 allows for two types for dealing
>> with field values: either the old style int/string/long/etc arrays, or
>> for a small speed hit and faster reopens, an ArrayObject type that is
>> basically an Object that can provide access to one or two real or
>> virtual arrays. So technically you could use an ArrayObject that had a
>> sparse implementation behind it. Unfortunately, you would have to
>> implement new CachKeys to do this. Trivial to do, but reveals our
>> LUCENE-831 problem of exponential cachkey increases with every new
>> little option/idea and the juggling of which to use. I havn't thought
>> about it, but I'm hoping an API tweak can alleviate some of this.
>>
>> - Mark
>>
>> Britske wrote:
>>    
>>> Hi,
>>>
>>> I recently saw activity on LUCENE-831 (Complete overhaul of FieldCache
>>> API/Implementation) which I have interest in.
>>> I posted previously on this with my concern that given the current
>>> default
>>> cache I sometimes get OOM-errors because I have a lot of fields which are
>>> sorted on, which ultimately causes the fieldcache to grow greater then
>>> available RAM.
>>>
>>> ultimately I want to subclass the new pluggable Fieldcache of lucene-831
>>> to
>>> offload to disk (using ehcache or memcachedB or something) but havn't
>>> found
>>> the time yet.
>>>
>>> What I would like to know for now is if perhaps the newly implemented
>>> standard cache in LUCENE-831 uses another strategy of caching than the
>>> standard Fieldcache in Lucene.
>>>
>>> i.e: The normal cache consumes memory while generating a fieldcache for
>>> every document in lucene even though the document hasn't got that field
>>> set.
>>>
>>> Since my documents are very sparse in these fields I want to sort on it
>>> would differ a_lot when documents that don't have the field in question
>>> set
>>> don't add up in the used memory.
>>>
>>> So am I lucky? Or would I indeed have to cook up something myself?
>>> Thanks and best regards,
>>>
>>> Geert-Jan
>>>
>>>
>>>  
>>>      
>> I'm
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>>
>>    
>
>  


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: LUCENE-831 (complete cache overhaul) -> mem use

britske
Thanks Mark ,

this gets me moving. I will look into it somewhere soon.

Geert-Jan


markrmiller wrote
Like I said, its pretty easy to add this, but its also going to suck.
Kind of exposes the fact that its missing the right extensibility at the
moment. Things are still a bit ugly overall.


Your going to need new CacheKeys for the data types you want to support.
A CacheKey builds and provides access to the field data and is simply:


*public* *abstract* *class* CacheKey {

*public* *abstract* CacheData buildData(IndexReader r);

*public* *abstract* *boolean* equals(Object o);

*public* *abstract* *int* hashCode();

*public* *boolean* isMergable();

*public* CacheData mergeData(*int*[] starts, CacheData[] data) ;

*public* *boolean* usesObjectArray();


For a sparse storage implementation you would use an object array, so
have usesObjectArray return true and isMergable can then be false and
you dont have to support the mergeData method.


In buildData you will load your object array and return it. Here is an
array backed IntObjectArrayCacheKey build method:

*public* CacheData buildData(IndexReader reader) *throws* IOException {

   *final* *int*[] retArray = getIntArray(reader);

   ObjectArray fieldValues = *new* ObjectArray() {

     *public* Object get(*int* index) {

       *return* *new* Integer(retArray[index]);

     }

   };

   *return* *new* CacheData(fieldValues);

}


*protected* *int*[] getIntArray(IndexReader reader) *throws* IOException {

   *final* *int*[] retArray = *new* *int*[reader.maxDoc()];

   TermDocs termDocs = reader.termDocs();

   TermEnum termEnum = reader.terms(*new* Term(field, ""));

   *try* {

     *do* {

       Term term = termEnum.term();

       *if* (term == *null* || term.field() != field)
*        break*;
*    
      int* termval = parser.parseInt(term.text());

       termDocs.seek(termEnum);

       *while* (termDocs.next()) {
        retArray[termDocs.doc()] = termval;
      }

     } *while* (termEnum.next());

   } *finally* {

     termDocs.close();

     termEnum.close();

   }

   *return* retArray;

}


So it should be fairly straightforward to return a sparse implementation
backed object array from your new CacheKey (SparseIntObjectArrayCacheKey
or something).

Now some more ugliness: You can turn on the ObjectArray cachekeys by
setting the system property 'use.object.array.sort' to true. This will
cause FieldSortedHitQueue to return ScoreDocComparators that use the
standard ObjectArray CacheKeys, IntObjectArrayCacheKey,
FloatObjectArrayCacheKey, etc.The method that builds each comparator
type knows what type to build for and whether to use primitive arrays or
ObjectArrays ie (from FieldSortedHitQueue):


*static* ScoreDocComparator comparatorDoubleOA(*final* IndexReader
reader, *final* String fieldname)


does this (it has to provide the CacheKey and know the return type):


*final* ObjectArray fieldOrder = (ObjectArray)
reader.getCachedData(*new*
DoubleObjectArrayCacheKey(field)).getCachePayload();


So you have to either change all of the ObjectArray comparator builders
to use your CacheKeys:


*final* ObjectArray fieldOrder = (ObjectArray)
reader.getCachedData(*new*
SparseIntObjectArrayCacheKey(field)).getCachePayload();


Or you have to add more options in
FieldSortedHitQueue.CacheEntry.buildData(IndexReader reader) and more
static comparator builders in FieldSortedHitQueue that use the right
CacheKeys. Obviously not very extensibility friendly at the moment. I'm
sure with some thought, things could be much better. If you decided to
jump into any of this, let me know if you have any suggestions, feedback.


- Mark



Britske wrote:
> That ArrayObject suggestion makes sense to me. It amost seemed to be as if
> you were referring as this option (or at least the interfaces needed to
> implement this) were already available as 1 out of 2 options available in
> 831?
>
> Could you give me a hint at were I have to be looking to extend what you're
> suggesting?
> a new Cache, CacheFactory and Cachekey implementaiton for all types of
> cachekeys? This may sound a bit ignorant, but it would be my first time to
> get my head around the internals of an api instead of merely using it to
> imbed in a client application so any help is highly appreciated.  
>
> Thanks for your help,
>
> Geert-Jan
>
>
>
> markrmiller wrote:
>  
>> Its hard to predict the future of LUCENE-831. I would bet that it will
>> end up in Lucene at some point in one form or another, but its hard to
>> say if that form will be whats in the available patches (I'm a contrib
>> committer so I won't have any real say in that, so take that prediction
>> with a grain of salt). It has strong ties to other issues and a
>> committer hasn't really had their whack at it yet.
>>
>> Having said that though, LUCENE-831 allows for two types for dealing
>> with field values: either the old style int/string/long/etc arrays, or
>> for a small speed hit and faster reopens, an ArrayObject type that is
>> basically an Object that can provide access to one or two real or
>> virtual arrays. So technically you could use an ArrayObject that had a
>> sparse implementation behind it. Unfortunately, you would have to
>> implement new CachKeys to do this. Trivial to do, but reveals our
>> LUCENE-831 problem of exponential cachkey increases with every new
>> little option/idea and the juggling of which to use. I havn't thought
>> about it, but I'm hoping an API tweak can alleviate some of this.
>>
>> - Mark
>>
>> Britske wrote:
>>    
>>> Hi,
>>>
>>> I recently saw activity on LUCENE-831 (Complete overhaul of FieldCache
>>> API/Implementation) which I have interest in.
>>> I posted previously on this with my concern that given the current
>>> default
>>> cache I sometimes get OOM-errors because I have a lot of fields which are
>>> sorted on, which ultimately causes the fieldcache to grow greater then
>>> available RAM.
>>>
>>> ultimately I want to subclass the new pluggable Fieldcache of lucene-831
>>> to
>>> offload to disk (using ehcache or memcachedB or something) but havn't
>>> found
>>> the time yet.
>>>
>>> What I would like to know for now is if perhaps the newly implemented
>>> standard cache in LUCENE-831 uses another strategy of caching than the
>>> standard Fieldcache in Lucene.
>>>
>>> i.e: The normal cache consumes memory while generating a fieldcache for
>>> every document in lucene even though the document hasn't got that field
>>> set.
>>>
>>> Since my documents are very sparse in these fields I want to sort on it
>>> would differ a_lot when documents that don't have the field in question
>>> set
>>> don't add up in the used memory.
>>>
>>> So am I lucky? Or would I indeed have to cook up something myself?
>>> Thanks and best regards,
>>>
>>> Geert-Jan
>>>
>>>
>>>  
>>>      
>> I'm
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>    
>
>  


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org