Performance issue.

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Performance issue.

maustin
Sorry.. I put the wrong subject on my message. I also wanted to mention that
my cpu jumps to to almost 100% each query.

> I'm having slow performance with my solr index. I'm not sure what to do. I
> need some suggestions on what to try. I have updated all my records in the
> last couple of days. I'm not sure how much it degraded because of that,
> but it now takes about 3 seconds per search. My cache statistics don't
> look so good either.
>
> Also... I'm not sure I was supposed to do a couple of things.
>    - I did an optimize index through Luke with compound format and noticed
> in the solrconfig file that useCompoundFile is set to false.
>    - I changed one of the fields in the schema from text_ws to string
>    - I added a field (type="text" indexed="false" stored="true")
>
> My schema and solrconfig are the same as the example except I have a few
> more fields. My pc is winXP and has 2gig of ram. Below are some stats from
> the solr admin stat page.
>
> Thanks!
>
>
> caching : true
> numDocs : 1185814
> maxDoc : 2070472
> readerImpl : MultiReader
>
>      name:  filterCache
>      class:  org.apache.solr.search.LRUCache
>      version:  1.0
>      description:  LRU Cache(maxSize=512, initialSize=512,
> autowarmCount=256,
> regenerator=org.apache.solr.search.SolrIndexSearcher$1@d55986)
>      stats:  lookups : 658446
>      hits : 30
>      hitratio : 0.00
>      inserts : 658420
>      evictions : 657908
>      size : 512
>      cumulative_lookups : 658446
>      cumulative_hits : 30
>      cumulative_hitratio : 0.00
>      cumulative_inserts : 658420
>      cumulative_evictions : 657908
>
>
>      name:  queryResultCache
>      class:  org.apache.solr.search.LRUCache
>      version:  1.0
>      description:  LRU Cache(maxSize=512, initialSize=512,
> autowarmCount=256,
> regenerator=org.apache.solr.search.SolrIndexSearcher$2@1b4c1d7)
>      stats:  lookups : 88
>      hits : 83
>      hitratio : 0.94
>      inserts : 6
>      evictions : 0
>      size : 5
>      cumulative_lookups : 88
>      cumulative_hits : 83
>      cumulative_hitratio : 0.94
>      cumulative_inserts : 6
>      cumulative_evictions : 0
>
>
>      name:  documentCache
>      class:  org.apache.solr.search.LRUCache
>      version:  1.0
>      description:  LRU Cache(maxSize=512, initialSize=512)
>      stats:  lookups : 780
>      hits : 738
>      hitratio : 0.94
>      inserts : 42
>      evictions : 0
>      size : 42
>      cumulative_lookups : 780
>      cumulative_hits : 738
>      cumulative_hitratio : 0.94
>      cumulative_inserts : 42
>      cumulative_evictions : 0
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Performance issue.

Yonik Seeley-2
On 12/5/06, Gmail Account <[hidden email]> wrote:
> Sorry.. I put the wrong subject on my message. I also wanted to mention that
> my cpu jumps to to almost 100% each query.

There's nothing wrong with CPU jumping to 100% each query, that just
means you aren't IO bound :-)

It's the 3 seconds that are the problem.

> > I'm having slow performance with my solr index. I'm not sure what to do. I
> > need some suggestions on what to try. I have updated all my records in the
> > last couple of days. I'm not sure how much it degraded because of that,
> > but it now takes about 3 seconds per search.

> >    - I did an optimize index through Luke with compound format and noticed
> > in the solrconfig file that useCompoundFile is set to false.

Don't do this unless you really know what you are doing... Luke is
probably using a different version of Lucene than Solr, and it could
be dangerous.

> > numDocs : 1185814
> > maxDoc : 2070472

Your index isn't optmized... you can see that by the difference
between maxDoc and numDocs (which represent deleted documents still in
the index).  This will slow things down in several ways.
 - Since you only have 2GB of RAM on your PC, the larger index files
means less effective OS level caching
 - if you are using filters, any larger than 3000 will be double the
size (maxDoc bits)
 - lowest level lucene search functions iterate over all documents for
a given term, even if they are deleted (they are filtered out by
consulting a bitvector of deleted docs)

Do an optimize of the index via Solr (use your "commit" command as a
template, but substitute "optimize").


> My cache statistics don't
> > look so good either.

Ouch... the filterCache doesn't look good.
Are you using faceted browsing?  It looks like the number of terms in
the field you are faceting by is larger than the number of items the
filterCache can hold... which means a 0% cache hit rate w/ a LRU
cache.

Can you give some examples of what your queries look like?

Depending on the size of the corpus and what faceted queries you want
to run, you may need more than 2GB of ram for faster queries.  If you
always keep the index optimized, this will lower the size of some of
the filters and allow you to increase the number of cached filters at
least.

> >    - I changed one of the fields in the schema from text_ws to string

OK, as long as you re-indexed.

> >    - I added a field (type="text" indexed="false" stored="true")

OK

-Yonik

> > My schema and solrconfig are the same as the example except I have a few
> > more fields. My pc is winXP and has 2gig of ram. Below are some stats from
> > the solr admin stat page.
> >
> > Thanks!
> >
> >
> > caching : true
> > numDocs : 1185814
> > maxDoc : 2070472
> > readerImpl : MultiReader
> >
> >      name:  filterCache
> >      class:  org.apache.solr.search.LRUCache
> >      version:  1.0
> >      description:  LRU Cache(maxSize=512, initialSize=512,
> > autowarmCount=256,
> > regenerator=org.apache.solr.search.SolrIndexSearcher$1@d55986)
> >      stats:  lookups : 658446
> >      hits : 30
> >      hitratio : 0.00
> >      inserts : 658420
> >      evictions : 657908
> >      size : 512
> >      cumulative_lookups : 658446
> >      cumulative_hits : 30
> >      cumulative_hitratio : 0.00
> >      cumulative_inserts : 658420
> >      cumulative_evictions : 657908
> >
> >
> >      name:  queryResultCache
> >      class:  org.apache.solr.search.LRUCache
> >      version:  1.0
> >      description:  LRU Cache(maxSize=512, initialSize=512,
> > autowarmCount=256,
> > regenerator=org.apache.solr.search.SolrIndexSearcher$2@1b4c1d7)
> >      stats:  lookups : 88
> >      hits : 83
> >      hitratio : 0.94
> >      inserts : 6
> >      evictions : 0
> >      size : 5
> >      cumulative_lookups : 88
> >      cumulative_hits : 83
> >      cumulative_hitratio : 0.94
> >      cumulative_inserts : 6
> >      cumulative_evictions : 0
> >
> >
> >      name:  documentCache
> >      class:  org.apache.solr.search.LRUCache
> >      version:  1.0
> >      description:  LRU Cache(maxSize=512, initialSize=512)
> >      stats:  lookups : 780
> >      hits : 738
> >      hitratio : 0.94
> >      inserts : 42
> >      evictions : 0
> >      size : 42
> >      cumulative_lookups : 780
> >      cumulative_hits : 738
> >      cumulative_hitratio : 0.94
> >      cumulative_inserts : 42
> >      cumulative_evictions : 0
Reply | Threaded
Open this post in threaded view
|

Re: Performance issue.

maustin

> There's nothing wrong with CPU jumping to 100% each query, that just
> means you aren't IO bound :-)
What do you mean not IO bound?

>> >    - I did an optimize index through Luke with compound format and
>> > noticed
>> > in the solrconfig file that useCompoundFile is set to false.
>
> Don't do this unless you really know what you are doing... Luke is
> probably using a different version of Lucene than Solr, and it could
> be dangerous.
Do you think I should reindex everything?


> - if you are using filters, any larger than 3000 will be double the
> size (maxDoc bits)
What do you mean larger than 3000? 3000 what and how do I tell?

> Can you give some examples of what your queries look like?
I will get this and send it.

Thanks,
Yonik

Reply | Threaded
Open this post in threaded view
|

Re: Performance issue.

Yonik Seeley-2
On 12/5/06, Gmail Account <[hidden email]> wrote:
> > There's nothing wrong with CPU jumping to 100% each query, that just
> > means you aren't IO bound :-)
> What do you mean not IO bound?

There is always going to be a bottleneck somewhere.  In very large
indicies, the bottleneck may be waiting for IO (waiting for data to be
read from the disk).  If you are on a single processor system and you
aren't waiting for data to be read from the disk or the network, then
the request will be using close to 100% CPU, which is actually a good
thing.

The bad thing is how long the query takes, not the fact that it's CPU bound.

> >> >    - I did an optimize index through Luke with compound format and
> >> > noticed
> >> > in the solrconfig file that useCompoundFile is set to false.
> >
> > Don't do this unless you really know what you are doing... Luke is
> > probably using a different version of Lucene than Solr, and it could
> > be dangerous.
> Do you think I should reindex everything?

That would be the safest thing to do.

> > - if you are using filters, any larger than 3000 will be double the
> > size (maxDoc bits)
> What do you mean larger than 3000? 3000 what and how do I tell?

From solrconfig.xml:
    <!-- This entry enables an int hash representation for filters (DocSets)
         when the number of items in the set is less than maxSize.  For smaller
         sets, this representation is more memory efficient, more efficient to
         iterate over, and faster to take intersections.  -->
    <HashDocSet maxSize="3000" loadFactor="0.75"/>

The key is that the memory consumed by a HashDocSet is independent of
maxDoc (the maximum internal lucene docid), but a BitSet based set has
maxDoc bits in it.  Thus, an unoptimized index with more deleted
documents causes a higher maxDoc and higher memory usage for any
BitSet based filters.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Performance issue.

maustin
I reindexed and optimized and it helped. However now each query averages
about 1 second(down from 3-4 seconds). The bottleneck now is the
getFacetTermEnumCounts function. If I take that call out it is a non
measurable query time and the filtercache is being used. With the
getFacetTermEnumCounts in, the filter cache after three queries is below
with the hitration at 0 and everything is being evicted. This call is for
the brand/manufacturer so I'm sure it is going through many thousands of
queries. I'm thinking about pre-processing the brand/manu to get a small set
of top brands per category and just quering them no matter what the other
facets are set to.(with certain filters, no brands will be shown)  If I
still want to call the getFacetTermEnumCounts for ALL brands, why is it not
using the cache?


lookups : 32849
hits : 0
hitratio : 0.00
inserts : 32850
evictions : 32338
size : 512
cumulative_lookups : 32849
cumulative_hits : 0
cumulative_hitratio : 0.00
cumulative_inserts : 32850
cumulative_evictions : 32338


Thanks,
Mike
----- Original Message -----
From: "Yonik Seeley" <[hidden email]>
To: <[hidden email]>
Sent: Tuesday, December 05, 2006 8:46 PM
Subject: Re: Performance issue.


> On 12/5/06, Gmail Account <[hidden email]> wrote:
>> > There's nothing wrong with CPU jumping to 100% each query, that just
>> > means you aren't IO bound :-)
>> What do you mean not IO bound?
>
> There is always going to be a bottleneck somewhere.  In very large
> indicies, the bottleneck may be waiting for IO (waiting for data to be
> read from the disk).  If you are on a single processor system and you
> aren't waiting for data to be read from the disk or the network, then
> the request will be using close to 100% CPU, which is actually a good
> thing.
>
> The bad thing is how long the query takes, not the fact that it's CPU
> bound.
>
>> >> >    - I did an optimize index through Luke with compound format and
>> >> > noticed
>> >> > in the solrconfig file that useCompoundFile is set to false.
>> >
>> > Don't do this unless you really know what you are doing... Luke is
>> > probably using a different version of Lucene than Solr, and it could
>> > be dangerous.
>> Do you think I should reindex everything?
>
> That would be the safest thing to do.
>
>> > - if you are using filters, any larger than 3000 will be double the
>> > size (maxDoc bits)
>> What do you mean larger than 3000? 3000 what and how do I tell?
>
> From solrconfig.xml:
>    <!-- This entry enables an int hash representation for filters
> (DocSets)
>         when the number of items in the set is less than maxSize.  For
> smaller
>         sets, this representation is more memory efficient, more efficient
> to
>         iterate over, and faster to take intersections.  -->
>    <HashDocSet maxSize="3000" loadFactor="0.75"/>
>
> The key is that the memory consumed by a HashDocSet is independent of
> maxDoc (the maximum internal lucene docid), but a BitSet based set has
> maxDoc bits in it.  Thus, an unoptimized index with more deleted
> documents causes a higher maxDoc and higher memory usage for any
> BitSet based filters.
>
> -Yonik

Reply | Threaded
Open this post in threaded view
|

Re: Performance issue.

Yonik Seeley-2
It is using the cache, but the number of items is larger than the size
of the cache.

If you want to continue to use the filter method then you need to
increase the size of the filter cache to something larger than the
number of unique values than what you are filtering on.  I don't know
if you will have enough memory to take this approach or not.

The second option is to make brand/manu a non-multi-valued string
type.  When you do that, Solr will use a different method to calculate
the facet counts (it will use the FieldCache rather than filters).
You would need to reindex to try this approach.

-Yonik

On 12/6/06, Gmail Account <[hidden email]> wrote:

> I reindexed and optimized and it helped. However now each query averages
> about 1 second(down from 3-4 seconds). The bottleneck now is the
> getFacetTermEnumCounts function. If I take that call out it is a non
> measurable query time and the filtercache is being used. With the
> getFacetTermEnumCounts in, the filter cache after three queries is below
> with the hitration at 0 and everything is being evicted. This call is for
> the brand/manufacturer so I'm sure it is going through many thousands of
> queries. I'm thinking about pre-processing the brand/manu to get a small set
> of top brands per category and just quering them no matter what the other
> facets are set to.(with certain filters, no brands will be shown)  If I
> still want to call the getFacetTermEnumCounts for ALL brands, why is it not
> using the cache?
>
>
> lookups : 32849
> hits : 0
> hitratio : 0.00
> inserts : 32850
> evictions : 32338
> size : 512
> cumulative_lookups : 32849
> cumulative_hits : 0
> cumulative_hitratio : 0.00
> cumulative_inserts : 32850
> cumulative_evictions : 32338
>
>
> Thanks,
> Mike
> ----- Original Message -----
> From: "Yonik Seeley" <[hidden email]>
> To: <[hidden email]>
> Sent: Tuesday, December 05, 2006 8:46 PM
> Subject: Re: Performance issue.
>
>
> > On 12/5/06, Gmail Account <[hidden email]> wrote:
> >> > There's nothing wrong with CPU jumping to 100% each query, that just
> >> > means you aren't IO bound :-)
> >> What do you mean not IO bound?
> >
> > There is always going to be a bottleneck somewhere.  In very large
> > indicies, the bottleneck may be waiting for IO (waiting for data to be
> > read from the disk).  If you are on a single processor system and you
> > aren't waiting for data to be read from the disk or the network, then
> > the request will be using close to 100% CPU, which is actually a good
> > thing.
> >
> > The bad thing is how long the query takes, not the fact that it's CPU
> > bound.
> >
> >> >> >    - I did an optimize index through Luke with compound format and
> >> >> > noticed
> >> >> > in the solrconfig file that useCompoundFile is set to false.
> >> >
> >> > Don't do this unless you really know what you are doing... Luke is
> >> > probably using a different version of Lucene than Solr, and it could
> >> > be dangerous.
> >> Do you think I should reindex everything?
> >
> > That would be the safest thing to do.
> >
> >> > - if you are using filters, any larger than 3000 will be double the
> >> > size (maxDoc bits)
> >> What do you mean larger than 3000? 3000 what and how do I tell?
> >
> > From solrconfig.xml:
> >    <!-- This entry enables an int hash representation for filters
> > (DocSets)
> >         when the number of items in the set is less than maxSize.  For
> > smaller
> >         sets, this representation is more memory efficient, more efficient
> > to
> >         iterate over, and faster to take intersections.  -->
> >    <HashDocSet maxSize="3000" loadFactor="0.75"/>
> >
> > The key is that the memory consumed by a HashDocSet is independent of
> > maxDoc (the maximum internal lucene docid), but a BitSet based set has
> > maxDoc bits in it.  Thus, an unoptimized index with more deleted
> > documents causes a higher maxDoc and higher memory usage for any
> > BitSet based filters.
> >
> > -Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Performance issue.

maustin
It is currently a string type. Here is everything that has to do with manu
in my schema... Should it have been multi-valued? Do you see anything wrong
with this?

<field name="manu" type="text" indexed="true" stored="true"/>
<!-- copied from "manu" via copyField -->
<field name="manu_exact" type="string" indexed="true" stored="true"/>
<field name="text" type="text" indexed="true" stored="false"
multiValued="true"/>
.....

<copyField source="manu" dest="text"/>
<copyField source="manu" dest="manu_exact"/>

Thanks...

----- Original Message -----
From: "Yonik Seeley" <[hidden email]>
To: <[hidden email]>
Sent: Wednesday, December 06, 2006 9:55 PM
Subject: Re: Performance issue.


> It is using the cache, but the number of items is larger than the size
> of the cache.
>
> If you want to continue to use the filter method then you need to
> increase the size of the filter cache to something larger than the
> number of unique values than what you are filtering on.  I don't know
> if you will have enough memory to take this approach or not.
>
> The second option is to make brand/manu a non-multi-valued string
> type.  When you do that, Solr will use a different method to calculate
> the facet counts (it will use the FieldCache rather than filters).
> You would need to reindex to try this approach.
>
> -Yonik
>
> On 12/6/06, Gmail Account <[hidden email]> wrote:
>> I reindexed and optimized and it helped. However now each query averages
>> about 1 second(down from 3-4 seconds). The bottleneck now is the
>> getFacetTermEnumCounts function. If I take that call out it is a non
>> measurable query time and the filtercache is being used. With the
>> getFacetTermEnumCounts in, the filter cache after three queries is below
>> with the hitration at 0 and everything is being evicted. This call is for
>> the brand/manufacturer so I'm sure it is going through many thousands of
>> queries. I'm thinking about pre-processing the brand/manu to get a small
>> set
>> of top brands per category and just quering them no matter what the other
>> facets are set to.(with certain filters, no brands will be shown)  If I
>> still want to call the getFacetTermEnumCounts for ALL brands, why is it
>> not
>> using the cache?
>>
>>
>> lookups : 32849
>> hits : 0
>> hitratio : 0.00
>> inserts : 32850
>> evictions : 32338
>> size : 512
>> cumulative_lookups : 32849
>> cumulative_hits : 0
>> cumulative_hitratio : 0.00
>> cumulative_inserts : 32850
>> cumulative_evictions : 32338
>>
>>
>> Thanks,
>> Mike
>> ----- Original Message -----
>> From: "Yonik Seeley" <[hidden email]>
>> To: <[hidden email]>
>> Sent: Tuesday, December 05, 2006 8:46 PM
>> Subject: Re: Performance issue.
>>
>>
>> > On 12/5/06, Gmail Account <[hidden email]> wrote:
>> >> > There's nothing wrong with CPU jumping to 100% each query, that just
>> >> > means you aren't IO bound :-)
>> >> What do you mean not IO bound?
>> >
>> > There is always going to be a bottleneck somewhere.  In very large
>> > indicies, the bottleneck may be waiting for IO (waiting for data to be
>> > read from the disk).  If you are on a single processor system and you
>> > aren't waiting for data to be read from the disk or the network, then
>> > the request will be using close to 100% CPU, which is actually a good
>> > thing.
>> >
>> > The bad thing is how long the query takes, not the fact that it's CPU
>> > bound.
>> >
>> >> >> >    - I did an optimize index through Luke with compound format
>> >> >> > and
>> >> >> > noticed
>> >> >> > in the solrconfig file that useCompoundFile is set to false.
>> >> >
>> >> > Don't do this unless you really know what you are doing... Luke is
>> >> > probably using a different version of Lucene than Solr, and it could
>> >> > be dangerous.
>> >> Do you think I should reindex everything?
>> >
>> > That would be the safest thing to do.
>> >
>> >> > - if you are using filters, any larger than 3000 will be double the
>> >> > size (maxDoc bits)
>> >> What do you mean larger than 3000? 3000 what and how do I tell?
>> >
>> > From solrconfig.xml:
>> >    <!-- This entry enables an int hash representation for filters
>> > (DocSets)
>> >         when the number of items in the set is less than maxSize.  For
>> > smaller
>> >         sets, this representation is more memory efficient, more
>> > efficient
>> > to
>> >         iterate over, and faster to take intersections.  -->
>> >    <HashDocSet maxSize="3000" loadFactor="0.75"/>
>> >
>> > The key is that the memory consumed by a HashDocSet is independent of
>> > maxDoc (the maximum internal lucene docid), but a BitSet based set has
>> > maxDoc bits in it.  Thus, an unoptimized index with more deleted
>> > documents causes a higher maxDoc and higher memory usage for any
>> > BitSet based filters.
>> >
>> > -Yonik

Reply | Threaded
Open this post in threaded view
|

Re: Performance issue.

Yonik Seeley-2
Your snippet shows it as "text" not "string"
Try faceting on manu_exact and you may get better results.

-Yonil

On 12/6/06, Gmail Account <[hidden email]> wrote:

> It is currently a string type. Here is everything that has to do with manu
> in my schema... Should it have been multi-valued? Do you see anything wrong
> with this?
>
> <field name="manu" type="text" indexed="true" stored="true"/>
> <!-- copied from "manu" via copyField -->
> <field name="manu_exact" type="string" indexed="true" stored="true"/>
> <field name="text" type="text" indexed="true" stored="false"
> multiValued="true"/>
> .....
>
> <copyField source="manu" dest="text"/>
> <copyField source="manu" dest="manu_exact"/>
>
> Thanks...
>
> ----- Original Message -----
> From: "Yonik Seeley" <[hidden email]>
> To: <[hidden email]>
> Sent: Wednesday, December 06, 2006 9:55 PM
> Subject: Re: Performance issue.
>
>
> > It is using the cache, but the number of items is larger than the size
> > of the cache.
> >
> > If you want to continue to use the filter method then you need to
> > increase the size of the filter cache to something larger than the
> > number of unique values than what you are filtering on.  I don't know
> > if you will have enough memory to take this approach or not.
> >
> > The second option is to make brand/manu a non-multi-valued string
> > type.  When you do that, Solr will use a different method to calculate
> > the facet counts (it will use the FieldCache rather than filters).
> > You would need to reindex to try this approach.
> >
> > -Yonik
> >
> > On 12/6/06, Gmail Account <[hidden email]> wrote:
> >> I reindexed and optimized and it helped. However now each query averages
> >> about 1 second(down from 3-4 seconds). The bottleneck now is the
> >> getFacetTermEnumCounts function. If I take that call out it is a non
> >> measurable query time and the filtercache is being used. With the
> >> getFacetTermEnumCounts in, the filter cache after three queries is below
> >> with the hitration at 0 and everything is being evicted. This call is for
> >> the brand/manufacturer so I'm sure it is going through many thousands of
> >> queries. I'm thinking about pre-processing the brand/manu to get a small
> >> set
> >> of top brands per category and just quering them no matter what the other
> >> facets are set to.(with certain filters, no brands will be shown)  If I
> >> still want to call the getFacetTermEnumCounts for ALL brands, why is it
> >> not
> >> using the cache?
> >>
> >>
> >> lookups : 32849
> >> hits : 0
> >> hitratio : 0.00
> >> inserts : 32850
> >> evictions : 32338
> >> size : 512
> >> cumulative_lookups : 32849
> >> cumulative_hits : 0
> >> cumulative_hitratio : 0.00
> >> cumulative_inserts : 32850
> >> cumulative_evictions : 32338
> >>
> >>
> >> Thanks,
> >> Mike
> >> ----- Original Message -----
> >> From: "Yonik Seeley" <[hidden email]>
> >> To: <[hidden email]>
> >> Sent: Tuesday, December 05, 2006 8:46 PM
> >> Subject: Re: Performance issue.
> >>
> >>
> >> > On 12/5/06, Gmail Account <[hidden email]> wrote:
> >> >> > There's nothing wrong with CPU jumping to 100% each query, that just
> >> >> > means you aren't IO bound :-)
> >> >> What do you mean not IO bound?
> >> >
> >> > There is always going to be a bottleneck somewhere.  In very large
> >> > indicies, the bottleneck may be waiting for IO (waiting for data to be
> >> > read from the disk).  If you are on a single processor system and you
> >> > aren't waiting for data to be read from the disk or the network, then
> >> > the request will be using close to 100% CPU, which is actually a good
> >> > thing.
> >> >
> >> > The bad thing is how long the query takes, not the fact that it's CPU
> >> > bound.
> >> >
> >> >> >> >    - I did an optimize index through Luke with compound format
> >> >> >> > and
> >> >> >> > noticed
> >> >> >> > in the solrconfig file that useCompoundFile is set to false.
> >> >> >
> >> >> > Don't do this unless you really know what you are doing... Luke is
> >> >> > probably using a different version of Lucene than Solr, and it could
> >> >> > be dangerous.
> >> >> Do you think I should reindex everything?
> >> >
> >> > That would be the safest thing to do.
> >> >
> >> >> > - if you are using filters, any larger than 3000 will be double the
> >> >> > size (maxDoc bits)
> >> >> What do you mean larger than 3000? 3000 what and how do I tell?
> >> >
> >> > From solrconfig.xml:
> >> >    <!-- This entry enables an int hash representation for filters
> >> > (DocSets)
> >> >         when the number of items in the set is less than maxSize.  For
> >> > smaller
> >> >         sets, this representation is more memory efficient, more
> >> > efficient
> >> > to
> >> >         iterate over, and faster to take intersections.  -->
> >> >    <HashDocSet maxSize="3000" loadFactor="0.75"/>
> >> >
> >> > The key is that the memory consumed by a HashDocSet is independent of
> >> > maxDoc (the maximum internal lucene docid), but a BitSet based set has
> >> > maxDoc bits in it.  Thus, an unoptimized index with more deleted
> >> > documents causes a higher maxDoc and higher memory usage for any
> >> > BitSet based filters.
> >> >
> >> > -Yonik