facet.method enum vs fc

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

facet.method enum vs fc

Mingfeng Yang
I am doing faceting on an index of 120M documents, on the field of url,
using the following two queries.  Note that the only difference of the two
queries is that first one uses default facet.method, and the second one
uses face.method=enum.   ( each document in the index contains a review we
extracted from internet with multiple fields, and url field stands for the
link to the original web pages.  The matching document size is like 5.3
million. )

http://autos-solr-api.wisewindow.com:8995/solr/select?q=*:*&indent=on&version=2.2&fq=language:english&start=0&rows=1&facet.mincount=1&facet=true&wt=json&fq=search_source:%22Video%22&sort=date%20desc&fl=topic&facet.limit=25&facet.field=url&facet.offset=0

http://autos-solr-api.wisewindow.com:8995/solr/select?q=*:*&indent=on&version=2.2&fq=language:english&start=0&rows=1&facet.mincount=1&facet=true&wt=json&fq=search_source:%22Video%22&sort=date%20desc&fl=topic&facet.limit=25&facet.field=url&facet.offset=0&facet.method=enum

The first method gives me outofmemory error( ERROR 500: Java heap space
 java.lang.OutOfMemoryError: Java heap space), but the second one runs fine
though very slow (163 seconds)

According to the wiki and solr documentation, the default facet.method=fc
uses less memory than facet.method=enum, isn't it?

Thanks,
Ming
Reply | Threaded
Open this post in threaded view
|

Re: facet.method enum vs fc

Timothy Potter
What are your results when using facet.method=fcs?


On Wed, Apr 17, 2013 at 12:06 PM, Mingfeng Yang <[hidden email]>wrote:

> I am doing faceting on an index of 120M documents, on the field of url,
> using the following two queries.  Note that the only difference of the two
> queries is that first one uses default facet.method, and the second one
> uses face.method=enum.   ( each document in the index contains a review we
> extracted from internet with multiple fields, and url field stands for the
> link to the original web pages.  The matching document size is like 5.3
> million. )
>
>
> http://autos-solr-api.wisewindow.com:8995/solr/select?q=*:*&indent=on&version=2.2&fq=language:english&start=0&rows=1&facet.mincount=1&facet=true&wt=json&fq=search_source:%22Video%22&sort=date%20desc&fl=topic&facet.limit=25&facet.field=url&facet.offset=0
>
>
> http://autos-solr-api.wisewindow.com:8995/solr/select?q=*:*&indent=on&version=2.2&fq=language:english&start=0&rows=1&facet.mincount=1&facet=true&wt=json&fq=search_source:%22Video%22&sort=date%20desc&fl=topic&facet.limit=25&facet.field=url&facet.offset=0&facet.method=enum
>
> The first method gives me outofmemory error( ERROR 500: Java heap space
>  java.lang.OutOfMemoryError: Java heap space), but the second one runs fine
> though very slow (163 seconds)
>
> According to the wiki and solr documentation, the default facet.method=fc
> uses less memory than facet.method=enum, isn't it?
>
> Thanks,
> Ming
>
Reply | Threaded
Open this post in threaded view
|

Re: facet.method enum vs fc

Mingfeng Yang
Does Solr 3.6 has facet.method=fcs?   I tried anyway, and got

ERROR 500: GC overhead limit exceeded  java.lang.OutOfMemoryError: GC
overhead limit exceeded.


On Wed, Apr 17, 2013 at 12:38 PM, Timothy Potter <[hidden email]>wrote:

> What are your results when using facet.method=fcs?
>
>
> On Wed, Apr 17, 2013 at 12:06 PM, Mingfeng Yang <[hidden email]
> >wrote:
>
> > I am doing faceting on an index of 120M documents, on the field of url,
> > using the following two queries.  Note that the only difference of the
> two
> > queries is that first one uses default facet.method, and the second one
> > uses face.method=enum.   ( each document in the index contains a review
> we
> > extracted from internet with multiple fields, and url field stands for
> the
> > link to the original web pages.  The matching document size is like 5.3
> > million. )
> >
> >
> >
> http://autos-solr-api.wisewindow.com:8995/solr/select?q=*:*&indent=on&version=2.2&fq=language:english&start=0&rows=1&facet.mincount=1&facet=true&wt=json&fq=search_source:%22Video%22&sort=date%20desc&fl=topic&facet.limit=25&facet.field=url&facet.offset=0
> >
> >
> >
> http://autos-solr-api.wisewindow.com:8995/solr/select?q=*:*&indent=on&version=2.2&fq=language:english&start=0&rows=1&facet.mincount=1&facet=true&wt=json&fq=search_source:%22Video%22&sort=date%20desc&fl=topic&facet.limit=25&facet.field=url&facet.offset=0&facet.method=enum
> >
> > The first method gives me outofmemory error( ERROR 500: Java heap space
> >  java.lang.OutOfMemoryError: Java heap space), but the second one runs
> fine
> > though very slow (163 seconds)
> >
> > According to the wiki and solr documentation, the default facet.method=fc
> > uses less memory than facet.method=enum, isn't it?
> >
> > Thanks,
> > Ming
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: facet.method enum vs fc

Toke Eskildsen
In reply to this post by Mingfeng Yang
On Wed, 2013-04-17 at 20:06 +0200, Mingfeng Yang wrote:
> I am doing faceting on an index of 120M documents,
> on the field of url[...]

I would guess that you would need 3-4GB for that.
How much memory do you allocate to Solr?

- Toke Eskildsen

Reply | Threaded
Open this post in threaded view
|

Re: facet.method enum vs fc

Mingfeng Yang
20G is allocated to Solr already.

Ming


On Wed, Apr 17, 2013 at 11:56 PM, Toke Eskildsen <[hidden email]>wrote:

> On Wed, 2013-04-17 at 20:06 +0200, Mingfeng Yang wrote:
> > I am doing faceting on an index of 120M documents,
> > on the field of url[...]
>
> I would guess that you would need 3-4GB for that.
> How much memory do you allocate to Solr?
>
> - Toke Eskildsen
>
>
Reply | Threaded
Open this post in threaded view
|

Re: facet.method enum vs fc

Joel Bernstein
Faceting on a high cardinality string field, like url, on a 120 million
record index is going to be very memory intensive.

You will very likely need to shard the index to get the performance that
you need.

In Solr 4.2, you can make the url field a Disk based DocValue and shift the
memory from Solr to the file system cache. But to run efficiently this is
still going to take a lot of memory in the OS file cache.




On Thu, Apr 18, 2013 at 12:00 PM, Mingfeng Yang <[hidden email]>wrote:

> 20G is allocated to Solr already.
>
> Ming
>
>
> On Wed, Apr 17, 2013 at 11:56 PM, Toke Eskildsen <[hidden email]
> >wrote:
>
> > On Wed, 2013-04-17 at 20:06 +0200, Mingfeng Yang wrote:
> > > I am doing faceting on an index of 120M documents,
> > > on the field of url[...]
> >
> > I would guess that you would need 3-4GB for that.
> > How much memory do you allocate to Solr?
> >
> > - Toke Eskildsen
> >
> >
>



--
Joel Bernstein
Professional Services LucidWorks
Reply | Threaded
Open this post in threaded view
|

Re: facet.method enum vs fc

Mingfeng Yang
Joel,

Thanks for your kind reply.   The problem is solved with sharding and using
facet.method=enum.  I am curious about  what's the different between enum
and fc, so that enum works but fc does not.   Do you know something about
this?

Thank you!

Regards,
Ming


On Fri, Apr 19, 2013 at 6:18 AM, Joel Bernstein <[hidden email]> wrote:

> Faceting on a high cardinality string field, like url, on a 120 million
> record index is going to be very memory intensive.
>
> You will very likely need to shard the index to get the performance that
> you need.
>
> In Solr 4.2, you can make the url field a Disk based DocValue and shift the
> memory from Solr to the file system cache. But to run efficiently this is
> still going to take a lot of memory in the OS file cache.
>
>
>
>
> On Thu, Apr 18, 2013 at 12:00 PM, Mingfeng Yang <[hidden email]
> >wrote:
>
> > 20G is allocated to Solr already.
> >
> > Ming
> >
> >
> > On Wed, Apr 17, 2013 at 11:56 PM, Toke Eskildsen <[hidden email]
> > >wrote:
> >
> > > On Wed, 2013-04-17 at 20:06 +0200, Mingfeng Yang wrote:
> > > > I am doing faceting on an index of 120M documents,
> > > > on the field of url[...]
> > >
> > > I would guess that you would need 3-4GB for that.
> > > How much memory do you allocate to Solr?
> > >
> > > - Toke Eskildsen
> > >
> > >
> >
>
>
>
> --
> Joel Bernstein
> Professional Services LucidWorks
>
Reply | Threaded
Open this post in threaded view
|

Re: facet.method enum vs fc

Chris Hostetter-3

: Thanks for your kind reply.   The problem is solved with sharding and using
: facet.method=enum.  I am curious about  what's the different between enum
: and fc, so that enum works but fc does not.   Do you know something about
: this?

method=fc/fcs uses the field caches (or uninverted fields if they are
multivalued) to build a large data structure that is reusable across
many requests and allows faceting happen very quickly even when the
number of terms is large.

enum causes solr to walk the term enum for the field and generate a DocSet
for each term which is then intersected with the main results -- basically
doing "facet.field" just like "facet.query" iwth simple term queries.

these DocSets from using facet.method=enum will be cached in the
filterCache, so there is some performance savings there if/when people
filter on these facet constraints, but the regular rules about cache
evicitions apply.

So in a situation where the heap size is "big enough not to matter"
method=fc should be faster and take up less ram then if you size your
filterCache big enough to hold all of the DocSets involved if you use
method=enum to not have cache evictions.  

In most cases, the only motivation for using method=enum is if you know
the cardinality of your set of constraints is relatively small and fixed
(ie: there are only 50 states in the US, so you might find that faceting
on a "state" field with method=enum is just as fast as using method=fc and
takes less ram -- this is why boolean fields default to method=enum, the
cardinality is garunteed to be "2").  But in some less common cases, you
might care more about saving ram then speed, or you might be trying to
facet on huge index with fields containing lots of terms (ie: full text)
so that method=fc just wont work with any concievable amount of ram, so it
could make sense to use method=enum with filterCache disabled.


-Hoss