All facet.fields for a given facet.query?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

All facet.fields for a given facet.query?

floehopper
Thanks for a great project.

Is it possible to request all facet.fields for a given facet.query instead
of having to request specific facet.fields? e.g. is there a wildcard for
facet.fields?
--
James.
http://blog.floehopper.org
Reply | Threaded
Open this post in threaded view
|

Re: All facet.fields for a given facet.query?

Yonik Seeley-2
On 6/18/07, James Mead <[hidden email]> wrote:
> Is it possible to request all facet.fields for a given facet.query instead
> of having to request specific facet.fields? e.g. is there a wildcard for
> facet.fields?

Not currently.
Can you elaborate on the problem you are trying to solve?  Are you
using dynamic fields and hence don't know the exact names of the
fields to facet on?

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: All facet.fields for a given facet.query?

Thomas Traeger-2
Hi,

I'm also just at that point where I think I need a wildcard facet.field
parameter (or someone points out another solution for my problem...).
Here is my situation:

I have many products of different types with totally different
attributes. There are currently more than 300 attributes....
I use dynamic fields to import the attributes into solr without having
to define a specific field for each attribute. Now when I make a query I
would like to get back all facet.fields that are relevant for that query.

I think it would be really nice, if I don't have to know which facets
fields are there at query time, instead just import attributes into
dynamic fields, get the relevant facets back and decide in the frontend
which to display and how...

What do the experts think about this?

Tom
Reply | Threaded
Open this post in threaded view
|

Re: All facet.fields for a given facet.query?

Chris Hostetter-3
: I have many products of different types with totally different
: attributes. There are currently more than 300 attributes....
: I use dynamic fields to import the attributes into solr without having
: to define a specific field for each attribute. Now when I make a query I
: would like to get back all facet.fields that are relevant for that query.
:
: I think it would be really nice, if I don't have to know which facets
: fields are there at query time, instead just import attributes into

The problem is there may be lots of fields you index but don't want to
facet on (full text search fields) and Solr has no easy way of knowing the
difference between those and the fields you think it makes sense to facet
on ... even if a field does make sense to facet on some of the time, that
doesn't mean it makes sense all of the time (as you say "when I make a
query I would like to get back all facet.fields that are relevant for that
query" ... Solr has no way of knowing which fields make sense for that
query unless it tries them all (can be very expensive) or you tell it.

I solve this problem by having metadata stored in my index which tells
my custom request handler what fields to facet on for each category ...
but i've also got several thousand categories.  If you've got less then
100 categories, you could easily enumerate them all with default
facet.field params in your solrconfig using seperate requesthandler
instances.

: What do the experts think about this?

you may want to read up on the past discussion of this in SOLR-247 ... in
particular note the link to the mail archive where there was assitional
discussion about it as well.  Where we left things is that it
might make sense to support true globging in both fl and facet.field, so
you can use naming conventions and say things like facet.field=facet_*
but that in general trying to do something like facet.field=* would be a
very bad idea even if it was supported.

http://issues.apache.org/jira/browse/SOLR-247


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: All facet.fields for a given facet.query?

martin.grotzke
On Tue, 2007-06-19 at 11:09 -0700, Chris Hostetter wrote:
> I solve this problem by having metadata stored in my index which tells
> my custom request handler what fields to facet on for each category ...
How do you define this metadata?

Cheers,
Martin


> but i've also got several thousand categories.  If you've got less then
> 100 categories, you could easily enumerate them all with default
> facet.field params in your solrconfig using seperate requesthandler
> instances.
>
> : What do the experts think about this?
>
> you may want to read up on the past discussion of this in SOLR-247 ... in
> particular note the link to the mail archive where there was assitional
> discussion about it as well.  Where we left things is that it
> might make sense to support true globging in both fl and facet.field, so
> you can use naming conventions and say things like facet.field=facet_*
> but that in general trying to do something like facet.field=* would be a
> very bad idea even if it was supported.
>
> http://issues.apache.org/jira/browse/SOLR-247
>
>
> -Hoss
>
--
Martin Grotzke
http://www.javakaffee.de/blog/

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: All facet.fields for a given facet.query?

martin.grotzke
In reply to this post by Thomas Traeger-2
On Tue, 2007-06-19 at 19:16 +0200, Thomas Traeger wrote:

> Hi,
>
> I'm also just at that point where I think I need a wildcard facet.field
> parameter (or someone points out another solution for my problem...).
> Here is my situation:
>
> I have many products of different types with totally different
> attributes. There are currently more than 300 attributes....
> I use dynamic fields to import the attributes into solr without having
> to define a specific field for each attribute. Now when I make a query I
> would like to get back all facet.fields that are relevant for that query.
>
> I think it would be really nice, if I don't have to know which facets
> fields are there at query time, instead just import attributes into
> dynamic fields, get the relevant facets back and decide in the frontend
> which to display and how...
Do you really need all facets in the frontend?

Would it be a solution to have a facet ranking in the field definitions,
and then decide at query time, on which fields to facet on? This would
need an additional query parameter like facet.query.count.

E.g. if you have a query with q=foo+AND+prop1:bar+AND+prop2:baz
and you have fields
prop1 with facet-ranking 100
prop2 with facet-ranking 90
prop3 with facet-ranking 80
prop4 with facet-ranking 70
prop5 with facet-ranking 60

then you might decide not to facet on prop1 and prop2 as you have
already a constraint on it, but to facet on prop3 and prop4 if
facet.query.count is 2.

Just thinking about that... :)

Cheers,
Martin


>
> What do the experts think about this?
>
> Tom
>
--
Martin Grotzke
http://www.javakaffee.de/blog/

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: All facet.fields for a given facet.query?

Thomas Traeger-2
In reply to this post by Chris Hostetter-3
first: sorry for the bad quoting, I found your message in the archive only...

>> I have many products of different types with totally different
>> attributes. There are currently more than 300 attributes....
>> I use dynamic fields to import the attributes into solr without having
>> to define a specific field for each attribute. Now when I make a query I
>> would like to get back all facet.fields that are relevant for that query.
>> I think it would be really nice, if I don't have to know which facets
>> fields are there at query time, instead just import attributes into

> The problem is there may be lots of fields you index but don't want to
> facet on (full text search fields) and Solr has no easy way of knowing the
> difference between those and the fields you think it makes sense to facet
> on ... even if a field does make sense to facet on some of the time, that
> doesn't mean it makes sense all of the time (as you say "when I make a
> query I would like to get back all facet.fields that are relevant for that
> query" ... Solr has no way of knowing which fields make sense for that
> query unless it tries them all (can be very expensive) or you tell it.
> I solve this problem by having metadata stored in my index which tells
> my custom request handler what fields to facet on for each category ...
> but i've also got several thousand categories.  If you've got less then
> 100 categories, you could easily enumerate them all with default
> facet.field params in your solrconfig using seperate requesthandler
> instances.

>> What do the experts think about this?

> you may want to read up on the past discussion of this in SOLR-247 ... in
> particular note the link to the mail archive where there was assitional
> discussion about it as well.  Where we left things is that it
> might make sense to support true globging in both fl and facet.field, so
> you can use naming conventions and say things like facet.field=facet_*
> but that in general trying to do something like facet.field=* would be a
> very bad idea even if it was supported.
> http://issues.apache.org/jira/browse/SOLR-247


to make it clear, i agree that it doesn't make sense faceting on all available fields, I only want faceting on those 300 attributes that are stored together with the fields for full text searches. A product/document has typically only 5-10 attributes.

I like to decide at index time which attributes of a product might be of interest for faceting and store those in dynamic fields with the attribute-name and some kind of prefix or suffix to identify them at query time as facet.fields. Exactly the naming convention you mentioned.

I will have a closer look at SOLR-247 and the supplied patch, seems like a good starting point to dig deeper into solr... :o)

Tom


Reply | Threaded
Open this post in threaded view
|

Re: All facet.fields for a given facet.query?

Thomas Traeger-2
In reply to this post by martin.grotzke
Martin Grotzke schrieb:

> On Tue, 2007-06-19 at 19:16 +0200, Thomas Traeger wrote:
>  
>> Hi,
>>
>> I'm also just at that point where I think I need a wildcard facet.field
>> parameter (or someone points out another solution for my problem...).
>> Here is my situation:
>>
>> I have many products of different types with totally different
>> attributes. There are currently more than 300 attributes....
>> I use dynamic fields to import the attributes into solr without having
>> to define a specific field for each attribute. Now when I make a query I
>> would like to get back all facet.fields that are relevant for that query.
>>
>> I think it would be really nice, if I don't have to know which facets
>> fields are there at query time, instead just import attributes into
>> dynamic fields, get the relevant facets back and decide in the frontend
>> which to display and how...
>>    
> Do you really need all facets in the frontend?
>  
no, only the subset with matches for the current query.

> Would it be a solution to have a facet ranking in the field definitions,
> and then decide at query time, on which fields to facet on? This would
> need an additional query parameter like facet.query.count.
>
> E.g. if you have a query with q=foo+AND+prop1:bar+AND+prop2:baz
> and you have fields
> prop1 with facet-ranking 100
> prop2 with facet-ranking 90
> prop3 with facet-ranking 80
> prop4 with facet-ranking 70
> prop5 with facet-ranking 60
>
> then you might decide not to facet on prop1 and prop2 as you have
> already a constraint on it, but to facet on prop3 and prop4 if
> facet.query.count is 2.
>
> Just thinking about that... :)
>
> Cheers,
> Martin
>
>  
One step after the other ;o), the ranking of the facets will be another
problem I have to solve, counts of facets and matching documents will be
a starting point. Another idea is to use the score of the documents
returned by the query to compute a score for the facet.field...

Tom
Reply | Threaded
Open this post in threaded view
|

Re: All facet.fields for a given facet.query?

martin.grotzke
On Wed, 2007-06-20 at 12:59 +0200, Thomas Traeger wrote:
> Martin Grotzke schrieb:
> > On Tue, 2007-06-19 at 19:16 +0200, Thomas Traeger wrote:
[...]
> >> I think it would be really nice, if I don't have to know which facets
> >> fields are there at query time, instead just import attributes into
> >> dynamic fields, get the relevant facets back and decide in the frontend
> >> which to display and how...
> >>    
> > Do you really need all facets in the frontend?
> >  
> no, only the subset with matches for the current query.
ok, that's somehow similar to our requirement, but we want to get only
e.g. the first 5 relevant facets back from solr and not handle this
in the frontend.

> > Would it be a solution to have a facet ranking in the field definitions,
> > and then decide at query time, on which fields to facet on? This would
> > need an additional query parameter like facet.query.count.
[...]
> >  
> One step after the other ;o), the ranking of the facets will be another
> problem I have to solve, counts of facets and matching documents will be
> a starting point. Another idea is to use the score of the documents
> returned by the query to compute a score for the facet.field...
Yep, this is also different for different applications.

I'm also interested in this problem and would like to help solving
this problem (though I'm really new to lucene and solr)...

Cheers,
Martin


>
> Tom
>
--
Martin Grotzke
http://www.javakaffee.de/blog/

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: All facet.fields for a given facet.query?

Chris Hostetter-3
In reply to this post by martin.grotzke

: > I solve this problem by having metadata stored in my index which tells
: > my custom request handler what fields to facet on for each category ...
: How do you define this metadata?

this might be a good place to start, note that this message is almost two
years old, and predates the opensourcing of Solr ... the "Servlet" refered
to in this thread is Solr.

http://www.nabble.com/Announcement%3A-Lucene-powering-CNET.com-Product-Category-Listings-p748420.html

...i think i also talked a bit about the metadata documents in my
apachecon slides from last yera ... but i don't really remember, and i
haven't look at them in a while...

http://people.apache.org/~hossman/apachecon2006us/


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: All facet.fields for a given facet.query?

Chris Hostetter-3
In reply to this post by Thomas Traeger-2
: to make it clear, i agree that it doesn't make sense faceting on all
: available fields, I only want faceting on those 300 attributes that are
: stored together with the fields for full text searches. A
: product/document has typically only 5-10 attributes.
:
: I like to decide at index time which attributes of a product might be of
: interest for faceting and store those in dynamic fields with the
: attribute-name and some kind of prefix or suffix to identify them at
: query time as facet.fields. Exactly the naming convention you mentioned.

but if the facet fields are different for every document, and they use a
simple dynamicField prefix (like "facet_*" for example) how do you know at
query time which fields to facet on? ... even if wildcards work in
facet.field, usingfacet.field=facet_* would require solr to compute the
counts for *every* field matching that pattern to find out which ones have
positive counts for the current result set -- there may only be 5 that
actually matter, but it's got to try all 300 of them to find out which 5
that is.

this is where custom request handlers that understand that faceting
"metadata" for your documents becomes key ... so you can say "when
querying across the entire collection, only try to facet on category and
manufacturer.  if the search is constrained by category, then lookup other
facet options to offer based on that category name from our metadata
store, etc...



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: All facet.fields for a given facet.query?

Thomas Traeger-2
Chris Hostetter schrieb:

> : to make it clear, i agree that it doesn't make sense faceting on all
> : available fields, I only want faceting on those 300 attributes that are
> : stored together with the fields for full text searches. A
> : product/document has typically only 5-10 attributes.
> :
> : I like to decide at index time which attributes of a product might be of
> : interest for faceting and store those in dynamic fields with the
> : attribute-name and some kind of prefix or suffix to identify them at
> : query time as facet.fields. Exactly the naming convention you mentioned.
>
> but if the facet fields are different for every document, and they use a
> simple dynamicField prefix (like "facet_*" for example) how do you know at
> query time which fields to facet on? ... even if wildcards work in
> facet.field, usingfacet.field=facet_* would require solr to compute the
> counts for *every* field matching that pattern to find out which ones have
> positive counts for the current result set -- there may only be 5 that
> actually matter, but it's got to try all 300 of them to find out which 5
> that is.
I just made a quick test by building a facet query with those 300
attributes.
I realized, that the facets are build out of the whole index, not the
subset
returned by the initial query. Therefore I have a large number of empty
facets which I simply ignore. In my case the QueryTime is somewhat
higher (of
course) but it is still at some milliseconds. (wow!!!) :o)

So at this state of my investigation and in my use case I don't have to
worry
about performance even if I use the system in a way that uses more
resources
than necessary.
> this is where custom request handlers that understand that faceting
> "metadata" for your documents becomes key ... so you can say "when
> querying across the entire collection, only try to facet on category and
> manufacturer.  if the search is constrained by category, then lookup other
> facet options to offer based on that category name from our metadata
> store, etc...
Faceting on manufacturers and categories first and than present the
corresponding facets might be used under some circumstances, but in my case
the category structure is quite deep, detailed and complex. So when
the user enters a query I like to say to him "Look, here are the
manufacturers and categories with matches to your query, choose one if you
want, but maybe there is another one with products that better fit your
needs or products that you didn't even know about. So maybe you like to
filter based on the following attributes." Something like this ;o)

The point is, that i currently don't want to know too much about the data,
I just want to feed it into solr, follow some conventions and get the most
out of it as quickly as possible. Optimizations can and will take place at
a later time.

I hope to find some time to dig into solr SimpleFacets this weekend.

Regards,

Tom
Reply | Threaded
Open this post in threaded view
|

Re: All facet.fields for a given facet.query?

martin.grotzke
In reply to this post by Chris Hostetter-3
On Wed, 2007-06-20 at 12:49 -0700, Chris Hostetter wrote:

> : > I solve this problem by having metadata stored in my index which tells
> : > my custom request handler what fields to facet on for each category ...
> : How do you define this metadata?
>
> this might be a good place to start, note that this message is almost two
> years old, and predates the opensourcing of Solr ... the "Servlet" refered
> to in this thread is Solr.
>
> http://www.nabble.com/Announcement%3A-Lucene-powering-CNET.com-Product-Category-Listings-p748420.html
>
> ...i think i also talked a bit about the metadata documents in my
> apachecon slides from last yera ... but i don't really remember, and i
> haven't look at them in a while...
>
> http://people.apache.org/~hossman/apachecon2006us/
thx, I'll have a look at these resources.

cheers,
martin


>
>
> -Hoss
>


signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: All facet.fields for a given facet.query?

Chris Hostetter-3
In reply to this post by Thomas Traeger-2

: I realized, that the facets are build out of the whole index, not the
: subset
: returned by the initial query. Therefore I have a large number of empty
: facets which I simply ignore. In my case the QueryTime is somewhat

facet.mincount is a way to tell solr not to bother giving you those 0
counts ... you sill still get the name of hte field though so that you
know it tried it.

: Faceting on manufacturers and categories first and than present the
: corresponding facets might be used under some circumstances, but in my case
: the category structure is quite deep, detailed and complex. So when
: the user enters a query I like to say to him "Look, here are the
: manufacturers and categories with matches to your query, choose one if you
: want, but maybe there is another one with products that better fit your
: needs or products that you didn't even know about. So maybe you like to
: filter based on the following attributes." Something like this ;o)

categories was just an example i used because it tends to be a common use
case ... my point is the decision about which facet qualifies for the
"maybe there is another one with products that better fit your needs" part
of the response either requires computing counts for *every* facet
constraint and then looking at them to see which ones provide good
distribution, or by knowing something more about your metadata (ie: having
stats that show the majority of people who search on the word "canon" want
to facet on "megapixels") .. this is where custom biz logic comes in,
becuase in a lot of situations computing counts for every possible facet
may not be practical (even if the syntax to request it was easier)


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: All facet.fields for a given facet.query?

Yonik Seeley-2
On 6/20/07, Chris Hostetter <[hidden email]> wrote:
> facet.mincount is a way to tell solr not to bother giving you those 0
> counts ...

An aside: shouldn't that be the default?  All of the people using
facets that I have seen always have to set facet.mincount=1 (or
facet.zeros=false)

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: All facet.fields for a given facet.query?

Thomas Traeger-2
In reply to this post by Chris Hostetter-3

> : Faceting on manufacturers and categories first and than present the
> : corresponding facets might be used under some circumstances, but in my case
> : the category structure is quite deep, detailed and complex. So when
> : the user enters a query I like to say to him "Look, here are the
> : manufacturers and categories with matches to your query, choose one if you
> : want, but maybe there is another one with products that better fit your
> : needs or products that you didn't even know about. So maybe you like to
> : filter based on the following attributes." Something like this ;o)
>
> categories was just an example i used because it tends to be a common use
> case ... my point is the decision about which facet qualifies for the
> "maybe there is another one with products that better fit your needs" part
> of the response either requires computing counts for *every* facet
> constraint and then looking at them to see which ones provide good
> distribution, or by knowing something more about your metadata (ie: having
> stats that show the majority of people who search on the word "canon" want
> to facet on "megapixels") .. this is where custom biz logic comes in,
> becuase in a lot of situations computing counts for every possible facet
> may not be practical (even if the syntax to request it was easier)
I get your point, but how to know where additional metadata is of value
if not
just trying? Currently I start with a generic approach to see what
really is
in the product data, to get an overview of the quality of the data and
what happens if I use the data in the new search solution. Then I can
decide
what is to do to optimize the system, i.e. try to reduce the count of
attributes, get the marketing to split somewhat generic attributes into
more
detailed ones, find a way to display the most relevant facets for the
current
query first and so on...

Tom
Reply | Threaded
Open this post in threaded view
|

Re: All facet.fields for a given facet.query?

Chris Hostetter-3

: I get your point, but how to know where additional metadata is of value
: if not
: just trying? Currently I start with a generic approach to see what

Man power.

for simple schemas the brute force facet on everything appraoch can scale
well .. but as soon as you start talking about having hundards of dynamic
fields where every product might be differnet you have to either
accept that you're going to be fighting an uphill performance battler
-- or start explicitly classifying those fields in some way that lets you
know which ones to use in which use cases (or at the very least: which
order to try them in in which use cases so you can do the most important
ones first and stop when you have some options to give the user.

you can even use the brute force "facet on everything" in Solr appraoch to
help you find those patterns for classifying your fields ... you might
even be able to completely automate it ... but i'm guessing you're going
to want to do it in batch on the backend and not in real time everytime a
user does a search.




-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: All facet.fields for a given facet.query?

Chris Hostetter-3
In reply to this post by Yonik Seeley-2

: > facet.mincount is a way to tell solr not to bother giving you those 0
: > counts ...
:
: An aside: shouldn't that be the default?  All of the people using
: facets that I have seen always have to set facet.mincount=1 (or
: facet.zeros=false)

Hmmm... maybe, but it's a really easy option to turn on, and i think if we
don't have facet.mincount default to 0 new users might get confused
when some constraints don't show up ... returning them with a 0 count
makes it clear Solr knows about them and tried them and found no
intersection with the current results.


-Hoss