Minimum facet length?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Minimum facet length?

project2501
Hi,
  I am exploring the faceted search results of Solr. My query is like this.

http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=text&facet.limit=500&facet.prefix=wick

If I don't use the prefix, I get back totals for words like 1,a,of,2,3,4.
1 letter/number occurrences in my documents. Its not really useful since
all the documents have some free floating single-digit numbers.

Is there a way to restrict the word frequency results for a facet based on
the length so I can set it to > 3 or is there a better way?

thanks,
Darren
Reply | Threaded
Open this post in threaded view
|

Re: Minimum facet length?

Shalin Shekhar Mangar
On Thu, Jul 30, 2009 at 9:53 PM, <[hidden email]> wrote:

> Hi,
>  I am exploring the faceted search results of Solr. My query is like this.
>
>
> http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=text&facet.limit=500&facet.prefix=wick
>
> If I don't use the prefix, I get back totals for words like 1,a,of,2,3,4.
> 1 letter/number occurrences in my documents. Its not really useful since
> all the documents have some free floating single-digit numbers.
>
> Is there a way to restrict the word frequency results for a facet based on
> the length so I can set it to > 3 or is there a better way?
>

Yes, you can specify facet.mincount=3 to return only those terms present in
more than 3 documents. On a related note, a tokenized field (such as text
type in the example schema) will create a large number of unqiue terms.
Faceting on such a field may not be very useful and/or efficient. Typically
faceting is done on untokenized fields (such as string type).

--
Regards,
Shalin Shekhar Mangar.
Reply | Threaded
Open this post in threaded view
|

Re: Minimum facet length?

Erik Hatcher

On Jul 30, 2009, at 1:00 PM, Shalin Shekhar Mangar wrote:

> On Thu, Jul 30, 2009 at 9:53 PM, <[hidden email]> wrote:
>
>> Hi,
>> I am exploring the faceted search results of Solr. My query is like  
>> this.
>>
>>
>> http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=text&facet.limit=500&facet.prefix=wick
>>
>> If I don't use the prefix, I get back totals for words like 1,a,of,
>> 2,3,4.
>> 1 letter/number occurrences in my documents. Its not really useful  
>> since
>> all the documents have some free floating single-digit numbers.
>>
>> Is there a way to restrict the word frequency results for a facet  
>> based on
>> the length so I can set it to > 3 or is there a better way?
>>
>
> Yes, you can specify facet.mincount=3 to return only those terms  
> present in
> more than 3 documents. On a related note, a tokenized field (such as  
> text
> type in the example schema) will create a large number of unqiue  
> terms.
> Faceting on such a field may not be very useful and/or efficient.  
> Typically
> faceting is done on untokenized fields (such as string type).

I think what was meant by > 3 was if faceting only returned terms of  
length greater than 3, not count.

You could copyField your text field to another field, set the analyzer  
to include a LengthFilterFactory with a minimum length specified, and  
also have other analysis tweaks to have numbers and other stop words  
removed.

        Erik

Reply | Threaded
Open this post in threaded view
|

Re: Minimum facet length?

Shalin Shekhar Mangar
On Thu, Jul 30, 2009 at 10:35 PM, Erik Hatcher
<[hidden email]>wrote:

>
> On Jul 30, 2009, at 1:00 PM, Shalin Shekhar Mangar wrote:
>
>  On Thu, Jul 30, 2009 at 9:53 PM, <[hidden email]> wrote:
>>
>>  Hi,
>>> I am exploring the faceted search results of Solr. My query is like this.
>>>
>>>
>>>
>>> http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=text&facet.limit=500&facet.prefix=wick
>>>
>>> If I don't use the prefix, I get back totals for words like 1,a,of,2,3,4.
>>> 1 letter/number occurrences in my documents. Its not really useful since
>>> all the documents have some free floating single-digit numbers.
>>>
>>> Is there a way to restrict the word frequency results for a facet based
>>> on
>>> the length so I can set it to > 3 or is there a better way?
>>>
>>>
>> Yes, you can specify facet.mincount=3 to return only those terms present
>> in
>> more than 3 documents. On a related note, a tokenized field (such as text
>> type in the example schema) will create a large number of unqiue terms.
>> Faceting on such a field may not be very useful and/or efficient.
>> Typically
>> faceting is done on untokenized fields (such as string type).
>>
>
> I think what was meant by > 3 was if faceting only returned terms of length
> greater than 3, not count.
>

Ah, sorry. I was too fast to reply.

--
Regards,
Shalin Shekhar Mangar.
Reply | Threaded
Open this post in threaded view
|

Re: Minimum facet length?

project2501
In reply to this post by Erik Hatcher
Hi Erik,
  Thanks for the tip. Hmmmm, well that's a good point, or maybe I will
just do the word filtering upfront and store it separately now that I
think about it more.

Darren

On Thu, 2009-07-30 at 13:05 -0400, Erik Hatcher wrote:

> On Jul 30, 2009, at 1:00 PM, Shalin Shekhar Mangar wrote:
>
> > On Thu, Jul 30, 2009 at 9:53 PM, <[hidden email]> wrote:
> >
> >> Hi,
> >> I am exploring the faceted search results of Solr. My query is like  
> >> this.
> >>
> >>
> >> http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=text&facet.limit=500&facet.prefix=wick
> >>
> >> If I don't use the prefix, I get back totals for words like 1,a,of,
> >> 2,3,4.
> >> 1 letter/number occurrences in my documents. Its not really useful  
> >> since
> >> all the documents have some free floating single-digit numbers.
> >>
> >> Is there a way to restrict the word frequency results for a facet  
> >> based on
> >> the length so I can set it to > 3 or is there a better way?
> >>
> >
> > Yes, you can specify facet.mincount=3 to return only those terms  
> > present in
> > more than 3 documents. On a related note, a tokenized field (such as  
> > text
> > type in the example schema) will create a large number of unqiue  
> > terms.
> > Faceting on such a field may not be very useful and/or efficient.  
> > Typically
> > faceting is done on untokenized fields (such as string type).
>
> I think what was meant by > 3 was if faceting only returned terms of  
> length greater than 3, not count.
>
> You could copyField your text field to another field, set the analyzer  
> to include a LengthFilterFactory with a minimum length specified, and  
> also have other analysis tweaks to have numbers and other stop words  
> removed.
>
> Erik
>