Faceted Searching problems

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Faceted Searching problems

Andre Basse
Hi all,
 
I just installed the nightly build to try the Faceted Searching . After some testing I discovered that some characters are missing in the result XML and that fields with "/" chars are sometimes split into two entries.
 
Example:
<int name="franc">1</int> should be France
<int name="culturefestiv">1</int> should be Culture/Festivals

Please find details below.
 
Original XML
=========
 
<str name="section">Metro</str>
 
<arr name="classification">
<str>Culture/Film</str>
<str>Culture/Festivals</str>
</arr>

<arr name="geoloc">
<str>France</str>
<str>Sydney</str>
</arr>
 
 
 
SOLR response for the query
=====================
(http://192.168.157.128:8983/solr/select/?q=Bellucci&rows=0&facet=true&facet.limit=5&facet.field=section&facet.field=geoloc&facet.field=classification)
 
<response>

 <responseHeader>
<status>0</status>
<QTime>518</QTime>
</responseHeader>
<result numFound="2" start="0"/>

 <lst name="facet_counts">
<lst name="facet_queries"/>

 <lst name="facet_fields">

 <lst name="section">
<int name="metro">2</int>
<int name="busi">0</int>
<int name="career">0</int>
<int name="comput">0</int>
<int name="domain">0</int>
</lst>

 <lst name="geoloc">
<int name="franc">1</int>
<int name="sydney">1</int>
<int name="act">0</int>
<int name="adelaid">0</int>
<int name="afghanistan">0</int>
</lst>

 <lst name="classification">
<int name="cultur">1</int>
<int name="culturefestiv">1</int>
<int name="culturefilm">1</int>
<int name="festiv">1</int>
<int name="film">1</int>
</lst>
</lst>
</lst>
</response>
 
 
Any help is much appreciated!
 
 
Thanks,
 
Andre
 
 
 


*********************************************************************************
The information contained in this e-mail message and any accompanying files is or may be confidential.  If you are not the intended recipient, any use, dissemination, reliance, forwarding, printing or copying of this e-mail or any attached files is unauthorised. This e-mail is subject to copyright. No part of it should be reproduced, adapted or communicated without the written consent of the copyright owner. If you have received this e-mail in error, please advise the sender immediately by return e-mail, or telephone and delete all copies. Fairfax does not guarantee the accuracy or completeness of any information contained in this e-mail or attached files. Internet communications are not secure, therefore Fairfax does not accept legal responsibility for the contents of this message or attached files.
*********************************************************************************

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Faceted Searching problems

Chris Hostetter-3

: I just installed the nightly build to try the Faceted Searching . After
: some testing I discovered that some characters are missing in the result
: XML and that fields with "/" chars are sometimes split into two entries.

I believe what you are encountering is an issue of tokenization (or
analysis) ... you didn't post your schema.xml, but i'm guessing these two
fields have a datatype that is analyzed right?  Take a look at the
followup posts in this recent thread...

http://www.nabble.com/Error-in-faceted-browsing-tf2267819.html

...i'll try to update the docs for facet.field to make this more obvious.


:
: Example:
: <int name="franc">1</int> should be France
: <int name="culturefestiv">1</int> should be Culture/Festivals
:
: Please find details below.
:
: Original XML
: =========
:
: <str name="section">Metro</str>
:
: <arr name="classification">
: <str>Culture/Film</str>
: <str>Culture/Festivals</str>
: </arr>
:
: <arr name="geoloc">
: <str>France</str>
: <str>Sydney</str>
: </arr>
:
:
:
: SOLR response for the query
: =====================
: (http://192.168.157.128:8983/solr/select/?q=Bellucci&rows=0&facet=true&facet.limit=5&facet.field=section&facet.field=geoloc&facet.field=classification)
:
: <response>
: −
:  <responseHeader>
: <status>0</status>
: <QTime>518</QTime>
: </responseHeader>
: <result numFound="2" start="0"/>
: −
:  <lst name="facet_counts">
: <lst name="facet_queries"/>
: −
:  <lst name="facet_fields">
: −
:  <lst name="section">
: <int name="metro">2</int>
: <int name="busi">0</int>
: <int name="career">0</int>
: <int name="comput">0</int>
: <int name="domain">0</int>
: </lst>
: −
:  <lst name="geoloc">
: <int name="franc">1</int>
: <int name="sydney">1</int>
: <int name="act">0</int>
: <int name="adelaid">0</int>
: <int name="afghanistan">0</int>
: </lst>
: −
:  <lst name="classification">
: <int name="cultur">1</int>
: <int name="culturefestiv">1</int>
: <int name="culturefilm">1</int>
: <int name="festiv">1</int>
: <int name="film">1</int>
: </lst>
: </lst>
: </lst>
: </response>
:
:
: Any help is much appreciated!
:
:
: Thanks,
:
: Andre
:
:
:
:
:
: *********************************************************************************
: The information contained in this e-mail message and any accompanying files is or may be confidential.  If you are not the intended recipient, any use, dissemination, reliance, forwarding, printing or copying of this e-mail or any attached files is unauthorised. This e-mail is subject to copyright. No part of it should be reproduced, adapted or communicated without the written consent of the copyright owner. If you have received this e-mail in error, please advise the sender immediately by return e-mail, or telephone and delete all copies. Fairfax does not guarantee the accuracy or completeness of any information contained in this e-mail or attached files. Internet communications are not secure, therefore Fairfax does not accept legal responsibility for the contents of this message or attached files.
: *********************************************************************************
:
:



-Hoss

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Faceted Searching problems

Yonik Seeley-2
In reply to this post by Andre Basse
On 9/13/06, Andre Basse <[hidden email]> wrote:
> Example:
> <int name="franc">1</int> should be France
> <int name="culturefestiv">1</int> should be Culture/Festivals

Hi Andre,

Field faceting works over the indexed terms... so you get back what
was indexed (word splitting, lowercasing, stemming, etc...  the
process is not generally reversible).

Perhaps you "classification" field should be of type "string" which is
indexed by not analyzed at all.  If you need some analysis (like if
you also want a query of "Festival" to match against
"Culture/Festivals", then you should index the field again as a
non-tokenized (non analyzed) "string" type.  This can be easily done
with an extra field definition and an a copyField statement in the
schema.xml

-Yonik

> Please find details below.
>
> Original XML
> =========
>
> <str name="section">Metro</str>
>
> <arr name="classification">
> <str>Culture/Film</str>
> <str>Culture/Festivals</str>
> </arr>
>
> <arr name="geoloc">
> <str>France</str>
> <str>Sydney</str>
> </arr>
>
>
>
> SOLR response for the query
> =====================
> (http://192.168.157.128:8983/solr/select/?q=Bellucci&rows=0&facet=true&facet.limit=5&facet.field=section&facet.field=geoloc&facet.field=classification)
>
> <response>
> −
>  <responseHeader>
> <status>0</status>
> <QTime>518</QTime>
> </responseHeader>
> <result numFound="2" start="0"/>
> −
>  <lst name="facet_counts">
> <lst name="facet_queries"/>
> −
>  <lst name="facet_fields">
> −
>  <lst name="section">
> <int name="metro">2</int>
> <int name="busi">0</int>
> <int name="career">0</int>
> <int name="comput">0</int>
> <int name="domain">0</int>
> </lst>
> −
>  <lst name="geoloc">
> <int name="franc">1</int>
> <int name="sydney">1</int>
> <int name="act">0</int>
> <int name="adelaid">0</int>
> <int name="afghanistan">0</int>
> </lst>
> −
>  <lst name="classification">
> <int name="cultur">1</int>
> <int name="culturefestiv">1</int>
> <int name="culturefilm">1</int>
> <int name="festiv">1</int>
> <int name="film">1</int>
> </lst>
> </lst>
> </lst>
> </response>
>
>
> Any help is much appreciated!
>
>
> Thanks,
>
> Andre
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Faceted Searching problems

Andre Basse
In reply to this post by Andre Basse
Sorry, please ignore that email. Problem solved (I should read more mails...)

Thanks to Jeff.










Hi all,
 
I just installed the nightly build to try the Faceted Searching . After some testing I discovered that some characters are missing in the result XML and that fields with "/" chars are sometimes split into two entries.
 
Example:
<int name="franc">1</int> should be France <int name="culturefestiv">1</int> should be Culture/Festivals

Please find details below.
 
Original XML
=========
 
<str name="section">Metro</str>
 
<arr name="classification">
<str>Culture/Film</str>
<str>Culture/Festivals</str>
</arr>

<arr name="geoloc">
<str>France</str>
<str>Sydney</str>
</arr>
 
 
 
SOLR response for the query
=====================
(http://192.168.157.128:8983/solr/select/?q=Bellucci&rows=0&facet=true&facet.limit=5&facet.field=section&facet.field=geoloc&facet.field=classification)
 
<response>

 <responseHeader>
<status>0</status>
<QTime>518</QTime>
</responseHeader>
<result numFound="2" start="0"/>

 <lst name="facet_counts">
<lst name="facet_queries"/>

 <lst name="facet_fields">

 <lst name="section">
<int name="metro">2</int>
<int name="busi">0</int>
<int name="career">0</int>
<int name="comput">0</int>
<int name="domain">0</int>
</lst>

 <lst name="geoloc">
<int name="franc">1</int>
<int name="sydney">1</int>
<int name="act">0</int>
<int name="adelaid">0</int>
<int name="afghanistan">0</int>
</lst>

 <lst name="classification">
<int name="cultur">1</int>
<int name="culturefestiv">1</int>
<int name="culturefilm">1</int>
<int name="festiv">1</int>
<int name="film">1</int>
</lst>
</lst>
</lst>
</response>
 
 
Any help is much appreciated!
 
 
Thanks,
 
Andre
 
 
 


*********************************************************************************
The information contained in this e-mail message and any accompanying files is or may be confidential.  If you are not the intended recipient, any use, dissemination, reliance, forwarding, printing or copying of this e-mail or any attached files is unauthorised. This e-mail is subject to copyright. No part of it should be reproduced, adapted or communicated without the written consent of the copyright owner. If you have received this e-mail in error, please advise the sender immediately by return e-mail, or telephone and delete all copies. Fairfax does not guarantee the accuracy or completeness of any information contained in this e-mail or attached files. Internet communications are not secure, therefore Fairfax does not accept legal responsibility for the contents of this message or attached files.
*********************************************************************************

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Faceted Searching problems

Erik Hatcher
In reply to this post by Andre Basse
You need to use an untokenized field for facets.  I can see we're  
going to get this question frequently now - it was mentioned earlier  
today in fact.  You can use a <copyField> that is untokenized such  
that you can use one field for searching, and one for facets.

You are obviously using a stemming analyzer, and that is why France  
became franc, etc - just to explain why you are seeing those terms  
listed.

        Erik


On Sep 13, 2006, at 9:19 PM, Andre Basse wrote:

> Hi all,
>
> I just installed the nightly build to try the Faceted Searching .  
> After some testing I discovered that some characters are missing in  
> the result XML and that fields with "/" chars are sometimes split  
> into two entries.
>
> Example:
> <int name="franc">1</int> should be France
> <int name="culturefestiv">1</int> should be Culture/Festivals
>
> Please find details below.
>
> Original XML
> =========
>
> <str name="section">Metro</str>
>
> <arr name="classification">
> <str>Culture/Film</str>
> <str>Culture/Festivals</str>
> </arr>
>
> <arr name="geoloc">
> <str>France</str>
> <str>Sydney</str>
> </arr>
>
>
>
> SOLR response for the query
> =====================
> (http://192.168.157.128:8983/solr/select/?
> q=Bellucci&rows=0&facet=true&facet.limit=5&facet.field=section&facet.f
> ield=geoloc&facet.field=classification)
>
> <response>
> −
>  <responseHeader>
> <status>0</status>
> <QTime>518</QTime>
> </responseHeader>
> <result numFound="2" start="0"/>
> −
>  <lst name="facet_counts">
> <lst name="facet_queries"/>
> −
>  <lst name="facet_fields">
> −
>  <lst name="section">
> <int name="metro">2</int>
> <int name="busi">0</int>
> <int name="career">0</int>
> <int name="comput">0</int>
> <int name="domain">0</int>
> </lst>
> −
>  <lst name="geoloc">
> <int name="franc">1</int>
> <int name="sydney">1</int>
> <int name="act">0</int>
> <int name="adelaid">0</int>
> <int name="afghanistan">0</int>
> </lst>
> −
>  <lst name="classification">
> <int name="cultur">1</int>
> <int name="culturefestiv">1</int>
> <int name="culturefilm">1</int>
> <int name="festiv">1</int>
> <int name="film">1</int>
> </lst>
> </lst>
> </lst>
> </response>
>
>
> Any help is much appreciated!
>
>
> Thanks,
>
> Andre
>
>
>
>
>
> **********************************************************************
> ***********
> The information contained in this e-mail message and any  
> accompanying files is or may be confidential.  If you are not the  
> intended recipient, any use, dissemination, reliance, forwarding,  
> printing or copying of this e-mail or any attached files is  
> unauthorised. This e-mail is subject to copyright. No part of it  
> should be reproduced, adapted or communicated without the written  
> consent of the copyright owner. If you have received this e-mail in  
> error, please advise the sender immediately by return e-mail, or  
> telephone and delete all copies. Fairfax does not guarantee the  
> accuracy or completeness of any information contained in this e-
> mail or attached files. Internet communications are not secure,  
> therefore Fairfax does not accept legal responsibility for the  
> contents of this message or attached files.
> **********************************************************************
> ***********
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Faceted Searching problems

Erik Hatcher
In reply to this post by Chris Hostetter-3

On Sep 13, 2006, at 9:37 PM, Chris Hostetter wrote:
> http://www.nabble.com/Error-in-faceted-browsing-tf2267819.html
>
> ...i'll try to update the docs for facet.field to make this more  
> obvious.

Would it ever make sense to generate facets on a tokenized field?  
Maybe the facet implementation could throw an error if the field name  
specified is tokenized?

        Erik

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Faceted Searching problems

Yonik Seeley-2
On 9/13/06, Erik Hatcher <[hidden email]> wrote:
> Would it ever make sense to generate facets on a tokenized field?
> Maybe the facet implementation could throw an error if the field name
> specified is tokenized?

I think it probably can make sense...
 - finding top terms in a full-text field that match a query could be useful
 - the analysis could just be for normalization - trimming whitespace
or normalization
 - it allows more flexibility on how to represent tags... one may
already have tags in a whitespace delimited field rather than separate
values in a multi-valued field.

-Yonik
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Faceted Searching problems

Yonik Seeley-2
In reply to this post by Erik Hatcher
On 9/13/06, Erik Hatcher <[hidden email]> wrote:
> You need to use an untokenized field for facets.

At least 3 answers in 5 minutes... we should try synchronized swimming ;-)

-Yonik
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Faceted Searching problems

Andre Basse
In reply to this post by Andre Basse

Time to say: Thank you all for your great support!


-Andre



----------------------------------------------------------------
> You need to use an untokenized field for facets.

At least 3 answers in 5 minutes... we should try synchronized swimming
;-)

-Yonik


*********************************************************************************
The information contained in this e-mail message and any accompanying files is or may be confidential.  If you are not the intended recipient, any use, dissemination, reliance, forwarding, printing or copying of this e-mail or any attached files is unauthorised. This e-mail is subject to copyright. No part of it should be reproduced, adapted or communicated without the written consent of the copyright owner. If you have received this e-mail in error, please advise the sender immediately by return e-mail, or telephone and delete all copies. Fairfax does not guarantee the accuracy or completeness of any information contained in this e-mail or attached files. Internet communications are not secure, therefore Fairfax does not accept legal responsibility for the contents of this message or attached files.
*********************************************************************************

Loading...