How to index a facetfield by searching words matching from another Textfield

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

How to index a facetfield by searching words matching from another Textfield

Xavier
Hi everyone,

I'm a new Solr User but i used to work on Endeca.

There is a modul called "TextTagger" with Endeca that is auto indexing values in a facetfield (multivalued) when he find words (from a given wordslist) into an other TextField from that document.

I didn't see any subjects or any ways to do it with Solr ???

Thanks for advance ;)
Em
Reply | Threaded
Open this post in threaded view
|

Re: How to index a facetfield by searching words matching from another Textfield

Em
Hi Xavier,

sounds like a job for KeepWordFilter!

From the javadocs:
"A TokenFilter that only keeps tokens with text contained in the
required words. This filter behaves like the inverse of StopFilter."

However, you have to provide the wordslist as a .txt-file.

By using copyFields and the KeepWordFilter you are able to achieve what
you want.

Kind regards,
Em

Am 20.02.2012 17:28, schrieb Xavier:

> Hi everyone,
>
> I'm a new Solr User but i used to work on Endeca.
>
> There is a modul called "TextTagger" with Endeca that is auto indexing
> values in a facetfield (multivalued) when he find words (from a given
> wordslist) into an other TextField from that document.
>
> I didn't see any subjects or any ways to do it with Solr ???
>
> Thanks for advance ;)
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-tp3761201p3761201.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: How to index a facetfield by searching words matching from another Textfield

Xavier
This post was updated on .
That's it !  Thanks :)

First time i see that documentation page (which is really helpfull) : http://lucidworks.lucidimagination.com/display/solr/Filter+Descriptions#FilterDescriptions-KeepWordsFilter

So, now i want to "associate" a wordslist to a value of an existing facets

So i tried i combine synonyms and keepwords like that :

<fieldType name="text_tag" class="solr.TextField" sortMissingLast="true" omitNorms="true">
        <analyzer>
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.SynonymFilterFactory" synonyms="synonymswords.txt"/>
                <filter class="solr.KeepWordFilterFactory" words="keepwords.txt" ignoreCase="true"/>
        </analyzer>
    </fieldType>

It works very well but my problem now is that i want to have whitespaces return in synonym and match it with my keepwords ! (because i have whitespaces in the values of my facet)

Exemple if i see : 'php' term i get with my synonyms_words : 'web langage'
and i keep the whole word 'web langage'

So my files are :
synonymswords.txt : php=>web langage
keepwords.txt : web langage

The problem is that each words are analyze separatly and i dont know how to handle it with whitespaces ...
(synonyms return 'web' and 'langage' so it don't match with 'web langage')

I tried to use 'solr.PatternReplaceFilter'  (<filter class="solr.PatternReplaceFilter" pattern="_" replacement=" "/>) with a chosen caractere '_' as a space caracter but i get an error so if you have an other tip for me it would be great :p
Reply | Threaded
Open this post in threaded view
|

Re: How to index a facetfield by searching words matching from another Textfield

Xavier
Seems that's an error from the documentation with the 'Factory' missing in the classname !!?

I found

<filter class="solr.PatternReplaceFilterFactory" pattern="_" replacement=" "/>

That is working fine !!!

Conclusion i have this files :
synonymswords.txt :
php,mysql,html,css=>web_langage

And

keepwords.txt :
web langage

With this fieldType :

<fieldType name="text_tag" class="solr.TextField" sortMissingLast="true" omitNorms="true">
        <analyzer>
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.SynonymFilterFactory" synonyms="synonymswords.txt"/>
                <filter class="solr.PatternReplaceFilterFactory" pattern="_" replacement=" "/>
                <filter class="solr.KeepWordFilterFactory" words="keepwords.txt" ignoreCase="true"/>
        </analyzer>
    </fieldType>


And it's working fine ;)


But I have another question, my fields are configured like that :

<copyField source="mytext" dest="text_tag_facet" />
<field name="text_tag_facet" type="text_tag" indexed="true" stored="false" multiValued="true"/>

But if I turn "stored" to "true", it always return the full original text in my documents field value for "text_tag_facet" and not the facets created (like 'web langage')

How can i get the result of the facet in the stored field of the document ?
Reply | Threaded
Open this post in threaded view
|

Re: How to index a facetfield by searching words matching from another Textfield

Erick Erickson
setting stored="true" simply places a verbatim copy
of the input in the index. Returning that field in
a document will simply return that verbatim copy,
there's no way to do anything else.

The facet *values* you get back in your response should
be what you put in your index though, why doesn't that
suffice?

BTW, it's best to start a new thread rather than switch
topics mid-stream, see:

http://people.apache.org/~hossman/#threadhijack

Best
Erick


On Tue, Feb 21, 2012 at 8:35 AM, Xavier <[hidden email]> wrote:

> Seems that's an error from the documentation with the 'Factory' missing in
> the classname !!?
>
> I found
>
> <filter class="solr.PatternReplaceFilterFactory" pattern="_" replacement="
> "/>
>
> That is working fine !!!
>
> Conclusion i have this files :
> *synonymswords.txt :*
> php,mysql,html,css=>web_langage
>
> And
>
> *keepwords.txt :*
> web langage
>
> With this fieldType :
>
> <fieldType name="text_tag" class="solr.TextField" sortMissingLast="true"
> omitNorms="true">
>        <analyzer>
>                <tokenizer class="solr.StandardTokenizerFactory"/>
>                <filter class="solr.LowerCaseFilterFactory"/>
>                <filter class="solr.SynonymFilterFactory"
> synonyms="synonymswords.txt"/>
>                <filter class="solr.PatternReplaceFilterFactory" pattern="_"
> replacement=" "/>
>                <filter class="solr.KeepWordFilterFactory"
> words="keepwords.txt" ignoreCase="true"/>
>        </analyzer>
>    </fieldType>
>
>
> And it's working fine ;)
>
>
> But I have another question, my fields are configured like that :
>
> <copyField source="mytext" dest="text_tag_facet" />
> <field name="text_tag_facet" type="text_tag" indexed="true" stored="false"
> multiValued="true"/>
>
> But if I turn "stored" to "true", it always return the full original text in
> my documents field value for "text_tag_facet" and not the facets created
> (like 'web langage')
>
> How can i get the result of the facet in the stored field of the document ?
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-tp3761201p3763551.html
> Sent from the Solr - User mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: How to index a facetfield by searching words matching from another Textfield

Xavier
Thanks for this answer.

I have posted my new question (related to this post) into a new topic ;)

( http://lucene.472066.n3.nabble.com/How-to-merge-an-quot-autofacet-quot-with-a-predefined-facet-td3763988.html )


Best regards