Dutch analyzer in combo with custom stopword list

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Dutch analyzer in combo with custom stopword list

nicksnels1
Hi,

I have replaced the English stopwords with Dutch stopwords. And I also
managed to get the dutch analyzer to work, without throwing an error. The
following works:

    <fieldtype name="nametext" class="solr.TextField">
      <analyzer class="org.apache.lucene.analysis.nl.DutchAnalyzer"/>
    </fieldtype>

But why doesn't the following work

    <fieldtype name="nametext" class="solr.TextField">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" words="stopwords.txt"/>
        <analyzer class="org.apache.lucene.analysis.nl.DutchAnalyzer"/>
      </analyzer>
    </fieldtype>

The problem is that the Dutch Analyzer doesn't filter all the stopwords, so
I made an extended one. But the above configuration doesn't work. How can I
make it work. Hope somebody can help me out.

Kind regards,

Nick
Reply | Threaded
Open this post in threaded view
|

Re: Dutch analyzer in combo with custom stopword list

Yonik Seeley
On 6/22/06, Nick Snels <[hidden email]> wrote:

> But why doesn't the following work
>
>     <fieldtype name="nametext" class="solr.TextField">
>       <analyzer>
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StandardFilterFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.StopFilterFactory" words="stopwords.txt"/>
>         <analyzer class="org.apache.lucene.analysis.nl.DutchAnalyzer"/>
>       </analyzer>
>     </fieldtype>

An analyzer *is* a tokenizer followed by multiple token filters, so
you can't really put an analyzer in another analyzer.

Probably the right way to handle this is to make a Factory for the
stemmer filter only.
One was is by enhancing the existing SnowballPorterFilterFactory in
Solr to make the language configurable, and to allow it to take an
exclusion or protected words list.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Dutch analyzer in combo with custom stopword list

nicksnels1
Hi Yonik,

thanks for the advice. The factory works like a charm!!

Kind regards,

Nick

On 6/22/06, Yonik Seeley <[hidden email]> wrote:

>
> On 6/22/06, Nick Snels <[hidden email]> wrote:
> > But why doesn't the following work
> >
> >     <fieldtype name="nametext" class="solr.TextField">
> >       <analyzer>
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.StandardFilterFactory"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >         <filter class="solr.StopFilterFactory" words="stopwords.txt"/>
> >         <analyzer class="org.apache.lucene.analysis.nl.DutchAnalyzer"/>
> >       </analyzer>
> >     </fieldtype>
>
> An analyzer *is* a tokenizer followed by multiple token filters, so
> you can't really put an analyzer in another analyzer.
>
> Probably the right way to handle this is to make a Factory for the
> stemmer filter only.
> One was is by enhancing the existing SnowballPorterFilterFactory in
> Solr to make the language configurable, and to allow it to take an
> exclusion or protected words list.
>
> -Yonik
>