schema field type doesn't work

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

schema field type doesn't work

Dimitar Ouzounov
Hi everybody,
I added the following fieldtype in schema.xml :


   <fieldtype name="st_numbers" class="solr.TextField">
      <analyzer>
          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
          <filter class="solr.WordDelimiterFilterFactory" catenateAll="1" />
          <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
   </fieldtype>

I want to index two types of strings, for example :

12345678
1234-5678

No matter which of the above strings is stored, I'd like to match it by
using either 12345678 or 1234-5678.
Everything is working fine, except for the case when 12345678 is stored and
I try to match it using
1234-5678. I must be doing something wrong, maybe in the schema. Does anyone
have any suggestions?
Any help would be greatly appreciated.
Reply | Threaded
Open this post in threaded view
|

Re: schema field type doesn't work

Bertrand Delacretaz
On 3/24/07, Dimitar Ouzounov <[hidden email]> wrote:

> ...I must be doing something wrong, maybe in the schema. Does anyone
> have any suggestions?..

The best way to debug such problems is with the analyzer admin tool:
http://localhost:8983/solr/admin/analysis.jsp

You can try various combinations of analyzers and see what Solr
actually indexes for various values.

HTH,
-Bertrand
Reply | Threaded
Open this post in threaded view
|

Re: schema field type doesn't work

Yonik Seeley-2
On 3/24/07, Bertrand Delacretaz <[hidden email]> wrote:
> On 3/24/07, Dimitar Ouzounov <[hidden email]> wrote:
>
> > ...I must be doing something wrong, maybe in the schema. Does anyone
> > have any suggestions?..
>
> The best way to debug such problems is with the analyzer admin tool:
> http://localhost:8983/solr/admin/analysis.jsp

Yep...
trying the analysis page, one can see that parts of the numbers (not
just the catenation) are also still being generated, messing up the
query.

So if 123-456 is indexed, and you also want to be able to match parts
of that number (like 123), then you need a query analyzer and an index
analyzer for the field type, and turn off generation of parts for the
query analyzer only.

If you don't want to match parts, then a single analyzer for both
query and indexing will do, but explicitly turn off part generation:
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="0" generateNumberParts="0" catenateWords="0"
catenateNumbers="0" catenateAll="1"/>

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: schema field type doesn't work

Dimitar Ouzounov
Thanks a lot ! The analyzer admin tool is indeed useful.

On 3/24/07, Yonik Seeley <[hidden email]> wrote:

>
> On 3/24/07, Bertrand Delacretaz <[hidden email]> wrote:
> > On 3/24/07, Dimitar Ouzounov <[hidden email]> wrote:
> >
> > > ...I must be doing something wrong, maybe in the schema. Does anyone
> > > have any suggestions?..
> >
> > The best way to debug such problems is with the analyzer admin tool:
> > http://localhost:8983/solr/admin/analysis.jsp
>
> Yep...
> trying the analysis page, one can see that parts of the numbers (not
> just the catenation) are also still being generated, messing up the
> query.
>
> So if 123-456 is indexed, and you also want to be able to match parts
> of that number (like 123), then you need a query analyzer and an index
> analyzer for the field type, and turn off generation of parts for the
> query analyzer only.
>
> If you don't want to match parts, then a single analyzer for both
> query and indexing will do, but explicitly turn off part generation:
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="0" generateNumberParts="0" catenateWords="0"
> catenateNumbers="0" catenateAll="1"/>
>
> -Yonik
>