Significance of Analyzer Class attribute

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Significance of Analyzer Class attribute

Rajinimaski
Hi,  What is the significance of Analyzer  class  attribute?


When I specify analyzer class in schema,  something like below and do
analysis on this field in analysis page : I cant  see verbose output on
tokenizer and filters

<fieldType name="text_chinese" class="solr.TextField">
      <analyzer
class="org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer">
  <tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/>
  <filter class="solr.SmartChineseWordTokenFilterFactory"/>
  </analyzer>
    </fieldType>


*But if i don't add analyzer class, I can see the verbose output based on
token and filters applied.*

<fieldType name="text_chinese" class="solr.TextField">
      <analyzer>
  <tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/>
  <filter class="solr.SmartChineseWordTokenFilterFactory"/>
  </analyzer>
    </fieldType>

Why is it that I cant see for above case? What happens when I specify
Analyzer class?  Does it take any default if i do not mention class
attribute in analyzer tag?



Thanks & Regards
Rajani
Reply | Threaded
Open this post in threaded view
|

Re: Significance of Analyzer Class attribute

iorixxx

> When I specify analyzer class in schema,  something
> like below and do
> analysis on this field in analysis page : I cant  see
> verbose output on
> tokenizer and filters
>
> <fieldType name="text_chinese"
> class="solr.TextField">
>       <analyzer
> class="org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer">
>   <tokenizer
> class="solr.SmartChineseSentenceTokenizerFactory"/>
>   <filter
> class="solr.SmartChineseWordTokenFilterFactory"/>
>   </analyzer>
>     </fieldType>
>
>
> *But if i don't add analyzer class, I can see the verbose
> output based on
> token and filters applied.*

Above config is somehow wrong. You cannot use both analyzer combined with tokenizer and filter altogether. If you want to use lucene analyzer in schema.xml there should be only analyzer definition.

It is highly recommended to use solr's charFilter(s), tokenizer, tokenFilter(s) in schema.xml.



Reply | Threaded
Open this post in threaded view
|

Re: Significance of Analyzer Class attribute

Lance Norskog-2
An Analyzer object is a chain of Tokenizer and TokenFilters. These
text type definitions either use an analyzer class or describe the
Tokenizer and TokenFilters directly. The Analyzer classes create their
own sequence of Tokenizer and maybe TokenFilters, hard-coded in the
analyzer class. In schema.xml, you will find text types with
Tokenizer/Filter chains, or with just an Analyzer.

Take the Analyzer out of the specification.

On Wed, Jul 25, 2012 at 5:19 AM, Ahmet Arslan <[hidden email]> wrote:

>
>> When I specify analyzer class in schema,  something
>> like below and do
>> analysis on this field in analysis page : I cant  see
>> verbose output on
>> tokenizer and filters
>>
>> <fieldType name="text_chinese"
>> class="solr.TextField">
>>       <analyzer
>> class="org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer">
>>   <tokenizer
>> class="solr.SmartChineseSentenceTokenizerFactory"/>
>>   <filter
>> class="solr.SmartChineseWordTokenFilterFactory"/>
>>   </analyzer>
>>     </fieldType>
>>
>>
>> *But if i don't add analyzer class, I can see the verbose
>> output based on
>> token and filters applied.*
>
> Above config is somehow wrong. You cannot use both analyzer combined with tokenizer and filter altogether. If you want to use lucene analyzer in schema.xml there should be only analyzer definition.
>
> It is highly recommended to use solr's charFilter(s), tokenizer, tokenFilter(s) in schema.xml.
>
>
>



--
Lance Norskog
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Significance of Analyzer Class attribute

Chris Hostetter-3
In reply to this post by iorixxx

: > When I specify analyzer class in schema,  something
: > like below and do
: > analysis on this field in analysis page : I cant  see
: > verbose output on
: > tokenizer and filters

The reason for that is that if you use an explicit Analyzer
implimentation, the analysis tool doesn't know what the individual phases
of hte tokenfilters are -- the Analyzer API doesn't expose that
information (some Analyzers may be monolithic and not made up of
individual TokenFilters)


 : > <fieldType name="text_chinese"
: > class="solr.TextField">
: >       <analyzer
: > class="org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer">
: >   <tokenizer
        ...
 
: Above config is somehow wrong. You cannot use both analyzer combined
: with tokenizer and filter altogether. If you want to use lucene analyzer
: in schema.xml there should be only analyzer definition.

Right.  what's happening here is htat since a "class" is specifid for hte
analyzer, it is ignoring the tokenizer+tokenfilters listed.  I've opened a
bug to add better error checking to catch these kinds of configuration
mistakes...

https://issues.apache.org/jira/browse/SOLR-3683


-Hoss
Reply | Threaded
Open this post in threaded view
|

Re: Significance of Analyzer Class attribute

Rajinimaski
Hi All,

  Thank you for the replies.



--Regards
Rajani


On Fri, Jul 27, 2012 at 9:58 AM, Chris Hostetter
<[hidden email]>wrote:

>
> : > When I specify analyzer class in schema,  something
> : > like below and do
> : > analysis on this field in analysis page : I cant  see
> : > verbose output on
> : > tokenizer and filters
>
> The reason for that is that if you use an explicit Analyzer
> implimentation, the analysis tool doesn't know what the individual phases
> of hte tokenfilters are -- the Analyzer API doesn't expose that
> information (some Analyzers may be monolithic and not made up of
> individual TokenFilters)
>
>
>  : > <fieldType name="text_chinese"
> : > class="solr.TextField">
> : >       <analyzer
> : > class="org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer">
> : >   <tokenizer
>         ...
>
> : Above config is somehow wrong. You cannot use both analyzer combined
> : with tokenizer and filter altogether. If you want to use lucene analyzer
> : in schema.xml there should be only analyzer definition.
>
> Right.  what's happening here is htat since a "class" is specifid for hte
> analyzer, it is ignoring the tokenizer+tokenfilters listed.  I've opened a
> bug to add better error checking to catch these kinds of configuration
> mistakes...
>
> https://issues.apache.org/jira/browse/SOLR-3683
>
>
> -Hoss