Questions about schema.xml

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Questions about schema.xml

johnmunir

HI,


Can someone help me understand the meaning of <analyzer type="index"> and <analyzer type="query"> in schema.xml, how they are used and what do I get back when the values are not the same?


For example, given:


<fieldType name="text" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
   <analyzer type="index">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
      <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
      <filter class="solr.PorterStemFilterFactory"/>
   </analyzer>
   <analyzer type="query">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
      <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
      <filter class="solr.PorterStemFilterFactory"/>
   </analyzer>
</fieldType>


If I make the entire content of "index" the same as "query" (or the other way around) how will that impact my search?  And why would I want to not make those two blocks the same?


Thanks!!!


-MJ
Reply | Threaded
Open this post in threaded view
|

Re: Questions about schema.xml

Prithu Banerjee
Those two values are used to specify the analyzer type you want. That can
be of two kinds, one for the indexer- the analyzer you specify analyzes the
input documents accordingly to build the index. The other one is for query,
it analyzes your query. Typically the specified analyzer for index and
query are same so that you can search over exactly the token you created
while indexing. But you are free to provide any customized analyzer
according to your need.

--
best regards,
Prithu

On Thu, Nov 8, 2012 at 8:43 AM, <[hidden email]> wrote:

>
> HI,
>
>
> Can someone help me understand the meaning of <analyzer type="index"> and
> <analyzer type="query"> in schema.xml, how they are used and what do I get
> back when the values are not the same?
>
>
> For example, given:
>
>
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100"
> autoGeneratePhraseQueries="true">
>    <analyzer type="index">
>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>       <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>       <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>       <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>       <filter class="solr.PorterStemFilterFactory"/>
>    </analyzer>
>    <analyzer type="query">
>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>       <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>       <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>       <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>       <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>       <filter class="solr.PorterStemFilterFactory"/>
>    </analyzer>
> </fieldType>
>
>
> If I make the entire content of "index" the same as "query" (or the other
> way around) how will that impact my search?  And why would I want to not
> make those two blocks the same?
>
>
> Thanks!!!
>
>
> -MJ
>
Reply | Threaded
Open this post in threaded view
|

Re: Questions about schema.xml

johnmunir
Thanks Prithu.


But why would I use different settings for the index and query?  I would think that if the setting is not the same for both, then search results for end users would be confusing, no?  To illustrate my point (this maybe drastic) if I don't "solr.LowerCaseFilterFactory" in one case, then many searches (mix-case for example) won't give me any hits.  A more realistic example is, if I don't match the rules for "solr.WordDelimiterFilterFactory", again, I could miss hits.  If my understanding is correct, and there is value in using different rules for "query" and "index", I like to see a concrete example, a use-case I can apply.


-- MJ



-----Original Message-----
From: Prithu Banerjee <[hidden email]>
To: solr-user <[hidden email]>
Sent: Thu, Nov 8, 2012 12:34 am
Subject: Re: Questions about schema.xml


Those two values are used to specify the analyzer type you want. That can
be of two kinds, one for the indexer- the analyzer you specify analyzes the
input documents accordingly to build the index. The other one is for query,
it analyzes your query. Typically the specified analyzer for index and
query are same so that you can search over exactly the token you created
while indexing. But you are free to provide any customized analyzer
according to your need.

--
best regards,
Prithu

On Thu, Nov 8, 2012 at 8:43 AM, <[hidden email]> wrote:

>
> HI,
>
>
> Can someone help me understand the meaning of <analyzer type="index"> and
> <analyzer type="query"> in schema.xml, how they are used and what do I get
> back when the values are not the same?
>
>
> For example, given:
>
>
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100"
> autoGeneratePhraseQueries="true">
>    <analyzer type="index">
>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>       <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>       <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>       <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>       <filter class="solr.PorterStemFilterFactory"/>
>    </analyzer>
>    <analyzer type="query">
>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>       <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>       <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>       <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>       <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>       <filter class="solr.PorterStemFilterFactory"/>
>    </analyzer>
> </fieldType>
>
>
> If I make the entire content of "index" the same as "query" (or the other
> way around) how will that impact my search?  And why would I want to not
> make those two blocks the same?
>
>
> Thanks!!!
>
>
> -MJ
>

 
Reply | Threaded
Open this post in threaded view
|

Re: Questions about schema.xml

Jack Krupansky-2
In reply to this post by johnmunir
Many token filters will be used 100% identically for both "index" and
"query" analysis, but WordDelimiterFilter is a rare exception. The issue is
that at index time it has the ability to generate multiple tokens at the
same position (the "catenate" options), any of which can be queried, but at
query time it can be problematic to have these "extra" terms (except in some
conditions), so the WDF settings suppress generation of the extra terms.

Another example is synonyms - generate extra terms at index time for greater
precision of searches, but limit the query terms to exclude the "extra"
terms.

That's the reason for the occaassional asymmetry between index-time and
query-time analyzers.

-- Jack Krupansky

-----Original Message-----
From: [hidden email]
Sent: Wednesday, November 07, 2012 7:13 PM
To: [hidden email]
Subject: Questions about schema.xml


HI,


Can someone help me understand the meaning of <analyzer type="index"> and
<analyzer type="query"> in schema.xml, how they are used and what do I get
back when the values are not the same?


For example, given:


<fieldType name="text" class="solr.TextField" positionIncrementGap="100"
autoGeneratePhraseQueries="true">
   <analyzer type="index">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
      <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="1"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
      <filter class="solr.PorterStemFilterFactory"/>
   </analyzer>
   <analyzer type="query">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
      <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="1"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
      <filter class="solr.PorterStemFilterFactory"/>
   </analyzer>
</fieldType>


If I make the entire content of "index" the same as "query" (or the other
way around) how will that impact my search?  And why would I want to not
make those two blocks the same?


Thanks!!!


-MJ

Reply | Threaded
Open this post in threaded view
|

Re: Questions about schema.xml

Erick Erickson
And, in fact, you do NOT need to have two. If they are both identical, just
specify one analysis chain with no qualifier, i.e.
<analyzer>


On Thu, Nov 8, 2012 at 9:44 AM, Jack Krupansky <[hidden email]>wrote:

> Many token filters will be used 100% identically for both "index" and
> "query" analysis, but WordDelimiterFilter is a rare exception. The issue is
> that at index time it has the ability to generate multiple tokens at the
> same position (the "catenate" options), any of which can be queried, but at
> query time it can be problematic to have these "extra" terms (except in
> some conditions), so the WDF settings suppress generation of the extra
> terms.
>
> Another example is synonyms - generate extra terms at index time for
> greater precision of searches, but limit the query terms to exclude the
> "extra" terms.
>
> That's the reason for the occaassional asymmetry between index-time and
> query-time analyzers.
>
> -- Jack Krupansky
>
> -----Original Message----- From: [hidden email]
> Sent: Wednesday, November 07, 2012 7:13 PM
> To: [hidden email]
> Subject: Questions about schema.xml
>
>
>
> HI,
>
>
> Can someone help me understand the meaning of <analyzer type="index"> and
> <analyzer type="query"> in schema.xml, how they are used and what do I get
> back when the values are not the same?
>
>
> For example, given:
>
>
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100"
> autoGeneratePhraseQueries="**true">
>   <analyzer type="index">
>      <tokenizer class="solr.**WhitespaceTokenizerFactory"/>
>      <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="**true" />
>      <filter class="solr.**WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>      <filter class="solr.**LowerCaseFilterFactory"/>
>      <filter class="solr.**KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>      <filter class="solr.**PorterStemFilterFactory"/>
>   </analyzer>
>   <analyzer type="query">
>      <tokenizer class="solr.**WhitespaceTokenizerFactory"/>
>      <filter class="solr.**SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>      <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="**true" />
>      <filter class="solr.**WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>      <filter class="solr.**LowerCaseFilterFactory"/>
>      <filter class="solr.**KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>      <filter class="solr.**PorterStemFilterFactory"/>
>   </analyzer>
> </fieldType>
>
>
> If I make the entire content of "index" the same as "query" (or the other
> way around) how will that impact my search?  And why would I want to not
> make those two blocks the same?
>
>
> Thanks!!!
>
>
> -MJ
>
Reply | Threaded
Open this post in threaded view
|

Re: Questions about schema.xml

johnmunir

Thank you everyone for your explanation.  So for WordDelimiterFilter, let me see if I got it right.


Given that out-of-the box setting for catenateWords is "0" for query but is "1" for index, then I don't see how this will give me any hits.  That is, if my document has "wi-fi", at index time it will be stored as "wifi".  Well, than at query time if I type "wi-fi" (without quotes) I will be searching for "wi fi" and thus won't get a hit.  no?


What about when I *do* quote my search, i.e.: I search for "wi-fi" with quotes, now what am I sending to the searcher, "wi-fi", "wi fi" or "wifi"?  Again, this is using the default out-of-the box setting per the above.


The same applies for catenateNumbers.


Btw, I'm looking at this link for the above values: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters


--MJ





-----Original Message-----
From: Erick Erickson <[hidden email]>
To: solr-user <[hidden email]>
Sent: Thu, Nov 8, 2012 6:57 pm
Subject: Re: Questions about schema.xml


And, in fact, you do NOT need to have two. If they are both identical, just
specify one analysis chain with no qualifier, i.e.
<analyzer>


On Thu, Nov 8, 2012 at 9:44 AM, Jack Krupansky <[hidden email]>wrote:

> Many token filters will be used 100% identically for both "index" and
> "query" analysis, but WordDelimiterFilter is a rare exception. The issue is
> that at index time it has the ability to generate multiple tokens at the
> same position (the "catenate" options), any of which can be queried, but at
> query time it can be problematic to have these "extra" terms (except in
> some conditions), so the WDF settings suppress generation of the extra
> terms.
>
> Another example is synonyms - generate extra terms at index time for
> greater precision of searches, but limit the query terms to exclude the
> "extra" terms.
>
> That's the reason for the occaassional asymmetry between index-time and
> query-time analyzers.
>
> -- Jack Krupansky
>
> -----Original Message----- From: [hidden email]
> Sent: Wednesday, November 07, 2012 7:13 PM
> To: [hidden email]
> Subject: Questions about schema.xml
>
>
>
> HI,
>
>
> Can someone help me understand the meaning of <analyzer type="index"> and
> <analyzer type="query"> in schema.xml, how they are used and what do I get
> back when the values are not the same?
>
>
> For example, given:
>
>
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100"
> autoGeneratePhraseQueries="**true">
>   <analyzer type="index">
>      <tokenizer class="solr.**WhitespaceTokenizerFactory"/>
>      <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="**true" />
>      <filter class="solr.**WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>      <filter class="solr.**LowerCaseFilterFactory"/>
>      <filter class="solr.**KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>      <filter class="solr.**PorterStemFilterFactory"/>
>   </analyzer>
>   <analyzer type="query">
>      <tokenizer class="solr.**WhitespaceTokenizerFactory"/>
>      <filter class="solr.**SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>      <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="**true" />
>      <filter class="solr.**WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>      <filter class="solr.**LowerCaseFilterFactory"/>
>      <filter class="solr.**KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>      <filter class="solr.**PorterStemFilterFactory"/>
>   </analyzer>
> </fieldType>
>
>
> If I make the entire content of "index" the same as "query" (or the other
> way around) how will that impact my search?  And why would I want to not
> make those two blocks the same?
>
>
> Thanks!!!
>
>
> -MJ
>

 
Reply | Threaded
Open this post in threaded view
|

Re: Questions about schema.xml

Jack Krupansky-2
The default setting should index BOTH "wi fi" and "wifi". Query for "wi-fi",
either with or without quotes will query for "wi fi". Incidentally, that is
known as "autoGeneratePhraseQueries".

-- Jack Krupansky

-----Original Message-----
From: [hidden email]
Sent: Thursday, November 08, 2012 6:20 PM
To: [hidden email]
Subject: Re: Questions about schema.xml


Thank you everyone for your explanation.  So for WordDelimiterFilter, let me
see if I got it right.


Given that out-of-the box setting for catenateWords is "0" for query but is
"1" for index, then I don't see how this will give me any hits.  That is, if
my document has "wi-fi", at index time it will be stored as "wifi".  Well,
than at query time if I type "wi-fi" (without quotes) I will be searching
for "wi fi" and thus won't get a hit.  no?


What about when I *do* quote my search, i.e.: I search for "wi-fi" with
quotes, now what am I sending to the searcher, "wi-fi", "wi fi" or "wifi"?
Again, this is using the default out-of-the box setting per the above.


The same applies for catenateNumbers.


Btw, I'm looking at this link for the above values:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters


--MJ





-----Original Message-----
From: Erick Erickson <[hidden email]>
To: solr-user <[hidden email]>
Sent: Thu, Nov 8, 2012 6:57 pm
Subject: Re: Questions about schema.xml


And, in fact, you do NOT need to have two. If they are both identical, just
specify one analysis chain with no qualifier, i.e.
<analyzer>


On Thu, Nov 8, 2012 at 9:44 AM, Jack Krupansky
<[hidden email]>wrote:

> Many token filters will be used 100% identically for both "index" and
> "query" analysis, but WordDelimiterFilter is a rare exception. The issue
> is
> that at index time it has the ability to generate multiple tokens at the
> same position (the "catenate" options), any of which can be queried, but
> at
> query time it can be problematic to have these "extra" terms (except in
> some conditions), so the WDF settings suppress generation of the extra
> terms.
>
> Another example is synonyms - generate extra terms at index time for
> greater precision of searches, but limit the query terms to exclude the
> "extra" terms.
>
> That's the reason for the occaassional asymmetry between index-time and
> query-time analyzers.
>
> -- Jack Krupansky
>
> -----Original Message----- From: [hidden email]
> Sent: Wednesday, November 07, 2012 7:13 PM
> To: [hidden email]
> Subject: Questions about schema.xml
>
>
>
> HI,
>
>
> Can someone help me understand the meaning of <analyzer type="index"> and
> <analyzer type="query"> in schema.xml, how they are used and what do I get
> back when the values are not the same?
>
>
> For example, given:
>
>
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100"
> autoGeneratePhraseQueries="**true">
>   <analyzer type="index">
>      <tokenizer class="solr.**WhitespaceTokenizerFactory"/>
>      <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="**true" />
>      <filter class="solr.**WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>      <filter class="solr.**LowerCaseFilterFactory"/>
>      <filter class="solr.**KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>      <filter class="solr.**PorterStemFilterFactory"/>
>   </analyzer>
>   <analyzer type="query">
>      <tokenizer class="solr.**WhitespaceTokenizerFactory"/>
>      <filter class="solr.**SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>      <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="**true" />
>      <filter class="solr.**WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>      <filter class="solr.**LowerCaseFilterFactory"/>
>      <filter class="solr.**KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>      <filter class="solr.**PorterStemFilterFactory"/>
>   </analyzer>
> </fieldType>
>
>
> If I make the entire content of "index" the same as "query" (or the other
> way around) how will that impact my search?  And why would I want to not
> make those two blocks the same?
>
>
> Thanks!!!
>
>
> -MJ
>


Reply | Threaded
Open this post in threaded view
|

Re: Questions about schema.xml

Erick Erickson
You should get familiar with the admin/analysis page, it's invaluable for
understanding _exactly_ what your analysis chain does with various inputs..

Best
Erick


On Thu, Nov 8, 2012 at 9:49 PM, Jack Krupansky <[hidden email]>wrote:

> The default setting should index BOTH "wi fi" and "wifi". Query for
> "wi-fi", either with or without quotes will query for "wi fi".
> Incidentally, that is known as "autoGeneratePhraseQueries".
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: [hidden email]
> Sent: Thursday, November 08, 2012 6:20 PM
> To: [hidden email]
>
> Subject: Re: Questions about schema.xml
>
>
> Thank you everyone for your explanation.  So for WordDelimiterFilter, let
> me see if I got it right.
>
>
> Given that out-of-the box setting for catenateWords is "0" for query but
> is "1" for index, then I don't see how this will give me any hits.  That
> is, if my document has "wi-fi", at index time it will be stored as "wifi".
>  Well, than at query time if I type "wi-fi" (without quotes) I will be
> searching for "wi fi" and thus won't get a hit.  no?
>
>
> What about when I *do* quote my search, i.e.: I search for "wi-fi" with
> quotes, now what am I sending to the searcher, "wi-fi", "wi fi" or "wifi"?
> Again, this is using the default out-of-the box setting per the above.
>
>
> The same applies for catenateNumbers.
>
>
> Btw, I'm looking at this link for the above values:
> http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**s<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters>
>
>
> --MJ
>
>
>
>
>
> -----Original Message-----
> From: Erick Erickson <[hidden email]>
> To: solr-user <[hidden email]>
> Sent: Thu, Nov 8, 2012 6:57 pm
> Subject: Re: Questions about schema.xml
>
>
> And, in fact, you do NOT need to have two. If they are both identical, just
> specify one analysis chain with no qualifier, i.e.
> <analyzer>
>
>
> On Thu, Nov 8, 2012 at 9:44 AM, Jack Krupansky <[hidden email]>**
> wrote:
>
>  Many token filters will be used 100% identically for both "index" and
>> "query" analysis, but WordDelimiterFilter is a rare exception. The issue
>> is
>> that at index time it has the ability to generate multiple tokens at the
>> same position (the "catenate" options), any of which can be queried, but
>> at
>> query time it can be problematic to have these "extra" terms (except in
>> some conditions), so the WDF settings suppress generation of the extra
>> terms.
>>
>> Another example is synonyms - generate extra terms at index time for
>> greater precision of searches, but limit the query terms to exclude the
>> "extra" terms.
>>
>> That's the reason for the occaassional asymmetry between index-time and
>> query-time analyzers.
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: [hidden email]
>> Sent: Wednesday, November 07, 2012 7:13 PM
>> To: [hidden email]
>> Subject: Questions about schema.xml
>>
>>
>>
>> HI,
>>
>>
>> Can someone help me understand the meaning of <analyzer type="index"> and
>> <analyzer type="query"> in schema.xml, how they are used and what do I get
>> back when the values are not the same?
>>
>>
>> For example, given:
>>
>>
>> <fieldType name="text" class="solr.TextField" positionIncrementGap="100"
>> autoGeneratePhraseQueries="****true">
>>   <analyzer type="index">
>>      <tokenizer class="solr.****WhitespaceTokenizerFactory"/>
>>      <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt" enablePositionIncrements="****true" />
>>      <filter class="solr.****WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>>      <filter class="solr.****LowerCaseFilterFactory"/>
>>      <filter class="solr.****KeywordMarkerFilterFactory"
>> protected="protwords.txt"/>
>>      <filter class="solr.****PorterStemFilterFactory"/>
>>   </analyzer>
>>   <analyzer type="query">
>>      <tokenizer class="solr.****WhitespaceTokenizerFactory"/>
>>      <filter class="solr.****SynonymFilterFactory"
>> synonyms="synonyms.txt"
>> ignoreCase="true" expand="true"/>
>>      <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt" enablePositionIncrements="****true" />
>>      <filter class="solr.****WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>>      <filter class="solr.****LowerCaseFilterFactory"/>
>>      <filter class="solr.****KeywordMarkerFilterFactory"
>> protected="protwords.txt"/>
>>      <filter class="solr.****PorterStemFilterFactory"/>
>>   </analyzer>
>> </fieldType>
>>
>>
>> If I make the entire content of "index" the same as "query" (or the other
>> way around) how will that impact my search?  And why would I want to not
>> make those two blocks the same?
>>
>>
>> Thanks!!!
>>
>>
>> -MJ
>>
>>
>
>