Problems with WordDelimiterFilterFactory

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Problems with WordDelimiterFilterFactory

bernieh
We are having some issues with our solr parent application not retrieving records as expected.

For example, if the input query includes a colon (e.g. hot and cold: temperatures), the relevant record (which contains a colon in the same place) does not get retrieved; if the input query does not include the colon, all is fine.  Ditto if the user searches for a query containing hyphens, e.g. "asia - civilization, although with the qualifier that something like "asia-civilization" (no spaces either side of the hyphen) works fine, whereas "asia - civilization" (spaces either side of hyphen) doesn't work.

Our schema.xml contains the following -

    <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
                                <filter class="solr.ISOLatin1AccentFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                                <filter class="solr.ISOLatin1AccentFilterFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>

Bernadette Houghton, Library Business Applications Developer
Deakin University Geelong Victoria 3217 Australia.
Phone: 03 5227 8230 International: +61 3 5227 8230
Fax: 03 5227 8000 International: +61 3 5227 8000
MSN: [hidden email]
Email: [hidden email]<mailto:[hidden email]>
Website: http://www.deakin.edu.au
<http://www.deakin.edu.au/>Deakin University CRICOS Provider Code 00113B (Vic)

Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.
Deakin University does not warrant that this email and any attachments are error or virus free

Reply | Threaded
Open this post in threaded view
|

Re: Problems with WordDelimiterFilterFactory

Christian Zambrano
Could you please provide the exact URL of a query where you are
experiencing this problem?
eg(Not URL encoded): q=fieldName:"hot and cold: temperatures"

On 10/07/2009 05:32 PM, Bernadette Houghton wrote:

> We are having some issues with our solr parent application not retrieving records as expected.
>
> For example, if the input query includes a colon (e.g. hot and cold: temperatures), the relevant record (which contains a colon in the same place) does not get retrieved; if the input query does not include the colon, all is fine.  Ditto if the user searches for a query containing hyphens, e.g. "asia - civilization, although with the qualifier that something like "asia-civilization" (no spaces either side of the hyphen) works fine, whereas "asia - civilization" (spaces either side of hyphen) doesn't work.
>
> Our schema.xml contains the following -
>
>      <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>        <analyzer type="index">
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <!-- in this example, we will only use synonyms at query time
>          <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>          -->
>                                  <filter class="solr.ISOLatin1AccentFilterFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>          <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
>          <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>        </analyzer>
>        <analyzer type="query">
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                                  <filter class="solr.ISOLatin1AccentFilterFactory"/>
>          <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>          <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
>          <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>        </analyzer>
>      </fieldType>
>
> Bernadette Houghton, Library Business Applications Developer
> Deakin University Geelong Victoria 3217 Australia.
> Phone: 03 5227 8230 International: +61 3 5227 8230
> Fax: 03 5227 8000 International: +61 3 5227 8000
> MSN: [hidden email]
> Email: [hidden email]<mailto:[hidden email]>
> Website: http://www.deakin.edu.au
> <http://www.deakin.edu.au/>Deakin University CRICOS Provider Code 00113B (Vic)
>
> Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.
> Deakin University does not warrant that this email and any attachments are error or virus free
>
>
>    
Reply | Threaded
Open this post in threaded view
|

RE: Problems with WordDelimiterFilterFactory

bernieh
Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:30000601

Either scroll down and click one of the "television broadcasting -- asia" links, or type it in the Quick Search box.


TIA

bern

-----Original Message-----
From: Christian Zambrano [mailto:[hidden email]]
Sent: Thursday, 8 October 2009 9:43 AM
To: [hidden email]
Subject: Re: Problems with WordDelimiterFilterFactory

Could you please provide the exact URL of a query where you are
experiencing this problem?
eg(Not URL encoded): q=fieldName:"hot and cold: temperatures"

On 10/07/2009 05:32 PM, Bernadette Houghton wrote:

> We are having some issues with our solr parent application not retrieving records as expected.
>
> For example, if the input query includes a colon (e.g. hot and cold: temperatures), the relevant record (which contains a colon in the same place) does not get retrieved; if the input query does not include the colon, all is fine.  Ditto if the user searches for a query containing hyphens, e.g. "asia - civilization, although with the qualifier that something like "asia-civilization" (no spaces either side of the hyphen) works fine, whereas "asia - civilization" (spaces either side of hyphen) doesn't work.
>
> Our schema.xml contains the following -
>
>      <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>        <analyzer type="index">
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <!-- in this example, we will only use synonyms at query time
>          <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>          -->
>                                  <filter class="solr.ISOLatin1AccentFilterFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>          <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
>          <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>        </analyzer>
>        <analyzer type="query">
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                                  <filter class="solr.ISOLatin1AccentFilterFactory"/>
>          <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>          <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
>          <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>        </analyzer>
>      </fieldType>
>
> Bernadette Houghton, Library Business Applications Developer
> Deakin University Geelong Victoria 3217 Australia.
> Phone: 03 5227 8230 International: +61 3 5227 8230
> Fax: 03 5227 8000 International: +61 3 5227 8000
> MSN: [hidden email]
> Email: [hidden email]<mailto:[hidden email]>
> Website: http://www.deakin.edu.au
> <http://www.deakin.edu.au/>Deakin University CRICOS Provider Code 00113B (Vic)
>
> Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.
> Deakin University does not warrant that this email and any attachments are error or virus free
>
>
>    
Reply | Threaded
Open this post in threaded view
|

Re: Problems with WordDelimiterFilterFactory

Christian Zambrano
Bern,

I am interested on the solr query. In other words, the query that your  
system sends to solr.

Thanks,


Christian

On Oct 7, 2009, at 5:56 PM, Bernadette Houghton <[hidden email]
 > wrote:

> Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:30000601
>
> Either scroll down and click one of the "television broadcasting --  
> asia" links, or type it in the Quick Search box.
>
>
> TIA
>
> bern
>
> -----Original Message-----
> From: Christian Zambrano [mailto:[hidden email]]
> Sent: Thursday, 8 October 2009 9:43 AM
> To: [hidden email]
> Subject: Re: Problems with WordDelimiterFilterFactory
>
> Could you please provide the exact URL of a query where you are
> experiencing this problem?
> eg(Not URL encoded): q=fieldName:"hot and cold: temperatures"
>
> On 10/07/2009 05:32 PM, Bernadette Houghton wrote:
>> We are having some issues with our solr parent application not  
>> retrieving records as expected.
>>
>> For example, if the input query includes a colon (e.g. hot and  
>> cold: temperatures), the relevant record (which contains a colon in  
>> the same place) does not get retrieved; if the input query does not  
>> include the colon, all is fine.  Ditto if the user searches for a  
>> query containing hyphens, e.g. "asia - civilization, although with  
>> the qualifier that something like "asia-civilization" (no spaces  
>> either side of the hyphen) works fine, whereas "asia -  
>> civilization" (spaces either side of hyphen) doesn't work.
>>
>> Our schema.xml contains the following -
>>
>>     <fieldType name="text" class="solr.TextField"  
>> positionIncrementGap="100">
>>       <analyzer type="index">
>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>         <!-- in this example, we will only use synonyms at query time
>>         <filter class="solr.SynonymFilterFactory"  
>> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>>         -->
>>                                 <filter  
>> class="solr.ISOLatin1AccentFilterFactory"/>
>>         <filter class="solr.StopFilterFactory" ignoreCase="true"  
>> words="stopwords.txt"/>
>>         <filter class="solr.WordDelimiterFilterFactory"  
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"  
>> catenateNumbers="1" catenateAll="0"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>         <filter class="solr.EnglishPorterFilterFactory"  
>> protected="protwords.txt"/>
>>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>       </analyzer>
>>       <analyzer type="query">
>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>                                 <filter  
>> class="solr.ISOLatin1AccentFilterFactory"/>
>>         <filter class="solr.SynonymFilterFactory"  
>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>>         <filter class="solr.StopFilterFactory" ignoreCase="true"  
>> words="stopwords.txt"/>
>>         <filter class="solr.WordDelimiterFilterFactory"  
>> generateWordParts="1" generateNumberParts="1" catenateWords="0"  
>> catenateNumbers="0" catenateAll="0"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>         <filter class="solr.EnglishPorterFilterFactory"  
>> protected="protwords.txt"/>
>>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>       </analyzer>
>>     </fieldType>
>>
>> Bernadette Houghton, Library Business Applications Developer
>> Deakin University Geelong Victoria 3217 Australia.
>> Phone: 03 5227 8230 International: +61 3 5227 8230
>> Fax: 03 5227 8000 International: +61 3 5227 8000
>> MSN: [hidden email]
>> Email: [hidden email]<mailto:[hidden email]
>> >
>> Website: http://www.deakin.edu.au
>> <http://www.deakin.edu.au/>Deakin University CRICOS Provider Code  
>> 00113B (Vic)
>>
>> Important Notice: The contents of this email are intended solely  
>> for the named addressee and are confidential; any unauthorised use,  
>> reproduction or storage of the contents is expressly prohibited. If  
>> you have received this email in error, please delete it and any  
>> attachments immediately and advise the sender by return email or  
>> telephone.
>> Deakin University does not warrant that this email and any  
>> attachments are error or virus free
>>
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Problems with WordDelimiterFilterFactory

MarkL
In reply to this post by bernieh
Use http://solr-url/solr/admin/analysis.jsp to see how your data is indexed/queried
Reply | Threaded
Open this post in threaded view
|

RE: Problems with WordDelimiterFilterFactory

SandeepTagore
In reply to this post by bernieh
Hi Bern,
I indexed some records with - and : today using your configuration and I searched with following urls
http://localhost/solr/select?q=CONTENT:"cold : temperature"
http://localhost/solr/select?q=CONTENT:"cold: temperature"
http://localhost/solr/select?q=CONTENT:"cold :temperature"
http://localhost/solr/select?q=CONTENT:"cold temperature"
and
http://localhost/solr/select?q=CONTENT:"asia - civilization"
http://localhost/solr/select?q=CONTENT:"asia- civilization"
http://localhost/solr/select?q=CONTENT:"asia -civilization"
http://localhost/solr/select?q=CONTENT:"asia civilization"
The results doesn't make any difference. It worked all the times and I saw the relevant records.

Regards,
Sandeep
Reply | Threaded
Open this post in threaded view
|

RE: Problems with WordDelimiterFilterFactory

bernieh
In reply to this post by Christian Zambrano
Here's the query and the error -

Oct 09 08:20:17  [debug] [196] Solr query string:    (Asia -- Civilization AND status_i:(2))
Oct 09 08:20:17  [debug] [196] Solr sort by:  score desc
Oct 09 08:20:17  [error] Error on searching: "400" Status: org.apache.lucene.queryParser.ParseException: Cannot parse '   (Asia -- Civilization AND status_i:(2)) ': Encount

Bern

-----Original Message-----
From: Christian Zambrano [mailto:[hidden email]]
Sent: Thursday, 8 October 2009 12:48 PM
To: [hidden email]
Cc: [hidden email]
Subject: Re: Problems with WordDelimiterFilterFactory

Bern,

I am interested on the solr query. In other words, the query that your  
system sends to solr.

Thanks,


Christian

On Oct 7, 2009, at 5:56 PM, Bernadette Houghton <[hidden email]
 > wrote:

> Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:30000601
>
> Either scroll down and click one of the "television broadcasting --  
> asia" links, or type it in the Quick Search box.
>
>
> TIA
>
> bern
>
> -----Original Message-----
> From: Christian Zambrano [mailto:[hidden email]]
> Sent: Thursday, 8 October 2009 9:43 AM
> To: [hidden email]
> Subject: Re: Problems with WordDelimiterFilterFactory
>
> Could you please provide the exact URL of a query where you are
> experiencing this problem?
> eg(Not URL encoded): q=fieldName:"hot and cold: temperatures"
>
> On 10/07/2009 05:32 PM, Bernadette Houghton wrote:
>> We are having some issues with our solr parent application not  
>> retrieving records as expected.
>>
>> For example, if the input query includes a colon (e.g. hot and  
>> cold: temperatures), the relevant record (which contains a colon in  
>> the same place) does not get retrieved; if the input query does not  
>> include the colon, all is fine.  Ditto if the user searches for a  
>> query containing hyphens, e.g. "asia - civilization, although with  
>> the qualifier that something like "asia-civilization" (no spaces  
>> either side of the hyphen) works fine, whereas "asia -  
>> civilization" (spaces either side of hyphen) doesn't work.
>>
>> Our schema.xml contains the following -
>>
>>     <fieldType name="text" class="solr.TextField"  
>> positionIncrementGap="100">
>>       <analyzer type="index">
>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>         <!-- in this example, we will only use synonyms at query time
>>         <filter class="solr.SynonymFilterFactory"  
>> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>>         -->
>>                                 <filter  
>> class="solr.ISOLatin1AccentFilterFactory"/>
>>         <filter class="solr.StopFilterFactory" ignoreCase="true"  
>> words="stopwords.txt"/>
>>         <filter class="solr.WordDelimiterFilterFactory"  
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"  
>> catenateNumbers="1" catenateAll="0"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>         <filter class="solr.EnglishPorterFilterFactory"  
>> protected="protwords.txt"/>
>>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>       </analyzer>
>>       <analyzer type="query">
>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>                                 <filter  
>> class="solr.ISOLatin1AccentFilterFactory"/>
>>         <filter class="solr.SynonymFilterFactory"  
>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>>         <filter class="solr.StopFilterFactory" ignoreCase="true"  
>> words="stopwords.txt"/>
>>         <filter class="solr.WordDelimiterFilterFactory"  
>> generateWordParts="1" generateNumberParts="1" catenateWords="0"  
>> catenateNumbers="0" catenateAll="0"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>         <filter class="solr.EnglishPorterFilterFactory"  
>> protected="protwords.txt"/>
>>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>       </analyzer>
>>     </fieldType>
>>
>> Bernadette Houghton, Library Business Applications Developer
>> Deakin University Geelong Victoria 3217 Australia.
>> Phone: 03 5227 8230 International: +61 3 5227 8230
>> Fax: 03 5227 8000 International: +61 3 5227 8000
>> MSN: [hidden email]
>> Email: [hidden email]<mailto:[hidden email]
>> >
>> Website: http://www.deakin.edu.au
>> <http://www.deakin.edu.au/>Deakin University CRICOS Provider Code  
>> 00113B (Vic)
>>
>> Important Notice: The contents of this email are intended solely  
>> for the named addressee and are confidential; any unauthorised use,  
>> reproduction or storage of the contents is expressly prohibited. If  
>> you have received this email in error, please delete it and any  
>> attachments immediately and advise the sender by return email or  
>> telephone.
>> Deakin University does not warrant that this email and any  
>> attachments are error or virus free
>>
>>
>>
Reply | Threaded
Open this post in threaded view
|

RE: Problems with WordDelimiterFilterFactory

bernieh
Sorry, the last line was truncated -

HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse '(Asia -- Civilization AND status_i:(2)) ': Encountered "-" at line 1, column 7. Was expecting one of: "(" ... "*" ... <QUOTED> ... <TERM> ... <PREFIXTERM> ... <WILDTERM> ... "[" ... "{" ... <NUMBER> ...

-----Original Message-----
From: Bernadette Houghton [mailto:[hidden email]]
Sent: Friday, 9 October 2009 8:22 AM
To: '[hidden email]'
Subject: RE: Problems with WordDelimiterFilterFactory

Here's the query and the error -

Oct 09 08:20:17  [debug] [196] Solr query string:    (Asia -- Civilization AND status_i:(2))
Oct 09 08:20:17  [debug] [196] Solr sort by:  score desc
Oct 09 08:20:17  [error] Error on searching: "400" Status: org.apache.lucene.queryParser.ParseException: Cannot parse '   (Asia -- Civilization AND status_i:(2)) ': Encount

Bern

-----Original Message-----
From: Christian Zambrano [mailto:[hidden email]]
Sent: Thursday, 8 October 2009 12:48 PM
To: [hidden email]
Cc: [hidden email]
Subject: Re: Problems with WordDelimiterFilterFactory

Bern,

I am interested on the solr query. In other words, the query that your  
system sends to solr.

Thanks,


Christian

On Oct 7, 2009, at 5:56 PM, Bernadette Houghton <[hidden email]
 > wrote:

> Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:30000601
>
> Either scroll down and click one of the "television broadcasting --  
> asia" links, or type it in the Quick Search box.
>
>
> TIA
>
> bern
>
> -----Original Message-----
> From: Christian Zambrano [mailto:[hidden email]]
> Sent: Thursday, 8 October 2009 9:43 AM
> To: [hidden email]
> Subject: Re: Problems with WordDelimiterFilterFactory
>
> Could you please provide the exact URL of a query where you are
> experiencing this problem?
> eg(Not URL encoded): q=fieldName:"hot and cold: temperatures"
>
> On 10/07/2009 05:32 PM, Bernadette Houghton wrote:
>> We are having some issues with our solr parent application not  
>> retrieving records as expected.
>>
>> For example, if the input query includes a colon (e.g. hot and  
>> cold: temperatures), the relevant record (which contains a colon in  
>> the same place) does not get retrieved; if the input query does not  
>> include the colon, all is fine.  Ditto if the user searches for a  
>> query containing hyphens, e.g. "asia - civilization, although with  
>> the qualifier that something like "asia-civilization" (no spaces  
>> either side of the hyphen) works fine, whereas "asia -  
>> civilization" (spaces either side of hyphen) doesn't work.
>>
>> Our schema.xml contains the following -
>>
>>     <fieldType name="text" class="solr.TextField"  
>> positionIncrementGap="100">
>>       <analyzer type="index">
>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>         <!-- in this example, we will only use synonyms at query time
>>         <filter class="solr.SynonymFilterFactory"  
>> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>>         -->
>>                                 <filter  
>> class="solr.ISOLatin1AccentFilterFactory"/>
>>         <filter class="solr.StopFilterFactory" ignoreCase="true"  
>> words="stopwords.txt"/>
>>         <filter class="solr.WordDelimiterFilterFactory"  
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"  
>> catenateNumbers="1" catenateAll="0"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>         <filter class="solr.EnglishPorterFilterFactory"  
>> protected="protwords.txt"/>
>>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>       </analyzer>
>>       <analyzer type="query">
>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>                                 <filter  
>> class="solr.ISOLatin1AccentFilterFactory"/>
>>         <filter class="solr.SynonymFilterFactory"  
>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>>         <filter class="solr.StopFilterFactory" ignoreCase="true"  
>> words="stopwords.txt"/>
>>         <filter class="solr.WordDelimiterFilterFactory"  
>> generateWordParts="1" generateNumberParts="1" catenateWords="0"  
>> catenateNumbers="0" catenateAll="0"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>         <filter class="solr.EnglishPorterFilterFactory"  
>> protected="protwords.txt"/>
>>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>       </analyzer>
>>     </fieldType>
>>
>> Bernadette Houghton, Library Business Applications Developer
>> Deakin University Geelong Victoria 3217 Australia.
>> Phone: 03 5227 8230 International: +61 3 5227 8230
>> Fax: 03 5227 8000 International: +61 3 5227 8000
>> MSN: [hidden email]
>> Email: [hidden email]<mailto:[hidden email]
>> >
>> Website: http://www.deakin.edu.au
>> <http://www.deakin.edu.au/>Deakin University CRICOS Provider Code  
>> 00113B (Vic)
>>
>> Important Notice: The contents of this email are intended solely  
>> for the named addressee and are confidential; any unauthorised use,  
>> reproduction or storage of the contents is expressly prohibited. If  
>> you have received this email in error, please delete it and any  
>> attachments immediately and advise the sender by return email or  
>> telephone.
>> Deakin University does not warrant that this email and any  
>> attachments are error or virus free
>>
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Problems with WordDelimiterFilterFactory

Patrick Jungermann
Hi Bern,

the problem is the character sequence "--". A query is not allowed to
have minus characters that consequent upon another one. Remove one minus
character and the query will be parsed without problems.

Because of this parsing problem, I'd recommend a query cleanup before
the submit to the Solr server that replaces each sequence of minus
characters by a single one.


Regards, Patrick



Bernadette Houghton schrieb:

> Sorry, the last line was truncated -
>
> HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse '(Asia -- Civilization AND status_i:(2)) ': Encountered "-" at line 1, column 7. Was expecting one of: "(" ... "*" ... <QUOTED> ... <TERM> ... <PREFIXTERM> ... <WILDTERM> ... "[" ... "{" ... <NUMBER> ...
>
> -----Original Message-----
> From: Bernadette Houghton [mailto:[hidden email]]
> Sent: Friday, 9 October 2009 8:22 AM
> To: '[hidden email]'
> Subject: RE: Problems with WordDelimiterFilterFactory
>
> Here's the query and the error -
>
> Oct 09 08:20:17  [debug] [196] Solr query string:    (Asia -- Civilization AND status_i:(2))
> Oct 09 08:20:17  [debug] [196] Solr sort by:  score desc
> Oct 09 08:20:17  [error] Error on searching: "400" Status: org.apache.lucene.queryParser.ParseException: Cannot parse '   (Asia -- Civilization AND status_i:(2)) ': Encount
>
> Bern
>
> -----Original Message-----
> From: Christian Zambrano [mailto:[hidden email]]
> Sent: Thursday, 8 October 2009 12:48 PM
> To: [hidden email]
> Cc: [hidden email]
> Subject: Re: Problems with WordDelimiterFilterFactory
>
> Bern,
>
> I am interested on the solr query. In other words, the query that your  
> system sends to solr.
>
> Thanks,
>
>
> Christian
>
> On Oct 7, 2009, at 5:56 PM, Bernadette Houghton <[hidden email]
>  > wrote:
>
>> Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:30000601
>>
>> Either scroll down and click one of the "television broadcasting --  
>> asia" links, or type it in the Quick Search box.
>>
>>
>> TIA
>>
>> bern
>>
>> -----Original Message-----
>> From: Christian Zambrano [mailto:[hidden email]]
>> Sent: Thursday, 8 October 2009 9:43 AM
>> To: [hidden email]
>> Subject: Re: Problems with WordDelimiterFilterFactory
>>
>> Could you please provide the exact URL of a query where you are
>> experiencing this problem?
>> eg(Not URL encoded): q=fieldName:"hot and cold: temperatures"
>>
>> On 10/07/2009 05:32 PM, Bernadette Houghton wrote:
>>> We are having some issues with our solr parent application not  
>>> retrieving records as expected.
>>>
>>> For example, if the input query includes a colon (e.g. hot and  
>>> cold: temperatures), the relevant record (which contains a colon in  
>>> the same place) does not get retrieved; if the input query does not  
>>> include the colon, all is fine.  Ditto if the user searches for a  
>>> query containing hyphens, e.g. "asia - civilization, although with  
>>> the qualifier that something like "asia-civilization" (no spaces  
>>> either side of the hyphen) works fine, whereas "asia -  
>>> civilization" (spaces either side of hyphen) doesn't work.
>>>
>>> Our schema.xml contains the following -
>>>
>>>     <fieldType name="text" class="solr.TextField"  
>>> positionIncrementGap="100">
>>>       <analyzer type="index">
>>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>         <!-- in this example, we will only use synonyms at query time
>>>         <filter class="solr.SynonymFilterFactory"  
>>> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>>>         -->
>>>                                 <filter  
>>> class="solr.ISOLatin1AccentFilterFactory"/>
>>>         <filter class="solr.StopFilterFactory" ignoreCase="true"  
>>> words="stopwords.txt"/>
>>>         <filter class="solr.WordDelimiterFilterFactory"  
>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"  
>>> catenateNumbers="1" catenateAll="0"/>
>>>         <filter class="solr.LowerCaseFilterFactory"/>
>>>         <filter class="solr.EnglishPorterFilterFactory"  
>>> protected="protwords.txt"/>
>>>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>>       </analyzer>
>>>       <analyzer type="query">
>>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>                                 <filter  
>>> class="solr.ISOLatin1AccentFilterFactory"/>
>>>         <filter class="solr.SynonymFilterFactory"  
>>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>>>         <filter class="solr.StopFilterFactory" ignoreCase="true"  
>>> words="stopwords.txt"/>
>>>         <filter class="solr.WordDelimiterFilterFactory"  
>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"  
>>> catenateNumbers="0" catenateAll="0"/>
>>>         <filter class="solr.LowerCaseFilterFactory"/>
>>>         <filter class="solr.EnglishPorterFilterFactory"  
>>> protected="protwords.txt"/>
>>>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>>       </analyzer>
>>>     </fieldType>
>>>
>>> Bernadette Houghton, Library Business Applications Developer
>>> Deakin University Geelong Victoria 3217 Australia.
>>> Phone: 03 5227 8230 International: +61 3 5227 8230
>>> Fax: 03 5227 8000 International: +61 3 5227 8000
>>> MSN: [hidden email]
>>> Email: [hidden email]<mailto:[hidden email]
>>> Website: http://www.deakin.edu.au
>>> <http://www.deakin.edu.au/>Deakin University CRICOS Provider Code  
>>> 00113B (Vic)
>>>
>>> Important Notice: The contents of this email are intended solely  
>>> for the named addressee and are confidential; any unauthorised use,  
>>> reproduction or storage of the contents is expressly prohibited. If  
>>> you have received this email in error, please delete it and any  
>>> attachments immediately and advise the sender by return email or  
>>> telephone.
>>> Deakin University does not warrant that this email and any  
>>> attachments are error or virus free
>>>
>>>
>>>

Reply | Threaded
Open this post in threaded view
|

RE: Problems with WordDelimiterFilterFactory

bernieh
In reply to this post by MarkL
Thanks for this, marklo; it is a *very* useful page.
bern

-----Original Message-----
From: marklo [mailto:[hidden email]]
Sent: Thursday, 8 October 2009 1:10 PM
To: [hidden email]
Subject: Re: Problems with WordDelimiterFilterFactory


Use http://solr-url/solr/admin/analysis.jsp to see how your data is
indexed/queried

--
View this message in context: http://www.nabble.com/Problems-with-WordDelimiterFilterFactory-tp25795589p25797377.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

RE: Problems with WordDelimiterFilterFactory

bernieh
In reply to this post by Patrick Jungermann
Thanks for this Patrick. If I remove one of the hyphens, solr doesn't throw up the error, but still doesn't find the right record. I see from marklo's analysis page that solr is still parsing it with a hyphen. Changing this part of our schema.xml -

        <filter class="solr.PatternReplaceFilterFactory"
                pattern="([^a-z])" replacement="" replace="all"
        />

To

        <filter class="solr.PatternReplaceFilterFactory"
                pattern="([^a-z])" replacement=" " replace="all"
        />

i.e. replacing non-alpha chars with a space, looks like it may handle that aspect.

Regards
Bern

-----Original Message-----
From: Patrick Jungermann [mailto:[hidden email]]
Sent: Friday, 9 October 2009 9:03 AM
To: [hidden email]
Subject: Re: Problems with WordDelimiterFilterFactory

Hi Bern,

the problem is the character sequence "--". A query is not allowed to
have minus characters that consequent upon another one. Remove one minus
character and the query will be parsed without problems.

Because of this parsing problem, I'd recommend a query cleanup before
the submit to the Solr server that replaces each sequence of minus
characters by a single one.


Regards, Patrick



Bernadette Houghton schrieb:

> Sorry, the last line was truncated -
>
> HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse '(Asia -- Civilization AND status_i:(2)) ': Encountered "-" at line 1, column 7. Was expecting one of: "(" ... "*" ... <QUOTED> ... <TERM> ... <PREFIXTERM> ... <WILDTERM> ... "[" ... "{" ... <NUMBER> ...
>
> -----Original Message-----
> From: Bernadette Houghton [mailto:[hidden email]]
> Sent: Friday, 9 October 2009 8:22 AM
> To: '[hidden email]'
> Subject: RE: Problems with WordDelimiterFilterFactory
>
> Here's the query and the error -
>
> Oct 09 08:20:17  [debug] [196] Solr query string:    (Asia -- Civilization AND status_i:(2))
> Oct 09 08:20:17  [debug] [196] Solr sort by:  score desc
> Oct 09 08:20:17  [error] Error on searching: "400" Status: org.apache.lucene.queryParser.ParseException: Cannot parse '   (Asia -- Civilization AND status_i:(2)) ': Encount
>
> Bern
>
> -----Original Message-----
> From: Christian Zambrano [mailto:[hidden email]]
> Sent: Thursday, 8 October 2009 12:48 PM
> To: [hidden email]
> Cc: [hidden email]
> Subject: Re: Problems with WordDelimiterFilterFactory
>
> Bern,
>
> I am interested on the solr query. In other words, the query that your  
> system sends to solr.
>
> Thanks,
>
>
> Christian
>
> On Oct 7, 2009, at 5:56 PM, Bernadette Houghton <[hidden email]
>  > wrote:
>
>> Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:30000601
>>
>> Either scroll down and click one of the "television broadcasting --  
>> asia" links, or type it in the Quick Search box.
>>
>>
>> TIA
>>
>> bern
>>
>> -----Original Message-----
>> From: Christian Zambrano [mailto:[hidden email]]
>> Sent: Thursday, 8 October 2009 9:43 AM
>> To: [hidden email]
>> Subject: Re: Problems with WordDelimiterFilterFactory
>>
>> Could you please provide the exact URL of a query where you are
>> experiencing this problem?
>> eg(Not URL encoded): q=fieldName:"hot and cold: temperatures"
>>
>> On 10/07/2009 05:32 PM, Bernadette Houghton wrote:
>>> We are having some issues with our solr parent application not  
>>> retrieving records as expected.
>>>
>>> For example, if the input query includes a colon (e.g. hot and  
>>> cold: temperatures), the relevant record (which contains a colon in  
>>> the same place) does not get retrieved; if the input query does not  
>>> include the colon, all is fine.  Ditto if the user searches for a  
>>> query containing hyphens, e.g. "asia - civilization, although with  
>>> the qualifier that something like "asia-civilization" (no spaces  
>>> either side of the hyphen) works fine, whereas "asia -  
>>> civilization" (spaces either side of hyphen) doesn't work.
>>>
>>> Our schema.xml contains the following -
>>>
>>>     <fieldType name="text" class="solr.TextField"  
>>> positionIncrementGap="100">
>>>       <analyzer type="index">
>>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>         <!-- in this example, we will only use synonyms at query time
>>>         <filter class="solr.SynonymFilterFactory"  
>>> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>>>         -->
>>>                                 <filter  
>>> class="solr.ISOLatin1AccentFilterFactory"/>
>>>         <filter class="solr.StopFilterFactory" ignoreCase="true"  
>>> words="stopwords.txt"/>
>>>         <filter class="solr.WordDelimiterFilterFactory"  
>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"  
>>> catenateNumbers="1" catenateAll="0"/>
>>>         <filter class="solr.LowerCaseFilterFactory"/>
>>>         <filter class="solr.EnglishPorterFilterFactory"  
>>> protected="protwords.txt"/>
>>>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>>       </analyzer>
>>>       <analyzer type="query">
>>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>                                 <filter  
>>> class="solr.ISOLatin1AccentFilterFactory"/>
>>>         <filter class="solr.SynonymFilterFactory"  
>>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>>>         <filter class="solr.StopFilterFactory" ignoreCase="true"  
>>> words="stopwords.txt"/>
>>>         <filter class="solr.WordDelimiterFilterFactory"  
>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"  
>>> catenateNumbers="0" catenateAll="0"/>
>>>         <filter class="solr.LowerCaseFilterFactory"/>
>>>         <filter class="solr.EnglishPorterFilterFactory"  
>>> protected="protwords.txt"/>
>>>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>>       </analyzer>
>>>     </fieldType>
>>>
>>> Bernadette Houghton, Library Business Applications Developer
>>> Deakin University Geelong Victoria 3217 Australia.
>>> Phone: 03 5227 8230 International: +61 3 5227 8230
>>> Fax: 03 5227 8000 International: +61 3 5227 8000
>>> MSN: [hidden email]
>>> Email: [hidden email]<mailto:[hidden email]
>>> Website: http://www.deakin.edu.au
>>> <http://www.deakin.edu.au/>Deakin University CRICOS Provider Code  
>>> 00113B (Vic)
>>>
>>> Important Notice: The contents of this email are intended solely  
>>> for the named addressee and are confidential; any unauthorised use,  
>>> reproduction or storage of the contents is expressly prohibited. If  
>>> you have received this email in error, please delete it and any  
>>> attachments immediately and advise the sender by return email or  
>>> telephone.
>>> Deakin University does not warrant that this email and any  
>>> attachments are error or virus free
>>>
>>>
>>>

Reply | Threaded
Open this post in threaded view
|

Re: Problems with WordDelimiterFilterFactory

Christian Zambrano
Bern,

The only way that could be happening is if you are not using the field
type you described on your original e-mail. The TokenFilter
WordDelimiterFilterFactory should take care of the hyphen.

On 10/08/2009 05:30 PM, Bernadette Houghton wrote:

> Thanks for this Patrick. If I remove one of the hyphens, solr doesn't throw up the error, but still doesn't find the right record. I see from marklo's analysis page that solr is still parsing it with a hyphen. Changing this part of our schema.xml -
>
>          <filter class="solr.PatternReplaceFilterFactory"
>                  pattern="([^a-z])" replacement="" replace="all"
>          />
>
> To
>
>          <filter class="solr.PatternReplaceFilterFactory"
>                  pattern="([^a-z])" replacement=" " replace="all"
>          />
>
> i.e. replacing non-alpha chars with a space, looks like it may handle that aspect.
>
> Regards
> Bern
>
> -----Original Message-----
> From: Patrick Jungermann [mailto:[hidden email]]
> Sent: Friday, 9 October 2009 9:03 AM
> To: [hidden email]
> Subject: Re: Problems with WordDelimiterFilterFactory
>
> Hi Bern,
>
> the problem is the character sequence "--". A query is not allowed to
> have minus characters that consequent upon another one. Remove one minus
> character and the query will be parsed without problems.
>
> Because of this parsing problem, I'd recommend a query cleanup before
> the submit to the Solr server that replaces each sequence of minus
> characters by a single one.
>
>
> Regards, Patrick
>
>
>
> Bernadette Houghton schrieb:
>    
>> Sorry, the last line was truncated -
>>
>> HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse '(Asia -- Civilization AND status_i:(2)) ': Encountered "-" at line 1, column 7. Was expecting one of: "(" ... "*" ...<QUOTED>  ...<TERM>  ...<PREFIXTERM>  ...<WILDTERM>  ... "[" ... "{" ...<NUMBER>  ...
>>
>> -----Original Message-----
>> From: Bernadette Houghton [mailto:[hidden email]]
>> Sent: Friday, 9 October 2009 8:22 AM
>> To: '[hidden email]'
>> Subject: RE: Problems with WordDelimiterFilterFactory
>>
>> Here's the query and the error -
>>
>> Oct 09 08:20:17  [debug] [196] Solr query string:    (Asia -- Civilization AND status_i:(2))
>> Oct 09 08:20:17  [debug] [196] Solr sort by:  score desc
>> Oct 09 08:20:17  [error] Error on searching: "400" Status: org.apache.lucene.queryParser.ParseException: Cannot parse '   (Asia -- Civilization AND status_i:(2)) ': Encount
>>
>> Bern
>>
>> -----Original Message-----
>> From: Christian Zambrano [mailto:[hidden email]]
>> Sent: Thursday, 8 October 2009 12:48 PM
>> To: [hidden email]
>> Cc: [hidden email]
>> Subject: Re: Problems with WordDelimiterFilterFactory
>>
>> Bern,
>>
>> I am interested on the solr query. In other words, the query that your
>> system sends to solr.
>>
>> Thanks,
>>
>>
>> Christian
>>
>> On Oct 7, 2009, at 5:56 PM, Bernadette Houghton<[hidden email]
>>   >  wrote:
>>
>>      
>>> Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:30000601
>>>
>>> Either scroll down and click one of the "television broadcasting --
>>> asia" links, or type it in the Quick Search box.
>>>
>>>
>>> TIA
>>>
>>> bern
>>>
>>> -----Original Message-----
>>> From: Christian Zambrano [mailto:[hidden email]]
>>> Sent: Thursday, 8 October 2009 9:43 AM
>>> To: [hidden email]
>>> Subject: Re: Problems with WordDelimiterFilterFactory
>>>
>>> Could you please provide the exact URL of a query where you are
>>> experiencing this problem?
>>> eg(Not URL encoded): q=fieldName:"hot and cold: temperatures"
>>>
>>> On 10/07/2009 05:32 PM, Bernadette Houghton wrote:
>>>        
>>>> We are having some issues with our solr parent application not
>>>> retrieving records as expected.
>>>>
>>>> For example, if the input query includes a colon (e.g. hot and
>>>> cold: temperatures), the relevant record (which contains a colon in
>>>> the same place) does not get retrieved; if the input query does not
>>>> include the colon, all is fine.  Ditto if the user searches for a
>>>> query containing hyphens, e.g. "asia - civilization, although with
>>>> the qualifier that something like "asia-civilization" (no spaces
>>>> either side of the hyphen) works fine, whereas "asia -
>>>> civilization" (spaces either side of hyphen) doesn't work.
>>>>
>>>> Our schema.xml contains the following -
>>>>
>>>>      <fieldType name="text" class="solr.TextField"
>>>> positionIncrementGap="100">
>>>>        <analyzer type="index">
>>>>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>          <!-- in this example, we will only use synonyms at query time
>>>>          <filter class="solr.SynonymFilterFactory"
>>>> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>>>>          -->
>>>>                                  <filter
>>>> class="solr.ISOLatin1AccentFilterFactory"/>
>>>>          <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>> words="stopwords.txt"/>
>>>>          <filter class="solr.WordDelimiterFilterFactory"
>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>>>> catenateNumbers="1" catenateAll="0"/>
>>>>          <filter class="solr.LowerCaseFilterFactory"/>
>>>>          <filter class="solr.EnglishPorterFilterFactory"
>>>> protected="protwords.txt"/>
>>>>          <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>>>        </analyzer>
>>>>        <analyzer type="query">
>>>>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>                                  <filter
>>>> class="solr.ISOLatin1AccentFilterFactory"/>
>>>>          <filter class="solr.SynonymFilterFactory"
>>>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>>>>          <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>> words="stopwords.txt"/>
>>>>          <filter class="solr.WordDelimiterFilterFactory"
>>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>>> catenateNumbers="0" catenateAll="0"/>
>>>>          <filter class="solr.LowerCaseFilterFactory"/>
>>>>          <filter class="solr.EnglishPorterFilterFactory"
>>>> protected="protwords.txt"/>
>>>>          <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>>>        </analyzer>
>>>>      </fieldType>
>>>>
>>>> Bernadette Houghton, Library Business Applications Developer
>>>> Deakin University Geelong Victoria 3217 Australia.
>>>> Phone: 03 5227 8230 International: +61 3 5227 8230
>>>> Fax: 03 5227 8000 International: +61 3 5227 8000
>>>> MSN: [hidden email]
>>>> Email: [hidden email]<mailto:[hidden email]
>>>> Website: http://www.deakin.edu.au
>>>> <http://www.deakin.edu.au/>Deakin University CRICOS Provider Code
>>>> 00113B (Vic)
>>>>
>>>> Important Notice: The contents of this email are intended solely
>>>> for the named addressee and are confidential; any unauthorised use,
>>>> reproduction or storage of the contents is expressly prohibited. If
>>>> you have received this email in error, please delete it and any
>>>> attachments immediately and advise the sender by return email or
>>>> telephone.
>>>> Deakin University does not warrant that this email and any
>>>> attachments are error or virus free
>>>>
>>>>
>>>>
>>>>          
>    
Reply | Threaded
Open this post in threaded view
|

Re: Problems with WordDelimiterFilterFactory

Chantal Ackermann
In reply to this post by bernieh
Hi Bernadette,

Bernadette Houghton schrieb:
> Thanks for this Patrick. If I remove one of the hyphens, solr doesn't throw up the error, but still doesn't find the right record. I see from marklo's analysis page that solr is still parsing it with a hyphen. Changing this part of our schema.xml -

that's probably because the hyphen/minus has a special meaning ("not
containing")? Try putting the input in quotes. But I agree with
Christian that the hyphens should have been removed during index time by
the token filters.

cheers
chantal




>
>         <filter class="solr.PatternReplaceFilterFactory"
>                 pattern="([^a-z])" replacement="" replace="all"
>         />
>
> To
>
>         <filter class="solr.PatternReplaceFilterFactory"
>                 pattern="([^a-z])" replacement=" " replace="all"
>         />
>
> i.e. replacing non-alpha chars with a space, looks like it may handle that aspect.
>
> Regards
> Bern
Reply | Threaded
Open this post in threaded view
|

Re: Problems with WordDelimiterFilterFactory

Shalin Shekhar Mangar
In reply to this post by Patrick Jungermann
On Fri, Oct 9, 2009 at 3:33 AM, Patrick Jungermann <
[hidden email]> wrote:

> Hi Bern,
>
> the problem is the character sequence "--". A query is not allowed to
> have minus characters that consequent upon another one. Remove one minus
> character and the query will be parsed without problems.
>
>
Or you could escape the hyphen character. If you are using SolrJ, use
ClientUtils.escapeQueryChars on the query string.

--
Regards,
Shalin Shekhar Mangar.