"begins with" searches

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

"begins with" searches

bernieh
We need to offer "begins with" type searches, e.g. a search for "surname, f" will retrieve "surname, firstname", "surname, f", "surname fm" etc.

Ideally, the user would be able to enter something like "surname f*".

However, wildcards don't work on phrase searches, nor do range searches.

Any suggestions as to how best to search for "begins with" phrases; or, how to best configure solr to support such searches?

TIA
Bernadette Houghton, Library Business Applications Developer
Deakin University Geelong Victoria 3217 Australia.
Phone: 03 5227 8230 International: +61 3 5227 8230
Fax: 03 5227 8000 International: +61 3 5227 8000
MSN: [hidden email]
Email: [hidden email]<mailto:[hidden email]>
Website: http://www.deakin.edu.au
<http://www.deakin.edu.au/>Deakin University CRICOS Provider Code 00113B (Vic)

Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.
Deakin University does not warrant that this email and any attachments are error or virus free

Reply | Threaded
Open this post in threaded view
|

Re: "begins with" searches

Avlesh Singh
Read up of setting-up these kind searches here -
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

Cheers
Avlesh

On Mon, Oct 26, 2009 at 7:43 AM, Bernadette Houghton <
[hidden email]> wrote:

> We need to offer "begins with" type searches, e.g. a search for "surname,
> f" will retrieve "surname, firstname", "surname, f", "surname fm" etc.
>
> Ideally, the user would be able to enter something like "surname f*".
>
> However, wildcards don't work on phrase searches, nor do range searches.
>
> Any suggestions as to how best to search for "begins with" phrases; or, how
> to best configure solr to support such searches?
>
> TIA
> Bernadette Houghton, Library Business Applications Developer
> Deakin University Geelong Victoria 3217 Australia.
> Phone: 03 5227 8230 International: +61 3 5227 8230
> Fax: 03 5227 8000 International: +61 3 5227 8000
> MSN: [hidden email]
> Email: [hidden email]<mailto:
> [hidden email]>
> Website: http://www.deakin.edu.au
> <http://www.deakin.edu.au/>Deakin University CRICOS Provider Code 00113B
> (Vic)
>
> Important Notice: The contents of this email are intended solely for the
> named addressee and are confidential; any unauthorised use, reproduction or
> storage of the contents is expressly prohibited. If you have received this
> email in error, please delete it and any attachments immediately and advise
> the sender by return email or telephone.
> Deakin University does not warrant that this email and any attachments are
> error or virus free
>
>
Reply | Threaded
Open this post in threaded view
|

Re: "begins with" searches

Gerald Snyder
In reply to this post by bernieh
Are you using the field name suffixes like Blacklight?    xxx_text,
_xxx_facet, xxx_string?   With the xxx_string field you can request
"begins with" search, but you may need some different search term
normalization than with a _text search.


Gerald Snyder
Florida Center for Library Automation


Bernadette Houghton wrote:

> We need to offer "begins with" type searches, e.g. a search for "surname, f" will retrieve "surname, firstname", "surname, f", "surname fm" etc.
>
> Ideally, the user would be able to enter something like "surname f*".
>
> However, wildcards don't work on phrase searches, nor do range searches.
>
> Any suggestions as to how best to search for "begins with" phrases; or, how to best configure solr to support such searches?
>
> TIA
> Bernadette Houghton, Library Business Applications Developer
> Deakin University Geelong Victoria 3217 Australia.
> Phone: 03 5227 8230 International: +61 3 5227 8230
> Fax: 03 5227 8000 International: +61 3 5227 8000
> MSN: [hidden email]
> Email: [hidden email]<mailto:[hidden email]>
> Website: http://www.deakin.edu.au
> <http://www.deakin.edu.au/>Deakin University CRICOS Provider Code 00113B (Vic)
>
> Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.
> Deakin University does not warrant that this email and any attachments are error or virus free
>
>
>  
Reply | Threaded
Open this post in threaded view
|

RE: "begins with" searches

bernieh
In reply to this post by Avlesh Singh
Thanks for this suggestion (thanks Gerald also: no, we're not using BlackLight-type prefixes).

I've set up an edgytext fieldType in schema.xml thus -

<fieldType name="edgytext" class="solr.TextField" positionIncrementGap="100">
 <analyzer type="index">
   <tokenizer class="solr.KeywordTokenizerFactory"/>
   <filter class="solr.LowerCaseFilterFactory"/>
   <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25" />
 </analyzer>
 <analyzer type="query">
   <tokenizer class="solr.KeywordTokenizerFactory"/>
   <filter class="solr.LowerCaseFilterFactory"/>
 </analyzer>
</fieldType>

And defined a field name thus -

<dynamicField name="*author_mt"  type="edgytext"    indexed="true"  stored="true" multiValued="true"/>

The results are mixed -

* searches such as "surname, f" and "surname, fre" (with quotations and commas) work well, retrieving "surname, f", "surname, Fred", "surname, Frederick" etc etc
* searches such as the above but without quotations don't work too well as they get parsed as author_mt:surname + author_mt:firstname, with solr reading the query as "author beginning with surname AND author beginning with firstname", which yields nil results.

Is there an analyser that will strip the whitespace out altogether? Or another alternative?

bern

-----Original Message-----
From: Avlesh Singh [mailto:[hidden email]]
Sent: Monday, 26 October 2009 6:32 PM
To: [hidden email]
Subject: Re: "begins with" searches

Read up of setting-up these kind searches here -
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

Cheers
Avlesh

On Mon, Oct 26, 2009 at 7:43 AM, Bernadette Houghton <
[hidden email]> wrote:

> We need to offer "begins with" type searches, e.g. a search for "surname,
> f" will retrieve "surname, firstname", "surname, f", "surname fm" etc.
>
> Ideally, the user would be able to enter something like "surname f*".
>
> However, wildcards don't work on phrase searches, nor do range searches.
>
> Any suggestions as to how best to search for "begins with" phrases; or, how
> to best configure solr to support such searches?
>
> TIA
> Bernadette Houghton, Library Business Applications Developer
> Deakin University Geelong Victoria 3217 Australia.
> Phone: 03 5227 8230 International: +61 3 5227 8230
> Fax: 03 5227 8000 International: +61 3 5227 8000
> MSN: [hidden email]
> Email: [hidden email]<mailto:
> [hidden email]>
> Website: http://www.deakin.edu.au
> <http://www.deakin.edu.au/>Deakin University CRICOS Provider Code 00113B
> (Vic)
>
> Important Notice: The contents of this email are intended solely for the
> named addressee and are confidential; any unauthorised use, reproduction or
> storage of the contents is expressly prohibited. If you have received this
> email in error, please delete it and any attachments immediately and advise
> the sender by return email or telephone.
> Deakin University does not warrant that this email and any attachments are
> error or virus free
>
>
Reply | Threaded
Open this post in threaded view
|

Re: "begins with" searches

Avlesh Singh
You are right about the parsing of query terms without a double quote
(solrQueryParser's defaultOperator has to be "AND" in your case). For the
problem at hand, two things -

   1. Do you have any reason for not doing a PhraseQuery (query terms
   enclosed in double quotes) on your "edgytext" field? If not then you can
   always enclose your query in double quotes to get expected "begins with"
   matches.
   2. You can always "escape" your query string before passing to Solr; and
   you wouldn't need to pass your query term in double quotes. For exapmle,
   search for the query string - surname, fre when "escaped" would be converted
   into surname,\+fre thereby asking Solr to treat this as a single query term.
   For more details -
   http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Escaping%20Special%20Characters.
   If you use SolrJ, there is a ClientUtils class somewhere in the package
   which has helper functions to achieve query escaping.

Cheers
Avlesh

On Tue, Oct 27, 2009 at 9:22 AM, Bernadette Houghton <
[hidden email]> wrote:

> Thanks for this suggestion (thanks Gerald also: no, we're not using
> BlackLight-type prefixes).
>
> I've set up an edgytext fieldType in schema.xml thus -
>
> <fieldType name="edgytext" class="solr.TextField"
> positionIncrementGap="100">
>  <analyzer type="index">
>   <tokenizer class="solr.KeywordTokenizerFactory"/>
>   <filter class="solr.LowerCaseFilterFactory"/>
>   <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
> maxGramSize="25" />
>  </analyzer>
>  <analyzer type="query">
>   <tokenizer class="solr.KeywordTokenizerFactory"/>
>   <filter class="solr.LowerCaseFilterFactory"/>
>  </analyzer>
> </fieldType>
>
> And defined a field name thus -
>
> <dynamicField name="*author_mt"  type="edgytext"    indexed="true"
>  stored="true" multiValued="true"/>
>
> The results are mixed -
>
> * searches such as "surname, f" and "surname, fre" (with quotations and
> commas) work well, retrieving "surname, f", "surname, Fred", "surname,
> Frederick" etc etc
> * searches such as the above but without quotations don't work too well as
> they get parsed as author_mt:surname + author_mt:firstname, with solr
> reading the query as "author beginning with surname AND author beginning
> with firstname", which yields nil results.
>
> Is there an analyser that will strip the whitespace out altogether? Or
> another alternative?
>
> bern
>
> -----Original Message-----
> From: Avlesh Singh [mailto:[hidden email]]
> Sent: Monday, 26 October 2009 6:32 PM
> To: [hidden email]
> Subject: Re: "begins with" searches
>
> Read up of setting-up these kind searches here -
>
> http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
>
> Cheers
> Avlesh
>
> On Mon, Oct 26, 2009 at 7:43 AM, Bernadette Houghton <
> [hidden email]> wrote:
>
> > We need to offer "begins with" type searches, e.g. a search for "surname,
> > f" will retrieve "surname, firstname", "surname, f", "surname fm" etc.
> >
> > Ideally, the user would be able to enter something like "surname f*".
> >
> > However, wildcards don't work on phrase searches, nor do range searches.
> >
> > Any suggestions as to how best to search for "begins with" phrases; or,
> how
> > to best configure solr to support such searches?
> >
> > TIA
> > Bernadette Houghton, Library Business Applications Developer
> > Deakin University Geelong Victoria 3217 Australia.
> > Phone: 03 5227 8230 International: +61 3 5227 8230
> > Fax: 03 5227 8000 International: +61 3 5227 8000
> > MSN: [hidden email]
> > Email: [hidden email]<mailto:
> > [hidden email]>
> > Website: http://www.deakin.edu.au
> > <http://www.deakin.edu.au/>Deakin University CRICOS Provider Code 00113B
> > (Vic)
> >
> > Important Notice: The contents of this email are intended solely for the
> > named addressee and are confidential; any unauthorised use, reproduction
> or
> > storage of the contents is expressly prohibited. If you have received
> this
> > email in error, please delete it and any attachments immediately and
> advise
> > the sender by return email or telephone.
> > Deakin University does not warrant that this email and any attachments
> are
> > error or virus free
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

RE: "begins with" searches

bernieh
Thanks Avlesh. The issue with not doing a phrase query on my "edgytext" field was that my parent application was adding an escape character to the quotation marks, and I was hoping to fix (or rather, work around) at the solr end to save maintenance overhead. But I've done a hack in the parent application to remove those escape chars, and all is working well in that respect.

My next issue relates to how to get the results of the author field come up in a search across all fields. For example, a search on author:"Houghton, B" (which uses the edgytext) yields 16 documents, but a search on all:"Houghton, B" (which doesn't) yields only 9. I thought the solution should be <copyfield source="*author_mt" dest="all"/> but that doesn't do the trick.

Thanks!

bern
-----Original Message-----
From: Avlesh Singh [mailto:[hidden email]]
Sent: Tuesday, 27 October 2009 5:54 PM
To: [hidden email]
Subject: Re: "begins with" searches

You are right about the parsing of query terms without a double quote
(solrQueryParser's defaultOperator has to be "AND" in your case). For the
problem at hand, two things -

   1. Do you have any reason for not doing a PhraseQuery (query terms
   enclosed in double quotes) on your "edgytext" field? If not then you can
   always enclose your query in double quotes to get expected "begins with"
   matches.
   2. You can always "escape" your query string before passing to Solr; and
   you wouldn't need to pass your query term in double quotes. For exapmle,
   search for the query string - surname, fre when "escaped" would be converted
   into surname,\+fre thereby asking Solr to treat this as a single query term.
   For more details -
   http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Escaping%20Special%20Characters.
   If you use SolrJ, there is a ClientUtils class somewhere in the package
   which has helper functions to achieve query escaping.

Cheers
Avlesh

On Tue, Oct 27, 2009 at 9:22 AM, Bernadette Houghton <
[hidden email]> wrote:

> Thanks for this suggestion (thanks Gerald also: no, we're not using
> BlackLight-type prefixes).
>
> I've set up an edgytext fieldType in schema.xml thus -
>
> <fieldType name="edgytext" class="solr.TextField"
> positionIncrementGap="100">
>  <analyzer type="index">
>   <tokenizer class="solr.KeywordTokenizerFactory"/>
>   <filter class="solr.LowerCaseFilterFactory"/>
>   <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
> maxGramSize="25" />
>  </analyzer>
>  <analyzer type="query">
>   <tokenizer class="solr.KeywordTokenizerFactory"/>
>   <filter class="solr.LowerCaseFilterFactory"/>
>  </analyzer>
> </fieldType>
>
> And defined a field name thus -
>
> <dynamicField name="*author_mt"  type="edgytext"    indexed="true"
>  stored="true" multiValued="true"/>
>
> The results are mixed -
>
> * searches such as "surname, f" and "surname, fre" (with quotations and
> commas) work well, retrieving "surname, f", "surname, Fred", "surname,
> Frederick" etc etc
> * searches such as the above but without quotations don't work too well as
> they get parsed as author_mt:surname + author_mt:firstname, with solr
> reading the query as "author beginning with surname AND author beginning
> with firstname", which yields nil results.
>
> Is there an analyser that will strip the whitespace out altogether? Or
> another alternative?
>
> bern
>
> -----Original Message-----
> From: Avlesh Singh [mailto:[hidden email]]
> Sent: Monday, 26 October 2009 6:32 PM
> To: [hidden email]
> Subject: Re: "begins with" searches
>
> Read up of setting-up these kind searches here -
>
> http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
>
> Cheers
> Avlesh
>
> On Mon, Oct 26, 2009 at 7:43 AM, Bernadette Houghton <
> [hidden email]> wrote:
>
> > We need to offer "begins with" type searches, e.g. a search for "surname,
> > f" will retrieve "surname, firstname", "surname, f", "surname fm" etc.
> >
> > Ideally, the user would be able to enter something like "surname f*".
> >
> > However, wildcards don't work on phrase searches, nor do range searches.
> >
> > Any suggestions as to how best to search for "begins with" phrases; or,
> how
> > to best configure solr to support such searches?
> >
> > TIA
> > Bernadette Houghton, Library Business Applications Developer
> > Deakin University Geelong Victoria 3217 Australia.
> > Phone: 03 5227 8230 International: +61 3 5227 8230
> > Fax: 03 5227 8000 International: +61 3 5227 8000
> > MSN: [hidden email]
> > Email: [hidden email]<mailto:
> > [hidden email]>
> > Website: http://www.deakin.edu.au
> > <http://www.deakin.edu.au/>Deakin University CRICOS Provider Code 00113B
> > (Vic)
> >
> > Important Notice: The contents of this email are intended solely for the
> > named addressee and are confidential; any unauthorised use, reproduction
> or
> > storage of the contents is expressly prohibited. If you have received
> this
> > email in error, please delete it and any attachments immediately and
> advise
> > the sender by return email or telephone.
> > Deakin University does not warrant that this email and any attachments
> are
> > error or virus free
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: "begins with" searches

Avlesh Singh
>
> My next issue relates to how to get the results of the author field come up
> in a search across all fields. For example, a search on author:"Houghton, B"
> (which uses the edgytext) yields 16 documents, but a search on
> all:"Houghton, B" (which doesn't) yields only 9. I thought the solution
> should be <copyfield source="*author_mt" dest="all"/> but that doesn't do
> the trick.
>

Do you have a field called "all"? How is it set up? Can you post the
schema.xml snippet relating to this field here?
<copyField> is supported for a dynamic field source. <copyfield
source="*author_mt" dest="all"/> should work for you as long as you have a
field called "all" defined in your schema. Moreover, for your specific use
case, the "all" field needs to be of type "edgytext".

Cheers
Avlesh

On Wed, Oct 28, 2009 at 9:35 AM, Bernadette Houghton <
[hidden email]> wrote:

> Thanks Avlesh. The issue with not doing a phrase query on my "edgytext"
> field was that my parent application was adding an escape character to the
> quotation marks, and I was hoping to fix (or rather, work around) at the
> solr end to save maintenance overhead. But I've done a hack in the parent
> application to remove those escape chars, and all is working well in that
> respect.
>
> My next issue relates to how to get the results of the author field come up
> in a search across all fields. For example, a search on author:"Houghton, B"
> (which uses the edgytext) yields 16 documents, but a search on
> all:"Houghton, B" (which doesn't) yields only 9. I thought the solution
> should be <copyfield source="*author_mt" dest="all"/> but that doesn't do
> the trick.
>
> Thanks!
>
> bern
> -----Original Message-----
> From: Avlesh Singh [mailto:[hidden email]]
> Sent: Tuesday, 27 October 2009 5:54 PM
> To: [hidden email]
> Subject: Re: "begins with" searches
>
> You are right about the parsing of query terms without a double quote
> (solrQueryParser's defaultOperator has to be "AND" in your case). For the
> problem at hand, two things -
>
>    1. Do you have any reason for not doing a PhraseQuery (query terms
>    enclosed in double quotes) on your "edgytext" field? If not then you can
>   always enclose your query in double quotes to get expected "begins with"
>   matches.
>    2. You can always "escape" your query string before passing to Solr; and
>    you wouldn't need to pass your query term in double quotes. For exapmle,
>   search for the query string - surname, fre when "escaped" would be
> converted
>   into surname,\+fre thereby asking Solr to treat this as a single query
> term.
>   For more details -
>
> http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Escaping%20Special%20Characters
> .
>   If you use SolrJ, there is a ClientUtils class somewhere in the package
>   which has helper functions to achieve query escaping.
>
> Cheers
> Avlesh
>
> On Tue, Oct 27, 2009 at 9:22 AM, Bernadette Houghton <
> [hidden email]> wrote:
>
> > Thanks for this suggestion (thanks Gerald also: no, we're not using
> > BlackLight-type prefixes).
> >
> > I've set up an edgytext fieldType in schema.xml thus -
> >
> > <fieldType name="edgytext" class="solr.TextField"
> > positionIncrementGap="100">
> >  <analyzer type="index">
> >   <tokenizer class="solr.KeywordTokenizerFactory"/>
> >   <filter class="solr.LowerCaseFilterFactory"/>
> >   <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
> > maxGramSize="25" />
> >  </analyzer>
> >  <analyzer type="query">
> >   <tokenizer class="solr.KeywordTokenizerFactory"/>
> >   <filter class="solr.LowerCaseFilterFactory"/>
> >  </analyzer>
> > </fieldType>
> >
> > And defined a field name thus -
> >
> > <dynamicField name="*author_mt"  type="edgytext"    indexed="true"
> >  stored="true" multiValued="true"/>
> >
> > The results are mixed -
> >
> > * searches such as "surname, f" and "surname, fre" (with quotations and
> > commas) work well, retrieving "surname, f", "surname, Fred", "surname,
> > Frederick" etc etc
> > * searches such as the above but without quotations don't work too well
> as
> > they get parsed as author_mt:surname + author_mt:firstname, with solr
> > reading the query as "author beginning with surname AND author beginning
> > with firstname", which yields nil results.
> >
> > Is there an analyser that will strip the whitespace out altogether? Or
> > another alternative?
> >
> > bern
> >
> > -----Original Message-----
> > From: Avlesh Singh [mailto:[hidden email]]
> > Sent: Monday, 26 October 2009 6:32 PM
> > To: [hidden email]
> > Subject: Re: "begins with" searches
> >
> > Read up of setting-up these kind searches here -
> >
> >
> http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
> >
> > Cheers
> > Avlesh
> >
> > On Mon, Oct 26, 2009 at 7:43 AM, Bernadette Houghton <
> > [hidden email]> wrote:
> >
> > > We need to offer "begins with" type searches, e.g. a search for
> "surname,
> > > f" will retrieve "surname, firstname", "surname, f", "surname fm" etc.
> > >
> > > Ideally, the user would be able to enter something like "surname f*".
> > >
> > > However, wildcards don't work on phrase searches, nor do range
> searches.
> > >
> > > Any suggestions as to how best to search for "begins with" phrases; or,
> > how
> > > to best configure solr to support such searches?
> > >
> > > TIA
> > > Bernadette Houghton, Library Business Applications Developer
> > > Deakin University Geelong Victoria 3217 Australia.
> > > Phone: 03 5227 8230 International: +61 3 5227 8230
> > > Fax: 03 5227 8000 International: +61 3 5227 8000
> > > MSN: [hidden email]
> > > Email: [hidden email]<mailto:
> > > [hidden email]>
> > > Website: http://www.deakin.edu.au
> > > <http://www.deakin.edu.au/>Deakin University CRICOS Provider Code
> 00113B
> > > (Vic)
> > >
> > > Important Notice: The contents of this email are intended solely for
> the
> > > named addressee and are confidential; any unauthorised use,
> reproduction
> > or
> > > storage of the contents is expressly prohibited. If you have received
> > this
> > > email in error, please delete it and any attachments immediately and
> > advise
> > > the sender by return email or telephone.
> > > Deakin University does not warrant that this email and any attachments
> > are
> > > error or virus free
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

RE: "begins with" searches

bernieh
Here's the "all" code snippets -

   <!-- catchall field, containing all other searchable text fields (implemented
        via copyField further on in this schema  -->
   <field name="all" type="text" indexed="true" stored="false" multiValued="true"/>
.
.
<!-- field for the QueryParser to use when an explicit fieldname is absent -->
 <defaultSearchField>all</defaultSearchField>
.
.
   <!-- Copy for ALL search -->
   <copyField source="*_t" dest="*_t_ft"/>
   <copyField source="*_mt" dest="*_mft"/>
   <copyField source="content" dest="all"/>
   <copyField source="*_t" dest="all"/>
   <copyField source="*_mt" dest="all"/>

It sounds from what you say that I'm going to need to change the field type to "edgytext". Which won't achieve the result I want, viz. the current "all" plus the edgytext. Any way to achieve this?

Thanks!
bern

-----Original Message-----
From: Avlesh Singh [mailto:[hidden email]]
Sent: Wednesday, 28 October 2009 3:30 PM
To: [hidden email]
Subject: Re: "begins with" searches

>
> My next issue relates to how to get the results of the author field come up
> in a search across all fields. For example, a search on author:"Houghton, B"
> (which uses the edgytext) yields 16 documents, but a search on
> all:"Houghton, B" (which doesn't) yields only 9. I thought the solution
> should be <copyfield source="*author_mt" dest="all"/> but that doesn't do
> the trick.
>

Do you have a field called "all"? How is it set up? Can you post the
schema.xml snippet relating to this field here?
<copyField> is supported for a dynamic field source. <copyfield
source="*author_mt" dest="all"/> should work for you as long as you have a
field called "all" defined in your schema. Moreover, for your specific use
case, the "all" field needs to be of type "edgytext".

Cheers
Avlesh

On Wed, Oct 28, 2009 at 9:35 AM, Bernadette Houghton <
[hidden email]> wrote:

> Thanks Avlesh. The issue with not doing a phrase query on my "edgytext"
> field was that my parent application was adding an escape character to the
> quotation marks, and I was hoping to fix (or rather, work around) at the
> solr end to save maintenance overhead. But I've done a hack in the parent
> application to remove those escape chars, and all is working well in that
> respect.
>
> My next issue relates to how to get the results of the author field come up
> in a search across all fields. For example, a search on author:"Houghton, B"
> (which uses the edgytext) yields 16 documents, but a search on
> all:"Houghton, B" (which doesn't) yields only 9. I thought the solution
> should be <copyfield source="*author_mt" dest="all"/> but that doesn't do
> the trick.
>
> Thanks!
>
> bern
> -----Original Message-----
> From: Avlesh Singh [mailto:[hidden email]]
> Sent: Tuesday, 27 October 2009 5:54 PM
> To: [hidden email]
> Subject: Re: "begins with" searches
>
> You are right about the parsing of query terms without a double quote
> (solrQueryParser's defaultOperator has to be "AND" in your case). For the
> problem at hand, two things -
>
>    1. Do you have any reason for not doing a PhraseQuery (query terms
>    enclosed in double quotes) on your "edgytext" field? If not then you can
>   always enclose your query in double quotes to get expected "begins with"
>   matches.
>    2. You can always "escape" your query string before passing to Solr; and
>    you wouldn't need to pass your query term in double quotes. For exapmle,
>   search for the query string - surname, fre when "escaped" would be
> converted
>   into surname,\+fre thereby asking Solr to treat this as a single query
> term.
>   For more details -
>
> http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Escaping%20Special%20Characters
> .
>   If you use SolrJ, there is a ClientUtils class somewhere in the package
>   which has helper functions to achieve query escaping.
>
> Cheers
> Avlesh
>
> On Tue, Oct 27, 2009 at 9:22 AM, Bernadette Houghton <
> [hidden email]> wrote:
>
> > Thanks for this suggestion (thanks Gerald also: no, we're not using
> > BlackLight-type prefixes).
> >
> > I've set up an edgytext fieldType in schema.xml thus -
> >
> > <fieldType name="edgytext" class="solr.TextField"
> > positionIncrementGap="100">
> >  <analyzer type="index">
> >   <tokenizer class="solr.KeywordTokenizerFactory"/>
> >   <filter class="solr.LowerCaseFilterFactory"/>
> >   <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
> > maxGramSize="25" />
> >  </analyzer>
> >  <analyzer type="query">
> >   <tokenizer class="solr.KeywordTokenizerFactory"/>
> >   <filter class="solr.LowerCaseFilterFactory"/>
> >  </analyzer>
> > </fieldType>
> >
> > And defined a field name thus -
> >
> > <dynamicField name="*author_mt"  type="edgytext"    indexed="true"
> >  stored="true" multiValued="true"/>
> >
> > The results are mixed -
> >
> > * searches such as "surname, f" and "surname, fre" (with quotations and
> > commas) work well, retrieving "surname, f", "surname, Fred", "surname,
> > Frederick" etc etc
> > * searches such as the above but without quotations don't work too well
> as
> > they get parsed as author_mt:surname + author_mt:firstname, with solr
> > reading the query as "author beginning with surname AND author beginning
> > with firstname", which yields nil results.
> >
> > Is there an analyser that will strip the whitespace out altogether? Or
> > another alternative?
> >
> > bern
> >
> > -----Original Message-----
> > From: Avlesh Singh [mailto:[hidden email]]
> > Sent: Monday, 26 October 2009 6:32 PM
> > To: [hidden email]
> > Subject: Re: "begins with" searches
> >
> > Read up of setting-up these kind searches here -
> >
> >
> http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
> >
> > Cheers
> > Avlesh
> >
> > On Mon, Oct 26, 2009 at 7:43 AM, Bernadette Houghton <
> > [hidden email]> wrote:
> >
> > > We need to offer "begins with" type searches, e.g. a search for
> "surname,
> > > f" will retrieve "surname, firstname", "surname, f", "surname fm" etc.
> > >
> > > Ideally, the user would be able to enter something like "surname f*".
> > >
> > > However, wildcards don't work on phrase searches, nor do range
> searches.
> > >
> > > Any suggestions as to how best to search for "begins with" phrases; or,
> > how
> > > to best configure solr to support such searches?
> > >
> > > TIA
> > > Bernadette Houghton, Library Business Applications Developer
> > > Deakin University Geelong Victoria 3217 Australia.
> > > Phone: 03 5227 8230 International: +61 3 5227 8230
> > > Fax: 03 5227 8000 International: +61 3 5227 8000
> > > MSN: [hidden email]
> > > Email: [hidden email]<mailto:
> > > [hidden email]>
> > > Website: http://www.deakin.edu.au
> > > <http://www.deakin.edu.au/>Deakin University CRICOS Provider Code
> 00113B
> > > (Vic)
> > >
> > > Important Notice: The contents of this email are intended solely for
> the
> > > named addressee and are confidential; any unauthorised use,
> reproduction
> > or
> > > storage of the contents is expressly prohibited. If you have received
> > this
> > > email in error, please delete it and any attachments immediately and
> > advise
> > > the sender by return email or telephone.
> > > Deakin University does not warrant that this email and any attachments
> > are
> > > error or virus free
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: "begins with" searches

Avlesh Singh
>
> It sounds from what you say that I'm going to need to change the field type
> to "edgytext". Which won't achieve the result I want, viz. the current "all"
> plus the edgytext. Any way to achieve this?
>
I guess there is a mismatch of expectations here. A field can be analyzed in
only ONE way. If your field "all" is of type "text", indexing and searching
would go through the analyzers (tokenizers and filters) specified ONLY for
the text field. It does not matter if data from a "edgytext" or any other
field type is being copied into the field.

Having said that converting the "all" field to type "edgytext" should still
work fine. All your regular searches on a text field should also work with
the edgytext field. Ain't it like that?

Cheers
Avlesh

On Thu, Oct 29, 2009 at 2:52 AM, Bernadette Houghton <
[hidden email]> wrote:

> Here's the "all" code snippets -
>
>   <!-- catchall field, containing all other searchable text fields
> (implemented
>        via copyField further on in this schema  -->
>   <field name="all" type="text" indexed="true" stored="false"
> multiValued="true"/>
> .
> .
> <!-- field for the QueryParser to use when an explicit fieldname is absent
> -->
>  <defaultSearchField>all</defaultSearchField>
> .
> .
>   <!-- Copy for ALL search -->
>   <copyField source="*_t" dest="*_t_ft"/>
>   <copyField source="*_mt" dest="*_mft"/>
>   <copyField source="content" dest="all"/>
>   <copyField source="*_t" dest="all"/>
>   <copyField source="*_mt" dest="all"/>
>
> It sounds from what you say that I'm going to need to change the field type
> to "edgytext". Which won't achieve the result I want, viz. the current "all"
> plus the edgytext. Any way to achieve this?
>
> Thanks!
> bern
>
> -----Original Message-----
> From: Avlesh Singh [mailto:[hidden email]]
> Sent: Wednesday, 28 October 2009 3:30 PM
> To: [hidden email]
> Subject: Re: "begins with" searches
>
> >
> > My next issue relates to how to get the results of the author field come
> up
> > in a search across all fields. For example, a search on author:"Houghton,
> B"
> > (which uses the edgytext) yields 16 documents, but a search on
> > all:"Houghton, B" (which doesn't) yields only 9. I thought the solution
> > should be <copyfield source="*author_mt" dest="all"/> but that doesn't do
> > the trick.
> >
>
> Do you have a field called "all"? How is it set up? Can you post the
> schema.xml snippet relating to this field here?
> <copyField> is supported for a dynamic field source. <copyfield
> source="*author_mt" dest="all"/> should work for you as long as you have a
> field called "all" defined in your schema. Moreover, for your specific use
> case, the "all" field needs to be of type "edgytext".
>
> Cheers
> Avlesh
>
> On Wed, Oct 28, 2009 at 9:35 AM, Bernadette Houghton <
> [hidden email]> wrote:
>
> > Thanks Avlesh. The issue with not doing a phrase query on my "edgytext"
> > field was that my parent application was adding an escape character to
> the
> > quotation marks, and I was hoping to fix (or rather, work around) at the
> > solr end to save maintenance overhead. But I've done a hack in the parent
> > application to remove those escape chars, and all is working well in that
> > respect.
> >
> > My next issue relates to how to get the results of the author field come
> up
> > in a search across all fields. For example, a search on author:"Houghton,
> B"
> > (which uses the edgytext) yields 16 documents, but a search on
> > all:"Houghton, B" (which doesn't) yields only 9. I thought the solution
> > should be <copyfield source="*author_mt" dest="all"/> but that doesn't do
> > the trick.
> >
> > Thanks!
> >
> > bern
> > -----Original Message-----
> > From: Avlesh Singh [mailto:[hidden email]]
> > Sent: Tuesday, 27 October 2009 5:54 PM
> > To: [hidden email]
> > Subject: Re: "begins with" searches
> >
> > You are right about the parsing of query terms without a double quote
> > (solrQueryParser's defaultOperator has to be "AND" in your case). For the
> > problem at hand, two things -
> >
> >    1. Do you have any reason for not doing a PhraseQuery (query terms
> >    enclosed in double quotes) on your "edgytext" field? If not then you
> can
> >   always enclose your query in double quotes to get expected "begins
> with"
> >   matches.
> >    2. You can always "escape" your query string before passing to Solr;
> and
> >    you wouldn't need to pass your query term in double quotes. For
> exapmle,
> >   search for the query string - surname, fre when "escaped" would be
> > converted
> >   into surname,\+fre thereby asking Solr to treat this as a single query
> > term.
> >   For more details -
> >
> >
> http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Escaping%20Special%20Characters
> > .
> >   If you use SolrJ, there is a ClientUtils class somewhere in the package
> >   which has helper functions to achieve query escaping.
> >
> > Cheers
> > Avlesh
> >
> > On Tue, Oct 27, 2009 at 9:22 AM, Bernadette Houghton <
> > [hidden email]> wrote:
> >
> > > Thanks for this suggestion (thanks Gerald also: no, we're not using
> > > BlackLight-type prefixes).
> > >
> > > I've set up an edgytext fieldType in schema.xml thus -
> > >
> > > <fieldType name="edgytext" class="solr.TextField"
> > > positionIncrementGap="100">
> > >  <analyzer type="index">
> > >   <tokenizer class="solr.KeywordTokenizerFactory"/>
> > >   <filter class="solr.LowerCaseFilterFactory"/>
> > >   <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
> > > maxGramSize="25" />
> > >  </analyzer>
> > >  <analyzer type="query">
> > >   <tokenizer class="solr.KeywordTokenizerFactory"/>
> > >   <filter class="solr.LowerCaseFilterFactory"/>
> > >  </analyzer>
> > > </fieldType>
> > >
> > > And defined a field name thus -
> > >
> > > <dynamicField name="*author_mt"  type="edgytext"    indexed="true"
> > >  stored="true" multiValued="true"/>
> > >
> > > The results are mixed -
> > >
> > > * searches such as "surname, f" and "surname, fre" (with quotations and
> > > commas) work well, retrieving "surname, f", "surname, Fred", "surname,
> > > Frederick" etc etc
> > > * searches such as the above but without quotations don't work too well
> > as
> > > they get parsed as author_mt:surname + author_mt:firstname, with solr
> > > reading the query as "author beginning with surname AND author
> beginning
> > > with firstname", which yields nil results.
> > >
> > > Is there an analyser that will strip the whitespace out altogether? Or
> > > another alternative?
> > >
> > > bern
> > >
> > > -----Original Message-----
> > > From: Avlesh Singh [mailto:[hidden email]]
> > > Sent: Monday, 26 October 2009 6:32 PM
> > > To: [hidden email]
> > > Subject: Re: "begins with" searches
> > >
> > > Read up of setting-up these kind searches here -
> > >
> > >
> >
> http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
> > >
> > > Cheers
> > > Avlesh
> > >
> > > On Mon, Oct 26, 2009 at 7:43 AM, Bernadette Houghton <
> > > [hidden email]> wrote:
> > >
> > > > We need to offer "begins with" type searches, e.g. a search for
> > "surname,
> > > > f" will retrieve "surname, firstname", "surname, f", "surname fm"
> etc.
> > > >
> > > > Ideally, the user would be able to enter something like "surname f*".
> > > >
> > > > However, wildcards don't work on phrase searches, nor do range
> > searches.
> > > >
> > > > Any suggestions as to how best to search for "begins with" phrases;
> or,
> > > how
> > > > to best configure solr to support such searches?
> > > >
> > > > TIA
> > > > Bernadette Houghton, Library Business Applications Developer
> > > > Deakin University Geelong Victoria 3217 Australia.
> > > > Phone: 03 5227 8230 International: +61 3 5227 8230
> > > > Fax: 03 5227 8000 International: +61 3 5227 8000
> > > > MSN: [hidden email]
> > > > Email: [hidden email]<mailto:
> > > > [hidden email]>
> > > > Website: http://www.deakin.edu.au
> > > > <http://www.deakin.edu.au/>Deakin University CRICOS Provider Code
> > 00113B
> > > > (Vic)
> > > >
> > > > Important Notice: The contents of this email are intended solely for
> > the
> > > > named addressee and are confidential; any unauthorised use,
> > reproduction
> > > or
> > > > storage of the contents is expressly prohibited. If you have received
> > > this
> > > > email in error, please delete it and any attachments immediately and
> > > advise
> > > > the sender by return email or telephone.
> > > > Deakin University does not warrant that this email and any
> attachments
> > > are
> > > > error or virus free
> > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

RE: "begins with" searches

bernieh
G'day Avlesh, converting the "all" field to type "edgytext" doesn't work as expected as the various "text" analysers etc don't get to work on that field, so I get less results than expected. And adding the edgy filter into the text field also yields less results. I can work around the issue by setting up a new "beginswith" edgytext field and using copyfield to copy the relevant fields into it.

But this approach doesn't really suit our parent application's main search screen, which is a single box labelled "quick search". Users will be puzzled as to why a search for "beginswith:"Houghton, b"" yields 20 results, while a search for "Houghton, b" yields 10. And also puzzled as to why "Houghton, b*" won't work.as they expect - people are already familiar with using wildcards. A way to get around this user perception problem is to get rid of the single search box and set up a series of drop down boxes for type of search (begins with, etc), along with field names. We might have to go there, but the ideal solution from our perspective would be for users to be able to enter terms in the "quick search" box without any field prefix, and have solr go off and search all field names/types.

By the way, our "text" field type config is currently set as -

    <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
                <filter class="solr.ISOLatin1AccentFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.ISOLatin1AccentFilterFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>

Bern


-----Original Message-----
From: Avlesh Singh [mailto:[hidden email]]
Sent: Thursday, 29 October 2009 12:35 PM
To: [hidden email]
Subject: Re: "begins with" searches

>
> It sounds from what you say that I'm going to need to change the field type
> to "edgytext". Which won't achieve the result I want, viz. the current "all"
> plus the edgytext. Any way to achieve this?
>
I guess there is a mismatch of expectations here. A field can be analyzed in
only ONE way. If your field "all" is of type "text", indexing and searching
would go through the analyzers (tokenizers and filters) specified ONLY for
the text field. It does not matter if data from a "edgytext" or any other
field type is being copied into the field.

Having said that converting the "all" field to type "edgytext" should still
work fine. All your regular searches on a text field should also work with
the edgytext field. Ain't it like that?

Cheers
Avlesh

On Thu, Oct 29, 2009 at 2:52 AM, Bernadette Houghton <
[hidden email]> wrote:

> Here's the "all" code snippets -
>
>   <!-- catchall field, containing all other searchable text fields
> (implemented
>        via copyField further on in this schema  -->
>   <field name="all" type="text" indexed="true" stored="false"
> multiValued="true"/>
> .
> .
> <!-- field for the QueryParser to use when an explicit fieldname is absent
> -->
>  <defaultSearchField>all</defaultSearchField>
> .
> .
>   <!-- Copy for ALL search -->
>   <copyField source="*_t" dest="*_t_ft"/>
>   <copyField source="*_mt" dest="*_mft"/>
>   <copyField source="content" dest="all"/>
>   <copyField source="*_t" dest="all"/>
>   <copyField source="*_mt" dest="all"/>
>
> It sounds from what you say that I'm going to need to change the field type
> to "edgytext". Which won't achieve the result I want, viz. the current "all"
> plus the edgytext. Any way to achieve this?
>
> Thanks!
> bern
>
> -----Original Message-----
> From: Avlesh Singh [mailto:[hidden email]]
> Sent: Wednesday, 28 October 2009 3:30 PM
> To: [hidden email]
> Subject: Re: "begins with" searches
>
> >
> > My next issue relates to how to get the results of the author field come
> up
> > in a search across all fields. For example, a search on author:"Houghton,
> B"
> > (which uses the edgytext) yields 16 documents, but a search on
> > all:"Houghton, B" (which doesn't) yields only 9. I thought the solution
> > should be <copyfield source="*author_mt" dest="all"/> but that doesn't do
> > the trick.
> >
>
> Do you have a field called "all"? How is it set up? Can you post the
> schema.xml snippet relating to this field here?
> <copyField> is supported for a dynamic field source. <copyfield
> source="*author_mt" dest="all"/> should work for you as long as you have a
> field called "all" defined in your schema. Moreover, for your specific use
> case, the "all" field needs to be of type "edgytext".
>
> Cheers
> Avlesh
>
> On Wed, Oct 28, 2009 at 9:35 AM, Bernadette Houghton <
> [hidden email]> wrote:
>
> > Thanks Avlesh. The issue with not doing a phrase query on my "edgytext"
> > field was that my parent application was adding an escape character to
> the
> > quotation marks, and I was hoping to fix (or rather, work around) at the
> > solr end to save maintenance overhead. But I've done a hack in the parent
> > application to remove those escape chars, and all is working well in that
> > respect.
> >
> > My next issue relates to how to get the results of the author field come
> up
> > in a search across all fields. For example, a search on author:"Houghton,
> B"
> > (which uses the edgytext) yields 16 documents, but a search on
> > all:"Houghton, B" (which doesn't) yields only 9. I thought the solution
> > should be <copyfield source="*author_mt" dest="all"/> but that doesn't do
> > the trick.
> >
> > Thanks!
> >
> > bern
> > -----Original Message-----
> > From: Avlesh Singh [mailto:[hidden email]]
> > Sent: Tuesday, 27 October 2009 5:54 PM
> > To: [hidden email]
> > Subject: Re: "begins with" searches
> >
> > You are right about the parsing of query terms without a double quote
> > (solrQueryParser's defaultOperator has to be "AND" in your case). For the
> > problem at hand, two things -
> >
> >    1. Do you have any reason for not doing a PhraseQuery (query terms
> >    enclosed in double quotes) on your "edgytext" field? If not then you
> can
> >   always enclose your query in double quotes to get expected "begins
> with"
> >   matches.
> >    2. You can always "escape" your query string before passing to Solr;
> and
> >    you wouldn't need to pass your query term in double quotes. For
> exapmle,
> >   search for the query string - surname, fre when "escaped" would be
> > converted
> >   into surname,\+fre thereby asking Solr to treat this as a single query
> > term.
> >   For more details -
> >
> >
> http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Escaping%20Special%20Characters
> > .
> >   If you use SolrJ, there is a ClientUtils class somewhere in the package
> >   which has helper functions to achieve query escaping.
> >
> > Cheers
> > Avlesh
> >
> > On Tue, Oct 27, 2009 at 9:22 AM, Bernadette Houghton <
> > [hidden email]> wrote:
> >
> > > Thanks for this suggestion (thanks Gerald also: no, we're not using
> > > BlackLight-type prefixes).
> > >
> > > I've set up an edgytext fieldType in schema.xml thus -
> > >
> > > <fieldType name="edgytext" class="solr.TextField"
> > > positionIncrementGap="100">
> > >  <analyzer type="index">
> > >   <tokenizer class="solr.KeywordTokenizerFactory"/>
> > >   <filter class="solr.LowerCaseFilterFactory"/>
> > >   <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
> > > maxGramSize="25" />
> > >  </analyzer>
> > >  <analyzer type="query">
> > >   <tokenizer class="solr.KeywordTokenizerFactory"/>
> > >   <filter class="solr.LowerCaseFilterFactory"/>
> > >  </analyzer>
> > > </fieldType>
> > >
> > > And defined a field name thus -
> > >
> > > <dynamicField name="*author_mt"  type="edgytext"    indexed="true"
> > >  stored="true" multiValued="true"/>
> > >
> > > The results are mixed -
> > >
> > > * searches such as "surname, f" and "surname, fre" (with quotations and
> > > commas) work well, retrieving "surname, f", "surname, Fred", "surname,
> > > Frederick" etc etc
> > > * searches such as the above but without quotations don't work too well
> > as
> > > they get parsed as author_mt:surname + author_mt:firstname, with solr
> > > reading the query as "author beginning with surname AND author
> beginning
> > > with firstname", which yields nil results.
> > >
> > > Is there an analyser that will strip the whitespace out altogether? Or
> > > another alternative?
> > >
> > > bern
> > >
> > > -----Original Message-----
> > > From: Avlesh Singh [mailto:[hidden email]]
> > > Sent: Monday, 26 October 2009 6:32 PM
> > > To: [hidden email]
> > > Subject: Re: "begins with" searches
> > >
> > > Read up of setting-up these kind searches here -
> > >
> > >
> >
> http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
> > >
> > > Cheers
> > > Avlesh
> > >
> > > On Mon, Oct 26, 2009 at 7:43 AM, Bernadette Houghton <
> > > [hidden email]> wrote:
> > >
> > > > We need to offer "begins with" type searches, e.g. a search for
> > "surname,
> > > > f" will retrieve "surname, firstname", "surname, f", "surname fm"
> etc.
> > > >
> > > > Ideally, the user would be able to enter something like "surname f*".
> > > >
> > > > However, wildcards don't work on phrase searches, nor do range
> > searches.
> > > >
> > > > Any suggestions as to how best to search for "begins with" phrases;
> or,
> > > how
> > > > to best configure solr to support such searches?
> > > >
> > > > TIA
> > > > Bernadette Houghton, Library Business Applications Developer
> > > > Deakin University Geelong Victoria 3217 Australia.
> > > > Phone: 03 5227 8230 International: +61 3 5227 8230
> > > > Fax: 03 5227 8000 International: +61 3 5227 8000
> > > > MSN: [hidden email]
> > > > Email: [hidden email]<mailto:
> > > > [hidden email]>
> > > > Website: http://www.deakin.edu.au
> > > > <http://www.deakin.edu.au/>Deakin University CRICOS Provider Code
> > 00113B
> > > > (Vic)
> > > >
> > > > Important Notice: The contents of this email are intended solely for
> > the
> > > > named addressee and are confidential; any unauthorised use,
> > reproduction
> > > or
> > > > storage of the contents is expressly prohibited. If you have received
> > > this
> > > > email in error, please delete it and any attachments immediately and
> > > advise
> > > > the sender by return email or telephone.
> > > > Deakin University does not warrant that this email and any
> attachments
> > > are
> > > > error or virus free
> > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: "begins with" searches

Avlesh Singh
>
> G'day Avlesh, converting the "all" field to type "edgytext" doesn't work as
> expected as the various "text" analysers etc don't get to work on that
> field, so I get less results than expected. And adding the edgy filter into
> the text field also yields less results. I can work around the issue by
> setting up a new "beginswith" edgytext field and using copyfield to copy the
> relevant fields into it.
>
You are absolutely right. What you think of being a work-around is actually
a solution!

But this approach doesn't really suit our parent application's main search

> screen, which is a single box labelled "quick search". Users will be puzzled
> as to why a search for "beginswith:"Houghton, b"" yields 20 results, while a
> search for "Houghton, b" yields 10. And also puzzled as to why "Houghton,
> b*" won't work. as they expect - people are already familiar with using
> wildcards. A way to get around this user perception problem is to get rid of
> the single search box and set up a series of drop down boxes for type of
> search (begins with, etc), along with field names. We might have to go
> there, but the ideal solution from our perspective would be for users to be
> able to enter terms in the "quick search" box without any field prefix, and
> have solr go off and search all field names/types.
>
As I said earlier, a field can be analyzed in only ONE way. In your kind of
requirements, multiple searching capabilities are needed for a single query.
Unfortunately, not all of these can be addressed by a single field. The
solution is to create multiple fields set up with different analyzers
(tokenizers and filters) while indexing and searching. At query time an OR
query can be done for all such fields (with a corresponding boost for a
particular field, if desired). Lucene would automatically rank the results
in correct order based on hits across multiple fields.

Hope this helps. And sorry for the delayed response.

Cheers
Avlesh

On Fri, Oct 30, 2009 at 3:22 AM, Bernadette Houghton <
[hidden email]> wrote:

> G'day Avlesh, converting the "all" field to type "edgytext" doesn't work as
> expected as the various "text" analysers etc don't get to work on that
> field, so I get less results than expected. And adding the edgy filter into
> the text field also yields less results. I can work around the issue by
> setting up a new "beginswith" edgytext field and using copyfield to copy the
> relevant fields into it.
>
> But this approach doesn't really suit our parent application's main search
> screen, which is a single box labelled "quick search". Users will be puzzled
> as to why a search for "beginswith:"Houghton, b"" yields 20 results, while a
> search for "Houghton, b" yields 10. And also puzzled as to why "Houghton,
> b*" won't work.as they expect - people are already familiar with using
> wildcards. A way to get around this user perception problem is to get rid of
> the single search box and set up a series of drop down boxes for type of
> search (begins with, etc), along with field names. We might have to go
> there, but the ideal solution from our perspective would be for users to be
> able to enter terms in the "quick search" box without any field prefix, and
> have solr go off and search all field names/types.
>
> By the way, our "text" field type config is currently set as -
>
>    <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100">
>      <analyzer type="index">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <!-- in this example, we will only use synonyms at query time
>        <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>        -->
>                <filter class="solr.ISOLatin1AccentFilterFactory"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                <filter class="solr.ISOLatin1AccentFilterFactory"/>
>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>      </analyzer>
>    </fieldType>
>
> Bern
>
>
> -----Original Message-----
> From: Avlesh Singh [mailto:[hidden email]]
> Sent: Thursday, 29 October 2009 12:35 PM
> To: [hidden email]
> Subject: Re: "begins with" searches
>
> >
> > It sounds from what you say that I'm going to need to change the field
> type
> > to "edgytext". Which won't achieve the result I want, viz. the current
> "all"
> > plus the edgytext. Any way to achieve this?
> >
> I guess there is a mismatch of expectations here. A field can be analyzed
> in
> only ONE way. If your field "all" is of type "text", indexing and searching
> would go through the analyzers (tokenizers and filters) specified ONLY for
> the text field. It does not matter if data from a "edgytext" or any other
> field type is being copied into the field.
>
> Having said that converting the "all" field to type "edgytext" should still
> work fine. All your regular searches on a text field should also work with
> the edgytext field. Ain't it like that?
>
> Cheers
> Avlesh
>
> On Thu, Oct 29, 2009 at 2:52 AM, Bernadette Houghton <
> [hidden email]> wrote:
>
> > Here's the "all" code snippets -
> >
> >   <!-- catchall field, containing all other searchable text fields
> > (implemented
> >        via copyField further on in this schema  -->
> >   <field name="all" type="text" indexed="true" stored="false"
> > multiValued="true"/>
> > .
> > .
> > <!-- field for the QueryParser to use when an explicit fieldname is
> absent
> > -->
> >  <defaultSearchField>all</defaultSearchField>
> > .
> > .
> >   <!-- Copy for ALL search -->
> >   <copyField source="*_t" dest="*_t_ft"/>
> >   <copyField source="*_mt" dest="*_mft"/>
> >   <copyField source="content" dest="all"/>
> >   <copyField source="*_t" dest="all"/>
> >   <copyField source="*_mt" dest="all"/>
> >
> > It sounds from what you say that I'm going to need to change the field
> type
> > to "edgytext". Which won't achieve the result I want, viz. the current
> "all"
> > plus the edgytext. Any way to achieve this?
> >
> > Thanks!
> > bern
> >
> > -----Original Message-----
> > From: Avlesh Singh [mailto:[hidden email]]
> > Sent: Wednesday, 28 October 2009 3:30 PM
> > To: [hidden email]
> > Subject: Re: "begins with" searches
> >
> > >
> > > My next issue relates to how to get the results of the author field
> come
> > up
> > > in a search across all fields. For example, a search on
> author:"Houghton,
> > B"
> > > (which uses the edgytext) yields 16 documents, but a search on
> > > all:"Houghton, B" (which doesn't) yields only 9. I thought the solution
> > > should be <copyfield source="*author_mt" dest="all"/> but that doesn't
> do
> > > the trick.
> > >
> >
> > Do you have a field called "all"? How is it set up? Can you post the
> > schema.xml snippet relating to this field here?
> > <copyField> is supported for a dynamic field source. <copyfield
> > source="*author_mt" dest="all"/> should work for you as long as you have
> a
> > field called "all" defined in your schema. Moreover, for your specific
> use
> > case, the "all" field needs to be of type "edgytext".
> >
> > Cheers
> > Avlesh
> >
> > On Wed, Oct 28, 2009 at 9:35 AM, Bernadette Houghton <
> > [hidden email]> wrote:
> >
> > > Thanks Avlesh. The issue with not doing a phrase query on my "edgytext"
> > > field was that my parent application was adding an escape character to
> > the
> > > quotation marks, and I was hoping to fix (or rather, work around) at
> the
> > > solr end to save maintenance overhead. But I've done a hack in the
> parent
> > > application to remove those escape chars, and all is working well in
> that
> > > respect.
> > >
> > > My next issue relates to how to get the results of the author field
> come
> > up
> > > in a search across all fields. For example, a search on
> author:"Houghton,
> > B"
> > > (which uses the edgytext) yields 16 documents, but a search on
> > > all:"Houghton, B" (which doesn't) yields only 9. I thought the solution
> > > should be <copyfield source="*author_mt" dest="all"/> but that doesn't
> do
> > > the trick.
> > >
> > > Thanks!
> > >
> > > bern
> > > -----Original Message-----
> > > From: Avlesh Singh [mailto:[hidden email]]
> > > Sent: Tuesday, 27 October 2009 5:54 PM
> > > To: [hidden email]
> > > Subject: Re: "begins with" searches
> > >
> > > You are right about the parsing of query terms without a double quote
> > > (solrQueryParser's defaultOperator has to be "AND" in your case). For
> the
> > > problem at hand, two things -
> > >
> > >    1. Do you have any reason for not doing a PhraseQuery (query terms
> > >    enclosed in double quotes) on your "edgytext" field? If not then you
> > can
> > >   always enclose your query in double quotes to get expected "begins
> > with"
> > >   matches.
> > >    2. You can always "escape" your query string before passing to Solr;
> > and
> > >    you wouldn't need to pass your query term in double quotes. For
> > exapmle,
> > >   search for the query string - surname, fre when "escaped" would be
> > > converted
> > >   into surname,\+fre thereby asking Solr to treat this as a single
> query
> > > term.
> > >   For more details -
> > >
> > >
> >
> http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Escaping%20Special%20Characters
> > > .
> > >   If you use SolrJ, there is a ClientUtils class somewhere in the
> package
> > >   which has helper functions to achieve query escaping.
> > >
> > > Cheers
> > > Avlesh
> > >
> > > On Tue, Oct 27, 2009 at 9:22 AM, Bernadette Houghton <
> > > [hidden email]> wrote:
> > >
> > > > Thanks for this suggestion (thanks Gerald also: no, we're not using
> > > > BlackLight-type prefixes).
> > > >
> > > > I've set up an edgytext fieldType in schema.xml thus -
> > > >
> > > > <fieldType name="edgytext" class="solr.TextField"
> > > > positionIncrementGap="100">
> > > >  <analyzer type="index">
> > > >   <tokenizer class="solr.KeywordTokenizerFactory"/>
> > > >   <filter class="solr.LowerCaseFilterFactory"/>
> > > >   <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
> > > > maxGramSize="25" />
> > > >  </analyzer>
> > > >  <analyzer type="query">
> > > >   <tokenizer class="solr.KeywordTokenizerFactory"/>
> > > >   <filter class="solr.LowerCaseFilterFactory"/>
> > > >  </analyzer>
> > > > </fieldType>
> > > >
> > > > And defined a field name thus -
> > > >
> > > > <dynamicField name="*author_mt"  type="edgytext"    indexed="true"
> > > >  stored="true" multiValued="true"/>
> > > >
> > > > The results are mixed -
> > > >
> > > > * searches such as "surname, f" and "surname, fre" (with quotations
> and
> > > > commas) work well, retrieving "surname, f", "surname, Fred",
> "surname,
> > > > Frederick" etc etc
> > > > * searches such as the above but without quotations don't work too
> well
> > > as
> > > > they get parsed as author_mt:surname + author_mt:firstname, with solr
> > > > reading the query as "author beginning with surname AND author
> > beginning
> > > > with firstname", which yields nil results.
> > > >
> > > > Is there an analyser that will strip the whitespace out altogether?
> Or
> > > > another alternative?
> > > >
> > > > bern
> > > >
> > > > -----Original Message-----
> > > > From: Avlesh Singh [mailto:[hidden email]]
> > > > Sent: Monday, 26 October 2009 6:32 PM
> > > > To: [hidden email]
> > > > Subject: Re: "begins with" searches
> > > >
> > > > Read up of setting-up these kind searches here -
> > > >
> > > >
> > >
> >
> http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
> > > >
> > > > Cheers
> > > > Avlesh
> > > >
> > > > On Mon, Oct 26, 2009 at 7:43 AM, Bernadette Houghton <
> > > > [hidden email]> wrote:
> > > >
> > > > > We need to offer "begins with" type searches, e.g. a search for
> > > "surname,
> > > > > f" will retrieve "surname, firstname", "surname, f", "surname fm"
> > etc.
> > > > >
> > > > > Ideally, the user would be able to enter something like "surname
> f*".
> > > > >
> > > > > However, wildcards don't work on phrase searches, nor do range
> > > searches.
> > > > >
> > > > > Any suggestions as to how best to search for "begins with" phrases;
> > or,
> > > > how
> > > > > to best configure solr to support such searches?
> > > > >
> > > > > TIA
> > > > > Bernadette Houghton, Library Business Applications Developer
> > > > > Deakin University Geelong Victoria 3217 Australia.
> > > > > Phone: 03 5227 8230 International: +61 3 5227 8230
> > > > > Fax: 03 5227 8000 International: +61 3 5227 8000
> > > > > MSN: [hidden email]
> > > > > Email: [hidden email]<mailto:
> > > > > [hidden email]>
> > > > > Website: http://www.deakin.edu.au
> > > > > <http://www.deakin.edu.au/>Deakin University CRICOS Provider Code
> > > 00113B
> > > > > (Vic)
> > > > >
> > > > > Important Notice: The contents of this email are intended solely
> for
> > > the
> > > > > named addressee and are confidential; any unauthorised use,
> > > reproduction
> > > > or
> > > > > storage of the contents is expressly prohibited. If you have
> received
> > > > this
> > > > > email in error, please delete it and any attachments immediately
> and
> > > > advise
> > > > > the sender by return email or telephone.
> > > > > Deakin University does not warrant that this email and any
> > attachments
> > > > are
> > > > > error or virus free
> > > > >
> > > > >
> > > >
> > >
> >
>