[lucy-user] Running query string thru Analyzer?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[lucy-user] Running query string thru Analyzer?

Gerald Richter
Hi,

 

I have defined a field in the following way:

 

    my $tokenizer    = Lucy::Analysis::StandardTokenizer->new;
    my $normalizer   = Lucy::Analysis::Normalizer->new (strip_accents => 1, case_fold => 1) ;
    my $field_analyzer = Lucy::Analysis::PolyAnalyzer->new
                            (
                            analyzers => [ $tokenizer, $normalizer ],
                            );
    my $field_type  = Lucy::Plan::FullTextType->new (analyzer => $field_analyzer) ;
    $schema->spec_field( name => 'option_ndx',  type => $field_type );

 

When I now run a query (either with a TermQuery or a WildcardQuery), and the indexed document was "Foo baß", it works as long as I query for "foo", but not when I query for "Foo" or "baß". So I guess I have to run the query string thru the same analyzer as the indexer does.

 

The question is how can I do this or is Lucy able to do this for me?

 

Thanks & Regards

 

Gerald

 

P.S. I am using Lucy 0.42

Reply | Threaded
Open this post in threaded view
|

Re: [lucy-user] Running query string thru Analyzer?

Gerald Richter
Thanks for all the feedback and sorry for sending the question two times by mistake.

 
The

 
$analyzer->split('Foo');

 
method was what I was searching for. I have to construct my query from serveral sources, so I cannot use QueryParser, but I added

 
        my $normalizer   = Lucy::Analysis::Normalizer->new (strip_accents => 1, case_fold => 1) ;
        my $tokens       = $normalizer->split($term);


and use $tokens to feed into TermQuery and it works perfectly.

 
I have overseen the split method in the cookbook custom query parser. It would be great to add it to the documentation of Lucy::Analysis::Analyzer which was the first place I was looking for such a method.

 
Maybe there are more helpful methods which are not mentioned in the docs and are worth being added...

 
I am used to look at the source code to get the documenation, but I not (yet) understand Clownfish and how it works.

 
Thanks & Regards

 
Gerald

 
 
 
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-user] Running query string thru Analyzer?

Nick Wellnhofer
On 27/01/2015 06:20, Gerald Richter wrote:
> I have overseen the split method in the cookbook custom query parser. It would be great to add it to the documentation of Lucy::Analysis::Analyzer which was the first place I was looking for such a method.

I added the "split" method to Analyzer's documentation for the next major release.

> Maybe there are more helpful methods which are not mentioned in the docs and are worth being added...

Yes, there certainly are. Lucy has taken a rather conservative approach with
regards to its public API.

> I am used to look at the source code to get the documenation, but I not (yet) understand Clownfish and how it works.

For documentation purposes, you can simply have a look at the .cfh (Clownfish
header) files in the "core" directory:

https://github.com/apache/lucy/tree/master/core
https://github.com/apache/lucy/blob/master/core/Lucy/Analysis/Analyzer.cfh

But as always, use undocumentated methods at your own risk.

Nick