Lucene 2.4 - Searching

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Lucene 2.4 - Searching

Karl Heinz Marbaise-3
Hi there,

I'm trying to do a, from my point of view, simple thing.

I would like to do a search ignoring the case of the stored information
in the index...with the following code:

reader = IndexReader.open(indexDirectory);
           
Searcher searcher = new IndexSearcher(reader);
Analyzer analyzer = new StandardAnalyzer();

//Created my own Query parse to handle ranges like filed:[1 TO 6]
QueryParser parser = new CustomQueryParser(FieldNames.CONTENTS, analyzer);
parser.setAllowLeadingWildcard(true);
parser.setLowercaseExpandedTerms(false);
Query query = parser.parse(queryLine);

TopDocs tmp = searcher.search(query, null, 20, sort);

To be more percisely...

I have a field which is called filename and contains a filename which
can of course be lowercase or upppercase or a mixture...

I would like to do the following:

+filename:/*scm*.doc

That should result in getting things like

/...SCMtest.doc
/...scmtest.doc
/...scm.doc
etc.

May be someone can give me hint how to solve this...

kind regards
Karl Heinz Marbaise
--
SoftwareEntwicklung Beratung Schulung    Tel.: +49 (0) 2405 / 415 893
Dipl.Ing.(FH) Karl Heinz Marbaise        ICQ#: 135949029
Hauptstrasse 177                         USt.IdNr: DE191347579
52146 Würselen                           http://www.soebes.de

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene 2.4 - Searching

Ian Lea
Hi


Sounds like a job for RegexQuery.  If you can't figure out how to use
it Google will throw up some examples.  You can downcase everything
yourself or use an analyzer that does it or maybe use a case
insensitive regexp.

Depending on your file names you might want to avoid StandardAnalyzer.
 It is likely to split them.  KeywordAnalyzer might be what you want.


--
Ian.


On Tue, Jan 27, 2009 at 7:29 PM, Karl Heinz Marbaise <[hidden email]> wrote:

> Hi there,
>
> I'm trying to do a, from my point of view, simple thing.
>
> I would like to do a search ignoring the case of the stored information in
> the index...with the following code:
>
> reader = IndexReader.open(indexDirectory);
>
> Searcher searcher = new IndexSearcher(reader);
> Analyzer analyzer = new StandardAnalyzer();
>
> //Created my own Query parse to handle ranges like filed:[1 TO 6]
> QueryParser parser = new CustomQueryParser(FieldNames.CONTENTS, analyzer);
> parser.setAllowLeadingWildcard(true);
> parser.setLowercaseExpandedTerms(false);
> Query query = parser.parse(queryLine);
>
> TopDocs tmp = searcher.search(query, null, 20, sort);
>
> To be more percisely...
>
> I have a field which is called filename and contains a filename which can of
> course be lowercase or upppercase or a mixture...
>
> I would like to do the following:
>
> +filename:/*scm*.doc
>
> That should result in getting things like
>
> /...SCMtest.doc
> /...scmtest.doc
> /...scm.doc
> etc.
>
> May be someone can give me hint how to solve this...
>
> kind regards
> Karl Heinz Marbaise
> --
> SoftwareEntwicklung Beratung Schulung    Tel.: +49 (0) 2405 / 415 893
> Dipl.Ing.(FH) Karl Heinz Marbaise        ICQ#: 135949029
> Hauptstrasse 177                         USt.IdNr: DE191347579
> 52146 Würselen                           http://www.soebes.de
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

adb
Reply | Threaded
Open this post in threaded view
|

Re: Lucene 2.4 - Searching

adb
In reply to this post by Karl Heinz Marbaise-3
Karl Heinz Marbaise wrote:

>
> I have a field which is called filename and contains a filename which
> can of course be lowercase or upppercase or a mixture...
>
> I would like to do the following:
>
> +filename:/*scm*.doc
>
> That should result in getting things like
>
> /...SCMtest.doc
> /...scmtest.doc
> /...scm.doc
> etc.
>
> May be someone can give me hint how to solve this...

It's all down to the analyzer you use when you index that field and how you
choose to tokenize it.  If you want to always search case insensitively, then
you should lower case the filename when indexing.

Depending on how you implemented your query parser, if you have implemented
wildcard query support, if it's anything like the standard QP, it will not put
the query string through the analyzer, so a search for

+filename:/*SCm*.doc

would then not find anything, so you'd need to make sure you lower case all the
filename field searches at some point.

I use a custom analyzer for filenames, which lower cases and tokenizes by letter
or digit or any custom chars and my query parser supports custom analyzers for
getFieldQuery().

If you want to keep the original filename, then just store the field as well as
index it, then you can get the original back from the Document.

Antony


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]