wildcards in phrase searches

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

wildcards in phrase searches

Lee_Gary
Is it possible to have wildcards in a phrase search? For example, if my
object is indexed with a phrase "benchmark properties", is there a way to
specify a phrase search that uses wildcards, such as "benchmar* properties"
or "benchmark pro*"? I have tried using WildcardQuery, but it doesnt seem to
work with phrases, only single terms. Using the QueryParser doesnt seem to
work either, and ive tried it with the StandardAnalyzer and KeywordAnalyzer.
Perhaps there is something simple Im missing here. Has anyone gotten this to
work?
 
Thanks in advance,
-G
Reply | Threaded
Open this post in threaded view
|

Re: wildcards in phrase searches

Erik Hatcher

On May 10, 2006, at 1:47 PM, [hidden email] wrote:

> Is it possible to have wildcards in a phrase search? For example,  
> if my
> object is indexed with a phrase "benchmark properties", is there a  
> way to
> specify a phrase search that uses wildcards, such as "benchmar*  
> properties"
> or "benchmark pro*"? I have tried using WildcardQuery, but it  
> doesnt seem to
> work with phrases, only single terms. Using the QueryParser doesnt  
> seem to
> work either, and ive tried it with the StandardAnalyzer and  
> KeywordAnalyzer.
> Perhaps there is something simple Im missing here. Has anyone  
> gotten this to
> work?

It's a fair bit trickier than using any of the stock core classes  
like that.  But it can be done, using SpanNearQuery nested with  
SpanRegexQuery's for the wildcarded parts in the contrib code (note,  
this uses regex syntax, not WildcardQuery syntax), along with  
SpanTermQuery's for the non-wildcarded parts.

        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: wildcards in phrase searches

Lee_Gary
In reply to this post by Lee_Gary
Thanks for your reply. Is there any sample code that would demonstrate how
to use these classes properly to get the desired effect of what im looking
for?

Thanks

-----Original Message-----
From: Erik Hatcher [mailto:[hidden email]]
Sent: Wednesday, May 10, 2006 11:00 AM
To: [hidden email]
Subject: Re: wildcards in phrase searches


On May 10, 2006, at 1:47 PM, [hidden email] wrote:

> Is it possible to have wildcards in a phrase search? For example, if
> my object is indexed with a phrase "benchmark properties", is there a
> way to specify a phrase search that uses wildcards, such as "benchmar*
> properties"
> or "benchmark pro*"? I have tried using WildcardQuery, but it doesnt
> seem to work with phrases, only single terms. Using the QueryParser
> doesnt seem to work either, and ive tried it with the StandardAnalyzer
> and KeywordAnalyzer.
> Perhaps there is something simple Im missing here. Has anyone gotten
> this to work?

It's a fair bit trickier than using any of the stock core classes like that.
But it can be done, using SpanNearQuery nested with SpanRegexQuery's for the
wildcarded parts in the contrib code (note, this uses regex syntax, not
WildcardQuery syntax), along with SpanTermQuery's for the non-wildcarded
parts.

        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: wildcards in phrase searches

Erik Hatcher

On May 10, 2006, at 2:12 PM, [hidden email] wrote:
> Thanks for your reply. Is there any sample code that would  
> demonstrate how
> to use these classes properly to get the desired effect of what im  
> looking
> for?

I am not able to access Lucene's svn at the moment, but here's my  
local copy of the TestSpanRegexQuery:

public class TestSpanRegexQuery extends TestCase {
   public void testSpanRegex() throws Exception {
     RAMDirectory directory = new RAMDirectory();
     IndexWriter writer = new IndexWriter(directory, new  
SimpleAnalyzer(), true);
     Document doc = new Document();
//    doc.add(new Field("field", "the quick brown fox jumps over the  
lazy dog", Field.Store.NO, Field.Index.TOKENIZED));
//    writer.addDocument(doc);
//    doc = new Document();
     doc.add(new Field("field", "auto update", Field.Store.NO,  
Field.Index.TOKENIZED));
     writer.addDocument(doc);
     doc = new Document();
     doc.add(new Field("field", "first auto update", Field.Store.NO,  
Field.Index.TOKENIZED));
     writer.addDocument(doc);
     writer.optimize();
     writer.close();

     IndexSearcher searcher = new IndexSearcher(directory);
     SpanRegexQuery srq = new SpanRegexQuery(new Term("field",  
"aut.*"));
     SpanFirstQuery sfq = new SpanFirstQuery(srq, 1);
//    SpanNearQuery query = new SpanNearQuery(new SpanQuery[] {srq,  
stq}, 6, true);
     Hits hits = searcher.search(sfq);
     assertEquals(1, hits.length());
   }
}

This is not quite what you're after, but close.  You'll use a  
SpanTermQuery's and SpanRegexQuery's mixed inside a SpanNearQuery.  
Try out some simple examples like this and see where you get, and  
post back if you have issues/questions.

        Erik




>
> Thanks
>
> -----Original Message-----
> From: Erik Hatcher [mailto:[hidden email]]
> Sent: Wednesday, May 10, 2006 11:00 AM
> To: [hidden email]
> Subject: Re: wildcards in phrase searches
>
>
> On May 10, 2006, at 1:47 PM, [hidden email] wrote:
>
>> Is it possible to have wildcards in a phrase search? For example, if
>> my object is indexed with a phrase "benchmark properties", is there a
>> way to specify a phrase search that uses wildcards, such as  
>> "benchmar*
>> properties"
>> or "benchmark pro*"? I have tried using WildcardQuery, but it doesnt
>> seem to work with phrases, only single terms. Using the QueryParser
>> doesnt seem to work either, and ive tried it with the  
>> StandardAnalyzer
>> and KeywordAnalyzer.
>> Perhaps there is something simple Im missing here. Has anyone gotten
>> this to work?
>
> It's a fair bit trickier than using any of the stock core classes  
> like that.
> But it can be done, using SpanNearQuery nested with  
> SpanRegexQuery's for the
> wildcarded parts in the contrib code (note, this uses regex syntax,  
> not
> WildcardQuery syntax), along with SpanTermQuery's for the non-
> wildcarded
> parts.
>
> Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: wildcards in phrase searches

Lee_Gary
In reply to this post by Lee_Gary
Thanks Erik, I was able to get a solution working for me using the classes
you outlined below.  

-----Original Message-----
From: Erik Hatcher [mailto:[hidden email]]
Sent: Wednesday, May 10, 2006 12:11 PM
To: [hidden email]
Subject: Re: wildcards in phrase searches


On May 10, 2006, at 2:12 PM, [hidden email] wrote:
> Thanks for your reply. Is there any sample code that would demonstrate
> how to use these classes properly to get the desired effect of what im
> looking for?

I am not able to access Lucene's svn at the moment, but here's my local copy
of the TestSpanRegexQuery:

public class TestSpanRegexQuery extends TestCase {
   public void testSpanRegex() throws Exception {
     RAMDirectory directory = new RAMDirectory();
     IndexWriter writer = new IndexWriter(directory, new SimpleAnalyzer(),
true);
     Document doc = new Document();
//    doc.add(new Field("field", "the quick brown fox jumps over the  
lazy dog", Field.Store.NO, Field.Index.TOKENIZED));
//    writer.addDocument(doc);
//    doc = new Document();
     doc.add(new Field("field", "auto update", Field.Store.NO,
Field.Index.TOKENIZED));
     writer.addDocument(doc);
     doc = new Document();
     doc.add(new Field("field", "first auto update", Field.Store.NO,
Field.Index.TOKENIZED));
     writer.addDocument(doc);
     writer.optimize();
     writer.close();

     IndexSearcher searcher = new IndexSearcher(directory);
     SpanRegexQuery srq = new SpanRegexQuery(new Term("field", "aut.*"));
     SpanFirstQuery sfq = new SpanFirstQuery(srq, 1);
//    SpanNearQuery query = new SpanNearQuery(new SpanQuery[] {srq,  
stq}, 6, true);
     Hits hits = searcher.search(sfq);
     assertEquals(1, hits.length());
   }
}

This is not quite what you're after, but close.  You'll use a  
SpanTermQuery's and SpanRegexQuery's mixed inside a SpanNearQuery.  
Try out some simple examples like this and see where you get, and post back
if you have issues/questions.

        Erik




>
> Thanks
>
> -----Original Message-----
> From: Erik Hatcher [mailto:[hidden email]]
> Sent: Wednesday, May 10, 2006 11:00 AM
> To: [hidden email]
> Subject: Re: wildcards in phrase searches
>
>
> On May 10, 2006, at 1:47 PM, [hidden email] wrote:
>
>> Is it possible to have wildcards in a phrase search? For example, if
>> my object is indexed with a phrase "benchmark properties", is there a
>> way to specify a phrase search that uses wildcards, such as
>> "benchmar*
>> properties"
>> or "benchmark pro*"? I have tried using WildcardQuery, but it doesnt
>> seem to work with phrases, only single terms. Using the QueryParser
>> doesnt seem to work either, and ive tried it with the
>> StandardAnalyzer and KeywordAnalyzer.
>> Perhaps there is something simple Im missing here. Has anyone gotten
>> this to work?
>
> It's a fair bit trickier than using any of the stock core classes like
> that.
> But it can be done, using SpanNearQuery nested with SpanRegexQuery's
> for the wildcarded parts in the contrib code (note, this uses regex
> syntax, not WildcardQuery syntax), along with SpanTermQuery's for the
> non- wildcarded parts.
>
> Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]