Range query and a proximity search

classic Classic list List threaded Threaded
3 messages Options
ba3
Reply | Threaded
Open this post in threaded view
|

Range query and a proximity search

ba3
Hi,

Iam having around 100 documents which had undergone revisions. Want to find out the documents which have undergone more than 40 revisions. The documents are all text based and the first few lines in the document contain the revision details. For eg:


revision 35
This is a document regarding environmental study of .....
...........
..........

There has been 45 instances in past year when the breach had happened .......
...........
..........

The volunteers had to spend close to 15 hours to resolve the issue ..........
.........
...........




I tried using :

1) A query string String q = "contents: revision AND [40 TO 50]" but, the problem was that if the revision number is 30 and the document had a number 47 in its content then the document is considered as a match. Changing the string to : "contents:\‚ÄĚrevision AND [40 TO 50]\"~5" for specifying the proximity search turned up as run time error.

2) 2) Using a multiphrase query query.add(new Term("contents","revision")); query.add(new Term("contents","[40 TO 50]")); did not give the result.

In both cases, splitting the range as "revision AND 41" or "revision AND 42" or .... yielded proper results.
Can you please suggest me some pointers as to how a range query combined with proximity be done.

-- Regards
Ba3
Reply | Threaded
Open this post in threaded view
|

Re: Range query and a proximity search

iorixxx

> Can you please suggest me some pointers as to how a range
> query combined with proximity be done.

Your remedy is ComplexPhraseQueryParser that utilizes SpanQuery family.
https://issues.apache.org/jira/browse/LUCENE-1486
That accepts ranges, ORs, Wildcards inside Phrase queries.

Using this new QueryParser your query will be someting like.
(with the default field set to contents)

"revision [40 TO 50]"

If you want to construct your Query programmatically with Lucene Query API:

Query query = spanNear([contents:revision, spanOr([contents:40, contents:41, ..., contents:50])], 0 , true)

Take a look at those Query subclasses:

org.apache.lucene.search.spans.SpanNearQuery
org.apache.lucene.search.spans.SpanOrQuery
org.apache.lucene.search.spans.SpanTermQuery


     

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

ba3
Reply | Threaded
Open this post in threaded view
|

Re: Range query and a proximity search

ba3
Excellent !!
Thanks for pointing me towards the ComplexPhraseQueryParser.

--Regards
Ba3

Ahmet Arslan wrote
> Can you please suggest me some pointers as to how a range
> query combined with proximity be done.

Your remedy is ComplexPhraseQueryParser that utilizes SpanQuery family.
https://issues.apache.org/jira/browse/LUCENE-1486
That accepts ranges, ORs, Wildcards inside Phrase queries.

Using this new QueryParser your query will be someting like.
(with the default field set to contents)

"revision [40 TO 50]"

If you want to construct your Query programmatically with Lucene Query API:

Query query = spanNear([contents:revision, spanOr([contents:40, contents:41, ..., contents:50])], 0 , true)

Take a look at those Query subclasses:

org.apache.lucene.search.spans.SpanNearQuery
org.apache.lucene.search.spans.SpanOrQuery
org.apache.lucene.search.spans.SpanTermQuery


     

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org