Lucene syntax query matched against a string content

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Lucene syntax query matched against a string content

Nilesh Bansal
Hi,

I want to create a function, which takes in a query string (in lucene
syntax), and a string as content and returns back if the query matches
the content or not. This would mean,

query = +(apache) +(lucene OR httpd)

will match

content = HTTPD by Apache foundation is one of the most popular open
source projects

and will not match

content = Lucene and httpd are projects from same open source foundation

Basically, I need to fill in the contents of the following Java
function. This should be easy to do, but I don't know how. I obviously
don't want to create a dummy lucene index in memory with a single
document and then search for the query against that (for performance
reasons).

public static boolean isRelevant(String luceneQuery, String contents) {
  // TODO fill in
}

Instead of boolean, it could return a relevance score, which will be
zero if the query is not relevant to the document.

Any help will be appreciated.

thanks
Nilesh

--
Nilesh Bansal

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene syntax query matched against a string content

Paul Elschot
Without using a RAMDirectory index it would be necessary to
implement all Scorers used by the query directly top of the token
stream that normally goes into the index. This is possible, but
Lucene is not designed to do this, so it won't be easy.

But especially for more preparsed queries against a small set of
new documents, this might be nice to have. Still, even for that
case, it would only gain performance over using RAMDirectory
when the queries can be evaluated from the ground up,
sharing as many subqueries as possible. And that is
just the opposite of the top down way query search is
currently implemented on a prebuilt index.

The basic design for this would be to start from a set of queries
to be 'analyzed' to make them share as many subqueries
as possible, building a query graph.
Then this query graph would be fed the new documents
one by one, resulting in a score for each matching query
that was added to the query graph.
It is possible, but it would be quite a bit of work.

And then someone will come along with the requirement
to match an existing index against such a query graph,
which is not a bad idea either, but it might need yet another
way of collecting the results.

Regards,
Paul Elschot

Op Friday 08 February 2008 05:48:08 schreef Nilesh Bansal:

> Hi,
>
> I want to create a function, which takes in a query string (in lucene
> syntax), and a string as content and returns back if the query matches
> the content or not. This would mean,
>
> query = +(apache) +(lucene OR httpd)
>
> will match
>
> content = HTTPD by Apache foundation is one of the most popular open
> source projects
>
> and will not match
>
> content = Lucene and httpd are projects from same open source foundation
>
> Basically, I need to fill in the contents of the following Java
> function. This should be easy to do, but I don't know how. I obviously
> don't want to create a dummy lucene index in memory with a single
> document and then search for the query against that (for performance
> reasons).
>
> public static boolean isRelevant(String luceneQuery, String contents) {
>   // TODO fill in
> }
>
> Instead of boolean, it could return a relevance score, which will be
> zero if the query is not relevant to the document.
>
> Any help will be appreciated.
>
> thanks
> Nilesh
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene syntax query matched against a string content

Erick Erickson
In reply to this post by Nilesh Bansal
You might want to check out MemoryIndex before rejecting putting a single
doc in memory and searching against it. It's quite fast, although whether
it'll
work in your situation only measurement will tell. It's in contrib as I
remember.

Erick

On Feb 7, 2008 11:48 PM, Nilesh Bansal <[hidden email]> wrote:

> Hi,
>
> I want to create a function, which takes in a query string (in lucene
> syntax), and a string as content and returns back if the query matches
> the content or not. This would mean,
>
> query = +(apache) +(lucene OR httpd)
>
> will match
>
> content = HTTPD by Apache foundation is one of the most popular open
> source projects
>
> and will not match
>
> content = Lucene and httpd are projects from same open source foundation
>
> Basically, I need to fill in the contents of the following Java
> function. This should be easy to do, but I don't know how. I obviously
> don't want to create a dummy lucene index in memory with a single
> document and then search for the query against that (for performance
> reasons).
>
> public static boolean isRelevant(String luceneQuery, String contents) {
>  // TODO fill in
> }
>
> Instead of boolean, it could return a relevance score, which will be
> zero if the query is not relevant to the document.
>
> Any help will be appreciated.
>
> thanks
> Nilesh
>
> --
> Nilesh Bansal
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Lucene syntax query matched against a string content

Nilesh Bansal
Excellent. MemoryIndex solves the problem. I didn't knew about this
index. Thanks.

-Nilesh

On Feb 8, 2008 8:23 AM, Erick Erickson <[hidden email]> wrote:

> You might want to check out MemoryIndex before rejecting putting a single
> doc in memory and searching against it. It's quite fast, although whether
> it'll
> work in your situation only measurement will tell. It's in contrib as I
> remember.
>
> Erick
>
>
> On Feb 7, 2008 11:48 PM, Nilesh Bansal <[hidden email]> wrote:
>
> > Hi,
> >
> > I want to create a function, which takes in a query string (in lucene
> > syntax), and a string as content and returns back if the query matches
> > the content or not. This would mean,
> >
> > query = +(apache) +(lucene OR httpd)
> >
> > will match
> >
> > content = HTTPD by Apache foundation is one of the most popular open
> > source projects
> >
> > and will not match
> >
> > content = Lucene and httpd are projects from same open source foundation
> >
> > Basically, I need to fill in the contents of the following Java
> > function. This should be easy to do, but I don't know how. I obviously
> > don't want to create a dummy lucene index in memory with a single
> > document and then search for the query against that (for performance
> > reasons).
> >
> > public static boolean isRelevant(String luceneQuery, String contents) {
> >  // TODO fill in
> > }
> >
> > Instead of boolean, it could return a relevance score, which will be
> > zero if the query is not relevant to the document.
> >
> > Any help will be appreciated.
> >
> > thanks
> > Nilesh
> >
> > --
> > Nilesh Bansal
> >
>
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>



--
Nilesh Bansal

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]