design: merging resultset from RDBMS with lucene search results

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

design: merging resultset from RDBMS with lucene search results

spring
Hi,

I have the following scenario:

RDBMS which contains the metadata for documents (ID, customer number,
doctype etc.).
Now I want to add fulltext search support.

So I will index the documents content in lucene and add the documents ID as
a stored field in lucene.

Now somebody wants to search like this: customer number 1234 AND content
"foo bar".

So I go to lucene, search for content:"foo bar" and get back a hitlist
containing the documents IDs.

Now - how to merge these Ids with the resultset of the RDBM's search for
customer number 1234?

1) select ... from ... where customer=1234 and ID in (<lucenes ID list>).

or

2) select ... from ... where customer=1234 and them join both resultsets in
the application.

or

3) no idea :)

What is best practice here?

Thank you.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: design: merging resultset from RDBMS with lucene search results

John Byrne-3
Hi,

You might consider avoiding this problem altogether, by simply adding
the meta data to your Lucene index. Lucene can handle untokenized
fields, which is ideal for meta data. It might not be as quick as the
RDB, but you could perhaps optimize by only searching in the RDB when
you only need to search meta data, and using Lucene when you need both.

Regards,
JB

[hidden email] wrote:

> Hi,
>
> I have the following scenario:
>
> RDBMS which contains the metadata for documents (ID, customer number,
> doctype etc.).
> Now I want to add fulltext search support.
>
> So I will index the documents content in lucene and add the documents ID as
> a stored field in lucene.
>
> Now somebody wants to search like this: customer number 1234 AND content
> "foo bar".
>
> So I go to lucene, search for content:"foo bar" and get back a hitlist
> containing the documents IDs.
>
> Now - how to merge these Ids with the resultset of the RDBM's search for
> customer number 1234?
>
> 1) select ... from ... where customer=1234 and ID in (<lucenes ID list>).
>
> or
>
> 2) select ... from ... where customer=1234 and them join both resultsets in
> the application.
>
> or
>
> 3) no idea :)
>
> What is best practice here?
>
> Thank you.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>
>
>  


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: design: merging resultset from RDBMS with lucene search results

spring
The metadata is quite offen altered and there are millions of documents.
Also document access is secured by complex sql statements which lucene might
not support.
So this is not an option I think.

> -----Original Message-----
> From: John Byrne [mailto:[hidden email]]
> Sent: Mittwoch, 13. Februar 2008 18:44
> To: [hidden email]
> Subject: Re: design: merging resultset from RDBMS with lucene
> search results
>
> Hi,
>
> You might consider avoiding this problem altogether, by simply adding
> the meta data to your Lucene index. Lucene can handle untokenized
> fields, which is ideal for meta data. It might not be as quick as the
> RDB, but you could perhaps optimize by only searching in the RDB when
> you only need to search meta data, and using Lucene when you
> need both.
>
> Regards,
> JB
>
> [hidden email] wrote:
> > Hi,
> >
> > I have the following scenario:
> >
> > RDBMS which contains the metadata for documents (ID,
> customer number,
> > doctype etc.).
> > Now I want to add fulltext search support.
> >
> > So I will index the documents content in lucene and add the
> documents ID as
> > a stored field in lucene.
> >
> > Now somebody wants to search like this: customer number
> 1234 AND content
> > "foo bar".
> >
> > So I go to lucene, search for content:"foo bar" and get
> back a hitlist
> > containing the documents IDs.
> >
> > Now - how to merge these Ids with the resultset of the
> RDBM's search for
> > customer number 1234?
> >
> > 1) select ... from ... where customer=1234 and ID in
> (<lucenes ID list>).
> >
> > or
> >
> > 2) select ... from ... where customer=1234 and them join
> both resultsets in
> > the application.
> >
> > or
> >
> > 3) no idea :)
> >
> > What is best practice here?
> >
> > Thank you.
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
> >
> >
> >  
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: design: merging resultset from RDBMS with lucene search results

Erick Erickson
In reply to this post by spring
Another possibility is to do it backwards, it depends on how expensive the
SQL query is I suppose. The idea would be to go ahead and to your
SQL query *first*, then construct a Lucene Filter to use with your query
using TermDocs/TermEnum.

I'd guess (without knowing much about your problem space) that *if* you
can spin through your SQL query and extract all of the unique doc IDs
acceptably quickly that you won't notice constructing the filter time-wise.

Also, search through the user list archive for embedding Lucene in a
database.
I confess that that entire discussion flew right over my head, so I don't
know if it's applicable, but........

Erick

On Wed, Feb 13, 2008 at 12:21 PM, <[hidden email]> wrote:

> Hi,
>
> I have the following scenario:
>
> RDBMS which contains the metadata for documents (ID, customer number,
> doctype etc.).
> Now I want to add fulltext search support.
>
> So I will index the documents content in lucene and add the documents ID
> as
> a stored field in lucene.
>
> Now somebody wants to search like this: customer number 1234 AND content
> "foo bar".
>
> So I go to lucene, search for content:"foo bar" and get back a hitlist
> containing the documents IDs.
>
> Now - how to merge these Ids with the resultset of the RDBM's search for
> customer number 1234?
>
> 1) select ... from ... where customer=1234 and ID in (<lucenes ID list>).
>
> or
>
> 2) select ... from ... where customer=1234 and them join both resultsets
> in
> the application.
>
> or
>
> 3) no idea :)
>
> What is best practice here?
>
> Thank you.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>