Lucene indexes and relationship

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Lucene indexes and relationship

is_maximum
Hello,
In our application, we have many categories (indexes) in which different
kind of information have been indexed. we provided a facility for our users
to opt their category to search and we also provided a way that they select
more than one category to search, afterwards, we must return back the result
from selected categories in which they are related to each other and discard
other documents which their IDs are not common with the results from other
categories, in other words, an intersection of all results

currently, each category has an ID field and this help us to find which
document to select and which one to discard, in order to implement this, we
defined a HitCollector in which, we collect all the Lucene's document IDs to
load the document later as well as all category ID to find its relationship
with other result. we store these IDs in a java.util.BitSet  structure to
save the memory and also for performance issues.
now this works fine, but it's messy, further more, to find all common
documents we have to compare all the results two by two and this is really
exhaustive and eat up the CPU resources and this deprived us from the
Lucene's speed of searching.

In my opinion, if this simple relationship could be embedded inside the
Lucene Api at low level sections of loading documents, this process would be
done very fast, and I think this feature will make many Lucene's user happy

I want to know, whether this is possible for Lucene developers or not, or it
is in their TO DO list or they are going to provide it in future.

thank you all Lucene developers and producers.

--
Regards,
Mohammad Norouzi
--------------------------
see my blog: http://brainable.blogspot.com/
another in Persian: http://fekre-motefavet.blogspot.com/
--
Regards
Mohammad
Pixelshot
Reply | Threaded
Open this post in threaded view
|

Re: Lucene indexes and relationship

Grant Ingersoll-2
Sounds like faceting, I think.  Have you looked at Solr?

-Grant

On Sep 15, 2007, at 1:15 AM, Mohammad Norouzi wrote:

> Hello,
> In our application, we have many categories (indexes) in which  
> different
> kind of information have been indexed. we provided a facility for  
> our users
> to opt their category to search and we also provided a way that  
> they select
> more than one category to search, afterwards, we must return back  
> the result
> from selected categories in which they are related to each other  
> and discard
> other documents which their IDs are not common with the results  
> from other
> categories, in other words, an intersection of all results
>
> currently, each category has an ID field and this help us to find  
> which
> document to select and which one to discard, in order to implement  
> this, we
> defined a HitCollector in which, we collect all the Lucene's  
> document IDs to
> load the document later as well as all category ID to find its  
> relationship
> with other result. we store these IDs in a java.util.BitSet  
> structure to
> save the memory and also for performance issues.
> now this works fine, but it's messy, further more, to find all common
> documents we have to compare all the results two by two and this is  
> really
> exhaustive and eat up the CPU resources and this deprived us from the
> Lucene's speed of searching.
>
> In my opinion, if this simple relationship could be embedded inside  
> the
> Lucene Api at low level sections of loading documents, this process  
> would be
> done very fast, and I think this feature will make many Lucene's  
> user happy
>
> I want to know, whether this is possible for Lucene developers or  
> not, or it
> is in their TO DO list or they are going to provide it in future.
>
> thank you all Lucene developers and producers.
>
> --
> Regards,
> Mohammad Norouzi
> --------------------------
> see my blog: http://brainable.blogspot.com/
> another in Persian: http://fekre-motefavet.blogspot.com/

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]