Beyond Lucene 2.0 Index Design

classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Beyond Lucene 2.0 Index Design

Doug Cutting
Marvin Humphrey wrote:
> Can you show us some code or pseudo-code for a BooleanScorer that would
> use impact-sorted posting lists?

Another way to interpret this proposal is index-only: the low-level
indexing APIs should be general enough to permit impact-sorted posting
lists, and perhaps an impact-sorted posting list index implementation
could be provided in the core, but the existing search API's might not
work well or at all with an impact-sorted index.  Perhaps they could
interoperate at a "weighted filter" level.  There could be a separate
search implementation for impact-sorted indexes, and it could provide
output as a weighted filter, and the document-sorted search
implementation could accept weighted-filter clauses.  Does that make
sense?  It would be a little like span queries: a separate query
resolving engine that interoperates with the standard engine, but does
not replace it.

Ramblingly,

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Beyond Lucene 2.0 Index Design

Chuck Williams-2

Doug Cutting wrote on 01/12/2007 09:49 AM:

> Marvin Humphrey wrote:
>> Can you show us some code or pseudo-code for a BooleanScorer that
>> would use impact-sorted posting lists?
>
> Another way to interpret this proposal is index-only: the low-level
> indexing APIs should be general enough to permit impact-sorted posting
> lists, and perhaps an impact-sorted posting list index implementation
> could be provided in the core, but the existing search API's might not
> work well or at all with an impact-sorted index.  Perhaps they could
> interoperate at a "weighted filter" level.  There could be a separate
> search implementation for impact-sorted indexes, and it could provide
> output as a weighted filter, and the document-sorted search
> implementation could accept weighted-filter clauses.  Does that make
> sense?  It would be a little like span queries: a separate query
> resolving engine that interoperates with the standard engine, but does
> not replace it.
I think this makes perfect sense.  Not all applications place an
emphasis on relevance ranking.  E.g., in my current application, the
focus is on categorizing and manipulating large complete result sets.
Lucene as it stands is excellent at that.  For very large indexes in
domains where users are most interested in the top few hits, e.g. web
search, then impact-sorted posting lists and partial retrieval have
great value.  Adding this capability to lucene in a manner that supports
both uses cases seems the way to go.  Sufficient flexibility in the
indexing api's and core implementation(s) so that apps can specify
whether or not they want impact sorting, combined with similarly
flexibility in the query engine(s) to provide complete or incremental
partial retrieval, would achieve this.  Weighted-filter clauses could
require the impact-sorted index representation.

Chuck


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Beyond Lucene 2.0 Index Design

Paul Elschot
Gentlemen,

On Friday 12 January 2007 21:00, Chuck Williams wrote:

>
> Doug Cutting wrote on 01/12/2007 09:49 AM:
> > Marvin Humphrey wrote:
> >> Can you show us some code or pseudo-code for a BooleanScorer that
> >> would use impact-sorted posting lists?
> >
> > Another way to interpret this proposal is index-only: the low-level
> > indexing APIs should be general enough to permit impact-sorted posting
> > lists, and perhaps an impact-sorted posting list index implementation
> > could be provided in the core, but the existing search API's might not
> > work well or at all with an impact-sorted index.  Perhaps they could
> > interoperate at a "weighted filter" level.  There could be a separate
> > search implementation for impact-sorted indexes, and it could provide
> > output as a weighted filter, and the document-sorted search
> > implementation could accept weighted-filter clauses.  Does that make
> > sense?  It would be a little like span queries: a separate query
> > resolving engine that interoperates with the standard engine, but does
> > not replace it.
> I think this makes perfect sense.  Not all applications place an
> emphasis on relevance ranking.  E.g., in my current application, the
> focus is on categorizing and manipulating large complete result sets.
> Lucene as it stands is excellent at that.  For very large indexes in
> domains where users are most interested in the top few hits, e.g. web
> search, then impact-sorted posting lists and partial retrieval have
> great value.  Adding this capability to lucene in a manner that supports
> both uses cases seems the way to go.  Sufficient flexibility in the
> indexing api's and core implementation(s) so that apps can specify
> whether or not they want impact sorting, combined with similarly
> flexibility in the query engine(s) to provide complete or incremental
> partial retrieval, would achieve this.  Weighted-filter clauses could
> require the impact-sorted index representation.

A weighted filter clause could already be used as a prescored clause
in a boolean query. That makes weighted filters a useful addition to
the current search methods.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

12