Control over Lucene Index

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Control over Lucene Index

Ralf Bierig

Hi,

in the context of a distributed information retrieval project, we would
like to use Lucene for its indexing capabilities but not for retrieval.
In particular, we would like to populate a Lucene index with the tokens
and statistics already computed by an external indexer, thereby
bypassing the document-based parsing, analysis, and ingestion into the
index which characterises Lucene's standard workflow. Is this possible?
That is, is it possible to feed precomputed statistics into a Lucene's
index? And is it possible to have control on what statistics are
associated with each document (as we will not use Lucene for retrieval
we are not interested in complying with the statistics it needs to
perform a search).

Any help greatly appreciated, many thanks.

Cheers,


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Control over Lucene Index

Grant Ingersoll
You could write a "dummy" Analyzer that provides the tokens from your
external process.  As for statistics, what kind are you interested in?  
I suppose you can store them in a field along with the document, or you
can set the boost values for the field/document, but that may be a bit
simple for your needs.

Ralf Bierig wrote:

>
> Hi,
>
> in the context of a distributed information retrieval project, we
> would like to use Lucene for its indexing capabilities but not for
> retrieval. In particular, we would like to populate a Lucene index
> with the tokens and statistics already computed by an external
> indexer, thereby bypassing the document-based parsing, analysis, and
> ingestion into the index which characterises Lucene's standard
> workflow. Is this possible? That is, is it possible to feed
> precomputed statistics into a Lucene's index? And is it possible to
> have control on what statistics are associated with each document (as
> we will not use Lucene for retrieval we are not interested in
> complying with the statistics it needs to perform a search).
>
> Any help greatly appreciated, many thanks.
>
> Cheers,
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

--

Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
School of Information Studies
335 Hinds Hall
Syracuse, NY 13244

http://www.cnlp.org 
Voice:  315-443-5484
Fax: 315-443-6886


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Control over Lucene Index

Doug Cutting
In reply to this post by Ralf Bierig
You could implement the IndexReader API, then use IndexMerger to write
this in Lucene's format.

Doug

Ralf Bierig wrote:

>
> Hi,
>
> in the context of a distributed information retrieval project, we would
> like to use Lucene for its indexing capabilities but not for retrieval.
> In particular, we would like to populate a Lucene index with the tokens
> and statistics already computed by an external indexer, thereby
> bypassing the document-based parsing, analysis, and ingestion into the
> index which characterises Lucene's standard workflow. Is this possible?
> That is, is it possible to feed precomputed statistics into a Lucene's
> index? And is it possible to have control on what statistics are
> associated with each document (as we will not use Lucene for retrieval
> we are not interested in complying with the statistics it needs to
> perform a search).
>
> Any help greatly appreciated, many thanks.
>
> Cheers,
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]