You could write a "dummy" Analyzer that provides the tokens from your
external process. As for statistics, what kind are you interested in?
I suppose you can store them in a field along with the document, or you
can set the boost values for the field/document, but that may be a bit
simple for your needs.
Ralf Bierig wrote:
>
> Hi,
>
> in the context of a distributed information retrieval project, we
> would like to use Lucene for its indexing capabilities but not for
> retrieval. In particular, we would like to populate a Lucene index
> with the tokens and statistics already computed by an external
> indexer, thereby bypassing the document-based parsing, analysis, and
> ingestion into the index which characterises Lucene's standard
> workflow. Is this possible? That is, is it possible to feed
> precomputed statistics into a Lucene's index? And is it possible to
> have control on what statistics are associated with each document (as
> we will not use Lucene for retrieval we are not interested in
> complying with the statistics it needs to perform a search).
>
> Any help greatly appreciated, many thanks.
>
> Cheers,
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
[hidden email]
> For additional commands, e-mail:
[hidden email]
>
>
--
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
School of Information Studies
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org
Voice: 315-443-5484
Fax: 315-443-6886
---------------------------------------------------------------------
To unsubscribe, e-mail:
[hidden email]
For additional commands, e-mail:
[hidden email]