custom indexing

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

custom indexing

John Wang-9
Hi:

    Great job on the flex indexing feature! This opens new doors on how an application to lucene for its usecases.

    I have coupla questions that I brought up before, the answer was to wait for flex indexing. Now that flex indexing seems to be in a good shape, I thought I'd bring it up again:

1) Is it possible to obtain unique term count for a given field, e.g. getUniqueTermCount(String field) on the segment reader?

2) Is it possible to use Lucene's segment/merge mechanism to encode custom segment files, my own StoredData format, or my own forward index some field etc.?

Thanks

-John
Reply | Threaded
Open this post in threaded view
|

Re: custom indexing

Michael McCandless-2
1) Yes, in fact you needn't wait for flex for this --
IndexReader.getUniqueTermCount was added in 2.9.  But this will throw
UOE on composite readers (Multi/DirReader).

2) Yes, you can make a Codec that separately maintains your own files,
both on initial flush and on merge.  Make sure your Codec.files()
returns your new files, so IndexFileDeleter doesn't delete them!

Mike

On Tue, Jun 15, 2010 at 5:29 PM, John Wang <[hidden email]> wrote:

> Hi:
>     Great job on the flex indexing feature! This opens new doors on how an
> application to lucene for its usecases.
>     I have coupla questions that I brought up before, the answer was to wait
> for flex indexing. Now that flex indexing seems to be in a good shape, I
> thought I'd bring it up again:
> 1) Is it possible to obtain unique term count for a given field,
> e.g. getUniqueTermCount(String field) on the segment reader?
> 2) Is it possible to use Lucene's segment/merge mechanism to encode custom
> segment files, my own StoredData format, or my own forward index some field
> etc.?
> Thanks
> -John

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: custom indexing

John Wang-9
Thanks Michael!

For 1), I only see the api to get the uniqueTerms for the entire reader, not for a specific field. Am I looking at the wrong place?

2) Awesome!!! Is there a wiki on flex indexing somewhere?

-John

On Wed, Jun 16, 2010 at 2:37 AM, Michael McCandless <[hidden email]> wrote:
1) Yes, in fact you needn't wait for flex for this --
IndexReader.getUniqueTermCount was added in 2.9.  But this will throw
UOE on composite readers (Multi/DirReader).

2) Yes, you can make a Codec that separately maintains your own files,
both on initial flush and on merge.  Make sure your Codec.files()
returns your new files, so IndexFileDeleter doesn't delete them!

Mike

On Tue, Jun 15, 2010 at 5:29 PM, John Wang <[hidden email]> wrote:
> Hi:
>     Great job on the flex indexing feature! This opens new doors on how an
> application to lucene for its usecases.
>     I have coupla questions that I brought up before, the answer was to wait
> for flex indexing. Now that flex indexing seems to be in a good shape, I
> thought I'd bring it up again:
> 1) Is it possible to obtain unique term count for a given field,
> e.g. getUniqueTermCount(String field) on the segment reader?
> 2) Is it possible to use Lucene's segment/merge mechanism to encode custom
> segment files, my own StoredData format, or my own forward index some field
> etc.?
> Thanks
> -John

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: custom indexing

Michael McCandless-2
On Wed, Jun 16, 2010 at 10:30 AM, John Wang <[hidden email]> wrote:
> Thanks Michael!
> For 1), I only see the api to get the uniqueTerms for the entire reader, not
> for a specific field. Am I looking at the wrong place?

Ahh sorry I missed that you need it per-field.  Yes, flex now makes it
possible.  If the reader is composite, do this:

  MultiFields.getTerms(reader, field).getUniqueTermCount();

else (definitely a single segment):

  reader.fields().terms(field).getUniqueTermCount()

(But you should null-check the returned Fields (in case reader has no
fields) and Terms (in case the specified field does not exist)).

> 2) Awesome!!! Is there a wiki on flex indexing somewhere?

There's a start at http://wiki.apache.org/lucene-java/FlexibleIndexing

But it doesn't document in detail how to make your own Codec --
probably simplest way to get started is look @ the core Codecs.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: custom indexing

John Wang-9
Awesome! Thanks Michael!

-John

On Wed, Jun 16, 2010 at 7:53 AM, Michael McCandless <[hidden email]> wrote:
On Wed, Jun 16, 2010 at 10:30 AM, John Wang <[hidden email]> wrote:
> Thanks Michael!
> For 1), I only see the api to get the uniqueTerms for the entire reader, not
> for a specific field. Am I looking at the wrong place?

Ahh sorry I missed that you need it per-field.  Yes, flex now makes it
possible.  If the reader is composite, do this:

 MultiFields.getTerms(reader, field).getUniqueTermCount();

else (definitely a single segment):

 reader.fields().terms(field).getUniqueTermCount()

(But you should null-check the returned Fields (in case reader has no
fields) and Terms (in case the specified field does not exist)).

> 2) Awesome!!! Is there a wiki on flex indexing somewhere?

There's a start at http://wiki.apache.org/lucene-java/FlexibleIndexing

But it doesn't document in detail how to make your own Codec --
probably simplest way to get started is look @ the core Codecs.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]