Can TermDocs.skipTo() go backwards

classic Classic list List threaded Threaded
6 messages Options
adb
Reply | Threaded
Open this post in threaded view
|

Can TermDocs.skipTo() go backwards

adb
I have a custom TopDocsCollector and need to collect a payload from each final
document hit.  The payload comes from a single term in each hit.

When collecting the payload, I don't want to fetch the payload during the
collect() method as it will make fetches which may subsequently be bumped from
the topDocs, so I want to fetch it during the topDocs() call.

I made some performance tests on a simple index of 5M documents.  If I do

reader.termPositions(term);
termPositions.skipTo(scoreDoc.doc);

it takes up to 282 ms just to make the skipTo.

The javadocs imply that skipTo() can only go forwards and as scoreDocs is in
score order, not docId order, I suppose it's not possible to just use

termPositions.skipTo(scoreDoc.doc);

unless skipTo() can go both backwards.  Can it?  Javadocs imply there is more
than one type of implementation.

If not I suppose I must resort the scoreDocs by docId order and then loop with
termPositions.skipTo(scoreDoc.doc).  The number of hits will be typically small
so it'll be fast enough.

Antony






---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Can TermDocs.skipTo() go backwards

Michael McCandless-2

TermDocs.skipTo() only moves forwards.

Can you use a stored field to retrieve this information, or do you  
really need to store it per-term-occurrence in your docs?

Mike

Antony Bowesman wrote:

> I have a custom TopDocsCollector and need to collect a payload from  
> each final document hit.  The payload comes from a single term in  
> each hit.
>
> When collecting the payload, I don't want to fetch the payload  
> during the collect() method as it will make fetches which may  
> subsequently be bumped from the topDocs, so I want to fetch it  
> during the topDocs() call.
>
> I made some performance tests on a simple index of 5M documents.  If  
> I do
>
> reader.termPositions(term);
> termPositions.skipTo(scoreDoc.doc);
>
> it takes up to 282 ms just to make the skipTo.
>
> The javadocs imply that skipTo() can only go forwards and as  
> scoreDocs is in score order, not docId order, I suppose it's not  
> possible to just use
>
> termPositions.skipTo(scoreDoc.doc);
>
> unless skipTo() can go both backwards.  Can it?  Javadocs imply  
> there is more than one type of implementation.
>
> If not I suppose I must resort the scoreDocs by docId order and then  
> loop with termPositions.skipTo(scoreDoc.doc).  The number of hits  
> will be typically small so it'll be fast enough.
>
> Antony
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

adb
Reply | Threaded
Open this post in threaded view
|

Re: Can TermDocs.skipTo() go backwards

adb
Michael McCandless wrote:
>
> TermDocs.skipTo() only moves forwards.
>
> Can you use a stored field to retrieve this information, or do you
> really need to store it per-term-occurrence in your docs?

I discussed my use case with Doron earlier and there were two options, either to
use payloads or stored fields.  With the payload case, for a single field
(owner) in a document there are multiple unique terms (ownerId), each with a
payload (access Id).

Using stored fields I have to store something like

ownerId:accessId
ownerId:accessId
ownerId:accessId

then fetch the stored field for the document and then find the particular
accessId for the owner I am searching for.

I was testing the performance implications of each as I understand fetching
stored fields is not optimal and the payload scenario is logically a better fit,
as every owner will have a different accessId for every Document.

What would fit my usage would be something like

byte[] b = doc.getPayload("owner", ownerId);

where for the given OID, I can retrieve the payload I associated with it, when I did

doc.add(new Field("owner", ownerId, accessPayload);

but that's not how it works :(

Antony



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Can TermDocs.skipTo() go backwards

Michael McCandless-2

Antony Bowesman wrote:

> Michael McCandless wrote:
>> TermDocs.skipTo() only moves forwards.
>> Can you use a stored field to retrieve this information, or do you  
>> really need to store it per-term-occurrence in your docs?
>
> I discussed my use case with Doron earlier and there were two  
> options, either to use payloads or stored fields.

Ahh right, my short term memory failed me ;)  I now remember this  
thread.

> With the payload case, for a single field (owner) in a document  
> there are multiple unique terms (ownerId), each with a payload  
> (access Id).
>
> Using stored fields I have to store something like
>
> ownerId:accessId
> ownerId:accessId
> ownerId:accessId
>
> then fetch the stored field for the document and then find the  
> particular accessId for the owner I am searching for.
>
> I was testing the performance implications of each as I understand  
> fetching stored fields is not optimal

Yes, though LUCENE-1231 (column stride stored fields) should help this.

> and the payload scenario is logically a better fit, as every owner  
> will have a different accessId for every Document.
>
> What would fit my usage would be something like
>
> byte[] b = doc.getPayload("owner", ownerId);
>
> where for the given OID, I can retrieve the payload I associated  
> with it, when I did
>
> doc.add(new Field("owner", ownerId, accessPayload);
>
> but that's not how it works :(

Yeah... payloads don't require/expect that each term would be unique  
in the field, so in general we have to access it via TermPositions API.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

adb
Reply | Threaded
Open this post in threaded view
|

Re: Can TermDocs.skipTo() go backwards

adb
Michael McCandless wrote:
>
> Ahh right, my short term memory failed me ;)  I now remember this thread.

Excused :) I expect you have real work to occupy your mind!

> Yes, though LUCENE-1231 (column stride stored fields) should help this.

I see from JIRA that MB has started working on this - It's marked as 3.0, but
there was some hope for a 2.4 release.  Are there any estimates for when this
might get to a release - this is an exciting development for me.

Thanks
Antony



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Can TermDocs.skipTo() go backwards

Michael McCandless-2

Antony Bowesman wrote:

> Michael McCandless wrote:
>> Ahh right, my short term memory failed me ;)  I now remember this  
>> thread.
>
> Excused :) I expect you have real work to occupy your mind!

Well, understanding how people are pushing Lucene *is* the real  
work ;)  This is exactly how Lucene grows!

>> Yes, though LUCENE-1231 (column stride stored fields) should help  
>> this.
>
> I see from JIRA that MB has started working on this - It's marked as  
> 3.0, but there was some hope for a 2.4 release.  Are there any  
> estimates for when this might get to a release - this is an exciting  
> development for me.

I don't think this will make 2.4 -- we're trying to wrap up 2.4 now.  
Michael, any rough ETA on when LUCENE-1231 will be in?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]