[jira] Created: (LUCENE-761) Clone proxStream lazily in SegmentTermPositions

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (LUCENE-761) Clone proxStream lazily in SegmentTermPositions

JIRA jira@apache.org
Clone proxStream lazily in SegmentTermPositions
-----------------------------------------------

                 Key: LUCENE-761
                 URL: http://issues.apache.org/jira/browse/LUCENE-761
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Index
            Reporter: Michael Busch
            Priority: Minor


In SegmentTermPositions the proxStream should be cloned lazily, i. e. at the first time nextPosition() is called. Then the initialization costs of TermPositions are not higher anymore compared to TermDocs and thus there is no reason anymore for Scorers to use TermDocs instead of TermPositions. In fact, all Scorers should use TermPositions, because custom subclasses of existing scorers might want to access payloads, which is only possible via TermPositions. We could further merge SegmentTermDocs and SegmentTermPositions into one class and deprecate the interface TermDocs.

I'm going to attach a patch once the payloads feature (LUCENE-755) is committed.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Assigned: (LUCENE-761) Clone proxStream lazily in SegmentTermPositions

JIRA jira@apache.org
     [ http://issues.apache.org/jira/browse/LUCENE-761?page=all ]

Michael Busch reassigned LUCENE-761:
------------------------------------

    Assignee: Michael Busch

> Clone proxStream lazily in SegmentTermPositions
> -----------------------------------------------
>
>                 Key: LUCENE-761
>                 URL: http://issues.apache.org/jira/browse/LUCENE-761
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>         Assigned To: Michael Busch
>            Priority: Minor
>
> In SegmentTermPositions the proxStream should be cloned lazily, i. e. at the first time nextPosition() is called. Then the initialization costs of TermPositions are not higher anymore compared to TermDocs and thus there is no reason anymore for Scorers to use TermDocs instead of TermPositions. In fact, all Scorers should use TermPositions, because custom subclasses of existing scorers might want to access payloads, which is only possible via TermPositions. We could further merge SegmentTermDocs and SegmentTermPositions into one class and deprecate the interface TermDocs.
> I'm going to attach a patch once the payloads feature (LUCENE-755) is committed.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-761) Clone proxStream lazily in SegmentTermPositions

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org
    [ http://issues.apache.org/jira/browse/LUCENE-761?page=comments#action_12461381 ]
           
Grant Ingersoll commented on LUCENE-761:
----------------------------------------

Hi Michael,

I am not sure I understand why 755 blocks this one.  I would think it would be the other way around, that way we could integrate this into scoring and people could access it seamlessly w/o having to change their query code (except maybe the similarity, as I suggested, or by adding some other interface).  


-Grant

> Clone proxStream lazily in SegmentTermPositions
> -----------------------------------------------
>
>                 Key: LUCENE-761
>                 URL: http://issues.apache.org/jira/browse/LUCENE-761
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>         Assigned To: Michael Busch
>            Priority: Minor
>
> In SegmentTermPositions the proxStream should be cloned lazily, i. e. at the first time nextPosition() is called. Then the initialization costs of TermPositions are not higher anymore compared to TermDocs and thus there is no reason anymore for Scorers to use TermDocs instead of TermPositions. In fact, all Scorers should use TermPositions, because custom subclasses of existing scorers might want to access payloads, which is only possible via TermPositions. We could further merge SegmentTermDocs and SegmentTermPositions into one class and deprecate the interface TermDocs.
> I'm going to attach a patch once the payloads feature (LUCENE-755) is committed.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-761) Clone proxStream lazily in SegmentTermPositions

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465753 ]

Michael Busch commented on LUCENE-761:
--------------------------------------

Grant,

your are absolutely right, 755 does not block this issue. The reason why I wanted to wait to submit a patch here was that 755 and this one are changing the same files. So committing this one would have prevented 755 from applying cleanly on the trunk. But since there were a couple of commits in the last days/weeks and the Payloads API is still under discussion I can as well submit a patch here now, because I have to change 755 to apply cleanly to the trunk anyway.

> Clone proxStream lazily in SegmentTermPositions
> -----------------------------------------------
>
>                 Key: LUCENE-761
>                 URL: https://issues.apache.org/jira/browse/LUCENE-761
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>         Assigned To: Michael Busch
>            Priority: Minor
>
> In SegmentTermPositions the proxStream should be cloned lazily, i. e. at the first time nextPosition() is called. Then the initialization costs of TermPositions are not higher anymore compared to TermDocs and thus there is no reason anymore for Scorers to use TermDocs instead of TermPositions. In fact, all Scorers should use TermPositions, because custom subclasses of existing scorers might want to access payloads, which is only possible via TermPositions. We could further merge SegmentTermDocs and SegmentTermPositions into one class and deprecate the interface TermDocs.
> I'm going to attach a patch once the payloads feature (LUCENE-755) is committed.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Commented: (LUCENE-761) Clone proxStream lazily in SegmentTermPositions

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12479835 ]

Grant Ingersoll commented on LUCENE-761:
----------------------------------------

If I understand correctly, all we need on this one is to move line 37 of SegmentTermPositions to line 55, right?

> Clone proxStream lazily in SegmentTermPositions
> -----------------------------------------------
>
>                 Key: LUCENE-761
>                 URL: https://issues.apache.org/jira/browse/LUCENE-761
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>         Assigned To: Michael Busch
>            Priority: Minor
>
> In SegmentTermPositions the proxStream should be cloned lazily, i. e. at the first time nextPosition() is called. Then the initialization costs of TermPositions are not higher anymore compared to TermDocs and thus there is no reason anymore for Scorers to use TermDocs instead of TermPositions. In fact, all Scorers should use TermPositions, because custom subclasses of existing scorers might want to access payloads, which is only possible via TermPositions. We could further merge SegmentTermDocs and SegmentTermPositions into one class and deprecate the interface TermDocs.
> I'm going to attach a patch once the payloads feature (LUCENE-755) is committed.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Commented: (LUCENE-761) Clone proxStream lazily in SegmentTermPositions

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12479876 ]

Michael Busch commented on LUCENE-761:
--------------------------------------

Grant,

you're right, it is a simple change to clone the stream lazily. And I think I will do that for now. The benefit then is, that it won't be more expensive anymore to use a SegmentTermPosition object instead of a SegmentTermDocs in scorers.

However, there might be one drawback. SegmentTermDocs implements the method
   int read(final int[] docs, final int[] freqs)
which is used by TermScorer for better performance. SegmentTermPositions overwrites this method and just throws a UnsupportedOperationException. This just becomes a problem if we want to make TermScorer extendable, so that subclasses can make use of payloads.... But actually I don't see much benefit in extending TermScorer over just extending Scorer for such a use case. What do you think?

> Clone proxStream lazily in SegmentTermPositions
> -----------------------------------------------
>
>                 Key: LUCENE-761
>                 URL: https://issues.apache.org/jira/browse/LUCENE-761
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>         Assigned To: Michael Busch
>            Priority: Minor
>
> In SegmentTermPositions the proxStream should be cloned lazily, i. e. at the first time nextPosition() is called. Then the initialization costs of TermPositions are not higher anymore compared to TermDocs and thus there is no reason anymore for Scorers to use TermDocs instead of TermPositions. In fact, all Scorers should use TermPositions, because custom subclasses of existing scorers might want to access payloads, which is only possible via TermPositions. We could further merge SegmentTermDocs and SegmentTermPositions into one class and deprecate the interface TermDocs.
> I'm going to attach a patch once the payloads feature (LUCENE-755) is committed.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Updated: (LUCENE-761) Clone proxStream lazily in SegmentTermPositions

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Busch updated LUCENE-761:
---------------------------------

    Attachment: lucene-761.patch

Here is the simple patch. All unit tests pass. I'll commit this soon if nobody objects...

> Clone proxStream lazily in SegmentTermPositions
> -----------------------------------------------
>
>                 Key: LUCENE-761
>                 URL: https://issues.apache.org/jira/browse/LUCENE-761
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>         Assigned To: Michael Busch
>            Priority: Minor
>         Attachments: lucene-761.patch
>
>
> In SegmentTermPositions the proxStream should be cloned lazily, i. e. at the first time nextPosition() is called. Then the initialization costs of TermPositions are not higher anymore compared to TermDocs and thus there is no reason anymore for Scorers to use TermDocs instead of TermPositions. In fact, all Scorers should use TermPositions, because custom subclasses of existing scorers might want to access payloads, which is only possible via TermPositions. We could further merge SegmentTermDocs and SegmentTermPositions into one class and deprecate the interface TermDocs.
> I'm going to attach a patch once the payloads feature (LUCENE-755) is committed.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Resolved: (LUCENE-761) Clone proxStream lazily in SegmentTermPositions

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Busch resolved LUCENE-761.
----------------------------------

       Resolution: Fixed
    Fix Version/s: 2.2

I just committed this.

> Clone proxStream lazily in SegmentTermPositions
> -----------------------------------------------
>
>                 Key: LUCENE-761
>                 URL: https://issues.apache.org/jira/browse/LUCENE-761
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>         Assigned To: Michael Busch
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: lucene-761.patch
>
>
> In SegmentTermPositions the proxStream should be cloned lazily, i. e. at the first time nextPosition() is called. Then the initialization costs of TermPositions are not higher anymore compared to TermDocs and thus there is no reason anymore for Scorers to use TermDocs instead of TermPositions. In fact, all Scorers should use TermPositions, because custom subclasses of existing scorers might want to access payloads, which is only possible via TermPositions. We could further merge SegmentTermDocs and SegmentTermPositions into one class and deprecate the interface TermDocs.
> I'm going to attach a patch once the payloads feature (LUCENE-755) is committed.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]