[jira] Created: (LUCENE-2662) BytesHash

classic Classic list List threaded Threaded
50 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (LUCENE-2662) BytesHash

JIRA jira@apache.org
BytesHash
---------

                 Key: LUCENE-2662
                 URL: https://issues.apache.org/jira/browse/LUCENE-2662
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Index
    Affects Versions: Realtime Branch
            Reporter: Jason Rutherglen
             Fix For: Realtime Branch


This issue will have the BytesHash separated out from LUCENE-2186

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-2662) BytesHash

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-2662:
-------------------------------------

    Priority: Minor  (was: Major)

> BytesHash
> ---------
>
>                 Key: LUCENE-2662
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2662
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: Realtime Branch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-2662) BytesHash

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-2662:
-------------------------------------

    Attachment: LUCENE-2662.patch

We need unit tests and a base implementation as BytesHash is abstract...

> BytesHash
> ---------
>
>                 Key: LUCENE-2662
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2662
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2662) BytesHash

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913589#action_12913589 ]

Jason Rutherglen commented on LUCENE-2662:
------------------------------------------

The current hash implementation needs to be separated out of TermsHashPerField.  

> BytesHash
> ---------
>
>                 Key: LUCENE-2662
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2662
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2662) BytesHash

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913599#action_12913599 ]

Robert Muir commented on LUCENE-2662:
-------------------------------------

Jason: I am confused... there is no hash impl in TermsHashPerField.

the hashing, and term encoding and other things, is the responsibility of the analysis chain (TermToBytesRefAttribute):
{code}
    // Get the text & hash of this term.
    int code = termAtt.toBytesRef(utf8);
{code}

this way, implementations can 'hash-as-they-go' like we do when encoding unicode char[] -> byte[],
or they can simply return BytesRef.hashCode() if they don't have an optimized implementation.


> BytesHash
> ---------
>
>                 Key: LUCENE-2662
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2662
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2662) BytesHash

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913622#action_12913622 ]

Jason Rutherglen commented on LUCENE-2662:
------------------------------------------

The THPF is hashing tokens for use in the indexing RAM buffer and the creation of postings, ie, the lookup of term byte[]s to term ids.  The hash component is currently interwoven into THPF.  

Here's some of the variables being used in THPF.

{code}
private int postingsHashSize = 4;
private int postingsHashHalfSize = postingsHashSize/2;
private int postingsHashMask = postingsHashSize-1;
private int[] postingsHash;
{code}

Also there's the methods rehashPostings, shrinkHash, postingEquals, and add(int textStart) has the lookup.  

We'll probably also need to separate out the quick sort implementation in THPF, I'll add that to this issue.

> BytesHash
> ---------
>
>                 Key: LUCENE-2662
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2662
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2662) BytesHash

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913628#action_12913628 ]

Robert Muir commented on LUCENE-2662:
-------------------------------------

Jason: what I am saying is if i look at the method in your patch:

public T add(BytesRef bytes)

the first thing it does is compute the hash, but this is already computed in the analysis chain.

why not have
{code}
public T add(BytesRef bytes, int hashCode)
{code}

and also:
{code}
public T add(BytesRef bytes) {
  return add(bytes, bytes.hashCode());
}
{code}

then we can avoid computing this twice, and keep the optimization in UnicodeUtil


> BytesHash
> ---------
>
>                 Key: LUCENE-2662
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2662
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2662) BytesHash

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913632#action_12913632 ]

Jason Rutherglen commented on LUCENE-2662:
------------------------------------------

Ah, ok, I didn't write this code, I extracted it from LUCENE-2186, and nice, you reviewed it can be improved.  I'll make changes to it shortly, hopefully.

> BytesHash
> ---------
>
>                 Key: LUCENE-2662
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2662
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2662) BytesHash

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913636#action_12913636 ]

Simon Willnauer commented on LUCENE-2662:
-----------------------------------------

jason, can you please hold off with this since I have newer / different versions of this class already with tests etc. I understand that you need that class but creating all these issues and rushing ahead is rather counter productive.

@Robert: this class is standalone in this patch and doesn't know about the analysis chain. But thanks for the comments I will incorporate them.

simon

> BytesHash
> ---------
>
>                 Key: LUCENE-2662
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2662
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2662) BytesHash

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913638#action_12913638 ]

Jason Rutherglen commented on LUCENE-2662:
------------------------------------------

Simon, when do you think you'll be posting?

> BytesHash
> ---------
>
>                 Key: LUCENE-2662
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2662
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2662) BytesHash

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913642#action_12913642 ]

Simon Willnauer commented on LUCENE-2662:
-----------------------------------------

bq. Simon, when do you think you'll be posting?

maybe within the next week I have a busy schedule but does this patch keep you from doing any work? You shouldn't just pull out stuff from 1 month old patches especially as you don't even give me time to reply on the orig. discussion.

Any rush on this?

> BytesHash
> ---------
>
>                 Key: LUCENE-2662
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2662
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2662) BytesHash

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913651#action_12913651 ]

Jason Rutherglen commented on LUCENE-2662:
------------------------------------------

It'd be nice to get deletes working, ie, LUCENE-2655 and move forward in a way that's useful long term.  What changes have you made?

> BytesHash
> ---------
>
>                 Key: LUCENE-2662
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2662
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Assigned: (LUCENE-2662) BytesHash

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer reassigned LUCENE-2662:
---------------------------------------

    Assignee: Simon Willnauer

> BytesHash
> ---------
>
>                 Key: LUCENE-2662
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2662
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch, 4.0
>            Reporter: Jason Rutherglen
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: Realtime Branch, 4.0
>
>         Attachments: LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-2662) BytesHash

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-2662:
------------------------------------

        Fix Version/s: 4.0
    Affects Version/s: 4.0

> BytesHash
> ---------
>
>                 Key: LUCENE-2662
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2662
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch, 4.0
>            Reporter: Jason Rutherglen
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: Realtime Branch, 4.0
>
>         Attachments: LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-2662) BytesHash

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-2662:
------------------------------------

    Attachment: LUCENE-2662.patch

This patch contains a slightly different version of BytesHash (renamed it to BytesRefHash but that is to be discussed - while writing this I actually think BytesHash is the better name).  BytesRefHash is now final and does not create Entry objects anymore. Internally it maintains two integer arrays one acting as the hash buckets and the other one contain the bytes-start offset in the ByteBlockPool. Each added entry is assigned to an increasing ordinal since this is what Entry is used in almost all use-cases (in CSF though). For TermsHashPerField this is also "native" since is uses the same kind of referencing system.

These changes keep this class as efficient as possible, keeping GC costs low and allows JIT to do better optimizations. IMO this class is super performance critical and since we recently refactored indexing towards parallel arrays adding another "object" array might not be the way to go anyway.

I also incorporated robers comments - thanks for the review anyway. I guess that is the first step towards factoring it out of TermsHashPerField, the next question is are we gonna do that in a different issue and get this committed first?

comments / review welcome!!

One more thing, I did not move ByteBlockPool to o.a.l.utils but I thing it belongs there, thoughts?

> BytesHash
> ---------
>
>                 Key: LUCENE-2662
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2662
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch, 4.0
>            Reporter: Jason Rutherglen
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: Realtime Branch, 4.0
>
>         Attachments: LUCENE-2662.patch, LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2662) BytesHash

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914452#action_12914452 ]

Robert Muir commented on LUCENE-2662:
-------------------------------------

bq. I guess that is the first step towards factoring it out of TermsHashPerField, the next question is are we gonna do that in a different issue and get this committed first?

I think it would be better if this class were used in the patch... i wouldn't commit it by itself unused. Its difficult for people to review its behavior, since its just a standalone unused thing (for instance, the hashCode thing i brought up)


> BytesHash
> ---------
>
>                 Key: LUCENE-2662
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2662
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch, 4.0
>            Reporter: Jason Rutherglen
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: Realtime Branch, 4.0
>
>         Attachments: LUCENE-2662.patch, LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2662) BytesHash

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914478#action_12914478 ]

Jason Rutherglen commented on LUCENE-2662:
------------------------------------------

> BytesRefHash is now final and does not create Entry objects anymore

That's good.

> move ByteBlockPool to o.a.l.utils

Sure why not.

> factoring it out of TermsHashPerField, the next question is are we gonna do that in a different issue and get this committed first?

We need to factor it out of THPF otherwise this patch isn't really useful for committing.  Also, it'll get tested through the entirety of the unit tests, ie, it'll get put through the laundry.  

> BytesHash
> ---------
>
>                 Key: LUCENE-2662
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2662
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch, 4.0
>            Reporter: Jason Rutherglen
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: Realtime Branch, 4.0
>
>         Attachments: LUCENE-2662.patch, LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2662) BytesHash

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914486#action_12914486 ]

Simon Willnauer commented on LUCENE-2662:
-----------------------------------------

bq. We need to factor it out of THPF otherwise this patch isn't really useful for committing. Also, it'll get tested through the entirety of the unit tests, ie, it'll get put through the laundry.

Yeah, lets see this as the first baby step towards it. I will move ByteBockPool to o.a.l.utils and start cutting THPF over to it. We need to do benchmarking in any case just to make sure JIT doesn't play nasty tricks with us again.

simon

> BytesHash
> ---------
>
>                 Key: LUCENE-2662
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2662
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch, 4.0
>            Reporter: Jason Rutherglen
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: Realtime Branch, 4.0
>
>         Attachments: LUCENE-2662.patch, LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2662) BytesHash

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914521#action_12914521 ]

Jason Rutherglen commented on LUCENE-2662:
------------------------------------------

bq. make sure JIT doesn't play nasty tricks with us again.

What would we do if this happens?

> BytesHash
> ---------
>
>                 Key: LUCENE-2662
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2662
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch, 4.0
>            Reporter: Jason Rutherglen
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: Realtime Branch, 4.0
>
>         Attachments: LUCENE-2662.patch, LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2662) BytesHash

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914621#action_12914621 ]

Michael McCandless commented on LUCENE-2662:
--------------------------------------------

Patch looks good Simon -- some ideas:



  * In the class jdocs, I think state that this is basically a
    Map<BytesRef,int>?

  * Maybe we also move ByteBlockPool --> oal.util?

  * Maybe move out the ByteBlockAllocator to its own class (in util)?
    RecyclingByteBlockAllocator?

  * Can we have DocumentsWriter share the ByteBlockAllocator?  (Right
    now it's dup'd code since DW also implements this).

  * Maybe rename ords -> keys?  And hash -> values?  (The key isn't
    really an "ord" (I think?) because it increases by more than 1
    each time... it's more like an address since it references an
    address in the byte-pool space).

  * We should advertise the limits in the jdocs -- limited to <= 2GB
    total byte storage, each key must be <= BLOCK SIZE-2 in length.

  * Can we have sortedEntries() not allocate a new iterator object?
    Ie, just return the sorted bytesStart int[]?  (This is what's done
    today, and, for term vectors on small docs, this method is pretty
    hot).  And the javadocs for this should be stronger -- it's not
    that the behaviour is undefined after, it's that you must .clear()
    after you're done consume the sorted entries.


> BytesHash
> ---------
>
>                 Key: LUCENE-2662
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2662
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch, 4.0
>            Reporter: Jason Rutherglen
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: Realtime Branch, 4.0
>
>         Attachments: LUCENE-2662.patch, LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

123