[jira] [Created] (LUCENE-3070) Enable DocValues by default for every Codec

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (LUCENE-3070) Enable DocValues by default for every Codec

JIRA jira@apache.org
Enable DocValues by default for every Codec
-------------------------------------------

                 Key: LUCENE-3070
                 URL: https://issues.apache.org/jira/browse/LUCENE-3070
             Project: Lucene - Java
          Issue Type: Task
          Components: Index
    Affects Versions: CSF branch
            Reporter: Simon Willnauer
             Fix For: CSF branch


Currently DocValues are enable with a wrapper Codec so each codec which needs DocValues must be wrapped by DocValuesCodec. The DocValues writer and reader should be moved to Codec to be enabled by default.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (LUCENE-3070) Enable DocValues by default for every Codec

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-3070:
--------------------------------

    Attachment: LUCENE-3070.patch

first taking a look at the branch, just a really quick hack at a patch adding docsConsumer() and docsProducer() to Codec, with some renaming to fit (DefaultDocValuesConsumer/Producer) and enabling this for all codecs (except preflex)

all tests pass, but there might be some naming/api issues... additionally maybe we should address the TestSort issue, we want to enable this test (without docvalues) on Preflex... maybe we should have an explicit test for that.

also we need a plain text impl for simpletext :)

> Enable DocValues by default for every Codec
> -------------------------------------------
>
>                 Key: LUCENE-3070
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3070
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: Index
>    Affects Versions: CSF branch
>            Reporter: Simon Willnauer
>             Fix For: CSF branch
>
>         Attachments: LUCENE-3070.patch
>
>
> Currently DocValues are enable with a wrapper Codec so each codec which needs DocValues must be wrapped by DocValuesCodec. The DocValues writer and reader should be moved to Codec to be enabled by default.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (LUCENE-3070) Enable DocValues by default for every Codec

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029233#comment-13029233 ]

Simon Willnauer commented on LUCENE-3070:
-----------------------------------------

Robert patch looks great!

some comments:
 * the simpletext nocommit should be a TODO IMO
 * for the preflex problem I think we need to add some infrastructure to add tests for 4.0 features somehow I will think about this
 * one problem we are having here is that our current implementation is somewhat wasteful. Currently on flush we pull a FieldsConsumer for every codec used in the indexing session (per DWPT) regardless if this field is indexed. so we are creating some unneeded files if you use one field for docvalues only. The other thing is that we need to somehow reset the FieldInfo#hasDocValues "flag" on an error when we are hitting non-aborting exceptions during indexing before we can actually create the corresponding consumer. That is something we should address in a spin-off issue I think.

overall I think you should commit the current state and we work from here!


> Enable DocValues by default for every Codec
> -------------------------------------------
>
>                 Key: LUCENE-3070
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3070
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: Index
>    Affects Versions: CSF branch
>            Reporter: Simon Willnauer
>             Fix For: CSF branch
>
>         Attachments: LUCENE-3070.patch
>
>
> Currently DocValues are enable with a wrapper Codec so each codec which needs DocValues must be wrapped by DocValuesCodec. The DocValues writer and reader should be moved to Codec to be enabled by default.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (LUCENE-3070) Enable DocValues by default for every Codec

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029237#comment-13029237 ]

Simon Willnauer commented on LUCENE-3070:
-----------------------------------------

one more think I think preflex should throw UOE instead of returning null... At some point we should also think about a better name for Source, something like InMemoryDocValues or RamResidentDocValues - something more self speaking

> Enable DocValues by default for every Codec
> -------------------------------------------
>
>                 Key: LUCENE-3070
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3070
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: Index
>    Affects Versions: CSF branch
>            Reporter: Simon Willnauer
>             Fix For: CSF branch
>
>         Attachments: LUCENE-3070.patch
>
>
> Currently DocValues are enable with a wrapper Codec so each codec which needs DocValues must be wrapped by DocValuesCodec. The DocValues writer and reader should be moved to Codec to be enabled by default.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Assigned] (LUCENE-3070) Enable DocValues by default for every Codec

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer reassigned LUCENE-3070:
---------------------------------------

    Assignee: Simon Willnauer

> Enable DocValues by default for every Codec
> -------------------------------------------
>
>                 Key: LUCENE-3070
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3070
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: Index
>    Affects Versions: CSF branch
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: CSF branch
>
>         Attachments: LUCENE-3070.patch
>
>
> Currently DocValues are enable with a wrapper Codec so each codec which needs DocValues must be wrapped by DocValuesCodec. The DocValues writer and reader should be moved to Codec to be enabled by default.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (LUCENE-3070) Enable DocValues by default for every Codec

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3070:
------------------------------------

    Attachment: LUCENE-3070.patch

This patch adds UOE to PreFlex codec and makes FieldInfo#docValues transactional to prevent wrong flags if non-aborting exceptions occur.

I also added some random docValues fields to RandomIndexWriter as well as some basic checks to CheckIndex. It's not perfect though but it's a start.

> Enable DocValues by default for every Codec
> -------------------------------------------
>
>                 Key: LUCENE-3070
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3070
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: Index
>    Affects Versions: CSF branch
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: CSF branch
>
>         Attachments: LUCENE-3070.patch, LUCENE-3070.patch
>
>
> Currently DocValues are enable with a wrapper Codec so each codec which needs DocValues must be wrapped by DocValuesCodec. The DocValues writer and reader should be moved to Codec to be enabled by default.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (LUCENE-3070) Enable DocValues by default for every Codec

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033970#comment-13033970 ]

Robert Muir commented on LUCENE-3070:
-------------------------------------

Seems like it might be a good idea in RandomIndexWriter to sometimes not add docvalues?


> Enable DocValues by default for every Codec
> -------------------------------------------
>
>                 Key: LUCENE-3070
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3070
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: Index
>    Affects Versions: CSF branch
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: CSF branch
>
>         Attachments: LUCENE-3070.patch, LUCENE-3070.patch
>
>
> Currently DocValues are enable with a wrapper Codec so each codec which needs DocValues must be wrapped by DocValuesCodec. The DocValues writer and reader should be moved to Codec to be enabled by default.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (LUCENE-3070) Enable DocValues by default for every Codec

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033971#comment-13033971 ]

Simon Willnauer commented on LUCENE-3070:
-----------------------------------------

bq. Seems like it might be a good idea in RandomIndexWriter to sometimes not add docvalues?

yeah I think we should make this per RIW session not per document though since we already have random DocValues Types so some docs might get docvalues_int_xyz and some might get docvalues_float_xyz fields.

> Enable DocValues by default for every Codec
> -------------------------------------------
>
>                 Key: LUCENE-3070
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3070
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: Index
>    Affects Versions: CSF branch
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: CSF branch
>
>         Attachments: LUCENE-3070.patch, LUCENE-3070.patch
>
>
> Currently DocValues are enable with a wrapper Codec so each codec which needs DocValues must be wrapped by DocValuesCodec. The DocValues writer and reader should be moved to Codec to be enabled by default.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (LUCENE-3070) Enable DocValues by default for every Codec

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3070:
------------------------------------

    Attachment: LUCENE-3070.patch

new patch, I added random DocValues to updateDocument and randomly enable / disable docValues entirely on optimize / commit / getReader so we get segments that don't have docValues at all etc. I think I will commit soon if nobody objects.

> Enable DocValues by default for every Codec
> -------------------------------------------
>
>                 Key: LUCENE-3070
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3070
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: Index
>    Affects Versions: CSF branch
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: CSF branch
>
>         Attachments: LUCENE-3070.patch, LUCENE-3070.patch, LUCENE-3070.patch
>
>
> Currently DocValues are enable with a wrapper Codec so each codec which needs DocValues must be wrapped by DocValuesCodec. The DocValues writer and reader should be moved to Codec to be enabled by default.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (LUCENE-3070) Enable DocValues by default for every Codec

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033977#comment-13033977 ]

Robert Muir commented on LUCENE-3070:
-------------------------------------

looks good, i think this will help the test coverage a lot.

can you rename swtichDoDocValues to switchDoDocValues? :)

> Enable DocValues by default for every Codec
> -------------------------------------------
>
>                 Key: LUCENE-3070
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3070
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: Index
>    Affects Versions: CSF branch
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: CSF branch
>
>         Attachments: LUCENE-3070.patch, LUCENE-3070.patch, LUCENE-3070.patch
>
>
> Currently DocValues are enable with a wrapper Codec so each codec which needs DocValues must be wrapped by DocValuesCodec. The DocValues writer and reader should be moved to Codec to be enabled by default.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (LUCENE-3070) Enable DocValues by default for every Codec

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3070:
------------------------------------

    Attachment: LUCENE-3070.patch

fixed typo - I will commit in a second.

> Enable DocValues by default for every Codec
> -------------------------------------------
>
>                 Key: LUCENE-3070
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3070
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: Index
>    Affects Versions: CSF branch
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: CSF branch
>
>         Attachments: LUCENE-3070.patch, LUCENE-3070.patch, LUCENE-3070.patch, LUCENE-3070.patch
>
>
> Currently DocValues are enable with a wrapper Codec so each codec which needs DocValues must be wrapped by DocValuesCodec. The DocValues writer and reader should be moved to Codec to be enabled by default.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Resolved] (LUCENE-3070) Enable DocValues by default for every Codec

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer resolved LUCENE-3070.
-------------------------------------

       Resolution: Fixed
    Lucene Fields: [New, Patch Available]  (was: [New])

> Enable DocValues by default for every Codec
> -------------------------------------------
>
>                 Key: LUCENE-3070
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3070
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: Index
>    Affects Versions: CSF branch
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: CSF branch
>
>         Attachments: LUCENE-3070.patch, LUCENE-3070.patch, LUCENE-3070.patch, LUCENE-3070.patch
>
>
> Currently DocValues are enable with a wrapper Codec so each codec which needs DocValues must be wrapped by DocValuesCodec. The DocValues writer and reader should be moved to Codec to be enabled by default.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]