[jira] Created: (LUCENE-756) Maintain norms in a single file .nrm

classic Classic list List threaded Threaded
35 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (LUCENE-756) Maintain norms in a single file .nrm

JIRA jira@apache.org
Maintain norms in a single file .nrm
------------------------------------

                 Key: LUCENE-756
                 URL: http://issues.apache.org/jira/browse/LUCENE-756
             Project: Lucene - Java
          Issue Type: Improvement
            Reporter: Doron Cohen
            Priority: Minor


Non-compound indexes are ~10% faster at indexing, and perform 50% IO activity comparing to compound indexes. But their file descriptors foot print is much higher.

By maintaining all field norms in a single .nrm file, we can bound the number of files used by non compound indexes, and possibly allow more applications to use this format.

More details on the motivation for this in: http://www.nabble.com/potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-tf2826909.html (in particular http://www.nabble.com/Re%3A-potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-p7910403.html).


--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Assigned: (LUCENE-756) Maintain norms in a single file .nrm

JIRA jira@apache.org
     [ http://issues.apache.org/jira/browse/LUCENE-756?page=all ]

Doron Cohen reassigned LUCENE-756:
----------------------------------

    Assignee: Doron Cohen

> Maintain norms in a single file .nrm
> ------------------------------------
>
>                 Key: LUCENE-756
>                 URL: http://issues.apache.org/jira/browse/LUCENE-756
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Doron Cohen
>         Assigned To: Doron Cohen
>            Priority: Minor
>
> Non-compound indexes are ~10% faster at indexing, and perform 50% IO activity comparing to compound indexes. But their file descriptors foot print is much higher.
> By maintaining all field norms in a single .nrm file, we can bound the number of files used by non compound indexes, and possibly allow more applications to use this format.
> More details on the motivation for this in: http://www.nabble.com/potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-tf2826909.html (in particular http://www.nabble.com/Re%3A-potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-p7910403.html).

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-756) Maintain norms in a single file .nrm

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org
     [ http://issues.apache.org/jira/browse/LUCENE-756?page=all ]

Doron Cohen updated LUCENE-756:
-------------------------------

    Attachment: nrm.patch.txt

Attached patch - nrm.patch.txt - modifies field norms maintenance to a single .nrm file.

Modification is backwards compatible - existing indexes with norms in a file per norm are read. - the first merge would create a single .nrm file.

All tests pass.

No performance degtadations were observed as result of this change, but my tests so far were not very extensive.


> Maintain norms in a single file .nrm
> ------------------------------------
>
>                 Key: LUCENE-756
>                 URL: http://issues.apache.org/jira/browse/LUCENE-756
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Doron Cohen
>         Assigned To: Doron Cohen
>            Priority: Minor
>         Attachments: nrm.patch.txt
>
>
> Non-compound indexes are ~10% faster at indexing, and perform 50% IO activity comparing to compound indexes. But their file descriptors foot print is much higher.
> By maintaining all field norms in a single .nrm file, we can bound the number of files used by non compound indexes, and possibly allow more applications to use this format.
> More details on the motivation for this in: http://www.nabble.com/potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-tf2826909.html (in particular http://www.nabble.com/Re%3A-potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-p7910403.html).

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-756) Maintain norms in a single file .nrm

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org
     [ http://issues.apache.org/jira/browse/LUCENE-756?page=all ]

Doron Cohen updated LUCENE-756:
-------------------------------

    Lucene Fields: [Patch Available]  (was: [New])

> Maintain norms in a single file .nrm
> ------------------------------------
>
>                 Key: LUCENE-756
>                 URL: http://issues.apache.org/jira/browse/LUCENE-756
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Doron Cohen
>         Assigned To: Doron Cohen
>            Priority: Minor
>         Attachments: nrm.patch.txt
>
>
> Non-compound indexes are ~10% faster at indexing, and perform 50% IO activity comparing to compound indexes. But their file descriptors foot print is much higher.
> By maintaining all field norms in a single .nrm file, we can bound the number of files used by non compound indexes, and possibly allow more applications to use this format.
> More details on the motivation for this in: http://www.nabble.com/potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-tf2826909.html (in particular http://www.nabble.com/Re%3A-potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-p7910403.html).

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-756) Maintain norms in a single file .nrm

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org
     [ http://issues.apache.org/jira/browse/LUCENE-756?page=all ]

Doron Cohen updated LUCENE-756:
-------------------------------

    Component/s: Index

> Maintain norms in a single file .nrm
> ------------------------------------
>
>                 Key: LUCENE-756
>                 URL: http://issues.apache.org/jira/browse/LUCENE-756
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Doron Cohen
>         Assigned To: Doron Cohen
>            Priority: Minor
>         Attachments: nrm.patch.txt
>
>
> Non-compound indexes are ~10% faster at indexing, and perform 50% IO activity comparing to compound indexes. But their file descriptors foot print is much higher.
> By maintaining all field norms in a single .nrm file, we can bound the number of files used by non compound indexes, and possibly allow more applications to use this format.
> More details on the motivation for this in: http://www.nabble.com/potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-tf2826909.html (in particular http://www.nabble.com/Re%3A-potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-p7910403.html).

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-756) Maintain norms in a single file .nrm

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org
     [ http://issues.apache.org/jira/browse/LUCENE-756?page=all ]

Doron Cohen updated LUCENE-756:
-------------------------------

    Attachment:     (was: nrm.patch.txt)

> Maintain norms in a single file .nrm
> ------------------------------------
>
>                 Key: LUCENE-756
>                 URL: http://issues.apache.org/jira/browse/LUCENE-756
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Doron Cohen
>         Assigned To: Doron Cohen
>            Priority: Minor
>
> Non-compound indexes are ~10% faster at indexing, and perform 50% IO activity comparing to compound indexes. But their file descriptors foot print is much higher.
> By maintaining all field norms in a single .nrm file, we can bound the number of files used by non compound indexes, and possibly allow more applications to use this format.
> More details on the motivation for this in: http://www.nabble.com/potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-tf2826909.html (in particular http://www.nabble.com/Re%3A-potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-p7910403.html).

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-756) Maintain norms in a single file .nrm

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org
     [ http://issues.apache.org/jira/browse/LUCENE-756?page=all ]

Doron Cohen updated LUCENE-756:
-------------------------------

    Attachment: nrm.patch.txt

Replacing the patch file (prev file was garbage - "svn stat" instead of "svn diff").

Few words on how this patch works:
- <segment>.nrm file was added.
- addDocument  (DocumentWriter) still writes each norm to a separate file - but that's in memory,
- at merge, all norms are written to a single file.
- CFS now also maintains all norms in a single file.
- IndexWriter merge-decision now considers hasSeparateNorms() not only for CFS but also for non compound.
- SegmentReader.openNorms() still creates ready-to-use/load Norm objects (which would read the norms only when needed). But the Norm object is now assigned a normSeek value, which is nonzero if the norm file is <segment>.nrm.
- existing indexes, prior to this change, are managed the same way that segments resulted of addDocument are managed.

Tests:
- I verified that also the (contrib) tests for FieldNormModifier and LengthNormModofier are working.

Remaining:
- I might add a test.
- more benchmarking?
- update fileFormat document.

> Maintain norms in a single file .nrm
> ------------------------------------
>
>                 Key: LUCENE-756
>                 URL: http://issues.apache.org/jira/browse/LUCENE-756
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Doron Cohen
>         Assigned To: Doron Cohen
>            Priority: Minor
>         Attachments: nrm.patch.txt
>
>
> Non-compound indexes are ~10% faster at indexing, and perform 50% IO activity comparing to compound indexes. But their file descriptors foot print is much higher.
> By maintaining all field norms in a single .nrm file, we can bound the number of files used by non compound indexes, and possibly allow more applications to use this format.
> More details on the motivation for this in: http://www.nabble.com/potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-tf2826909.html (in particular http://www.nabble.com/Re%3A-potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-p7910403.html).

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-756) Maintain norms in a single file .nrm

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org
    [ http://issues.apache.org/jira/browse/LUCENE-756?page=comments#action_12460285 ]
           
Yonik Seeley commented on LUCENE-756:
-------------------------------------

Seems like a good idea... given that norms are read once on-demand, I wouldn't expect anything search related to be slower with this.  Opening a new reader should actually be slightly faster due to fewer files to open.


> Maintain norms in a single file .nrm
> ------------------------------------
>
>                 Key: LUCENE-756
>                 URL: http://issues.apache.org/jira/browse/LUCENE-756
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Doron Cohen
>         Assigned To: Doron Cohen
>            Priority: Minor
>         Attachments: nrm.patch.txt
>
>
> Non-compound indexes are ~10% faster at indexing, and perform 50% IO activity comparing to compound indexes. But their file descriptors foot print is much higher.
> By maintaining all field norms in a single .nrm file, we can bound the number of files used by non compound indexes, and possibly allow more applications to use this format.
> More details on the motivation for this in: http://www.nabble.com/potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-tf2826909.html (in particular http://www.nabble.com/Re%3A-potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-p7910403.html).

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-756) Maintain norms in a single file .nrm

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org
    [ http://issues.apache.org/jira/browse/LUCENE-756?page=comments#action_12460287 ]
           
Yonik Seeley commented on LUCENE-756:
-------------------------------------

> - CFS now also maintains all norms in a single file.

Does this mean a separate file outside the final .cfs files?

> Maintain norms in a single file .nrm
> ------------------------------------
>
>                 Key: LUCENE-756
>                 URL: http://issues.apache.org/jira/browse/LUCENE-756
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Doron Cohen
>         Assigned To: Doron Cohen
>            Priority: Minor
>         Attachments: nrm.patch.txt
>
>
> Non-compound indexes are ~10% faster at indexing, and perform 50% IO activity comparing to compound indexes. But their file descriptors foot print is much higher.
> By maintaining all field norms in a single .nrm file, we can bound the number of files used by non compound indexes, and possibly allow more applications to use this format.
> More details on the motivation for this in: http://www.nabble.com/potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-tf2826909.html (in particular http://www.nabble.com/Re%3A-potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-p7910403.html).

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-756) Maintain norms in a single file .nrm

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org
    [ http://issues.apache.org/jira/browse/LUCENE-756?page=comments#action_12460292 ]
           
Doron Cohen commented on LUCENE-756:
------------------------------------

> Does this mean a separate file outside the final .cfs files?

Oh no - there's a single .nrm file in the .cfs file (instead of multiple .fN files in the .cfs file).
As before, only .sN files (separated norm files) are outside of .cfs file.


> Maintain norms in a single file .nrm
> ------------------------------------
>
>                 Key: LUCENE-756
>                 URL: http://issues.apache.org/jira/browse/LUCENE-756
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Doron Cohen
>         Assigned To: Doron Cohen
>            Priority: Minor
>         Attachments: nrm.patch.txt
>
>
> Non-compound indexes are ~10% faster at indexing, and perform 50% IO activity comparing to compound indexes. But their file descriptors foot print is much higher.
> By maintaining all field norms in a single .nrm file, we can bound the number of files used by non compound indexes, and possibly allow more applications to use this format.
> More details on the motivation for this in: http://www.nabble.com/potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-tf2826909.html (in particular http://www.nabble.com/Re%3A-potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-p7910403.html).

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-756) Maintain norms in a single file .nrm

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org
    [ http://issues.apache.org/jira/browse/LUCENE-756?page=comments#action_12460313 ]
           
Doug Cutting commented on LUCENE-756:
-------------------------------------

Since we're adding a new file, shouldn't we give it a header, so that it's format can be revised?  Something like:
  new byte[] {'N','R','M',VERSION}
as the first four bytes.  We might someday decide to change the representation used, e.g., a different one-byte-float format, or permit higher resolution, or compression, or somesuch.

Also, should we use a constant for ".nrm" extension, so that it's checked at compile-time?

> Maintain norms in a single file .nrm
> ------------------------------------
>
>                 Key: LUCENE-756
>                 URL: http://issues.apache.org/jira/browse/LUCENE-756
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Doron Cohen
>         Assigned To: Doron Cohen
>            Priority: Minor
>         Attachments: nrm.patch.txt
>
>
> Non-compound indexes are ~10% faster at indexing, and perform 50% IO activity comparing to compound indexes. But their file descriptors foot print is much higher.
> By maintaining all field norms in a single .nrm file, we can bound the number of files used by non compound indexes, and possibly allow more applications to use this format.
> More details on the motivation for this in: http://www.nabble.com/potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-tf2826909.html (in particular http://www.nabble.com/Re%3A-potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-p7910403.html).

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-756) Maintain norms in a single file .nrm

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org
    [ http://issues.apache.org/jira/browse/LUCENE-756?page=comments#action_12460316 ]
           
Doron Cohen commented on LUCENE-756:
------------------------------------

Thanks for the comments, Doug.
You're right of course, I will add both the header and the constant.
(that would be either today or only in a week from now.)

> Maintain norms in a single file .nrm
> ------------------------------------
>
>                 Key: LUCENE-756
>                 URL: http://issues.apache.org/jira/browse/LUCENE-756
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Doron Cohen
>         Assigned To: Doron Cohen
>            Priority: Minor
>         Attachments: nrm.patch.txt
>
>
> Non-compound indexes are ~10% faster at indexing, and perform 50% IO activity comparing to compound indexes. But their file descriptors foot print is much higher.
> By maintaining all field norms in a single .nrm file, we can bound the number of files used by non compound indexes, and possibly allow more applications to use this format.
> More details on the motivation for this in: http://www.nabble.com/potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-tf2826909.html (in particular http://www.nabble.com/Re%3A-potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-p7910403.html).

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-756) Maintain norms in a single file .nrm

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org
     [ http://issues.apache.org/jira/browse/LUCENE-756?page=all ]

Doron Cohen updated LUCENE-756:
-------------------------------

    Attachment: nrm.patch.2.txt

nrm.patch.2.txt:

Updated as Doug suggested:
- ".nrm" extension now maintained in a constant .
- .nrm file now has a 4 bytes header.

And, fileFormat document is updated.

Also, I checked again that the seeks for the various field norms are lazy - performed only when bytes are actually read with refill().




> Maintain norms in a single file .nrm
> ------------------------------------
>
>                 Key: LUCENE-756
>                 URL: http://issues.apache.org/jira/browse/LUCENE-756
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Doron Cohen
>         Assigned To: Doron Cohen
>            Priority: Minor
>         Attachments: nrm.patch.2.txt, nrm.patch.txt
>
>
> Non-compound indexes are ~10% faster at indexing, and perform 50% IO activity comparing to compound indexes. But their file descriptors foot print is much higher.
> By maintaining all field norms in a single .nrm file, we can bound the number of files used by non compound indexes, and possibly allow more applications to use this format.
> More details on the motivation for this in: http://www.nabble.com/potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-tf2826909.html (in particular http://www.nabble.com/Re%3A-potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-p7910403.html).

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-756) Maintain norms in a single file .nrm

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462069 ]

Doron Cohen commented on LUCENE-756:
------------------------------------

I am updating the patch (nrm.patch.3.txt):

- using a single constant for the norms file extension:
  static final String NORMS_EXTENSION = "nrm";
(This is more in line with existing extension constants in the code.)
(As a side comment, there are various extension names (e.g. ".cfs") in the code that are also candidate for factoring as a constant, but this is a separate issue.)

- adding a test - TestNorms
This test verifies that norm values assigned with field.setBoost() are preserved during the life cycle of an index, including adding documents, updating norms values (separate norms), addIndexes(), and optimize.

All tests pass.
On my side this is ready to go in.


> Maintain norms in a single file .nrm
> ------------------------------------
>
>                 Key: LUCENE-756
>                 URL: https://issues.apache.org/jira/browse/LUCENE-756
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Doron Cohen
>         Assigned To: Doron Cohen
>            Priority: Minor
>         Attachments: nrm.patch.2.txt, nrm.patch.txt
>
>
> Non-compound indexes are ~10% faster at indexing, and perform 50% IO activity comparing to compound indexes. But their file descriptors foot print is much higher.
> By maintaining all field norms in a single .nrm file, we can bound the number of files used by non compound indexes, and possibly allow more applications to use this format.
> More details on the motivation for this in: http://www.nabble.com/potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-tf2826909.html (in particular http://www.nabble.com/Re%3A-potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-p7910403.html).

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-756) Maintain norms in a single file .nrm

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen updated LUCENE-756:
-------------------------------

    Attachment: nrm.patch.3.txt

> Maintain norms in a single file .nrm
> ------------------------------------
>
>                 Key: LUCENE-756
>                 URL: https://issues.apache.org/jira/browse/LUCENE-756
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Doron Cohen
>         Assigned To: Doron Cohen
>            Priority: Minor
>         Attachments: nrm.patch.2.txt, nrm.patch.3.txt, nrm.patch.txt
>
>
> Non-compound indexes are ~10% faster at indexing, and perform 50% IO activity comparing to compound indexes. But their file descriptors foot print is much higher.
> By maintaining all field norms in a single .nrm file, we can bound the number of files used by non compound indexes, and possibly allow more applications to use this format.
> More details on the motivation for this in: http://www.nabble.com/potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-tf2826909.html (in particular http://www.nabble.com/Re%3A-potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-p7910403.html).

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (LUCENE-756) Maintain norms in a single file .nrm

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley resolved LUCENE-756.
---------------------------------

    Resolution: Fixed

Committed.  Thanks Doron!

> Maintain norms in a single file .nrm
> ------------------------------------
>
>                 Key: LUCENE-756
>                 URL: https://issues.apache.org/jira/browse/LUCENE-756
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Doron Cohen
>         Assigned To: Doron Cohen
>            Priority: Minor
>         Attachments: nrm.patch.2.txt, nrm.patch.3.txt, nrm.patch.txt
>
>
> Non-compound indexes are ~10% faster at indexing, and perform 50% IO activity comparing to compound indexes. But their file descriptors foot print is much higher.
> By maintaining all field norms in a single .nrm file, we can bound the number of files used by non compound indexes, and possibly allow more applications to use this format.
> More details on the motivation for this in: http://www.nabble.com/potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-tf2826909.html (in particular http://www.nabble.com/Re%3A-potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-p7910403.html).

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-756) Maintain norms in a single file .nrm

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462774 ]

Doron Cohen commented on LUCENE-756:
------------------------------------

Thanks for commiting this Yonik!

Seems the added test TestNorms was not commited..?

> Maintain norms in a single file .nrm
> ------------------------------------
>
>                 Key: LUCENE-756
>                 URL: https://issues.apache.org/jira/browse/LUCENE-756
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Doron Cohen
>         Assigned To: Doron Cohen
>            Priority: Minor
>         Attachments: nrm.patch.2.txt, nrm.patch.3.txt, nrm.patch.txt
>
>
> Non-compound indexes are ~10% faster at indexing, and perform 50% IO activity comparing to compound indexes. But their file descriptors foot print is much higher.
> By maintaining all field norms in a single .nrm file, we can bound the number of files used by non compound indexes, and possibly allow more applications to use this format.
> More details on the motivation for this in: http://www.nabble.com/potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-tf2826909.html (in particular http://www.nabble.com/Re%3A-potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-p7910403.html).

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-756) Maintain norms in a single file .nrm

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462914 ]

Yonik Seeley commented on LUCENE-756:
-------------------------------------

Hmmm, I actually did an "svn status" to see if there was anything to add too.
Problem is, my current tree is too messy and I missed it.
Thanks for the double-check.

> Maintain norms in a single file .nrm
> ------------------------------------
>
>                 Key: LUCENE-756
>                 URL: https://issues.apache.org/jira/browse/LUCENE-756
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Doron Cohen
>         Assigned To: Doron Cohen
>            Priority: Minor
>         Attachments: nrm.patch.2.txt, nrm.patch.3.txt, nrm.patch.txt
>
>
> Non-compound indexes are ~10% faster at indexing, and perform 50% IO activity comparing to compound indexes. But their file descriptors foot print is much higher.
> By maintaining all field norms in a single .nrm file, we can bound the number of files used by non compound indexes, and possibly allow more applications to use this format.
> More details on the motivation for this in: http://www.nabble.com/potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-tf2826909.html (in particular http://www.nabble.com/Re%3A-potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-p7910403.html).

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Reopened: (LUCENE-756) Maintain norms in a single file .nrm

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless reopened LUCENE-756:
---------------------------------------


I would like to propose some small improvements to this nice feature.

I've worked out a patch (will attach shortly).  Doron if you agree /
or we can iterate then I'll commit it!  Thanks.

Proposed changes:

  * Renamed "withNrm()" to "getHasMergedNorms" to be more
    descriptive.  Also changed the field to "hasMergedNorms".

  * Explicitly store "hasMergedNorms" in the segments_N file.

    I think in general we should favor storing things like this
    explicitly instead of relying on IO operations (fileExists).
    We've made great progress lately in reducing such IO operations so
    I'd like to keep that up when possible :)

    I created a new FORMAT_MERGED_NORMS in SegmentInfos for this.  The
    change is fully backwards compatible (old indices work fine).  I
    extended TestBackwardsCompatibility to test this.

    This then has the nice side effect of not having to create the
    fleeting CompoundFileReader in "SegmentInfo.getHasMergedNorms"
    (which was somewhat spooky to me) for indices written to after
    this is committed.  For indices written to before this gets
    committed but after the first version was committed (10 days ago),
    the check is still needed so I've left it in there with a comment.

  * Fixed the TestDoc unit test to actually create & return
    SegmentInfo's vs recreating a new SegmentInfo every time (which
    causes problems whenever we add something to SegmentInfo).  This
    is still a correct test but more scalable with time as we make
    changes to SegmentInfo.


> Maintain norms in a single file .nrm
> ------------------------------------
>
>                 Key: LUCENE-756
>                 URL: https://issues.apache.org/jira/browse/LUCENE-756
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Doron Cohen
>         Assigned To: Doron Cohen
>            Priority: Minor
>         Attachments: index.premergednorms.cfs.zip, index.premergednorms.nocfs.zip, LUCENE-756-Jan16.patch, nrm.patch.2.txt, nrm.patch.3.txt, nrm.patch.txt
>
>
> Non-compound indexes are ~10% faster at indexing, and perform 50% IO activity comparing to compound indexes. But their file descriptors foot print is much higher.
> By maintaining all field norms in a single .nrm file, we can bound the number of files used by non compound indexes, and possibly allow more applications to use this format.
> More details on the motivation for this in: http://www.nabble.com/potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-tf2826909.html (in particular http://www.nabble.com/Re%3A-potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-p7910403.html).

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-756) Maintain norms in a single file .nrm

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-756:
--------------------------------------

    Attachment: LUCENE-756-Jan16.patch

> Maintain norms in a single file .nrm
> ------------------------------------
>
>                 Key: LUCENE-756
>                 URL: https://issues.apache.org/jira/browse/LUCENE-756
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Doron Cohen
>         Assigned To: Doron Cohen
>            Priority: Minor
>         Attachments: index.premergednorms.cfs.zip, index.premergednorms.nocfs.zip, LUCENE-756-Jan16.patch, nrm.patch.2.txt, nrm.patch.3.txt, nrm.patch.txt
>
>
> Non-compound indexes are ~10% faster at indexing, and perform 50% IO activity comparing to compound indexes. But their file descriptors foot print is much higher.
> By maintaining all field norms in a single .nrm file, we can bound the number of files used by non compound indexes, and possibly allow more applications to use this format.
> More details on the motivation for this in: http://www.nabble.com/potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-tf2826909.html (in particular http://www.nabble.com/Re%3A-potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-p7910403.html).

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

12