[jira] Created: (SOLR-1961) Use Lucene's Field Cache To Retrieve Stored Fields From Memory

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (SOLR-1961) Use Lucene's Field Cache To Retrieve Stored Fields From Memory

JIRA jira@apache.org
Use Lucene's Field Cache To Retrieve Stored Fields From Memory
--------------------------------------------------------------

                 Key: SOLR-1961
                 URL: https://issues.apache.org/jira/browse/SOLR-1961
             Project: Solr
          Issue Type: New Feature
          Components: search
    Affects Versions: 1.4
            Reporter: Stephen Bochinski


This allows the user to configure which fields should be field cached in the schema.xml file by adding the following attribute:
fieldCached="true"

Enabling this on a field greatly decreases the time needed to retrieve stored fields. This works on fields containing Bytes, Strings, Integers, Longs, and Floats. Enabling field cache is applicable in many scenarios. For instance, if you have a bunch of text that is indexed and not stored and you only need to retrieve a string or number associated with a document. Its applicable in any case where there are many indexed fields and not too many stored fields being retrieved. The memory consumption is not very high compared to the performance gains field cache brings.

Memory consumption is governed by: Number of fields cached * Number of documents *  8 bytes per reference +
        SUM(Number of unique values of the field  * average length of term)
        * 2 (chars use 2 bytes) * String overhead (40 bytes)


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-1961) Use Lucene's Field Cache To Retrieve Stored Fields From Memory

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/SOLR-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stephen Bochinski updated SOLR-1961:
------------------------------------

    Attachment: patch.txt

code changes.

> Use Lucene's Field Cache To Retrieve Stored Fields From Memory
> --------------------------------------------------------------
>
>                 Key: SOLR-1961
>                 URL: https://issues.apache.org/jira/browse/SOLR-1961
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.4
>            Reporter: Stephen Bochinski
>         Attachments: patch.txt
>
>   Original Estimate: 101.5h
>  Remaining Estimate: 101.5h
>
> This allows the user to configure which fields should be field cached in the schema.xml file by adding the following attribute:
> fieldCached="true"
> Enabling this on a field greatly decreases the time needed to retrieve stored fields. This works on fields containing Bytes, Strings, Integers, Longs, and Floats. Enabling field cache is applicable in many scenarios. For instance, if you have a bunch of text that is indexed and not stored and you only need to retrieve a string or number associated with a document. Its applicable in any case where there are many indexed fields and not too many stored fields being retrieved. The memory consumption is not very high compared to the performance gains field cache brings.
> Memory consumption is governed by: Number of fields cached * Number of documents *  8 bytes per reference +
> SUM(Number of unique values of the field  * average length of term)
> * 2 (chars use 2 bytes) * String overhead (40 bytes)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-1961) Use Lucene's Field Cache To Retrieve Stored Fields From Memory

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/SOLR-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stephen Bochinski updated SOLR-1961:
------------------------------------

    Description:
This allows the user to configure which fields should be field cached in the schema.xml file by adding the following attribute:
fieldCached="true"

Enabling this on a field greatly decreases the time needed to retrieve stored fields. This works on fields containing Bytes, Strings, Integers, Longs, and Floats. Enabling field cache is applicable in many scenarios. For instance, if you have a bunch of text that is indexed and not stored and you only need to retrieve a string or number associated with a document. Its applicable in any case where there are many indexed fields and not too many stored fields being retrieved. The memory consumption is not very high compared to the performance gains field cache brings.

Memory consumption is governed by: Number of fields cached * Number of documents *  8 bytes per reference +
        SUM(Number of unique values of the field  * average length of term)  * 2 (chars use 2 bytes) * String overhead (40 bytes)


  was:
This allows the user to configure which fields should be field cached in the schema.xml file by adding the following attribute:
fieldCached="true"

Enabling this on a field greatly decreases the time needed to retrieve stored fields. This works on fields containing Bytes, Strings, Integers, Longs, and Floats. Enabling field cache is applicable in many scenarios. For instance, if you have a bunch of text that is indexed and not stored and you only need to retrieve a string or number associated with a document. Its applicable in any case where there are many indexed fields and not too many stored fields being retrieved. The memory consumption is not very high compared to the performance gains field cache brings.

Memory consumption is governed by: Number of fields cached * Number of documents *  8 bytes per reference +
        SUM(Number of unique values of the field  * average length of term)
        * 2 (chars use 2 bytes) * String overhead (40 bytes)



> Use Lucene's Field Cache To Retrieve Stored Fields From Memory
> --------------------------------------------------------------
>
>                 Key: SOLR-1961
>                 URL: https://issues.apache.org/jira/browse/SOLR-1961
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.4
>            Reporter: Stephen Bochinski
>         Attachments: patch.txt
>
>   Original Estimate: 101.5h
>  Remaining Estimate: 101.5h
>
> This allows the user to configure which fields should be field cached in the schema.xml file by adding the following attribute:
> fieldCached="true"
> Enabling this on a field greatly decreases the time needed to retrieve stored fields. This works on fields containing Bytes, Strings, Integers, Longs, and Floats. Enabling field cache is applicable in many scenarios. For instance, if you have a bunch of text that is indexed and not stored and you only need to retrieve a string or number associated with a document. Its applicable in any case where there are many indexed fields and not too many stored fields being retrieved. The memory consumption is not very high compared to the performance gains field cache brings.
> Memory consumption is governed by: Number of fields cached * Number of documents *  8 bytes per reference +
> SUM(Number of unique values of the field  * average length of term)  * 2 (chars use 2 bytes) * String overhead (40 bytes)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-1961) Use Lucene's Field Cache To Retrieve Stored Fields From Memory

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882909#action_12882909 ]

Otis Gospodnetic commented on SOLR-1961:
----------------------------------------

This looks useful.  The 2 new classes should have the standard ASL stuff at the very top.  I also spotted a couple of TODOs from/for somebody named Stuart?


> Use Lucene's Field Cache To Retrieve Stored Fields From Memory
> --------------------------------------------------------------
>
>                 Key: SOLR-1961
>                 URL: https://issues.apache.org/jira/browse/SOLR-1961
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.4
>            Reporter: Stephen Bochinski
>         Attachments: patch.txt
>
>   Original Estimate: 101.5h
>  Remaining Estimate: 101.5h
>
> This allows the user to configure which fields should be field cached in the schema.xml file by adding the following attribute:
> fieldCached="true"
> Enabling this on a field greatly decreases the time needed to retrieve stored fields. This works on fields containing Bytes, Strings, Integers, Longs, and Floats. Enabling field cache is applicable in many scenarios. For instance, if you have a bunch of text that is indexed and not stored and you only need to retrieve a string or number associated with a document. Its applicable in any case where there are many indexed fields and not too many stored fields being retrieved. The memory consumption is not very high compared to the performance gains field cache brings.
> Memory consumption is governed by: Number of fields cached * Number of documents *  8 bytes per reference +
> SUM(Number of unique values of the field  * average length of term)  * 2 (chars use 2 bytes) * String overhead (40 bytes)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (SOLR-1961) Use Lucene's Field Cache To Retrieve Stored Fields From Memory

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882909#action_12882909 ]

Otis Gospodnetic edited comment on SOLR-1961 at 6/26/10 11:13 PM:
------------------------------------------------------------------

This looks useful.  The 2 new classes should have the standard ASL stuff at the very top.  I also spotted a couple of TODOs from/for somebody named Stuart?

Oh, and do you happen to have any unit tests for this?

      was (Author: otis):
    This looks useful.  The 2 new classes should have the standard ASL stuff at the very top.  I also spotted a couple of TODOs from/for somebody named Stuart?

 

> Use Lucene's Field Cache To Retrieve Stored Fields From Memory
> --------------------------------------------------------------
>
>                 Key: SOLR-1961
>                 URL: https://issues.apache.org/jira/browse/SOLR-1961
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.4
>            Reporter: Stephen Bochinski
>         Attachments: patch.txt
>
>   Original Estimate: 101.5h
>  Remaining Estimate: 101.5h
>
> This allows the user to configure which fields should be field cached in the schema.xml file by adding the following attribute:
> fieldCached="true"
> Enabling this on a field greatly decreases the time needed to retrieve stored fields. This works on fields containing Bytes, Strings, Integers, Longs, and Floats. Enabling field cache is applicable in many scenarios. For instance, if you have a bunch of text that is indexed and not stored and you only need to retrieve a string or number associated with a document. Its applicable in any case where there are many indexed fields and not too many stored fields being retrieved. The memory consumption is not very high compared to the performance gains field cache brings.
> Memory consumption is governed by: Number of fields cached * Number of documents *  8 bytes per reference +
> SUM(Number of unique values of the field  * average length of term)  * 2 (chars use 2 bytes) * String overhead (40 bytes)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-1961) Use Lucene's Field Cache To Retrieve Stored Fields From Memory

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883286#action_12883286 ]

Stephen Bochinski commented on SOLR-1961:
-----------------------------------------

Forgot to remove those TODO's. I'll remove them when I update the patch as well as adding the ASL stuff.

I don't have any unit tests written so far. I will write some and include them in the updated patch.

> Use Lucene's Field Cache To Retrieve Stored Fields From Memory
> --------------------------------------------------------------
>
>                 Key: SOLR-1961
>                 URL: https://issues.apache.org/jira/browse/SOLR-1961
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.4
>            Reporter: Stephen Bochinski
>         Attachments: patch.txt
>
>   Original Estimate: 101.5h
>  Remaining Estimate: 101.5h
>
> This allows the user to configure which fields should be field cached in the schema.xml file by adding the following attribute:
> fieldCached="true"
> Enabling this on a field greatly decreases the time needed to retrieve stored fields. This works on fields containing Bytes, Strings, Integers, Longs, and Floats. Enabling field cache is applicable in many scenarios. For instance, if you have a bunch of text that is indexed and not stored and you only need to retrieve a string or number associated with a document. Its applicable in any case where there are many indexed fields and not too many stored fields being retrieved. The memory consumption is not very high compared to the performance gains field cache brings.
> Memory consumption is governed by: Number of fields cached * Number of documents *  8 bytes per reference +
> SUM(Number of unique values of the field  * average length of term)  * 2 (chars use 2 bytes) * String overhead (40 bytes)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-1961) Use Lucene's Field Cache To Retrieve Stored Fields From Memory

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/SOLR-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stephen Bochinski updated SOLR-1961:
------------------------------------

    Attachment: field_cache.patch

I've added the ASL headers. I added a unit test as well. I also removed the TODO's floating around in the code.

> Use Lucene's Field Cache To Retrieve Stored Fields From Memory
> --------------------------------------------------------------
>
>                 Key: SOLR-1961
>                 URL: https://issues.apache.org/jira/browse/SOLR-1961
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.4
>            Reporter: Stephen Bochinski
>         Attachments: field_cache.patch, patch.txt
>
>   Original Estimate: 101.5h
>  Remaining Estimate: 101.5h
>
> This allows the user to configure which fields should be field cached in the schema.xml file by adding the following attribute:
> fieldCached="true"
> Enabling this on a field greatly decreases the time needed to retrieve stored fields. This works on fields containing Bytes, Strings, Integers, Longs, and Floats. Enabling field cache is applicable in many scenarios. For instance, if you have a bunch of text that is indexed and not stored and you only need to retrieve a string or number associated with a document. Its applicable in any case where there are many indexed fields and not too many stored fields being retrieved. The memory consumption is not very high compared to the performance gains field cache brings.
> Memory consumption is governed by: Number of fields cached * Number of documents *  8 bytes per reference +
> SUM(Number of unique values of the field  * average length of term)  * 2 (chars use 2 bytes) * String overhead (40 bytes)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-1961) Use Lucene's Field Cache To Retrieve Stored Fields From Memory

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886265#action_12886265 ]

Uwe Schindler commented on SOLR-1961:
-------------------------------------

Can you remove the useless Exception catch blocks printing ex.printStackTrace() and bubble them up? Also simply printing stack traces to anywhere (stdout) is the wong thing to do. If you catch the Exceptions, pass them to log.error, so the logging framework logs them correct.

> Use Lucene's Field Cache To Retrieve Stored Fields From Memory
> --------------------------------------------------------------
>
>                 Key: SOLR-1961
>                 URL: https://issues.apache.org/jira/browse/SOLR-1961
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.4
>            Reporter: Stephen Bochinski
>         Attachments: field_cache.patch, patch.txt
>
>   Original Estimate: 101.5h
>  Remaining Estimate: 101.5h
>
> This allows the user to configure which fields should be field cached in the schema.xml file by adding the following attribute:
> fieldCached="true"
> Enabling this on a field greatly decreases the time needed to retrieve stored fields. This works on fields containing Bytes, Strings, Integers, Longs, and Floats. Enabling field cache is applicable in many scenarios. For instance, if you have a bunch of text that is indexed and not stored and you only need to retrieve a string or number associated with a document. Its applicable in any case where there are many indexed fields and not too many stored fields being retrieved. The memory consumption is not very high compared to the performance gains field cache brings.
> Memory consumption is governed by: Number of fields cached * Number of documents *  8 bytes per reference +
> SUM(Number of unique values of the field  * average length of term)  * 2 (chars use 2 bytes) * String overhead (40 bytes)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-1961) Use Lucene's Field Cache To Retrieve Stored Fields From Memory

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/SOLR-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stephen Bochinski updated SOLR-1961:
------------------------------------

    Attachment: field_cache.patch

I removed the cases where printing was being done to stdout. I also bubbled up the IOExceptions to a consolidated section. Added a little to the test case as well to test more realistic versions of fields with Strings in them.

> Use Lucene's Field Cache To Retrieve Stored Fields From Memory
> --------------------------------------------------------------
>
>                 Key: SOLR-1961
>                 URL: https://issues.apache.org/jira/browse/SOLR-1961
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.4
>            Reporter: Stephen Bochinski
>         Attachments: field_cache.patch, field_cache.patch, patch.txt
>
>   Original Estimate: 101.5h
>  Remaining Estimate: 101.5h
>
> This allows the user to configure which fields should be field cached in the schema.xml file by adding the following attribute:
> fieldCached="true"
> Enabling this on a field greatly decreases the time needed to retrieve stored fields. This works on fields containing Bytes, Strings, Integers, Longs, and Floats. Enabling field cache is applicable in many scenarios. For instance, if you have a bunch of text that is indexed and not stored and you only need to retrieve a string or number associated with a document. Its applicable in any case where there are many indexed fields and not too many stored fields being retrieved. The memory consumption is not very high compared to the performance gains field cache brings.
> Memory consumption is governed by: Number of fields cached * Number of documents *  8 bytes per reference +
> SUM(Number of unique values of the field  * average length of term)  * 2 (chars use 2 bytes) * String overhead (40 bytes)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]