[jira] Created: (SOLR-272) SolrDocument performance testing

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (SOLR-272) SolrDocument performance testing

Tim Allison (Jira)
SolrDocument performance testing
--------------------------------

                 Key: SOLR-272
                 URL: https://issues.apache.org/jira/browse/SOLR-272
             Project: Solr
          Issue Type: Test
    Affects Versions: 1.3
            Reporter: Ryan McKinley


In 1.3, we added SolrInputDocument -- a temporary class to hold document information.  There is concern that this may be less then ideal performance-wise.

To settle some concerns (mine included) I want to compare a few SolrDocument implementations to make sure we are not doing something crazy.

I implemented a LuceneInputDocument subclass of SolrInputDocument that stores its values directly in Lucene Document (rather then a Map<String,Collection>).

This is a quick test comparing:
1. Building documents with SolrInputDocument
2. Building documents with LuceneInputDocument (same interface writing directly to Document)
3. using DocumentBuilder (solr 1.2, solr 1.1)





--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-272) SolrDocument performance testing

Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley updated SOLR-272:
-------------------------------

    Attachment: SOLR-272-SolrDocumentPerformanceTesting.patch

Contains:
* LuceneInputDocument
* changed tests to use this impl (and still pass)
* a simple comparison test (far from a perfect representation)

> SolrDocument performance testing
> --------------------------------
>
>                 Key: SOLR-272
>                 URL: https://issues.apache.org/jira/browse/SOLR-272
>             Project: Solr
>          Issue Type: Test
>    Affects Versions: 1.3
>            Reporter: Ryan McKinley
>         Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch
>
>
> In 1.3, we added SolrInputDocument -- a temporary class to hold document information.  There is concern that this may be less then ideal performance-wise.
> To settle some concerns (mine included) I want to compare a few SolrDocument implementations to make sure we are not doing something crazy.
> I implemented a LuceneInputDocument subclass of SolrInputDocument that stores its values directly in Lucene Document (rather then a Map<String,Collection>).
> This is a quick test comparing:
> 1. Building documents with SolrInputDocument
> 2. Building documents with LuceneInputDocument (same interface writing directly to Document)
> 3. using DocumentBuilder (solr 1.2, solr 1.1)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-272) SolrDocument performance testing

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507725 ]

Ryan McKinley commented on SOLR-272:
------------------------------------

Running a test that creates 'n' docs each with an id,name, and  a few subjects, the results are:

[100000] SolrInputDocument:    1841
[100000] LuceneInputDocument:  4258
[100000] DocumentBuilder:      5969
[1000000] SolrInputDocument:    14727
[1000000] LuceneInputDocument:  34369
[1000000] DocumentBuilder:      51604

(running on JDK 1.6 core 2 duo 2.3ghz)

Surprisingly it looks like:

SolrInputDocument -- fastest
LuceneInputDocument - ~2x slower
DocumentBuilder - ~3x slower

I'm sure the documents I'm building aren't a good distribution of what random documents would look like - BUT, the other style documents (copy fields, things with default values, etc) are handled more easily in the already winning SolrInputDocument...



> SolrDocument performance testing
> --------------------------------
>
>                 Key: SOLR-272
>                 URL: https://issues.apache.org/jira/browse/SOLR-272
>             Project: Solr
>          Issue Type: Test
>    Affects Versions: 1.3
>            Reporter: Ryan McKinley
>         Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch
>
>
> In 1.3, we added SolrInputDocument -- a temporary class to hold document information.  There is concern that this may be less then ideal performance-wise.
> To settle some concerns (mine included) I want to compare a few SolrDocument implementations to make sure we are not doing something crazy.
> I implemented a LuceneInputDocument subclass of SolrInputDocument that stores its values directly in Lucene Document (rather then a Map<String,Collection>).
> This is a quick test comparing:
> 1. Building documents with SolrInputDocument
> 2. Building documents with LuceneInputDocument (same interface writing directly to Document)
> 3. using DocumentBuilder (solr 1.2, solr 1.1)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (SOLR-272) SolrDocument performance testing

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507725 ]

Ryan McKinley edited comment on SOLR-272 at 6/24/07 3:39 PM:
-------------------------------------------------------------

Running a test that creates 'n' docs each with an id,name, and  a few subjects, the results are:

[100000] SolrInputDocument:          1828
[100000] LuceneInputDocument:    2499
[100000] DocumentBuilder:              1746
[1000000] SolrInputDocument:        14162
[1000000] LuceneInputDocument:  19764
[1000000] DocumentBuilder:            17127

(running on JDK 1.6 core 2 duo 2.3ghz)





 was:
Running a test that creates 'n' docs each with an id,name, and  a few subjects, the results are:

[100000] SolrInputDocument:    1841
[100000] LuceneInputDocument:  4258
[100000] DocumentBuilder:      5969
[1000000] SolrInputDocument:    14727
[1000000] LuceneInputDocument:  34369
[1000000] DocumentBuilder:      51604

(running on JDK 1.6 core 2 duo 2.3ghz)

Surprisingly it looks like:

SolrInputDocument -- fastest
LuceneInputDocument - ~2x slower
DocumentBuilder - ~3x slower

I'm sure the documents I'm building aren't a good distribution of what random documents would look like - BUT, the other style documents (copy fields, things with default values, etc) are handled more easily in the already winning SolrInputDocument...



> SolrDocument performance testing
> --------------------------------
>
>                 Key: SOLR-272
>                 URL: https://issues.apache.org/jira/browse/SOLR-272
>             Project: Solr
>          Issue Type: Test
>    Affects Versions: 1.3
>            Reporter: Ryan McKinley
>         Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, SOLR-272-SolrDocumentPerformanceTesting.patch
>
>
> In 1.3, we added SolrInputDocument -- a temporary class to hold document information.  There is concern that this may be less then ideal performance-wise.
> To settle some concerns (mine included) I want to compare a few SolrDocument implementations to make sure we are not doing something crazy.
> I implemented a LuceneInputDocument subclass of SolrInputDocument that stores its values directly in Lucene Document (rather then a Map<String,Collection>).
> This is a quick test comparing:
> 1. Building documents with SolrInputDocument
> 2. Building documents with LuceneInputDocument (same interface writing directly to Document)
> 3. using DocumentBuilder (solr 1.2, solr 1.1)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-272) SolrDocument performance testing

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley updated SOLR-272:
-------------------------------

    Attachment: SOLR-272-SolrDocumentPerformanceTesting.patch

dooh.  I was not resetting the time after each run

> SolrDocument performance testing
> --------------------------------
>
>                 Key: SOLR-272
>                 URL: https://issues.apache.org/jira/browse/SOLR-272
>             Project: Solr
>          Issue Type: Test
>    Affects Versions: 1.3
>            Reporter: Ryan McKinley
>         Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, SOLR-272-SolrDocumentPerformanceTesting.patch
>
>
> In 1.3, we added SolrInputDocument -- a temporary class to hold document information.  There is concern that this may be less then ideal performance-wise.
> To settle some concerns (mine included) I want to compare a few SolrDocument implementations to make sure we are not doing something crazy.
> I implemented a LuceneInputDocument subclass of SolrInputDocument that stores its values directly in Lucene Document (rather then a Map<String,Collection>).
> This is a quick test comparing:
> 1. Building documents with SolrInputDocument
> 2. Building documents with LuceneInputDocument (same interface writing directly to Document)
> 3. using DocumentBuilder (solr 1.2, solr 1.1)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-272) SolrDocument performance testing

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507728 ]

Ryan McKinley commented on SOLR-272:
------------------------------------

running again.  this time added a dynamic field:

[100000] 2074 :: 0.02074 mili/doc :: SolrInputDocument
[100000] 2617 :: 0.02617 mili/doc :: LuceneInputDocument
[100000] 1843 :: 0.01843 mili/doc :: DocumentBuilder
[1000000] 16248 :: 0.016248 mili/doc :: SolrInputDocument
[1000000] 21946 :: 0.021946 mili/doc :: LuceneInputDocument
[1000000] 18618 :: 0.018618 mili/doc :: DocumentBuilder

For 100000, SolrInputDocument is slightly slower then DocuentBuilder, but for n=1000000 it is slightly faster.  Any thoughts on why?  

Same test, running GC every 1000 docs
if( (i%1000) == 0 ) System.gc();

[100000] 3728 :: 0.03728 mili/doc :: SolrInputDocument
[100000] 3872 :: 0.03872 mili/doc :: LuceneInputDocument
[100000] 3595 :: 0.03595 mili/doc :: DocumentBuilder
[1000000] 33843 :: 0.033843 mili/doc :: SolrInputDocument
[1000000] 39668 :: 0.039668 mili/doc :: LuceneInputDocument
[1000000] 36950 :: 0.036950 mili/doc :: DocumentBuilder

> SolrDocument performance testing
> --------------------------------
>
>                 Key: SOLR-272
>                 URL: https://issues.apache.org/jira/browse/SOLR-272
>             Project: Solr
>          Issue Type: Test
>    Affects Versions: 1.3
>            Reporter: Ryan McKinley
>         Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, SOLR-272-SolrDocumentPerformanceTesting.patch
>
>
> In 1.3, we added SolrInputDocument -- a temporary class to hold document information.  There is concern that this may be less then ideal performance-wise.
> To settle some concerns (mine included) I want to compare a few SolrDocument implementations to make sure we are not doing something crazy.
> I implemented a LuceneInputDocument subclass of SolrInputDocument that stores its values directly in Lucene Document (rather then a Map<String,Collection>).
> This is a quick test comparing:
> 1. Building documents with SolrInputDocument
> 2. Building documents with LuceneInputDocument (same interface writing directly to Document)
> 3. using DocumentBuilder (solr 1.2, solr 1.1)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-272) SolrDocument performance testing

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley updated SOLR-272:
-------------------------------

    Attachment: SolrDocumentPerformanceTester.java

Since the LuceneInputDocument is an obvious looser, I removed that from the test.  

I also:
* removed Random from the mix -- makes the tests inconsistent
* test simple and complex docs.
  > simple is just the id
  > complex is id + name + dynamic field + 10 subjects, the subjects each have a copyField to 'text'

With this test, the SolrInputDocument wins every time:  

[100000] 2043 :: 0.02043 mili/doc :: SolrInputDocument - true
[100000] 2193 :: 0.02193 mili/doc :: DocumentBuilder - true
[1000000] 15815 :: 0.015815 mili/doc :: SolrInputDocument - true
[1000000] 19223 :: 0.019223 mili/doc :: DocumentBuilder - true
[10000000] 6228 :: 0.000623 mili/doc :: SolrInputDocument - false
[10000000] 17263 :: 0.001726 mili/doc :: DocumentBuilder - false



> SolrDocument performance testing
> --------------------------------
>
>                 Key: SOLR-272
>                 URL: https://issues.apache.org/jira/browse/SOLR-272
>             Project: Solr
>          Issue Type: Test
>    Affects Versions: 1.3
>            Reporter: Ryan McKinley
>         Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, SOLR-272-SolrDocumentPerformanceTesting.patch, SolrDocumentPerformanceTester.java
>
>
> In 1.3, we added SolrInputDocument -- a temporary class to hold document information.  There is concern that this may be less then ideal performance-wise.
> To settle some concerns (mine included) I want to compare a few SolrDocument implementations to make sure we are not doing something crazy.
> I implemented a LuceneInputDocument subclass of SolrInputDocument that stores its values directly in Lucene Document (rather then a Map<String,Collection>).
> This is a quick test comparing:
> 1. Building documents with SolrInputDocument
> 2. Building documents with LuceneInputDocument (same interface writing directly to Document)
> 3. using DocumentBuilder (solr 1.2, solr 1.1)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-272) SolrDocument performance testing

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-272:
------------------------------

    Attachment: SolrInputDoc.patch

> With this test, the SolrInputDocument wins every time

Not once you correct the bugs ;-)

- copyField was not being done in the SolrInputDocument version
- setField was being used the for the multiValued field instead of addField, resulting in fewer fields.

I modified the schema (didn't work out of the box) and removed everything that didn't have to do with the fields in the document (partially because copyField wasn't implemented).

On my P4, SolrInputDocument comes in at 14% slower.... I don't know how it would be with all the copyField and dynamicField stuff in there.  There are certainly scenarios were it could be faster since it can do a single lookup for a multivalued field.



> SolrDocument performance testing
> --------------------------------
>
>                 Key: SOLR-272
>                 URL: https://issues.apache.org/jira/browse/SOLR-272
>             Project: Solr
>          Issue Type: Test
>    Affects Versions: 1.3
>            Reporter: Ryan McKinley
>         Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, SOLR-272-SolrDocumentPerformanceTesting.patch, SolrDocumentPerformanceTester.java, SolrInputDoc.patch
>
>
> In 1.3, we added SolrInputDocument -- a temporary class to hold document information.  There is concern that this may be less then ideal performance-wise.
> To settle some concerns (mine included) I want to compare a few SolrDocument implementations to make sure we are not doing something crazy.
> I implemented a LuceneInputDocument subclass of SolrInputDocument that stores its values directly in Lucene Document (rather then a Map<String,Collection>).
> This is a quick test comparing:
> 1. Building documents with SolrInputDocument
> 2. Building documents with LuceneInputDocument (same interface writing directly to Document)
> 3. using DocumentBuilder (solr 1.2, solr 1.1)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-272) SolrDocument performance testing

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508426 ]

Yonik Seeley commented on SOLR-272:
-----------------------------------

Note that my current fix to toDocument() for copyField isn't complete since the previous implementation allowed copyField from an undefined field in the schema.

It might be cleaner just to use a field that isn't indexed or stored, but that would be a slight backward incompatability.
Might be OK since I don't know if anyone has ever used that feature.  Thoughts?


> SolrDocument performance testing
> --------------------------------
>
>                 Key: SOLR-272
>                 URL: https://issues.apache.org/jira/browse/SOLR-272
>             Project: Solr
>          Issue Type: Test
>    Affects Versions: 1.3
>            Reporter: Ryan McKinley
>         Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, SOLR-272-SolrDocumentPerformanceTesting.patch, SolrDocumentPerformanceTester.java, SolrInputDoc.patch
>
>
> In 1.3, we added SolrInputDocument -- a temporary class to hold document information.  There is concern that this may be less then ideal performance-wise.
> To settle some concerns (mine included) I want to compare a few SolrDocument implementations to make sure we are not doing something crazy.
> I implemented a LuceneInputDocument subclass of SolrInputDocument that stores its values directly in Lucene Document (rather then a Map<String,Collection>).
> This is a quick test comparing:
> 1. Building documents with SolrInputDocument
> 2. Building documents with LuceneInputDocument (same interface writing directly to Document)
> 3. using DocumentBuilder (solr 1.2, solr 1.1)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-272) SolrDocument performance testing

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508429 ]

Yonik Seeley commented on SOLR-272:
-----------------------------------

Ugh... nevermind.
 I ran "svn up" on a different directory than what I patched, and hence got an older version.

> SolrDocument performance testing
> --------------------------------
>
>                 Key: SOLR-272
>                 URL: https://issues.apache.org/jira/browse/SOLR-272
>             Project: Solr
>          Issue Type: Test
>    Affects Versions: 1.3
>            Reporter: Ryan McKinley
>         Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, SOLR-272-SolrDocumentPerformanceTesting.patch, SolrDocumentPerformanceTester.java, SolrInputDoc.patch
>
>
> In 1.3, we added SolrInputDocument -- a temporary class to hold document information.  There is concern that this may be less then ideal performance-wise.
> To settle some concerns (mine included) I want to compare a few SolrDocument implementations to make sure we are not doing something crazy.
> I implemented a LuceneInputDocument subclass of SolrInputDocument that stores its values directly in Lucene Document (rather then a Map<String,Collection>).
> This is a quick test comparing:
> 1. Building documents with SolrInputDocument
> 2. Building documents with LuceneInputDocument (same interface writing directly to Document)
> 3. using DocumentBuilder (solr 1.2, solr 1.1)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-272) SolrDocument performance testing

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508432 ]

Yonik Seeley commented on SOLR-272:
-----------------------------------

OK, I'm on the correct version now.
Once I changed setField to addField, SolrInputDocument was still slower by 12%
They are both almost 5 times as slow with the default schema (all the copyField, required, and default value checking I assume).

> SolrDocument performance testing
> --------------------------------
>
>                 Key: SOLR-272
>                 URL: https://issues.apache.org/jira/browse/SOLR-272
>             Project: Solr
>          Issue Type: Test
>    Affects Versions: 1.3
>            Reporter: Ryan McKinley
>         Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, SOLR-272-SolrDocumentPerformanceTesting.patch, SolrDocumentPerformanceTester.java, SolrInputDoc.patch
>
>
> In 1.3, we added SolrInputDocument -- a temporary class to hold document information.  There is concern that this may be less then ideal performance-wise.
> To settle some concerns (mine included) I want to compare a few SolrDocument implementations to make sure we are not doing something crazy.
> I implemented a LuceneInputDocument subclass of SolrInputDocument that stores its values directly in Lucene Document (rather then a Map<String,Collection>).
> This is a quick test comparing:
> 1. Building documents with SolrInputDocument
> 2. Building documents with LuceneInputDocument (same interface writing directly to Document)
> 3. using DocumentBuilder (solr 1.2, solr 1.1)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-272) SolrDocument performance testing

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508478 ]

Ryan McKinley commented on SOLR-272:
------------------------------------

Just so we are on the same page...  Are you using SolrDocumentPerformanceTester.java with the changes from r551060 (trunk)?  

On my machine (core 2 duo) the SolrInputDocument is consistently faster.  Are we running the same tests?  



> SolrDocument performance testing
> --------------------------------
>
>                 Key: SOLR-272
>                 URL: https://issues.apache.org/jira/browse/SOLR-272
>             Project: Solr
>          Issue Type: Test
>    Affects Versions: 1.3
>            Reporter: Ryan McKinley
>         Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, SOLR-272-SolrDocumentPerformanceTesting.patch, SolrDocumentPerformanceTester.java, SolrInputDoc.patch
>
>
> In 1.3, we added SolrInputDocument -- a temporary class to hold document information.  There is concern that this may be less then ideal performance-wise.
> To settle some concerns (mine included) I want to compare a few SolrDocument implementations to make sure we are not doing something crazy.
> I implemented a LuceneInputDocument subclass of SolrInputDocument that stores its values directly in Lucene Document (rather then a Map<String,Collection>).
> This is a quick test comparing:
> 1. Building documents with SolrInputDocument
> 2. Building documents with LuceneInputDocument (same interface writing directly to Document)
> 3. using DocumentBuilder (solr 1.2, solr 1.1)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-272) SolrDocument performance testing

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-272:
------------------------------

    Attachment: SolrDocumentPerformanceTester.java

Attraching the modified test prog I used.
I modified it to accept separate counts, and do separate runs for the different implementations.
For example, 100000 0 0 and 0 0 10000
This was to avoid any GC effects from one implementation to another, and to avoid hotspot optimizing for one path and then having a different implementation switch to a different path.

The SolrInputDocument builder also needed that change from setField to addField to be equivalent.

> SolrDocument performance testing
> --------------------------------
>
>                 Key: SOLR-272
>                 URL: https://issues.apache.org/jira/browse/SOLR-272
>             Project: Solr
>          Issue Type: Test
>    Affects Versions: 1.3
>            Reporter: Ryan McKinley
>         Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, SOLR-272-SolrDocumentPerformanceTesting.patch, SolrDocumentPerformanceTester.java, SolrDocumentPerformanceTester.java, SolrInputDoc.patch
>
>
> In 1.3, we added SolrInputDocument -- a temporary class to hold document information.  There is concern that this may be less then ideal performance-wise.
> To settle some concerns (mine included) I want to compare a few SolrDocument implementations to make sure we are not doing something crazy.
> I implemented a LuceneInputDocument subclass of SolrInputDocument that stores its values directly in Lucene Document (rather then a Map<String,Collection>).
> This is a quick test comparing:
> 1. Building documents with SolrInputDocument
> 2. Building documents with LuceneInputDocument (same interface writing directly to Document)
> 3. using DocumentBuilder (solr 1.2, solr 1.1)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-272) SolrDocument performance testing

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508600 ]

Yonik Seeley commented on SOLR-272:
-----------------------------------

FYI, I tested on both P4 and Athlon with Java6 -server
Of course this is still rather academic since I don't expect this to be a bottleneck in indexing.

> SolrDocument performance testing
> --------------------------------
>
>                 Key: SOLR-272
>                 URL: https://issues.apache.org/jira/browse/SOLR-272
>             Project: Solr
>          Issue Type: Test
>    Affects Versions: 1.3
>            Reporter: Ryan McKinley
>         Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, SOLR-272-SolrDocumentPerformanceTesting.patch, SolrDocumentPerformanceTester.java, SolrDocumentPerformanceTester.java, SolrInputDoc.patch
>
>
> In 1.3, we added SolrInputDocument -- a temporary class to hold document information.  There is concern that this may be less then ideal performance-wise.
> To settle some concerns (mine included) I want to compare a few SolrDocument implementations to make sure we are not doing something crazy.
> I implemented a LuceneInputDocument subclass of SolrInputDocument that stores its values directly in Lucene Document (rather then a Map<String,Collection>).
> This is a quick test comparing:
> 1. Building documents with SolrInputDocument
> 2. Building documents with LuceneInputDocument (same interface writing directly to Document)
> 3. using DocumentBuilder (solr 1.2, solr 1.1)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-272) SolrDocument performance testing

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508603 ]

Ryan McKinley commented on SOLR-272:
------------------------------------

ok, now i'm getting the same results as you.  thanks.

> SolrDocument performance testing
> --------------------------------
>
>                 Key: SOLR-272
>                 URL: https://issues.apache.org/jira/browse/SOLR-272
>             Project: Solr
>          Issue Type: Test
>    Affects Versions: 1.3
>            Reporter: Ryan McKinley
>         Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, SOLR-272-SolrDocumentPerformanceTesting.patch, SolrDocumentPerformanceTester.java, SolrDocumentPerformanceTester.java, SolrInputDoc.patch
>
>
> In 1.3, we added SolrInputDocument -- a temporary class to hold document information.  There is concern that this may be less then ideal performance-wise.
> To settle some concerns (mine included) I want to compare a few SolrDocument implementations to make sure we are not doing something crazy.
> I implemented a LuceneInputDocument subclass of SolrInputDocument that stores its values directly in Lucene Document (rather then a Map<String,Collection>).
> This is a quick test comparing:
> 1. Building documents with SolrInputDocument
> 2. Building documents with LuceneInputDocument (same interface writing directly to Document)
> 3. using DocumentBuilder (solr 1.2, solr 1.1)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-272) SolrDocument performance testing

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508614 ]

Yonik Seeley commented on SOLR-272:
-----------------------------------

An alternate way to do SolrDocument would be to only add a Collection if there were multiple values... something along the lines of:

public class SolrDocument2 {
  private final HashMap<String,Object> _fields = new HashMap<String,Object>();

  public SolrDocument2() {
  }

  public Collection<String> getFieldNames() {
    return _fields.keySet();
  }

  public void clear() {
    _fields.clear();
  }

  public Object removeFields(String name) {
    return _fields.remove( name ) != null;
  }

  public void setField(String name, Object value) {
    _fields.put(name, value);
  }

  public void addField(String name, Object value)
  {
    Object existing = _fields.put(name, value);
    if (existing == null) return;

    if (existing instanceof Collection) {
      Collection c = (Collection)existing;
      c.add(value);
      _fields.put(name, c);
    }
  }

  /**
   * returns the first value for this field
   */
  public Object getFieldValue(String name) {
    Object v = _fields.get( name );
    if (v == null || !(v instanceof Collection)) return v;
    Collection c = (Collection)v;
    if (c.size() > 0 ) {
      return c.iterator().next();
    }
    return null;
  }

  /**
   * Get the value(s) for a given field... a Collection, or an Object
   */
  public Object getFieldValues(String name) {
    return _fields.get( name );
  }
}



> SolrDocument performance testing
> --------------------------------
>
>                 Key: SOLR-272
>                 URL: https://issues.apache.org/jira/browse/SOLR-272
>             Project: Solr
>          Issue Type: Test
>    Affects Versions: 1.3
>            Reporter: Ryan McKinley
>         Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, SOLR-272-SolrDocumentPerformanceTesting.patch, SolrDocumentPerformanceTester.java, SolrDocumentPerformanceTester.java, SolrInputDoc.patch
>
>
> In 1.3, we added SolrInputDocument -- a temporary class to hold document information.  There is concern that this may be less then ideal performance-wise.
> To settle some concerns (mine included) I want to compare a few SolrDocument implementations to make sure we are not doing something crazy.
> I implemented a LuceneInputDocument subclass of SolrInputDocument that stores its values directly in Lucene Document (rather then a Map<String,Collection>).
> This is a quick test comparing:
> 1. Building documents with SolrInputDocument
> 2. Building documents with LuceneInputDocument (same interface writing directly to Document)
> 3. using DocumentBuilder (solr 1.2, solr 1.1)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-272) SolrDocument performance testing

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley updated SOLR-272:
-------------------------------

    Attachment: SolrInputDoc.patch

This is an alternative version of SolrDocument that only creates Collections for mulitvalued fields... The one big difference to Yoniks suggestion above is that it returns a Collection<Object> for getFieldValues() even if it is a single valued field.  

Running the perf test for 1M docs 5 times for each implementation:

[1000000] SolrInputDocument:   9992  9827  9823  9854  9948  
[1000000] SolrInputDocument2:  9636   9719  9699  9807  9729
[1000000] DocumentBuilder:     8866   8818  8946  8812  8953

To be honest, I'm not sure the complexity of dealing with a Map<String,Object> (where the Object may be a collection or not) is worth the marginal speedup.  I suppose if the docs are all single valued it would be a more substantial difference.

> SolrDocument performance testing
> --------------------------------
>
>                 Key: SOLR-272
>                 URL: https://issues.apache.org/jira/browse/SOLR-272
>             Project: Solr
>          Issue Type: Test
>    Affects Versions: 1.3
>            Reporter: Ryan McKinley
>         Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, SOLR-272-SolrDocumentPerformanceTesting.patch, SolrDocumentPerformanceTester.java, SolrDocumentPerformanceTester.java, SolrInputDoc.patch, SolrInputDoc.patch
>
>
> In 1.3, we added SolrInputDocument -- a temporary class to hold document information.  There is concern that this may be less then ideal performance-wise.
> To settle some concerns (mine included) I want to compare a few SolrDocument implementations to make sure we are not doing something crazy.
> I implemented a LuceneInputDocument subclass of SolrInputDocument that stores its values directly in Lucene Document (rather then a Map<String,Collection>).
> This is a quick test comparing:
> 1. Building documents with SolrInputDocument
> 2. Building documents with LuceneInputDocument (same interface writing directly to Document)
> 3. using DocumentBuilder (solr 1.2, solr 1.1)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-272) SolrDocument performance testing

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508890 ]

Yonik Seeley commented on SOLR-272:
-----------------------------------

> To be honest, I'm not sure the complexity of dealing with a Map<String,Object> (where the Object may be a
> collection or not) is worth the marginal speedup.

I'm not sure either, but one reason the speedup is marginal is that it's not the bottleneck (other things are taking more time, like dynamic copy-field checking... I've never checked that code to see if it could be optimized, but things are quite a bit faster when all the dynamic fields are removed).

SolrInputDocument could similary be sped up by getting rid of the Map for boosts.
One could either store a bare value, or a BoostedValue.

class BoostedValue {
  float boost;
  Object value;
}



> SolrDocument performance testing
> --------------------------------
>
>                 Key: SOLR-272
>                 URL: https://issues.apache.org/jira/browse/SOLR-272
>             Project: Solr
>          Issue Type: Test
>    Affects Versions: 1.3
>            Reporter: Ryan McKinley
>         Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, SOLR-272-SolrDocumentPerformanceTesting.patch, SolrDocumentPerformanceTester.java, SolrDocumentPerformanceTester.java, SolrInputDoc.patch, SolrInputDoc.patch
>
>
> In 1.3, we added SolrInputDocument -- a temporary class to hold document information.  There is concern that this may be less then ideal performance-wise.
> To settle some concerns (mine included) I want to compare a few SolrDocument implementations to make sure we are not doing something crazy.
> I implemented a LuceneInputDocument subclass of SolrInputDocument that stores its values directly in Lucene Document (rather then a Map<String,Collection>).
> This is a quick test comparing:
> 1. Building documents with SolrInputDocument
> 2. Building documents with LuceneInputDocument (same interface writing directly to Document)
> 3. using DocumentBuilder (solr 1.2, solr 1.1)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-272) SolrDocument performance testing

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508993 ]

Yonik Seeley commented on SOLR-272:
-----------------------------------

> The one big difference to Yoniks suggestion above is that it returns a Collection<Object> for getFieldValues() even if it is a single valued field

That's a good change as it leads to simpler client code.
I think that getFieldValue() should perhaps return the raw entry (an Object or a Collection<Object>) for those (like the indexer) who would want the most efficient access.


> SolrDocument performance testing
> --------------------------------
>
>                 Key: SOLR-272
>                 URL: https://issues.apache.org/jira/browse/SOLR-272
>             Project: Solr
>          Issue Type: Test
>    Affects Versions: 1.3
>            Reporter: Ryan McKinley
>         Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, SOLR-272-SolrDocumentPerformanceTesting.patch, SolrDocumentPerformanceTester.java, SolrDocumentPerformanceTester.java, SolrInputDoc.patch, SolrInputDoc.patch
>
>
> In 1.3, we added SolrInputDocument -- a temporary class to hold document information.  There is concern that this may be less then ideal performance-wise.
> To settle some concerns (mine included) I want to compare a few SolrDocument implementations to make sure we are not doing something crazy.
> I implemented a LuceneInputDocument subclass of SolrInputDocument that stores its values directly in Lucene Document (rather then a Map<String,Collection>).
> This is a quick test comparing:
> 1. Building documents with SolrInputDocument
> 2. Building documents with LuceneInputDocument (same interface writing directly to Document)
> 3. using DocumentBuilder (solr 1.2, solr 1.1)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (SOLR-272) SolrDocument performance testing

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley resolved SOLR-272.
--------------------------------

    Resolution: Fixed

> SolrDocument performance testing
> --------------------------------
>
>                 Key: SOLR-272
>                 URL: https://issues.apache.org/jira/browse/SOLR-272
>             Project: Solr
>          Issue Type: Test
>    Affects Versions: 1.3
>            Reporter: Ryan McKinley
>         Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, SOLR-272-SolrDocumentPerformanceTesting.patch, SolrDocumentPerformanceTester.java, SolrDocumentPerformanceTester.java, SolrInputDoc.patch, SolrInputDoc.patch
>
>
> In 1.3, we added SolrInputDocument -- a temporary class to hold document information.  There is concern that this may be less then ideal performance-wise.
> To settle some concerns (mine included) I want to compare a few SolrDocument implementations to make sure we are not doing something crazy.
> I implemented a LuceneInputDocument subclass of SolrInputDocument that stores its values directly in Lucene Document (rather then a Map<String,Collection>).
> This is a quick test comparing:
> 1. Building documents with SolrInputDocument
> 2. Building documents with LuceneInputDocument (same interface writing directly to Document)
> 3. using DocumentBuilder (solr 1.2, solr 1.1)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.