[jira] Created: (LUCENE-671) Hashtable based Document

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (LUCENE-671) Hashtable based Document

Sebastian Nagel (Jira)
Hashtable based Document
------------------------

                 Key: LUCENE-671
                 URL: http://issues.apache.org/jira/browse/LUCENE-671
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Index, Search
    Affects Versions: 2.0.0, 1.9
            Reporter: Chris
            Priority: Minor


I've attached a Document based on a hashtable and a performance test case. It performs better in most cases (all but enumeration by my measurement), but likely uses a larger memory footprint. The Document testcase will fail since it accesses the "fields" variable directly and gets confused when it's not the list it expected it to be.

If nothing else we would be interested in at least being able to extend Document, which is currently declared final. (Anyone know the performance gains on declaring a class final?) Currently we have to maintain a copy of lucene which has methods and classes definalized and overriden.

There are other classes as well that could be declared non-final (Fieldable comes to mind) since it's possible to make changes for project specific situations in those aswell but that's off-topic.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-671) Hashtable based Document

Sebastian Nagel (Jira)
     [ http://issues.apache.org/jira/browse/LUCENE-671?page=all ]

Chris updated LUCENE-671:
-------------------------

    Attachment: HashDocument.java

> Hashtable based Document
> ------------------------
>
>                 Key: LUCENE-671
>                 URL: http://issues.apache.org/jira/browse/LUCENE-671
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index, Search
>    Affects Versions: 2.0.0, 1.9
>            Reporter: Chris
>            Priority: Minor
>         Attachments: HashDocument.java, TestBenchDocuments.java
>
>
> I've attached a Document based on a hashtable and a performance test case. It performs better in most cases (all but enumeration by my measurement), but likely uses a larger memory footprint. The Document testcase will fail since it accesses the "fields" variable directly and gets confused when it's not the list it expected it to be.
> If nothing else we would be interested in at least being able to extend Document, which is currently declared final. (Anyone know the performance gains on declaring a class final?) Currently we have to maintain a copy of lucene which has methods and classes definalized and overriden.
> There are other classes as well that could be declared non-final (Fieldable comes to mind) since it's possible to make changes for project specific situations in those aswell but that's off-topic.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-671) Hashtable based Document

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)
     [ http://issues.apache.org/jira/browse/LUCENE-671?page=all ]

Chris updated LUCENE-671:
-------------------------

    Attachment: TestBenchDocuments.java

> Hashtable based Document
> ------------------------
>
>                 Key: LUCENE-671
>                 URL: http://issues.apache.org/jira/browse/LUCENE-671
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index, Search
>    Affects Versions: 2.0.0, 1.9
>            Reporter: Chris
>            Priority: Minor
>         Attachments: HashDocument.java, TestBenchDocuments.java
>
>
> I've attached a Document based on a hashtable and a performance test case. It performs better in most cases (all but enumeration by my measurement), but likely uses a larger memory footprint. The Document testcase will fail since it accesses the "fields" variable directly and gets confused when it's not the list it expected it to be.
> If nothing else we would be interested in at least being able to extend Document, which is currently declared final. (Anyone know the performance gains on declaring a class final?) Currently we have to maintain a copy of lucene which has methods and classes definalized and overriden.
> There are other classes as well that could be declared non-final (Fieldable comes to mind) since it's possible to make changes for project specific situations in those aswell but that's off-topic.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (LUCENE-671) Hashtable based Document

Grant Ingersoll
In reply to this post by Sebastian Nagel (Jira)
FYI: Fieldable is an interface.  Field is indeed final, but  
AbstractField implements most everything in Fieldable and is not  
final, so if you want to customize, just be aware of the small issues  
with Document.getFields() method versus Document.getFieldables()

-Grant


On Sep 14, 2006, at 11:19 AM, Chris (JIRA) wrote:

> Hashtable based Document
> ------------------------
>
>                  Key: LUCENE-671
>                  URL: http://issues.apache.org/jira/browse/LUCENE-671
>              Project: Lucene - Java
>           Issue Type: Improvement
>           Components: Index, Search
>     Affects Versions: 2.0.0, 1.9
>             Reporter: Chris
>             Priority: Minor
>
>
> I've attached a Document based on a hashtable and a performance  
> test case. It performs better in most cases (all but enumeration by  
> my measurement), but likely uses a larger memory footprint. The  
> Document testcase will fail since it accesses the "fields" variable  
> directly and gets confused when it's not the list it expected it to  
> be.
>
> If nothing else we would be interested in at least being able to  
> extend Document, which is currently declared final. (Anyone know  
> the performance gains on declaring a class final?) Currently we  
> have to maintain a copy of lucene which has methods and classes  
> definalized and overriden.
>
> There are other classes as well that could be declared non-final  
> (Fieldable comes to mind) since it's possible to make changes for  
> project specific situations in those aswell but that's off-topic.
>
> --
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the  
> administrators: http://issues.apache.org/jira/secure/ 
> Administrators.jspa
> -
> For more information on JIRA, see: http://www.atlassian.com/ 
> software/jira
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

--------------------------
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org

Voice: 315-443-5484
Skype: grant_ingersoll
Fax: 315-443-6886




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (LUCENE-671) Hashtable based Document

Mike Klaas
In reply to this post by Sebastian Nagel (Jira)
On 9/14/06, Chris (JIRA) <[hidden email]> wrote:
> If nothing else we would be interested in at least being able to extend Document, which is currently declared final. (Anyone know the performance gains on declaring a class final?)

According to this, not much:
http://www-128.ibm.com/developerworks/java/library/j-jtp1029.html

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-671) Hashtable based Document

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)
    [ http://issues.apache.org/jira/browse/LUCENE-671?page=comments#action_12434769 ]
           
Chris commented on LUCENE-671:
------------------------------

After some digging: http://www-128.ibm.com/developerworks/java/library/j-jtp1029.html

If these classes are declared final for performance, it might be worth reconsidering. I know of at least one other development group that has to maintain their own lucene tree for the same reason. (Both of us have had to make changes in FieldsWriter to store extra information about the field)

re: (Fieldable comes to mind)
Yup I meant field, and I'll look into Abstract Field, thanks Mike.

> Hashtable based Document
> ------------------------
>
>                 Key: LUCENE-671
>                 URL: http://issues.apache.org/jira/browse/LUCENE-671
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index, Search
>    Affects Versions: 2.0.0, 1.9
>            Reporter: Chris
>            Priority: Minor
>         Attachments: HashDocument.java, TestBenchDocuments.java
>
>
> I've attached a Document based on a hashtable and a performance test case. It performs better in most cases (all but enumeration by my measurement), but likely uses a larger memory footprint. The Document testcase will fail since it accesses the "fields" variable directly and gets confused when it's not the list it expected it to be.
> If nothing else we would be interested in at least being able to extend Document, which is currently declared final. (Anyone know the performance gains on declaring a class final?) Currently we have to maintain a copy of lucene which has methods and classes definalized and overriden.
> There are other classes as well that could be declared non-final (Fieldable comes to mind) since it's possible to make changes for project specific situations in those aswell but that's off-topic.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-671) Hashtable based Document

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)
    [ http://issues.apache.org/jira/browse/LUCENE-671?page=comments#action_12434782 ]
           
Doug Cutting commented on LUCENE-671:
-------------------------------------

The final declaration is not for performance.  It is to keep folks from thinking, if they subclass Document, that instances of their subclass will be returned to them in search results.  To make Documents fully-subclassible one would need to make their serialization extensible.

> Hashtable based Document
> ------------------------
>
>                 Key: LUCENE-671
>                 URL: http://issues.apache.org/jira/browse/LUCENE-671
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index, Search
>    Affects Versions: 2.0.0, 1.9
>            Reporter: Chris
>            Priority: Minor
>         Attachments: HashDocument.java, TestBenchDocuments.java
>
>
> I've attached a Document based on a hashtable and a performance test case. It performs better in most cases (all but enumeration by my measurement), but likely uses a larger memory footprint. The Document testcase will fail since it accesses the "fields" variable directly and gets confused when it's not the list it expected it to be.
> If nothing else we would be interested in at least being able to extend Document, which is currently declared final. (Anyone know the performance gains on declaring a class final?) Currently we have to maintain a copy of lucene which has methods and classes definalized and overriden.
> There are other classes as well that could be declared non-final (Fieldable comes to mind) since it's possible to make changes for project specific situations in those aswell but that's off-topic.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-671) Hashtable based Document

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)
    [ http://issues.apache.org/jira/browse/LUCENE-671?page=comments#action_12434791 ]
           
Karl Wettin commented on LUCENE-671:
------------------------------------


Cutting:
> To make Documents fully-subclassible one would need to make their serialization extensible.

I find this a bit strange considering RAMDirectory was not made serializable until a few months ago.. But then it might just have been something preemptive. Or perhaps people serialize documents without adding them to the index? That too sounds quite fishy.

I'm all for definalizing Term and Document as this is something required for my issue 550 index.

> Hashtable based Document
> ------------------------
>
>                 Key: LUCENE-671
>                 URL: http://issues.apache.org/jira/browse/LUCENE-671
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index, Search
>    Affects Versions: 2.0.0, 1.9
>            Reporter: Chris
>            Priority: Minor
>         Attachments: HashDocument.java, TestBenchDocuments.java
>
>
> I've attached a Document based on a hashtable and a performance test case. It performs better in most cases (all but enumeration by my measurement), but likely uses a larger memory footprint. The Document testcase will fail since it accesses the "fields" variable directly and gets confused when it's not the list it expected it to be.
> If nothing else we would be interested in at least being able to extend Document, which is currently declared final. (Anyone know the performance gains on declaring a class final?) Currently we have to maintain a copy of lucene which has methods and classes definalized and overriden.
> There are other classes as well that could be declared non-final (Fieldable comes to mind) since it's possible to make changes for project specific situations in those aswell but that's off-topic.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (LUCENE-671) Hashtable based Document

Robert Engels
Doug is not talking about java serialization, he is talking about  
general serialization used to store a Document in an Index.

On Sep 14, 2006, at 3:00 PM, Karl Wettin (JIRA) wrote:

>     [ http://issues.apache.org/jira/browse/LUCENE-671?
> page=comments#action_12434791 ]
>
> Karl Wettin commented on LUCENE-671:
> ------------------------------------
>
>
> Cutting:
>> To make Documents fully-subclassible one would need to make their  
>> serialization extensible.
>
> I find this a bit strange considering RAMDirectory was not made  
> serializable until a few months ago.. But then it might just have  
> been something preemptive. Or perhaps people serialize documents  
> without adding them to the index? That too sounds quite fishy.
>
> I'm all for definalizing Term and Document as this is something  
> required for my issue 550 index.
>
>> Hashtable based Document
>> ------------------------
>>
>>                 Key: LUCENE-671
>>                 URL: http://issues.apache.org/jira/browse/LUCENE-671
>>             Project: Lucene - Java
>>          Issue Type: Improvement
>>          Components: Index, Search
>>    Affects Versions: 2.0.0, 1.9
>>            Reporter: Chris
>>            Priority: Minor
>>         Attachments: HashDocument.java, TestBenchDocuments.java
>>
>>
>> I've attached a Document based on a hashtable and a performance  
>> test case. It performs better in most cases (all but enumeration  
>> by my measurement), but likely uses a larger memory footprint. The  
>> Document testcase will fail since it accesses the "fields"  
>> variable directly and gets confused when it's not the list it  
>> expected it to be.
>> If nothing else we would be interested in at least being able to  
>> extend Document, which is currently declared final. (Anyone know  
>> the performance gains on declaring a class final?) Currently we  
>> have to maintain a copy of lucene which has methods and classes  
>> definalized and overriden.
>> There are other classes as well that could be declared non-final  
>> (Fieldable comes to mind) since it's possible to make changes for  
>> project specific situations in those aswell but that's off-topic.
>
> --
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the  
> administrators: http://issues.apache.org/jira/secure/ 
> Administrators.jspa
> -
> For more information on JIRA, see: http://www.atlassian.com/ 
> software/jira
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (LUCENE-671) Hashtable based Document

Karl Wettin
On Thu, 2006-09-14 at 15:13 -0500, robert engels wrote:
>
> Doug is not talking about java serialization, he is talking about  
> general serialization used to store a Document in an Index.

Ahh, I see. That makes much more sense.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (LUCENE-671) Hashtable based Document

Simon Willnauer
In reply to this post by Mike Klaas
> > If nothing else we would be interested in at least being able to extend Document, which is currently declared final. (Anyone know the performance gains on declaring a class final?)


Some of the classes are declared final due to performance issues in
previous vm versions < 1.4 but not all of them. Document for instance
has been declared final as is described here:
http://mail-archives.apache.org/mod_mbox/lucene-java-user/200204.mbox/%3c20020411181942.32093.qmail@...%3e

I can remember other discussion about this topic but I can find them
in the list arch. anymore.

just my 2 cent.


simon

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-671) Hashtable based Document

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)
    [ http://issues.apache.org/jira/browse/LUCENE-671?page=comments#action_12434958 ]
           
Chris commented on LUCENE-671:
------------------------------

> It is to keep folks from thinking, if they subclass Document, that instances of their subclass will be returned to them in search results. To make Documents fully-subclassible one would need to make their serialization extensible.

Ahhh, that makes sense to me, and I think providing a method for informing the rest of lucene which versions of various classes to use is probably more trouble than it's worth. We'll just maintain our own tree then.

Thanks



> Hashtable based Document
> ------------------------
>
>                 Key: LUCENE-671
>                 URL: http://issues.apache.org/jira/browse/LUCENE-671
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index, Search
>    Affects Versions: 2.0.0, 1.9
>            Reporter: Chris
>            Priority: Minor
>         Attachments: HashDocument.java, TestBenchDocuments.java
>
>
> I've attached a Document based on a hashtable and a performance test case. It performs better in most cases (all but enumeration by my measurement), but likely uses a larger memory footprint. The Document testcase will fail since it accesses the "fields" variable directly and gets confused when it's not the list it expected it to be.
> If nothing else we would be interested in at least being able to extend Document, which is currently declared final. (Anyone know the performance gains on declaring a class final?) Currently we have to maintain a copy of lucene which has methods and classes definalized and overriden.
> There are other classes as well that could be declared non-final (Fieldable comes to mind) since it's possible to make changes for project specific situations in those aswell but that's off-topic.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (LUCENE-671) Hashtable based Document

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll resolved LUCENE-671.
------------------------------------

    Resolution: Won't Fix

> Hashtable based Document
> ------------------------
>
>                 Key: LUCENE-671
>                 URL: https://issues.apache.org/jira/browse/LUCENE-671
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index, Search
>    Affects Versions: 1.9, 2.0.0
>            Reporter: Chris
>            Priority: Minor
>         Attachments: HashDocument.java, TestBenchDocuments.java
>
>
> I've attached a Document based on a hashtable and a performance test case. It performs better in most cases (all but enumeration by my measurement), but likely uses a larger memory footprint. The Document testcase will fail since it accesses the "fields" variable directly and gets confused when it's not the list it expected it to be.
> If nothing else we would be interested in at least being able to extend Document, which is currently declared final. (Anyone know the performance gains on declaring a class final?) Currently we have to maintain a copy of lucene which has methods and classes definalized and overriden.
> There are other classes as well that could be declared non-final (Fieldable comes to mind) since it's possible to make changes for project specific situations in those aswell but that's off-topic.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]