Vector

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Vector

Karl Wettin-3
There are a couple of Vector:s in the code. Is it really necessary to
use this expensive thread safe artifact from the dark ages?

What I really don't get is the need for it in the index package as
IndexWriter already is synchronized. And how often do you modify the
clauses of a BooleanQuery from multiple threads?!

Even Sun recommends Collection.synchronizedList over Vector when thread
safty is an issue.

I belive that replaced with Linked- and ArrayLists it could save a whole
bunch of ticks at heavy load.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Vector

Erik Hatcher
+1

Does anyone have any numbers on the performance differences on such a  
refactoring?  I reckon it wouldn't be that hard to put together a  
reasonably representative dataset and test before/after.  Who's game?

        Erik - the new dad (again :)!


On May 6, 2006, at 2:29 AM, karl wettin wrote:

> There are a couple of Vector:s in the code. Is it really necessary to
> use this expensive thread safe artifact from the dark ages?
>
> What I really don't get is the need for it in the index package as
> IndexWriter already is synchronized. And how often do you modify the
> clauses of a BooleanQuery from multiple threads?!
>
> Even Sun recommends Collection.synchronizedList over Vector when  
> thread
> safty is an issue.
>
> I belive that replaced with Linked- and ArrayLists it could save a  
> whole
> bunch of ticks at heavy load.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Vector

Karl Wettin-3
On Sat, 2006-05-06 at 03:28 -0400, Erik Hatcher wrote:
On May 6, 2006, at 2:29 AM, karl wettin wrote:
>
> > There are a couple of Vector:s in the code. Is it really necessary to
> > use this expensive thread safe artifact from the dark ages?
>
> +1
>
> Does anyone have any numbers on the performance differences on such a  
> refactoring?  I reckon it wouldn't be that hard to put together a  
> reasonably representative dataset and test before/after.  Who's game?

I'm already at it, but in my branch. Can patch up the SVN version with
my changes. I'll leave the test to someone else :)

The question is what needs and not needs to be synchronized. I take it
nothing needs to, but I'm not sure.

> Erik - the new dad (again :)!

Congratulations!


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Vector

Erik Hatcher
On May 6, 2006, at 3:40 AM, karl wettin wrote:

> On Sat, 2006-05-06 at 03:28 -0400, Erik Hatcher wrote:
> On May 6, 2006, at 2:29 AM, karl wettin wrote:
>>
>>> There are a couple of Vector:s in the code. Is it really  
>>> necessary to
>>> use this expensive thread safe artifact from the dark ages?
>>
>> +1
>>
>> Does anyone have any numbers on the performance differences on such a
>> refactoring?  I reckon it wouldn't be that hard to put together a
>> reasonably representative dataset and test before/after.  Who's game?
>
> I'm already at it, but in my branch. Can patch up the SVN version with
> my changes. I'll leave the test to someone else :)
>
> The question is what needs and not needs to be synchronized. I take it
> nothing needs to, but I'm not sure.

Well, we used to have this hot shot committer named Brian Goetz, but  
he's too busy being an expert on synchronization and low-level Java  
details that personally make my head hurt.  Maybe he could find it an  
interesting case study to do a little nuts and bolts analysis of the  
Lucene codebase and see what tweaks make sense and just get a test  
suite going to hammer it on all our before/after scenarios.

Whatcha think, Brian?! :)

        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Vector

Yonik Seeley
In reply to this post by Karl Wettin-3
On 5/6/06, karl wettin <[hidden email]> wrote:
> There are a couple of Vector:s in the code. Is it really necessary to
> use this expensive thread safe artifact from the dark ages?

I've wondered that myself ... seeing "Vector" in the code does hurt my
eyes a little :-)
It's just one of those things that's never the highest priority I guess.

I think in many/most of these places it's unnecessary to have a
synchronized collection at all.  For examile, the one in Document for
instance will be used often:
  List fields = new Vector();

Since the reference type is actually "List" it looks like the use of a
synchronized collection is deliberate.  Can someone think why this is
needed?

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Vector

Otis Gospodnetic-2
I think most of Vector (or Hashtable) references are leftovers from the pre-Java Collections era, that's all.
I doubt we'd be able to get much juice out of move unsynchronized Java Collections, although I'd like to see them for the same reason as Yonik.

Otis

----- Original Message ----
From: Yonik Seeley <[hidden email]>
To: [hidden email]
Sent: Saturday, May 6, 2006 6:35:03 AM
Subject: Re: Vector

On 5/6/06, karl wettin <[hidden email]> wrote:
> There are a couple of Vector:s in the code. Is it really necessary to
> use this expensive thread safe artifact from the dark ages?

I've wondered that myself ... seeing "Vector" in the code does hurt my
eyes a little :-)
It's just one of those things that's never the highest priority I guess.

I think in many/most of these places it's unnecessary to have a
synchronized collection at all.  For examile, the one in Document for
instance will be used often:
  List fields = new Vector();

Since the reference type is actually "List" it looks like the use of a
synchronized collection is deliberate.  Can someone think why this is
needed?

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Vector

Karl Wettin-3
On Sat, 2006-05-06 at 08:55 -0700, Otis Gospodnetic wrote:
> I doubt we'd be able to get much juice out of move
> unsynchronized Java Collections

I might be the only one here that counts every wasted tick? :)

But it is not the synchronization I think is the big thief. A LinkedList
could do the job more efficient when the collection is used only for
iteration.

Some what off topic, but I've started looking in to porting Lucene to
J2ME (that leaves me with only pre-JCF collections and no floats). I
have absolutely no idea what to use it for, but imagine something in the
lines of distributed collaborate filtering could be fun.



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Vector

Tatu Saloranta
In reply to this post by Karl Wettin-3
--- karl wettin <[hidden email]> wrote:
...
> Even Sun recommends Collection.synchronizedList over
> Vector when thread
> safty is an issue.
>
> I belive that replaced with Linked- and ArrayLists
> it could save a whole
> bunch of ticks at heavy load.

Changing Vectors systematically to ArrayList would be
the most sensible thing to do. Except for huge Lists,
or with lots of changes to the middle, LinkedLists
have very little to offer (even iteration is likely to
be as fast or faster with ArrayList -- although I
haven't tested with micro-benchmark) over ArrayLists.
This because they add the Entry object overhead, for
memory use and access.

-+ Tatu +-



__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Vector

Murat Yakici
In reply to this post by Karl Wettin-3

We have done this two years ago, for Lucene 1.2.
The major challenge was the Floating points and a few classes.
(We used this pack http://mywebpages.comcast.net/ohommes/MathFP/)
As u said J2ME has a very limited API for data structures.

I'm not sure there is need for all the classes of Lucene to be ported to
J2ME. There will be quite a bit of an unnecessary overhead to the mobile
device.

Murat

> On Sat, 2006-05-06 at 08:55 -0700, Otis Gospodnetic wrote:
>> I doubt we'd be able to get much juice out of move
>> unsynchronized Java Collections
>
> I might be the only one here that counts every wasted tick? :)
>
> But it is not the synchronization I think is the big thief. A LinkedList
> could do the job more efficient when the collection is used only for
> iteration.
>
> Some what off topic, but I've started looking in to porting Lucene to
> J2ME (that leaves me with only pre-JCF collections and no floats). I
> have absolutely no idea what to use it for, but imagine something in the
> lines of distributed collaborate filtering could be fun.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Vector

Murat Yakici
In reply to this post by Erik Hatcher
Hi,

A few months ago, I did some benchmarking on file reading operations, not
for the collection classes, I'm affraid. I wrote a couple of lines to see
which method of reading files would perform better than the others(XP 1Gb
mem, P4 3.19Hhz, java 1.4.2). I experimented on TREC collection. The
attached file contains the avg. readings in m-secs. The same thing can be
done for Collections.

I thought this might be useful for the next releases of Lucene.

Murat

> On May 6, 2006, at 3:40 AM, karl wettin wrote:
>> On Sat, 2006-05-06 at 03:28 -0400, Erik Hatcher wrote:
>> On May 6, 2006, at 2:29 AM, karl wettin wrote:
>>>
>>>> There are a couple of Vector:s in the code. Is it really
>>>> necessary to
>>>> use this expensive thread safe artifact from the dark ages?
>>>
>>> +1
>>>
>>> Does anyone have any numbers on the performance differences on such a
>>> refactoring?  I reckon it wouldn't be that hard to put together a
>>> reasonably representative dataset and test before/after.  Who's game?
>>
>> I'm already at it, but in my branch. Can patch up the SVN version with
>> my changes. I'll leave the test to someone else :)
>>
>> The question is what needs and not needs to be synchronized. I take it
>> nothing needs to, but I'm not sure.
>
> Well, we used to have this hot shot committer named Brian Goetz, but
> he's too busy being an expert on synchronization and low-level Java
> details that personally make my head hurt.  Maybe he could find it an
> interesting case study to do a little nuts and bolts analysis of the
> Lucene codebase and see what tweaks make sense and just get a test
> suite going to hammer it on all our before/after scenarios.
>
> Whatcha think, Brian?! :)
>
> Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

AllIOperformanceBench_.xls (32K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

J2ME (was: Vector)

Karl Wettin-3
In reply to this post by Murat Yakici
On Sun, 2006-05-07 at 12:55 +0100, [hidden email] wrote:

> > Some what off topic, but I've started looking in to porting Lucene to
> > J2ME (that leaves me with only pre-JCF collections and no floats). I
> > have absolutely no idea what to use it for, but imagine something in the
> > lines of distributed collaborate filtering could be fun.
>
> We have done this two years ago, for Lucene 1.2.


I would love to take a look at the code if it is availabe.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: J2ME (was: Vector)

Murat Yakici
Hi,

That was part of a project. During the time I was working on, there were
plans to make it publicly available. I don't know what happened after I
left.

Well, in the simplest sense there are three steps to follow:
  -Remove double/float numbers, replace them by functions that would be
performed by the library I mentioned before (There is also one library
in java.net, it might be the same though).
  -Replace all non-J2ME classes, with their equivalent ones (either with
Vector/Hashtable or your own)
    As far as remember, there is no file operations in MIDP 2.0. You may
want to check that out. So you can't write an index to the file
system, cause there is no file system API! The trick is to write
Lucene index files to the DB provided (RMS).

  -Check whether the replaced class methods are performing exactly the
same operations as previously.

Having said that, I suspect the effort is not worthwhile, unless you are
in a hurry to develop a product or something (see JSR-62 on Personal
Profile, which is finished and communicator type devices such as p900/p910
9500 do have the profile). In addition, MIDP 3.0 is in progress, nobody
knows what's coming out, it's kind of unstable at the moment.

Murat,



> On Sun, 2006-05-07 at 12:55 +0100, [hidden email] wrote:
>
>> > Some what off topic, but I've started looking in to porting Lucene to
>> > J2ME (that leaves me with only pre-JCF collections and no floats). I
>> > have absolutely no idea what to use it for, but imagine something in
>> the
>> > lines of distributed collaborate filtering could be fun.
>>
>> We have done this two years ago, for Lucene 1.2.
>
>
> I would love to take a look at the code if it is availabe.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Vector

Karl Wettin-3
In reply to this post by Karl Wettin-3
On Sat, 2006-05-06 at 09:40 +0200, karl wettin wrote:
>
> There are a couple of Vector:s in the code. Is it really
> necessary to use this expensive thread safe artifact from the dark
> ages?

> The question is what needs and not needs to be synchronized. I take it
> nothing needs to, but I'm not sure.

Did anybody know what needs to be synchronized and what does not need to
be synchronized? Should I summarize the uses and post it here for
discussion?


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Vector

Yonik Seeley
On 5/9/06, karl wettin <[hidden email]> wrote:
> Did anybody know what needs to be synchronized and what does not need to
> be synchronized?

Needs to be considered on a case-by-case basis IMO.

>Should I summarize the uses and post it here for
> discussion?

Sure!

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: J2ME

DM Smith
In reply to this post by Karl Wettin-3
I have an application I'd like to move to J2ME which uses lucene for
creating and searching indexes. I can get by with the capabilities of
search.

karl wettin wrote:

> On Sun, 2006-05-07 at 12:55 +0100, [hidden email] wrote:
>
>  
>>> Some what off topic, but I've started looking in to porting Lucene to
>>> J2ME (that leaves me with only pre-JCF collections and no floats). I
>>> have absolutely no idea what to use it for, but imagine something in the
>>> lines of distributed collaborate filtering could be fun.
>>>      
>> We have done this two years ago, for Lucene 1.2.
>>    
>
>
> I would love to take a look at the code if it is availabe.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>  

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Vector

Karl Wettin-3
In reply to this post by Yonik Seeley
On Tue, 2006-05-09 at 13:21 -0400, Yonik Seeley wrote:
> > Did anybody know what needs to be synchronized and what
> > does not need to
> > be synchronized?
>
> Needs to be considered on a case-by-case basis IMO.
>
> > Should I summarize the uses and post it here for
> > discussion?

> Sure!

Here we go.
I think it it safe to go ArrayList on all but RAMFile.
The IndexWriter is already synchronized, right?


StandardTokenizer:
  private java.util.Vector jj_expentries


Document:
  List fields = new Vector();


IndexWriter:
  addIndexes(IndexReader[])
    Vector segmentsToDelete
  deleteFiles(Vector)
    Vector deletable
  deleteSegments(Vector)
    Vector deletable
  mergeSegments(int, int)
    Vector segmentsToDelete
  readDeleteableFiles()
    Vector results


SegmentMerger:
  private Vector readers

 
TermVectorWriter:
  terms = new Vector();


MultiFieldQueryParser:
  get(AnyKindOf)Query
    Vector clauses


QueryParser:
  private java.util.Vector jj_expentries
  Query(String)
    Vector caluses
  getFieldQuery(String, String)
    Vector v; // tokens from stream
 
 
BooleanQuery:
  private Vector clauses


BooleanQuery.BooleanWeight:
  private Vector weights


Hits:
  private Vector hitDocs


MultiPhraseQuery:
  private Vector positions


PhraseQuery:
  private Vector positions
  private Vector terms


RAMFile:
  private Vector buffers //





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Vector

Karl Wettin-3
On Fri, 2006-05-12 at 16:01 +0200, karl wettin wrote:
> > Needs to be considered on a case-by-case basis IMO.
> >
> > > Should I summarize the uses and post it here for
> > > discussion?

In lack of other the view from someone that actually knows for sure,
here are my comments.


> StandardTokenizer:
>   private java.util.Vector jj_expentries

If the standard analyzer is not thread safe, does this make sense? Looks
genereted in my eyes. Did not look at the code.

>
> Document:
>   List fields = new Vector();

ArrayList. People don't add fields from multiple threads?

> IndexWriter:
>   addIndexes(IndexReader[])
>     Vector segmentsToDelete
>   deleteFiles(Vector)
>     Vector deletable
>   deleteSegments(Vector)
>     Vector deletable
>   mergeSegments(int, int)
>     Vector segmentsToDelete
>   readDeleteableFiles()
>     Vector results

Located in methods. Vector makes no sense at all?

>
> SegmentMerger:
>   private Vector readers

The IndexWriter keeps this instance synchronized?

>  
> TermVectorWriter:
>   terms = new Vector();

The IndexWriter keeps this instance synchronized?

>
> MultiFieldQueryParser:
>   get(AnyKindOf)Query
>     Vector clauses


In methods. Vector makes no sense? Query parser is not thread safe, does
this have to be?

>
> QueryParser:
>   private java.util.Vector jj_expentries
>   Query(String)
>     Vector caluses
>   getFieldQuery(String, String)
>     Vector v; // tokens from stream


In methods. Vector makes no sense? Query parser is not thread safe, does
this have to be?
 
> BooleanQuery:
>   private Vector clauses

People don't build a query from multiple threads?

>
> BooleanQuery.BooleanWeight:
>   private Vector weights

People don't build a query from multiple threads?

> Hits:
>   private Vector hitDocs

Parallel hits collection might require this?

> MultiPhraseQuery:
>   private Vector positions

Unsure.

> PhraseQuery:
>   private Vector positions
>   private Vector terms

People don't build a query from multiple threads?


> RAMFile:
>   private Vector buffers

The IndexWriter keeps this instance synchronized?



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Vector

Karl Wettin-3
On Sat, 2006-05-13 at 01:38 +0200, karl wettin wrote:
> here are my comments.

It seems to run fine on Lists. But I have only been running it on an
issue 550 index, so I have no idea how the Directory implementation
index will take it.

No test case, no nothing. Just tried it in live environment for a couple
of minutes.

>
>
> > StandardTokenizer:
> >   private java.util.Vector jj_expentries
>
> If the standard analyzer is not thread safe, does this make sense? Looks
> genereted in my eyes. Did not look at the code.
>
> >
> > Document:
> >   List fields = new Vector();
>
> ArrayList. People don't add fields from multiple threads?
>
> > IndexWriter:
> >   addIndexes(IndexReader[])
> >     Vector segmentsToDelete
> >   deleteFiles(Vector)
> >     Vector deletable
> >   deleteSegments(Vector)
> >     Vector deletable
> >   mergeSegments(int, int)
> >     Vector segmentsToDelete
> >   readDeleteableFiles()
> >     Vector results
>
> Located in methods. Vector makes no sense at all?
>
> >
> > SegmentMerger:
> >   private Vector readers
>
> The IndexWriter keeps this instance synchronized?
>
> >  
> > TermVectorWriter:
> >   terms = new Vector();
>
> The IndexWriter keeps this instance synchronized?
>
> >
> > MultiFieldQueryParser:
> >   get(AnyKindOf)Query
> >     Vector clauses
>
>
> In methods. Vector makes no sense? Query parser is not thread safe, does
> this have to be?
>
> >
> > QueryParser:
> >   private java.util.Vector jj_expentries
> >   Query(String)
> >     Vector caluses
> >   getFieldQuery(String, String)
> >     Vector v; // tokens from stream
>
>
> In methods. Vector makes no sense? Query parser is not thread safe, does
> this have to be?
>  
> > BooleanQuery:
> >   private Vector clauses
>
> People don't build a query from multiple threads?
>
> >
> > BooleanQuery.BooleanWeight:
> >   private Vector weights
>
> People don't build a query from multiple threads?
>
> > Hits:
> >   private Vector hitDocs
>
> Parallel hits collection might require this?
>
> > MultiPhraseQuery:
> >   private Vector positions
>
> Unsure.
>
> > PhraseQuery:
> >   private Vector positions
> >   private Vector terms
>
> People don't build a query from multiple threads?
>
>
> > RAMFile:
> >   private Vector buffers
>
> The IndexWriter keeps this instance synchronized?
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]