Optimise Indexing time using lucene..

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Optimise Indexing time using lucene..

lucene4varma
Hi all,

I am new to lucene and am using it for text search in my web application, and for that i need to index records in database.
We are using jdbc directory to store the indexes. Now the problem is when is start the process of indexing the records for the first time it is taking huge amount of time. Following is the code for indexing.

rs = st.executequery(); // returns 2 million records
while(rs.next()) {
    create java object .............;
    index java record into JDBC directory...;
}

The above process takes me huge amount of time for 2 million records. Approximately it is taking 3-4 business days to run the process.
Can any one please suggest me and approach by which i could cut down this time.

Thanks in advance,
Varma
Reply | Threaded
Open this post in threaded view
|

Re: Optimise Indexing time using lucene..

Mathieu Lecarme
lucene4varma a écrit :

> Hi all,
>
> I am new to lucene and am using it for text search in my web application,
> and for that i need to index records in database.
> We are using jdbc directory to store the indexes. Now the problem is when is
> start the process of indexing the records for the first time it is taking
> huge amount of time. Following is the code for indexing.
>
> rs = st.executequery(); // returns 2 million records
> while(rs.next()) {
>     create java object .............;
>     index java record into JDBC directory...;
> }
>
> The above process takes me huge amount of time for 2 million records.
> Approximately it is taking 3-4 business days to run the process.
> Can any one please suggest me and approach by which i could cut down this
> time.
>  
jdbc directory is not a good idea. It's only useful when you need
central repository.
Use large maxBufferedDocs in your IndexWriter.
With large amount of data, you'll get bottleneck : database reading,
index writing, RAM for buffered docs, maybe CPU.
If your database reading is huge, and you are hurry, you can shard the
index between multiple computer, and when it's finished, merge all the
index, with champain.

M.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]