the efficiency of creating indexes

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

the efficiency of creating indexes

治江 王
As i know, the time effciency of creating index is non-linearity with the size of documents. For example, if the size of indexes is 1G, the time cost is 2 hours, If the size of indexes is 10G, the time cost may be 30 hours. Who can tell me what is the reason? Any tips will be appreciated.


      ___________________________________________________________
  好玩贺卡等你发,邮箱贺卡全新上线!
http://card.mail.cn.yahoo.com/
Reply | Threaded
Open this post in threaded view
|

RE: the efficiency of creating indexes

Fang_Li
Did you try?
The cost of index merging grows when indexes are getting bigger.
Try to limit the max document size in a segment by setting setMaxMergeDocs in IndexWriter.

-----Original Message-----
From: 治江 王 [mailto:[hidden email]]
Sent: Monday, February 16, 2009 1:49 PM
To: [hidden email]
Subject: the efficiency of creating indexes

As i know, the time effciency of creating index is non-linearity with the size of documents. For example, if the size of indexes is 1G, the time cost is 2 hours, If the size of indexes is 10G, the time cost may be 30 hours. Who can tell me what is the reason? Any tips will be appreciated.


      ___________________________________________________________
  好玩贺卡等你发,邮箱贺卡全新上线!
http://card.mail.cn.yahoo.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: the efficiency of creating indexes

Michael McCandless-2

If not for merging, I believe indexing is simply linear.

Merging adds only a logarithmic (in total index size) cost.

Using as large an IndexWriter RAM buffer as you can will minimize the  
amount of merging.  (Also increasing mergeFactor, or decreasing  
maxMergeMB/Docs, but these will impact search performance as well).

Mike

emc.com wrote:

> Did you try?
> The cost of index merging grows when indexes are getting bigger.
> Try to limit the max document size in a segment by setting  
> setMaxMergeDocs in IndexWriter.
>
> -----Original Message-----
> From: 治江 王 [mailto:[hidden email]]
> Sent: Monday, February 16, 2009 1:49 PM
> To: [hidden email]
> Subject: the efficiency of creating indexes
>
> As i know, the time effciency of creating index is non-linearity  
> with the size of documents. For example, if the size of indexes is  
> 1G, the time cost is 2 hours, If the size of indexes is 10G, the  
> time cost may be 30 hours. Who can tell me what is the reason? Any  
> tips will be appreciated.
>
>
>      ___________________________________________________________
>  好玩贺卡等你发,邮箱贺卡全新上线!
> http://card.mail.cn.yahoo.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]