Seeing what's occupying all the space in the index

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

RE: Seeing what's occupying all the space in the index

Rob Staveley (Tom)
Luke shows the total index size the same, and yes, it appears to list all
the files. There are 997 of them which are tough to count using that
interface with Cygwin/X.

> Also, you may want to see if you have any stale locks or the like that is
preventing you from doing an optimize.

No lock files, but optimize() is failing now with the following error:
        java.io.IOException: Cannot overwrite:
/mnt/sdb1/index/index-1/_2lhqi.fnm

Also, using Doug Cutting's suggestion, I see several compound file contents
as follows:

--------8<--------
$JAVA_HOME/bin/java -Xmx128M org.apache.lucene.index.IndexReader
~/dat/indexd/index-1/_2lhqh.cfs
_1168u.f1: 0 bytes
_1168u.f10: 0 bytes
_1168u.f11: 0 bytes
_1168u.f12: 0 bytes
_1168u.f13: 0 bytes
_1168u.f14: 0 bytes
_1168u.f15: 0 bytes
_1168u.f16: 0 bytes
_1168u.f17: 0 bytes
_1168u.f18: 0 bytes
_1168u.f19: 0 bytes
_1168u.f2: 0 bytes
_1168u.f20: 0 bytes
_1168u.f21: 0 bytes
_1168u.f22: 0 bytes
_1168u.f23: 0 bytes
_1168u.f24: 0 bytes
_1168u.f25: 0 bytes
_1168u.f26: 0 bytes
_1168u.f27: 0 bytes
_1168u.f28: 0 bytes
_1168u.f29: 0 bytes
_1168u.f3: 0 bytes
_1168u.f30: 0 bytes
_1168u.f31: 0 bytes
_1168u.f32: 0 bytes
_1168u.f33: 0 bytes
_1168u.f34: 1052419072 bytes
_1168u.f4: 0 bytes
_1168u.f5: 0 bytes
_1168u.f6: 0 bytes
_1168u.f7: 0 bytes
_1168u.f8: 0 bytes
_1168u.f9: 0 bytes
_1168u.fdt: 0 bytes
_1168u.fdx: 0 bytes
_1168u.fnm: 0 bytes
_1168u.frq: 0 bytes
_1168u.prx: 0 bytes
_1168u.tii: 0 bytes
_1168u.tis: 0 bytes
--------8<--------

That presumably isn't healthy.

-----Original Message-----
From: Grant Ingersoll [mailto:[hidden email]]
Sent: 26 May 2006 21:27
To: [hidden email]
Subject: Re: Seeing what's occupying all the space in the index

It kind of sounds like those files are corrupted, but I can't say for sure.
When you look in Luke at your index (the one with all the files, not the new
one) do you see all the documents you would expect to see with values that
seem reasonable?  Also, in Luke, you can see a listing of all the files it
thinks are in the index, do they match with what you see via a file listing
on the command line?

Also, you may want to see if you have any stale locks or the like that is
preventing you from doing an optimize.

Rob Staveley (Tom) wrote:

> Indexing 55648 documents in a new clean directory, I see only .cfs
> files (+ deletable  + segments). Disk usage is 65K for all of these,
> which means that each message takes ~1K of index space rather than >
> 10K as it does in my 99GB index.
>
> Bearing in mind that the large index has > 5 million Lucene documents
> indexed in it now, do you reckon I can merge the .fdt, .prx and .frq
> into a compound index?
>
> -----Original Message-----
> From: Grant Ingersoll [mailto:[hidden email]]
> Sent: 26 May 2006 18:38
> To: [hidden email]
> Subject: Re: Seeing what's occupying all the space in the index
>
>  
>> Can you try a smaller sample in a clean directory and see what size
>> it is
>>    
> (so that it doesn't take as long to index)?
>  
--

Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
School of Information Studies
335 Hinds Hall
Syracuse, NY 13244

http://www.cnlp.org
Voice:  315-443-5484
Fax: 315-443-6886


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

smime.p7s (5K) Download Attachment
12