Index File structure, in particular TermInfo

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Index File structure, in particular TermInfo

Wolfgang Täger
Hi,

I'm using Lucene 1.4.3 Java version.

In order to solve some particular problems, I'm trying to access the cfs
file directly from outside the Java framework.
However reading the tis file turns out to be difficult:

I tried to follow
http://lucene.apache.org/java/docs/fileformats.html 

and successfully read the first entries, but then there was a problem. I
then found in the source code (TermInfosWriter), that SkipDelta
is sometimes omitted. After fixing this problem, there apparently is still
another problem occurring after several hundred entries.
It looks like ProxDelta is missing too in these cases.

However I didn't find this in the source.

Therefore my question is whether there are exceptions from the scheme
given on the fileformats page:





1.      TermInfos --> <TermInfo>TermCount
TermInfo --> <Term, DocFreq, FreqDelta, ProxDelta, SkipDelta>
Term --> <PrefixLength, Suffix, FieldNum>



Note: I'm reading tis, not tii at the moment, but maybe this is related.

Thanks,

Wolfgang