Lucene File Formats web page

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Lucene File Formats web page

Ivan Vasilev-2
Hi Guys,

In the File Formats web page
(http://lucene.apache.org/java/2_3_0/fileformats.html) there is section
describing Segments File, where we read:

Segments --> Format, Version, NameCounter, ...
...
Format is -1 as of Lucene 1.4 and -3
(SemgentInfos.FORMAT_SINGLE_NORM_FILE) as of Lucene 2.1.
...

On my opinion the last sentence is not completely right. I mean it
should contain addition somthing like this:
1) "and -4 (SemgentInfos.FORMAT_SHARED_DOC_STORE) as of Lucene 2.3."
or like this:
2) "and as of Lucene 2.3 -4 (SemgentInfos.FORMAT_SHARED_DOC_STORE) in
cases when shared stored fields and term vectors exist or -3
(SemgentInfos.FORMAT_SINGLE_NORM_FILE) when no sharing exists."

So my question is what of the two suggestions is correct?
According to my tests with our IndexRecoverer tool the 1) is correct.
Here I will have to mention what does this tool. It creates segments
file out of given set of segments. I did not made a lot of tests yet but
in one single case when I created segments file for a segments set that
does not contain any shared data when I use
Format=SemgentInfos.FORMAT_SINGLE_NORM_FILE Luke can not open it using
Lucene 2.3, but when Format=SemgentInfos.FORMAT_SHARED_DOC_STORE it
opens it correctly.

Best Regards,
Ivan


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene File Formats web page

Michael McCandless-2

Woops, you are correct, the file formats doc is out of date.

It should be #1.

I'll fix it -- thank you for raising it!

Mike

Ivan Vasilev wrote:

> Hi Guys,
>
> In the File Formats web page (http://lucene.apache.org/java/2_3_0/ 
> fileformats.html) there is section describing Segments File, where  
> we read:
>
> Segments --> Format, Version, NameCounter, ...
> ...
> Format is -1 as of Lucene 1.4 and -3  
> (SemgentInfos.FORMAT_SINGLE_NORM_FILE) as of Lucene 2.1.
> ...
>
> On my opinion the last sentence is not completely right. I mean it  
> should contain addition somthing like this:
> 1) "and -4 (SemgentInfos.FORMAT_SHARED_DOC_STORE) as of Lucene 2.3."
> or like this:
> 2) "and as of Lucene 2.3 -4 (SemgentInfos.FORMAT_SHARED_DOC_STORE)  
> in cases when shared stored fields and term vectors exist or -3  
> (SemgentInfos.FORMAT_SINGLE_NORM_FILE) when no sharing exists."
>
> So my question is what of the two suggestions is correct?
> According to my tests with our IndexRecoverer tool the 1) is  
> correct. Here I will have to mention what does this tool. It  
> creates segments file out of given set of segments. I did not made  
> a lot of tests yet but in one single case when I created segments  
> file for a segments set that does not contain any shared data when  
> I use Format=SemgentInfos.FORMAT_SINGLE_NORM_FILE Luke can not open  
> it using Lucene 2.3, but when  
> Format=SemgentInfos.FORMAT_SHARED_DOC_STORE it opens it correctly.
>
> Best Regards,
> Ivan
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]