Lucene Indexing

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Lucene Indexing

Sairaj Sunil
Hi all,
Can you tell me the exact indexing algorithm used by Lucene. or give some
links to the documents that describe the algorithm used by lucene
Thanks in advance
--
Sairaj Sunil
Reply | Threaded
Open this post in threaded view
|

Re: Lucene Indexing

Rajiv2
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html


On 1/24/07, Sairaj Sunil <[hidden email]> wrote:
>
> Hi all,
> Can you tell me the exact indexing algorithm used by Lucene. or give some
> links to the documents that describe the algorithm used by lucene
> Thanks in advance
> --
> Sairaj Sunil
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Lucene Indexing

Sairaj Sunil
Hi
I was asking what exactly is the inverted indexing strategy used for storing
the index. Is it batch-based index/b-tree based/segment-based data structure
that is used as an index data structure.


On 1/25/07, Rajiv Roopan <[hidden email]> wrote:

>
>
> http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html
>
>
> On 1/24/07, Sairaj Sunil <[hidden email]> wrote:
> >
> > Hi all,
> > Can you tell me the exact indexing algorithm used by Lucene. or give
> some
> > links to the documents that describe the algorithm used by lucene
> > Thanks in advance
> > --
> > Sairaj Sunil
> >
> >
>
>


--
Sairaj Sunil
Reply | Threaded
Open this post in threaded view
|

RE: Lucene Indexing

Damien McCarthy
This document should contain the information you need :

http://lucene.sourceforge.net/talks/inktomi/

Damien.
-----Original Message-----
From: Sairaj Sunil [mailto:[hidden email]]
Sent: 26 January 2007 03:22
To: [hidden email]
Subject: Re: Lucene Indexing

Hi
I was asking what exactly is the inverted indexing strategy used for storing
the index. Is it batch-based index/b-tree based/segment-based data structure
that is used as an index data structure.


On 1/25/07, Rajiv Roopan <[hidden email]> wrote:
>
>
>
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.h
tml

>
>
> On 1/24/07, Sairaj Sunil <[hidden email]> wrote:
> >
> > Hi all,
> > Can you tell me the exact indexing algorithm used by Lucene. or give
> some
> > links to the documents that describe the algorithm used by lucene
> > Thanks in advance
> > --
> > Sairaj Sunil
> >
> >
>
>


--
Sairaj Sunil


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene Indexing

Sairaj Sunil
In reply to this post by Sairaj Sunil
I went through that document. It mentions about the Lucene's Indexing
algorithm that it uses incremental algorithm. So, can i say that it uses a
combination of segment-based and b-tree based strategies. If i am wrong
please correct me.

On 1/26/07, Damien McCarthy <[hidden email]> wrote:

>
> This document should contain the information you need :
>
> http://lucene.sourceforge.net/talks/inktomi/
>
> Damien.
> -----Original Message-----
> From: Sairaj Sunil [mailto:[hidden email]]
> Sent: 26 January 2007 03:22
> To: [hidden email]
> Subject: Re: Lucene Indexing
>
> Hi
> I was asking what exactly is the inverted indexing strategy used for
> storing
> the index. Is it batch-based index/b-tree based/segment-based data
> structure
> that is used as an index data structure.
>
>
> On 1/25/07, Rajiv Roopan <[hidden email]> wrote:
> >
> >
> >
>
> http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.h
> tml
> >
> >
> > On 1/24/07, Sairaj Sunil <[hidden email]> wrote:
> > >
> > > Hi all,
> > > Can you tell me the exact indexing algorithm used by Lucene. or give
> > some
> > > links to the documents that describe the algorithm used by lucene
> > > Thanks in advance
> > > --
> > > Sairaj Sunil
> > >
> > >
> >
> >
>
>
> --
> Sairaj Sunil
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Sairaj Sunil
II Mtech(CS)
SSSIHL
Prashanthi Nilayam
Reply | Threaded
Open this post in threaded view
|

Re: Lucene Indexing

Grant Ingersoll-2
I don't believe there is any b-tree strategy in Lucene.  I would say  
that it is segment based, I guess, in that it indexes documents in  
memory based on your merge factors and then flushes to disk, at then  
end you can choose to merge the segments together via optimize().  I  
find it to have a structure similar to that described in section 8.2  
of "Modern Information Retrieval" by Baeza-Yates, et. al with a fair  
number of improvements for storing terms, positions and frequency  
info, etc. in compact form.

References you might find useful:
http://lucene.apache.org/java/docs/fileformats.html

If you feel like helping w/ this part of the docs, I would love some  
help on https://issues.apache.org/jira/browse/LUCENE-765

Probably the best way to know at this point is to trace through the  
code.

-Grant

On Jan 26, 2007, at 5:11 AM, Sairaj Sunil wrote:

> I went through that document. It mentions about the Lucene's Indexing
> algorithm that it uses incremental algorithm. So, can i say that it  
> uses a
> combination of segment-based and b-tree based strategies. If i am  
> wrong
> please correct me.
>
> On 1/26/07, Damien McCarthy <[hidden email]> wrote:
>>
>> This document should contain the information you need :
>>
>> http://lucene.sourceforge.net/talks/inktomi/
>>
>> Damien.
>> -----Original Message-----
>> From: Sairaj Sunil [mailto:[hidden email]]
>> Sent: 26 January 2007 03:22
>> To: [hidden email]
>> Subject: Re: Lucene Indexing
>>
>> Hi
>> I was asking what exactly is the inverted indexing strategy used for
>> storing
>> the index. Is it batch-based index/b-tree based/segment-based data
>> structure
>> that is used as an index data structure.
>>
>>
>> On 1/25/07, Rajiv Roopan <[hidden email]> wrote:
>> >
>> >
>> >
>>
>> http://lucene.apache.org/java/docs/api/org/apache/lucene/search/ 
>> Similarity.h
>> tml
>> >
>> >
>> > On 1/24/07, Sairaj Sunil <[hidden email]> wrote:
>> > >
>> > > Hi all,
>> > > Can you tell me the exact indexing algorithm used by Lucene.  
>> or give
>> > some
>> > > links to the documents that describe the algorithm used by lucene
>> > > Thanks in advance
>> > > --
>> > > Sairaj Sunil
>> > >
>> > >
>> >
>> >
>>
>>
>> --
>> Sairaj Sunil
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
>
> --
> Sairaj Sunil
> II Mtech(CS)
> SSSIHL
> Prashanthi Nilayam

--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ 
LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]