Lucene Data Structures

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Lucene Data Structures

Prafulla Kiran
Hi Everybody,

Could someone please explain the actual data structures being used by
Lucene for storing the postings list in the index. I see a file called
MultileveSkipListReader and MultiLevelSkipListWriter. Is lucene using
Multi-level skip lists behind the scenes, for maintaining the index ? I
want to understand clearly the actual data structure being used by
lucene for storing the index and postings list, so that I can deduce the
complexity for reading from that datastructure and decide whether my
application would scale as per my requirements while using Lucene. So,
someone please give me some pointers to the data structures being used
by Lucene .

TIA,
Prafulla

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene Data Structures

Grant Ingersoll-2
http://lucene.apache.org/java/2_4_0/fileformats.html

On Dec 15, 2008, at 12:15 AM, Prafulla Kiran wrote:

> Hi Everybody,
>
> Could someone please explain the actual data structures being used  
> by Lucene for storing the postings list in the index. I see a file  
> called MultileveSkipListReader and MultiLevelSkipListWriter. Is  
> lucene using Multi-level skip lists behind the scenes, for  
> maintaining the index ? I want to understand clearly the actual data  
> structure being used by lucene for storing the index and postings  
> list, so that I can deduce the complexity for reading from that  
> datastructure and decide whether my application would scale as per  
> my requirements while using Lucene. So, someone please give me some  
> pointers to the data structures being used by Lucene .
>
> TIA,
> Prafulla
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene Data Structures

Prafulla Kiran
Well, I have seen this link many times before. It doesn't really explain
the data structures part of it. Perhaps I should have asked my question
this way:
"What data structures are being used by Lucene to read the posting lists
from the index ?" .
My guess is that a hash table is being used for reading the postings of
each term, with the key being the term and the hash value being a multi
level skip list.
Please correct me if I am wrong.

Regards,
Prafulla


Grant Ingersoll wrote:

> http://lucene.apache.org/java/2_4_0/fileformats.html
>
> On Dec 15, 2008, at 12:15 AM, Prafulla Kiran wrote:
>
>> Hi Everybody,
>>
>> Could someone please explain the actual data structures being used by
>> Lucene for storing the postings list in the index. I see a file
>> called MultileveSkipListReader and MultiLevelSkipListWriter. Is
>> lucene using Multi-level skip lists behind the scenes, for
>> maintaining the index ? I want to understand clearly the actual data
>> structure being used by lucene for storing the index and postings
>> list, so that I can deduce the complexity for reading from that
>> datastructure and decide whether my application would scale as per my
>> requirements while using Lucene. So, someone please give me some
>> pointers to the data structures being used by Lucene .
>>
>> TIA,
>> Prafulla
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
> --------------------------
> Grant Ingersoll
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
> ------------------------------------------------------------------------
>
>
> No virus found in this incoming message.
> Checked by AVG - http://www.avg.com 
> Version: 8.0.176 / Virus Database: 270.9.18/1848 - Release Date: 12/14/2008 12:28 PM
>
>  


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene Data Structures

Erick Erickson
I question whether you *can* make this decision based upon the data
structure being used. I can code such that *any* data structure you care
to name will not perform well under some conditions <G>.

Not to mention the other characteristics of a search engine that get in
the way of even the very most efficient structure.

The only way I have ever gained confidence is to create a representative
dataset and measure. Which also has its perils, but....

Although I'll gladly admit that no amount of clever programming can
make up for a fundamentally flawed architecture.

But there are some pretty bright people coding all this up. You might get
more comfortable by looking at some of the success stories on the website.

But in the end, it's a "best guess" kind of thing. Perhaps you could explain
what you plan to do and folks with more experience than me might be able
to offer insights...

Best
Erick

On Tue, Dec 16, 2008 at 12:12 AM, Prafulla Kiran
<[hidden email]>wrote:

> Well, I have seen this link many times before. It doesn't really explain
> the data structures part of it. Perhaps I should have asked my question this
> way:
> "What data structures are being used by Lucene to read the posting lists
> from the index ?" .
> My guess is that a hash table is being used for reading the postings of
> each term, with the key being the term and the hash value being a multi
> level skip list.
> Please correct me if I am wrong.
>
> Regards,
> Prafulla
>
>
> Grant Ingersoll wrote:
>
>> http://lucene.apache.org/java/2_4_0/fileformats.html
>>
>> On Dec 15, 2008, at 12:15 AM, Prafulla Kiran wrote:
>>
>>  Hi Everybody,
>>>
>>> Could someone please explain the actual data structures being used by
>>> Lucene for storing the postings list in the index. I see a file called
>>> MultileveSkipListReader and MultiLevelSkipListWriter. Is lucene using
>>> Multi-level skip lists behind the scenes, for maintaining the index ? I want
>>> to understand clearly the actual data structure being used by lucene for
>>> storing the index and postings list, so that I can deduce the complexity for
>>> reading from that datastructure and decide whether my application would
>>> scale as per my requirements while using Lucene. So, someone please give me
>>> some pointers to the data structures being used by Lucene .
>>>
>>> TIA,
>>> Prafulla
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>> --------------------------
>> Grant Ingersoll
>>
>> Lucene Helpful Hints:
>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>> ------------------------------------------------------------------------
>>
>>
>> No virus found in this incoming message.
>> Checked by AVG - http://www.avg.com Version: 8.0.176 / Virus Database:
>> 270.9.18/1848 - Release Date: 12/14/2008 12:28 PM
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>