Where are indexes stored and where to store indexes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Where are indexes stored and where to store indexes

Bryan Woliner
I know that this is a really basic question, but once you index segment(s),
where is the index stored?

On a related note, I read in numerous emails to the list that you can search
more than one index at the same time if they are in the same location when
you start tomcat. Where is the correct location (or type of location) to
store these indexes? Based on the fact that you need to create new db and
segment directories each time you do a crawl, it seems like you would have
to move indexes after they are created if you want multiple indexes in the
same location.

Thanks for the help,
Bryan
Reply | Threaded
Open this post in threaded view
|

Re: Where are indexes stored and where to store indexes

Bryan Woliner
An update to my question:

1. I found where the index is located, so nevermind on that one.

2. In term of using bin/nutch merge -- the wiki indicates that correct
syntax is something like this:

bin/nutch merge index segments/*

However, that seems to suggest that all of your indexes need to be in
the same segments directory in order to merge them.

What is the best practice for merging indexes that are in different
segment directories? Do you have to copy all of your segments to the
same segments directory first? Do you need to use mergesegs before you
call merge (it doesn't seem likely that this is the case).

Thanks,
Bryan




On 8/24/05, Bryan Woliner <[hidden email]> wrote:
> I know that this is a really basic question, but once you index segment(s), where is the index stored?
>  
>  On a related note, I read in numerous emails to the list that you can search more than one index at the same time if they are in the same location when you start tomcat. Where is the correct location (or type of location) to store these indexes? Based on the fact that you need to create new db and segment directories each time you do a crawl, it seems like you would have to move indexes after they are created if you want multiple indexes in the same location.
>  
>  Thanks for the help,
>  Bryan
>
Reply | Threaded
Open this post in threaded view
|

Re: Where are indexes stored and where to store indexes

Michael Ji
As I understand, (from code and data structure
investigation), the Nutch search engine need directory
of "segments", which contains the index information
(by "/index/200..") and raw data information (by
"content/", "parse_data", etc).

I believe the target segments should be stayed in the
same place (where tomcat launched), while the segments
being merged could be in somewhere else. After a
successfully merging, the segments being merged
becomes empty (sorry if I stated it wrong, but I
remembered I saw Null in my test run) or at least
un-used.

Michael Ji,

--- Bryan Woliner <[hidden email]> wrote:

> An update to my question:
>
> 1. I found where the index is located, so nevermind
> on that one.
>
> 2. In term of using bin/nutch merge -- the wiki
> indicates that correct
> syntax is something like this:
>
> bin/nutch merge index segments/*
>
> However, that seems to suggest that all of your
> indexes need to be in
> the same segments directory in order to merge them.
>
> What is the best practice for merging indexes that
> are in different
> segment directories? Do you have to copy all of your
> segments to the
> same segments directory first? Do you need to use
> mergesegs before you
> call merge (it doesn't seem likely that this is the
> case).
>
> Thanks,
> Bryan
>
>
>
>
> On 8/24/05, Bryan Woliner <[hidden email]>
> wrote:
> > I know that this is a really basic question, but
> once you index segment(s), where is the index
> stored?
> >  
> >  On a related note, I read in numerous emails to
> the list that you can search more than one index at
> the same time if they are in the same location when
> you start tomcat. Where is the correct location (or
> type of location) to store these indexes? Based on
> the fact that you need to create new db and segment
> directories each time you do a crawl, it seems like
> you would have to move indexes after they are
> created if you want multiple indexes in the same
> location.
> >  
> >  Thanks for the help,
> >  Bryan
> >
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com