Solr 1.4 Replication index directories

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr 1.4 Replication index directories

mark angelillo
Hi,

We're using the new replication and it's working pretty well. There's  
one detail I'd like to get some more information about.

As the replication works, it creates versions of the index in the data  
directory. Originally we had index/, but now there are dated versions  
such as index.20100127044500/, which are the replicated versions.

Each copy is sized in the vicinity of 65G. With our current hard drive  
it's fine to have two around, but 3 gets a little dicey. Sometimes  
we're finding that the replication doesn't always clean up after  
itself. I would like to understand this better, or to not have this  
happen. It could be a configuration issue.

Some more specific questions:

- Is it safe to remove the index/ directory (that doesn't have the  
date on it)? I think I tried this once and the whole thing broke,  
however maybe something else was wrong at the time.

- Is there a way to know which one is the current one? (I'm looking at  
the file index.properties, and it seems to be correct, but sometimes  
there's a newer version in the directory, which later is removed)

- Could it be that the index does not finish replicating in the poll  
interval I give it? What happens if, say there's a poll interval X and  
replicating the index happens to take longer than X sometimes. (Our  
current poll interval is 45 minutes, and every time I'm watching it it  
completes in time.)

Thanks in advance
Mark
Reply | Threaded
Open this post in threaded view
|

Re: Solr 1.4 Replication index directories

Otis Gospodnetic-2
Answers below.




----- Original Message ----

> From: mark angelillo <[hidden email]>
>
> Hi,
>
> We're using the new replication and it's working pretty well. There's one detail
> I'd like to get some more information about.
>
> As the replication works, it creates versions of the index in the data
> directory. Originally we had index/, but now there are dated versions such as
> index.20100127044500/, which are the replicated versions.
>
> Each copy is sized in the vicinity of 65G. With our current hard drive it's fine
> to have two around, but 3 gets a little dicey. Sometimes we're finding that the
> replication doesn't always clean up after itself. I would like to understand
> this better, or to not have this happen. It could be a configuration issue.
>
> Some more specific questions:
>
> - Is it safe to remove the index/ directory (that doesn't have the date on it)?
> I think I tried this once and the whole thing broke, however maybe something
> else was wrong at the time.

No, that's the real, live index, you don't want to remove that one.

> - Is there a way to know which one is the current one? (I'm looking at the file
> index.properties, and it seems to be correct, but sometimes there's a newer
> version in the directory, which later is removed)

I think the "index" one is always current, no?  If not, I imagine the admin replication page will tell you, or even the Statistics page.
e.g.
reader :  SolrIndexReader{this=46a55e,r=ReadOnlySegmentReader@46a55e,segments=1}
readerDir :  org.apache.lucene.store.NIOFSDirectory@/mnt/solrhome/cores/foo/data/index

> - Could it be that the index does not finish replicating in the poll interval I
> give it? What happens if, say there's a poll interval X and replicating the
> index happens to take longer than X sometimes. (Our current poll interval is 45
> minutes, and every time I'm watching it it completes in time.)


I think only 1 replication will/should be happening at a time.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/
Reply | Threaded
Open this post in threaded view
|

Re: Solr 1.4 Replication index directories

mark angelillo
Thanks, Otis. Responses inline.


>> Hi,
>>
>> We're using the new replication and it's working pretty well.  
>> There's one detail
>> I'd like to get some more information about.
>>
>> As the replication works, it creates versions of the index in the  
>> data
>> directory. Originally we had index/, but now there are dated  
>> versions such as
>> index.20100127044500/, which are the replicated versions.
>>
>> Each copy is sized in the vicinity of 65G. With our current hard  
>> drive it's fine
>> to have two around, but 3 gets a little dicey. Sometimes we're  
>> finding that the
>> replication doesn't always clean up after itself. I would like to  
>> understand
>> this better, or to not have this happen. It could be a  
>> configuration issue.
>>
>> Some more specific questions:
>>
>> - Is it safe to remove the index/ directory (that doesn't have the  
>> date on it)?
>> I think I tried this once and the whole thing broke, however maybe  
>> something
>> else was wrong at the time.
>
> No, that's the real, live index, you don't want to remove that one.


Yeah... I tried it once and remember things breaking.

However nothing in this directory has been modified for over a week  
(since the last replication initialization). And I'm still sitting on  
130GB of data for what is only 65GB on the master



>
>> - Is there a way to know which one is the current one? (I'm looking  
>> at the file
>> index.properties, and it seems to be correct, but sometimes there's  
>> a newer
>> version in the directory, which later is removed)
>
> I think the "index" one is always current, no?  If not, I imagine  
> the admin replication page will tell you, or even the Statistics page.
> e.g.
> reader :  
> SolrIndexReader{this=46a55e,r=ReadOnlySegmentReader@46a55e,segments=1}
> readerDir :  org.apache.lucene.store.NIOFSDirectory@/mnt/solrhome/
> cores/foo/data/index


reader :  
SolrIndexReader
{this=5c3aef1,r=ReadOnlyDirectoryReader@5c3aef1,refCnt=1,segments=9}
readerDir : org.apache.lucene.store.NIOFSDirectory@/home/solr/solr_1.4/
solr/data/index.20100127044500




>
>> - Could it be that the index does not finish replicating in the  
>> poll interval I
>> give it? What happens if, say there's a poll interval X and  
>> replicating the
>> index happens to take longer than X sometimes. (Our current poll  
>> interval is 45
>> minutes, and every time I'm watching it it completes in time.)
>
>
> I think only 1 replication will/should be happening at a time.

Whew, that's comforting.

Reply | Threaded
Open this post in threaded view
|

Re: Solr 1.4 Replication index directories

Noble Paul നോബിള്‍  नोब्ळ्-2
the index.20100127044500/ is a temp directory should have got cleaned
up if there was no problem in replication (see the logs if there was a
problem) . if there is a problem the temp directory will be used as
the new index directory and the old one will no more be used.at any
given point only one directory is used for the index. check the
replication dashboard to check which one it is. Everything else can be
deleted.

On Fri, Jan 29, 2010 at 6:03 AM, mark angelillo <[hidden email]> wrote:

> Thanks, Otis. Responses inline.
>
>
>>> Hi,
>>>
>>> We're using the new replication and it's working pretty well. There's one
>>> detail
>>> I'd like to get some more information about.
>>>
>>> As the replication works, it creates versions of the index in the data
>>> directory. Originally we had index/, but now there are dated versions
>>> such as
>>> index.20100127044500/, which are the replicated versions.
>>>
>>> Each copy is sized in the vicinity of 65G. With our current hard drive
>>> it's fine
>>> to have two around, but 3 gets a little dicey. Sometimes we're finding
>>> that the
>>> replication doesn't always clean up after itself. I would like to
>>> understand
>>> this better, or to not have this happen. It could be a configuration
>>> issue.
>>>
>>> Some more specific questions:
>>>
>>> - Is it safe to remove the index/ directory (that doesn't have the date
>>> on it)?
>>> I think I tried this once and the whole thing broke, however maybe
>>> something
>>> else was wrong at the time.
>>
>> No, that's the real, live index, you don't want to remove that one.
>
>
> Yeah... I tried it once and remember things breaking.
>
> However nothing in this directory has been modified for over a week (since
> the last replication initialization). And I'm still sitting on 130GB of data
> for what is only 65GB on the master
>
>
>
>>
>>> - Is there a way to know which one is the current one? (I'm looking at
>>> the file
>>> index.properties, and it seems to be correct, but sometimes there's a
>>> newer
>>> version in the directory, which later is removed)
>>
>> I think the "index" one is always current, no?  If not, I imagine the
>> admin replication page will tell you, or even the Statistics page.
>> e.g.
>> reader :
>>  SolrIndexReader{this=46a55e,r=ReadOnlySegmentReader@46a55e,segments=1}
>> readerDir :
>>  org.apache.lucene.store.NIOFSDirectory@/mnt/solrhome/cores/foo/data/index
>
>
> reader :
> SolrIndexReader{this=5c3aef1,r=ReadOnlyDirectoryReader@5c3aef1,refCnt=1,segments=9}
> readerDir :
> org.apache.lucene.store.NIOFSDirectory@/home/solr/solr_1.4/solr/data/index.20100127044500
>
>
>
>
>>
>>> - Could it be that the index does not finish replicating in the poll
>>> interval I
>>> give it? What happens if, say there's a poll interval X and replicating
>>> the
>>> index happens to take longer than X sometimes. (Our current poll interval
>>> is 45
>>> minutes, and every time I'm watching it it completes in time.)

you can keep a very small pollInterval and it is OK. if a replication
is going on no new replication will be initiated till the old one
completes
>>
>>
>> I think only 1 replication will/should be happening at a time.
>
> Whew, that's comforting.
>
>



--
-----------------------------------------------------
Noble Paul | Systems Architect| AOL | http://aol.com