Tlog vs. buffer + softcommit.

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Tlog vs. buffer + softcommit.

Bing Hua
Hello,

I'm a bit confused with the purpose of Transaction Logs (Update Logs) in Solr.

My understanding is, update request comes in, first the new item is put in RAM buffer as well as T-Log. After a soft commit happens, the new item becomes searchable but not hard committed in stable storage. Configuring soft commit interval to 1 sec achieves NRT.

Then what exactly T-Log is doing in this scenario? Why is it there and under what circumstances is it being cleared?

I tried to search for online documentations but no success. Trying to get something from source code. Any hints would be appreciated.

Thanks,
Bing
Reply | Threaded
Open this post in threaded view
|

Re: Tlog vs. buffer + softcommit.

Yonik Seeley-2-2
On Thu, Aug 9, 2012 at 5:39 PM, Bing Hua <[hidden email]> wrote:
> I'm a bit confused with the purpose of Transaction Logs (Update Logs) in
> Solr.
>
> My understanding is, update request comes in, first the new item is put in
> RAM buffer as well as T-Log. After a soft commit happens, the new item
> becomes searchable but not hard committed in stable storage. Configuring
> soft commit interval to 1 sec achieves NRT.
>
> Then what exactly T-Log is doing in this scenario?

It serves realtime-get... when even 1 second isn't acceptable (i.e.
you need to be guaranteed of getting the latest version of a
document):
http://searchhub.org/dev/2011/09/07/realtime-get/
and also allows for a peer to ask "give me the list of the last update
events you know about".
You can also "kill -9" the server and solr will automatically recover
from the log.

> what circumstances is it being cleared?

A new log file is created every time a hard commit is done, and old
log files are removed if newer log files contain enough entries to
satisfy the needs of what I call "peersync" in solr cloud (currently
~100 updates IIRC).

It's cleared after a hard commit and after there are enough entries in
other log files to satisfy the lookback requirements of what I call
"peersync" in SolrCloud (currently ~100 updates IIRC).

-Yonik
http://lucidimagination.com
Reply | Threaded
Open this post in threaded view
|

Re: Tlog vs. buffer + softcommit.

Bing Hua
Thanks for the information. It definitely helps a lot. There're numDeletesToKeep = 1000; numRecordsToKeep = 100; in UpdateLog so this should probably be what you're referring to.

However when I was doing indexing the total size of TLogs kept on increasing. It doesn't sound like the case where there's a cap for number of documents? Also for peersync, can I find some intro online?
Reply | Threaded
Open this post in threaded view
|

Re: Tlog vs. buffer + softcommit.

Yonik Seeley-2-2
On Fri, Aug 10, 2012 at 11:19 AM, Bing Hua <[hidden email]> wrote:
> Thanks for the information. It definitely helps a lot. There're
> numDeletesToKeep = 1000; numRecordsToKeep = 100; in UpdateLog so this should
> probably be what you're referring to.
>
> However when I was doing indexing the total size of TLogs kept on
> increasing. It doesn't sound like the case where there's a cap for number of
> documents?

No, there is no cap.  That's why the following is in solrconfig.xml:

     <autoCommit>
       <maxTime>15000</maxTime>
       <openSearcher>false</openSearcher>
     </autoCommit>

That causes a hard commit every 15 seconds w/o opening a new searcher
(i.e. you still retain control over exactly when the searcher view
changes if you want).

> Also for peersync, can I find some intro online?

Nothing yet - but the idea is pretty simple... sync up with peers by
getting recent updates if possible.  If that fails, we get in sync by
copying over a full index.

-Yonik
http://lucidworks.com
Reply | Threaded
Open this post in threaded view
|

Re: Tlog vs. buffer + softcommit.

Bing Hua
I remember I did set the 15sec autocommit and still saw the Tlogs growing unboundedly. But sounds like theoretically it should not if I index in a constant rate. I'll probably try it again sometime.

For the peersync, I think solr cloud now uses push-replication over pull. Hmm, it makes sense to keep an amount of Tlogs for peers to sync up.

Thanks,
Bing