Discarding HLog files

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Discarding HLog files

alakshman
I had a question about how stuff is being written to the HLogs. Each column family that makes up a table has its own on disk representation. However there is only one HLog for all tables. Which means on every write, the individual HMemcache's for each column family in the row mutation are updated but the entire row is written to the HLog.

Now when a column family's HMemcache is flushed a token is written to HLog indicating that the column family for this table has been flushed ? There may be other column families which have not yet been flushed. Since we seem to write the entire rows to the HLog how can one tell that the log file has only flushed entities w/o a scan of the entire file ? Is the sequential scan unavoidable to determine if the HLog can be deleted when it is rolled away ?

Please explain.

THanks
Avinash
Reply | Threaded
Open this post in threaded view
|

Re: Discarding HLog files

Jim Kellerman
On Sat, 2007-07-14 at 12:49 -0700, alakshman wrote:
> I had a question about how stuff is being written to the HLogs. Each column
> family that makes up a table has its own on disk representation. However
> there is only one HLog for all tables.

This isn't quite true. There is one HLog per HRegionServer.

> Which means on every write, the
> individual HMemcache's for each column family in the row mutation are
> updated but the entire row is written to the HLog.

Also not quite true. The entire row is not written, only the changes are
written to the HLog.

> Now when a column family's HMemcache is flushed a token is written to HLog
> indicating that the column family for this table has been flushed ? There
> may be other column families which have not yet been flushed. Since we seem
> to write the entire rows to the HLog how can one tell that the log file has
> only flushed entities w/o a scan of the entire file ?

When the memcache is flushed, it happens on a per-region basis. That is
all the changes that apply to that region (all changed columns) are
written to disk. After the changes are flushed, a flushcache-complete is
written to the log indicating that all changes older than this id can be
ignored.

HLog maintains a couple of in-memory structures indicating for each
region, what the last flushed sequence number is, and also has a map of
flush id's to output files.

When the log is rolled, it determines the oldest outstanding sequence
number (the oldest sequence number that has not been flushed) and knows
that it can discard all the files with sequence numbers older than the
oldest outstanding change.

If a region server crashes, the master determines which regions the
region server was serving and has the hlog split into a separate part
for each region, and leaves the hlog in a special location. When the
master reassigns the region, part of starting up a region includes
processing any log entries that were not flushed (HRegion looks for an
old log file in the special location). Once the outstanding log entries
have been processed, the region can be brought on line.

> Is the sequential scan
> unavoidable to determine if the HLog can be deleted when it is rolled away ?
>
> Please explain.
>
> THanks
> Avinash
--
Jim Kellerman, Senior Engineer; Powerset
[hidden email]