[jira] Created: (HADOOP-1528) HClient for multple-tables

classic Classic list List threaded Threaded
44 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (HADOOP-1528) HClient for multple-tables

Hudson (Jira)
HClient for multple-tables
--------------------------

                 Key: HADOOP-1528
                 URL: https://issues.apache.org/jira/browse/HADOOP-1528
             Project: Hadoop
          Issue Type: Task
          Components: contrib/hbase
    Affects Versions: 0.14.0
            Reporter: James Kennedy


I have an app that needs to access multiple HBase tables concurrently.  The current HClient can only have one table open at a time even though it caches region servers of multiple tables as they are looked up.

This means that my application layer must open multiple HClients, one per table, perhaps caching those HClients in a pool to reuse them (and their cached table data) as appropriate.

or

Shall I write an HClient patch that makes the HClient  multi-table thread-safe?

Jim's suggestion is to implement an HClient singleton (call it HClientManager?) that does the actual caching/resync of root/meta regions.  Individual HClients will still be one talbe, one update row at a time but will rely on the singleton for the cached table info.  We want HClients to be created and disposed as fast as possible with a minium of meta lookups.

Jim, what about non-root/meta regions, shouldn't they be cached and refreshed via the singleton also?  It may still be possible that a region split/resync will occur during on HClient session so does the HClientManager need to be able to notify the corresponding HClients in that event?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1528) HClient for multiple tables

Hudson (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Kennedy updated HADOOP-1528:
----------------------------------

    Summary: HClient for multiple tables  (was: HClient for multple-tables)

> HClient for multiple tables
> ---------------------------
>
>                 Key: HADOOP-1528
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1528
>             Project: Hadoop
>          Issue Type: Task
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>
> I have an app that needs to access multiple HBase tables concurrently.  The current HClient can only have one table open at a time even though it caches region servers of multiple tables as they are looked up.
> This means that my application layer must open multiple HClients, one per table, perhaps caching those HClients in a pool to reuse them (and their cached table data) as appropriate.
> or
> Shall I write an HClient patch that makes the HClient  multi-table thread-safe?
> Jim's suggestion is to implement an HClient singleton (call it HClientManager?) that does the actual caching/resync of root/meta regions.  Individual HClients will still be one talbe, one update row at a time but will rely on the singleton for the cached table info.  We want HClients to be created and disposed as fast as possible with a minium of meta lookups.
> Jim, what about non-root/meta regions, shouldn't they be cached and refreshed via the singleton also?  It may still be possible that a region split/resync will occur during on HClient session so does the HClientManager need to be able to notify the corresponding HClients in that event?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1528) HClient for multiple tables

Hudson (Jira)
In reply to this post by Hudson (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Kennedy updated HADOOP-1528:
----------------------------------

    Description:
I have an app that needs to access multiple HBase tables concurrently.  The current HClient can only have one table open at a time even though it caches region servers of multiple tables as they are looked up.

This means that my application layer must open multiple HClients, one per table, perhaps caching those HClients in a pool to reuse them (and their cached table data) as appropriate.

or

Shall I write an HClient patch that makes the HClient  multi-table thread-safe?

Jim's suggestion is to implement an HClient singleton (call it HClientManager?) that does the actual caching/resync of root/meta regions.  Individual HClients will still be one table, one update row at a time but will rely on the singleton for the cached table info.  We want HClients to be created and disposed as fast as possible with a minimum of meta lookups.

Jim, what about non-root/meta regions, shouldn't they be cached and refreshed via the singleton also?  It may still be possible that a region split/resync will occur during on HClient session so does the HClientManager need to be able to notify the corresponding HClients in that event?

  was:
I have an app that needs to access multiple HBase tables concurrently.  The current HClient can only have one table open at a time even though it caches region servers of multiple tables as they are looked up.

This means that my application layer must open multiple HClients, one per table, perhaps caching those HClients in a pool to reuse them (and their cached table data) as appropriate.

or

Shall I write an HClient patch that makes the HClient  multi-table thread-safe?

Jim's suggestion is to implement an HClient singleton (call it HClientManager?) that does the actual caching/resync of root/meta regions.  Individual HClients will still be one talbe, one update row at a time but will rely on the singleton for the cached table info.  We want HClients to be created and disposed as fast as possible with a minium of meta lookups.

Jim, what about non-root/meta regions, shouldn't they be cached and refreshed via the singleton also?  It may still be possible that a region split/resync will occur during on HClient session so does the HClientManager need to be able to notify the corresponding HClients in that event?


> HClient for multiple tables
> ---------------------------
>
>                 Key: HADOOP-1528
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1528
>             Project: Hadoop
>          Issue Type: Task
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>
> I have an app that needs to access multiple HBase tables concurrently.  The current HClient can only have one table open at a time even though it caches region servers of multiple tables as they are looked up.
> This means that my application layer must open multiple HClients, one per table, perhaps caching those HClients in a pool to reuse them (and their cached table data) as appropriate.
> or
> Shall I write an HClient patch that makes the HClient  multi-table thread-safe?
> Jim's suggestion is to implement an HClient singleton (call it HClientManager?) that does the actual caching/resync of root/meta regions.  Individual HClients will still be one table, one update row at a time but will rely on the singleton for the cached table info.  We want HClients to be created and disposed as fast as possible with a minimum of meta lookups.
> Jim, what about non-root/meta regions, shouldn't they be cached and refreshed via the singleton also?  It may still be possible that a region split/resync will occur during on HClient session so does the HClientManager need to be able to notify the corresponding HClients in that event?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1528) HClient for multiple tables

Hudson (Jira)
In reply to this post by Hudson (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508568 ]

James Kennedy commented on HADOOP-1528:
---------------------------------------

I decided to write a new equivalent of HClient using two classes: HConnection and HTable. These represent a splitting of the original HClient. HConnection takes care of administrative functions like create/delete/disable/enableTable(), etc, caches all table info and region connections, and serves out HTables via openTable() or createTable().

HTable is a lighter-weight client that allows scan/update of a single HBase table. It uses its parent HConnection to initialize any region server proxies, etc.

The HConnection is NOT a singleton. I figured that within a single app, user may need to access multiple HBase clusters.  So instead i made HConnection maintain a static <Configuration, HConnection> map with static getters for HConnection by Configuration.  The default configuration is the one on the classpath but it is possible to get HConnections based on any other Configuration.  HConnection will statically preserve those connections within the application lifetime and since its constructor is private, it is not possible to instantiate an unregistered HConnection.

The chief advantage of this HConnection-HTable pattern is that one can have multiple concurrent transactions on multiple tables that share a single HBase "connection".

I'll post a patch when i've tested some more. Right now this code presumes Hadoop-1531 patch is applied and i'm trying to avoid code tangle... it would be great if Hadoop-1521 got applied soon unless you guys reject it.


> HClient for multiple tables
> ---------------------------
>
>                 Key: HADOOP-1528
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1528
>             Project: Hadoop
>          Issue Type: Task
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>
> I have an app that needs to access multiple HBase tables concurrently.  The current HClient can only have one table open at a time even though it caches region servers of multiple tables as they are looked up.
> This means that my application layer must open multiple HClients, one per table, perhaps caching those HClients in a pool to reuse them (and their cached table data) as appropriate.
> or
> Shall I write an HClient patch that makes the HClient  multi-table thread-safe?
> Jim's suggestion is to implement an HClient singleton (call it HClientManager?) that does the actual caching/resync of root/meta regions.  Individual HClients will still be one table, one update row at a time but will rely on the singleton for the cached table info.  We want HClients to be created and disposed as fast as possible with a minimum of meta lookups.
> Jim, what about non-root/meta regions, shouldn't they be cached and refreshed via the singleton also?  It may still be possible that a region split/resync will occur during on HClient session so does the HClientManager need to be able to notify the corresponding HClients in that event?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1528) HClient for multiple tables

Hudson (Jira)
In reply to this post by Hudson (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508622 ]

Jim Kellerman commented on HADOOP-1528:
---------------------------------------

See Michael's and my comments on HADOOP-1531

Why not have one HConnection object for each HBase instance?

Since the HConnection object is managing region to server mappings I guess it makes sense to cache all the server information in the connection object rather than just the root/meta information as I suggested previously.

My original thinking was that since what you call HTable is associated with a single table, that it made sense to cache the information for that table here instead of in the connection object. This way, when you are done with that table, it's cache will go away when the HTable object goes away.

If you maintain the region to server cache for all the open tables for an HBase instance in the HConnection, then there should probably a close method on HTable so it can tell the connection to drop the information for that table.

> HClient for multiple tables
> ---------------------------
>
>                 Key: HADOOP-1528
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1528
>             Project: Hadoop
>          Issue Type: Task
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>
> I have an app that needs to access multiple HBase tables concurrently.  The current HClient can only have one table open at a time even though it caches region servers of multiple tables as they are looked up.
> This means that my application layer must open multiple HClients, one per table, perhaps caching those HClients in a pool to reuse them (and their cached table data) as appropriate.
> or
> Shall I write an HClient patch that makes the HClient  multi-table thread-safe?
> Jim's suggestion is to implement an HClient singleton (call it HClientManager?) that does the actual caching/resync of root/meta regions.  Individual HClients will still be one table, one update row at a time but will rely on the singleton for the cached table info.  We want HClients to be created and disposed as fast as possible with a minimum of meta lookups.
> Jim, what about non-root/meta regions, shouldn't they be cached and refreshed via the singleton also?  It may still be possible that a region split/resync will occur during on HClient session so does the HClientManager need to be able to notify the corresponding HClients in that event?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1528) HClient for multiple tables

Hudson (Jira)
In reply to this post by Hudson (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508646 ]

James Kennedy commented on HADOOP-1528:
---------------------------------------

If by "HBase instance" we mean a Configuration object pointing to a specific HMaster, then one HConnection per HBase instance is how i've implemented it.  The static methods in HConnection ensure that HConnections are initialized as needed based on given Configuration object.  After that, any requests for an HConnection are keyed by Configuration. Repeated requests always get the same HConnection for a given Configuration.

Imagine using a single HConnection for many transactions across many (sometimes the same) tables.  You don't want to "close" an HTable because another transaction on the same table may execute moments afterwards... the caching has been centralized to avoid repeated meta lookups.

Only when an HTable times out accessing an HRegion and needs to findRegion(),  does it call a HConnection.closeTable() which essentially clears the cache for that table.  This is followed by  a synchronized HConnection.getTableServers() method which will update the cache with the new regions.





> HClient for multiple tables
> ---------------------------
>
>                 Key: HADOOP-1528
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1528
>             Project: Hadoop
>          Issue Type: Task
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>
> I have an app that needs to access multiple HBase tables concurrently.  The current HClient can only have one table open at a time even though it caches region servers of multiple tables as they are looked up.
> This means that my application layer must open multiple HClients, one per table, perhaps caching those HClients in a pool to reuse them (and their cached table data) as appropriate.
> or
> Shall I write an HClient patch that makes the HClient  multi-table thread-safe?
> Jim's suggestion is to implement an HClient singleton (call it HClientManager?) that does the actual caching/resync of root/meta regions.  Individual HClients will still be one table, one update row at a time but will rely on the singleton for the cached table info.  We want HClients to be created and disposed as fast as possible with a minimum of meta lookups.
> Jim, what about non-root/meta regions, shouldn't they be cached and refreshed via the singleton also?  It may still be possible that a region split/resync will occur during on HClient session so does the HClientManager need to be able to notify the corresponding HClients in that event?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1528) HClient for multiple tables

Hudson (Jira)
In reply to this post by Hudson (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508691 ]

Jim Kellerman commented on HADOOP-1528:
---------------------------------------

> If by "HBase instance" we mean a Configuration object pointing to a specific HMaster, then one
> HConnection per HBase instance is how i've implemented it.

Yes, this is what I meant.

> Imagine using a single HConnection for many transactions across many (sometimes the same) tables.
> You don't want to "close" an HTable because another transaction on the same table may execute
> moments afterwards... the caching has been centralized to avoid repeated meta lookups.

Yes. However, an application usually knows when it is done accessing a table. If the table region to server map were in the HTable object, then that cache would be dumped when the client garbage collected the HTable object. However, since we are talking about caching the region to server information in the HConnection object, the client needs the ability to say "ok I am done with this table now" so that the HConnection can drop all that cached information. For very large tables, this could consume a great amount of memory, and I would much rather go through the work of re-opening a table than risking an OutOfMemoryException.

> Only when an HTable times out accessing an HRegion and needs to findRegion(), does it call a
> HConnection.closeTable() which essentially clears the cache for that table. This is followed by a
> synchronized HConnection.getTableServers() method which will update the cache with the new
> regions.

yes, this is definately needed. but see my explanation above for why we should have an HTable.close
method.



> HClient for multiple tables
> ---------------------------
>
>                 Key: HADOOP-1528
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1528
>             Project: Hadoop
>          Issue Type: Task
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>
> I have an app that needs to access multiple HBase tables concurrently.  The current HClient can only have one table open at a time even though it caches region servers of multiple tables as they are looked up.
> This means that my application layer must open multiple HClients, one per table, perhaps caching those HClients in a pool to reuse them (and their cached table data) as appropriate.
> or
> Shall I write an HClient patch that makes the HClient  multi-table thread-safe?
> Jim's suggestion is to implement an HClient singleton (call it HClientManager?) that does the actual caching/resync of root/meta regions.  Individual HClients will still be one table, one update row at a time but will rely on the singleton for the cached table info.  We want HClients to be created and disposed as fast as possible with a minimum of meta lookups.
> Jim, what about non-root/meta regions, shouldn't they be cached and refreshed via the singleton also?  It may still be possible that a region split/resync will occur during on HClient session so does the HClientManager need to be able to notify the corresponding HClients in that event?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1528) HClient for multiple tables

Hudson (Jira)
In reply to this post by Hudson (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508703 ]

James Kennedy commented on HADOOP-1528:
---------------------------------------

I see what you're saying.

I'll add a close() method to HTable, thats not a problem.




> HClient for multiple tables
> ---------------------------
>
>                 Key: HADOOP-1528
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1528
>             Project: Hadoop
>          Issue Type: Task
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>
> I have an app that needs to access multiple HBase tables concurrently.  The current HClient can only have one table open at a time even though it caches region servers of multiple tables as they are looked up.
> This means that my application layer must open multiple HClients, one per table, perhaps caching those HClients in a pool to reuse them (and their cached table data) as appropriate.
> or
> Shall I write an HClient patch that makes the HClient  multi-table thread-safe?
> Jim's suggestion is to implement an HClient singleton (call it HClientManager?) that does the actual caching/resync of root/meta regions.  Individual HClients will still be one table, one update row at a time but will rely on the singleton for the cached table info.  We want HClients to be created and disposed as fast as possible with a minimum of meta lookups.
> Jim, what about non-root/meta regions, shouldn't they be cached and refreshed via the singleton also?  It may still be possible that a region split/resync will occur during on HClient session so does the HClientManager need to be able to notify the corresponding HClients in that event?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1528) HClient for multiple tables

Hudson (Jira)
In reply to this post by Hudson (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Kennedy updated HADOOP-1528:
----------------------------------

    Attachment: HConnection.patch

Submitted a patch with four new files:
HConnection
HTable
TableNotFoundException
TestHConnection

and additions of equals() and hashcode() to Configuration.

Assuming you guys like this code:

I'm not sure what the best way to integrate this would be. HConnection and HTable repeat a lot of the same code that is in HClient.  Should we maintain them separately? If we replace HClient we'll need to refactor a lot of places where it is used. If that's the way to go, I would suggest modifiying HClient to be a facade for HConnection/HTable to ditch the repeated code.  Deprecate it and then slowly replace it with HConnection wherever we see it.



> HClient for multiple tables
> ---------------------------
>
>                 Key: HADOOP-1528
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1528
>             Project: Hadoop
>          Issue Type: Task
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>         Attachments: HConnection.patch
>
>
> I have an app that needs to access multiple HBase tables concurrently.  The current HClient can only have one table open at a time even though it caches region servers of multiple tables as they are looked up.
> This means that my application layer must open multiple HClients, one per table, perhaps caching those HClients in a pool to reuse them (and their cached table data) as appropriate.
> or
> Shall I write an HClient patch that makes the HClient  multi-table thread-safe?
> Jim's suggestion is to implement an HClient singleton (call it HClientManager?) that does the actual caching/resync of root/meta regions.  Individual HClients will still be one table, one update row at a time but will rely on the singleton for the cached table info.  We want HClients to be created and disposed as fast as possible with a minimum of meta lookups.
> Jim, what about non-root/meta regions, shouldn't they be cached and refreshed via the singleton also?  It may still be possible that a region split/resync will occur during on HClient session so does the HClientManager need to be able to notify the corresponding HClients in that event?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Assigned: (HADOOP-1528) HClient for multiple tables

Hudson (Jira)
In reply to this post by Hudson (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Kennedy reassigned HADOOP-1528:
-------------------------------------

    Assignee: James Kennedy

> HClient for multiple tables
> ---------------------------
>
>                 Key: HADOOP-1528
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1528
>             Project: Hadoop
>          Issue Type: Task
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: James Kennedy
>         Attachments: HConnection.patch
>
>
> I have an app that needs to access multiple HBase tables concurrently.  The current HClient can only have one table open at a time even though it caches region servers of multiple tables as they are looked up.
> This means that my application layer must open multiple HClients, one per table, perhaps caching those HClients in a pool to reuse them (and their cached table data) as appropriate.
> or
> Shall I write an HClient patch that makes the HClient  multi-table thread-safe?
> Jim's suggestion is to implement an HClient singleton (call it HClientManager?) that does the actual caching/resync of root/meta regions.  Individual HClients will still be one table, one update row at a time but will rely on the singleton for the cached table info.  We want HClients to be created and disposed as fast as possible with a minimum of meta lookups.
> Jim, what about non-root/meta regions, shouldn't they be cached and refreshed via the singleton also?  It may still be possible that a region split/resync will occur during on HClient session so does the HClientManager need to be able to notify the corresponding HClients in that event?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Assigned: (HADOOP-1528) HClient for multiple tables

Hudson (Jira)
In reply to this post by Hudson (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Kennedy reassigned HADOOP-1528:
-------------------------------------

    Assignee: Jim Kellerman  (was: James Kennedy)

Assigning to Jim for comment.
Also I'll be on vacation for 3 weeks.

> HClient for multiple tables
> ---------------------------
>
>                 Key: HADOOP-1528
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1528
>             Project: Hadoop
>          Issue Type: Task
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: Jim Kellerman
>         Attachments: HConnection.patch
>
>
> I have an app that needs to access multiple HBase tables concurrently.  The current HClient can only have one table open at a time even though it caches region servers of multiple tables as they are looked up.
> This means that my application layer must open multiple HClients, one per table, perhaps caching those HClients in a pool to reuse them (and their cached table data) as appropriate.
> or
> Shall I write an HClient patch that makes the HClient  multi-table thread-safe?
> Jim's suggestion is to implement an HClient singleton (call it HClientManager?) that does the actual caching/resync of root/meta regions.  Individual HClients will still be one table, one update row at a time but will rely on the singleton for the cached table info.  We want HClients to be created and disposed as fast as possible with a minimum of meta lookups.
> Jim, what about non-root/meta regions, shouldn't they be cached and refreshed via the singleton also?  It may still be possible that a region split/resync will occur during on HClient session so does the HClientManager need to be able to notify the corresponding HClients in that event?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1528) HClient for multiple tables

Hudson (Jira)
In reply to this post by Hudson (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515052 ]

Jim Kellerman commented on HADOOP-1528:
---------------------------------------

James,

I've been meaning to look at this but haven't had  time yet. However it is next on my list after HADOOP-1516.

> HClient for multiple tables
> ---------------------------
>
>                 Key: HADOOP-1528
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1528
>             Project: Hadoop
>          Issue Type: Task
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: Jim Kellerman
>         Attachments: HConnection.patch
>
>
> I have an app that needs to access multiple HBase tables concurrently.  The current HClient can only have one table open at a time even though it caches region servers of multiple tables as they are looked up.
> This means that my application layer must open multiple HClients, one per table, perhaps caching those HClients in a pool to reuse them (and their cached table data) as appropriate.
> or
> Shall I write an HClient patch that makes the HClient  multi-table thread-safe?
> Jim's suggestion is to implement an HClient singleton (call it HClientManager?) that does the actual caching/resync of root/meta regions.  Individual HClients will still be one table, one update row at a time but will rely on the singleton for the cached table info.  We want HClients to be created and disposed as fast as possible with a minimum of meta lookups.
> Jim, what about non-root/meta regions, shouldn't they be cached and refreshed via the singleton also?  It may still be possible that a region split/resync will occur during on HClient session so does the HClientManager need to be able to notify the corresponding HClients in that event?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1528) HClient for multiple tables

Hudson (Jira)
In reply to this post by Hudson (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515061 ]

James Kennedy commented on HADOOP-1528:
---------------------------------------

No problem. I'm just concerned that that HClient will evolve so much that this patch will be out of date if it isn't already.  It will take some work to mirror recent improvements in HClient to the HConnection/HTable classes although most methods are copied to them as is.

Since I'll be away for 3 weeks, i won't be able to update a new patch and/or implement the above HClient facade for a while.

> HClient for multiple tables
> ---------------------------
>
>                 Key: HADOOP-1528
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1528
>             Project: Hadoop
>          Issue Type: Task
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: Jim Kellerman
>         Attachments: HConnection.patch
>
>
> I have an app that needs to access multiple HBase tables concurrently.  The current HClient can only have one table open at a time even though it caches region servers of multiple tables as they are looked up.
> This means that my application layer must open multiple HClients, one per table, perhaps caching those HClients in a pool to reuse them (and their cached table data) as appropriate.
> or
> Shall I write an HClient patch that makes the HClient  multi-table thread-safe?
> Jim's suggestion is to implement an HClient singleton (call it HClientManager?) that does the actual caching/resync of root/meta regions.  Individual HClients will still be one table, one update row at a time but will rely on the singleton for the cached table info.  We want HClients to be created and disposed as fast as possible with a minimum of meta lookups.
> Jim, what about non-root/meta regions, shouldn't they be cached and refreshed via the singleton also?  It may still be possible that a region split/resync will occur during on HClient session so does the HClientManager need to be able to notify the corresponding HClients in that event?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1528) HClient for multiple tables

Hudson (Jira)
In reply to this post by Hudson (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515063 ]

Jim Kellerman commented on HADOOP-1528:
---------------------------------------

Don't worry about creating a new patch. I can work off the existing one which at least has the architectural view established.

HClient has changed a lot and changes even more for HADOOP-1516, but it should be hard to adopt what you have.

> HClient for multiple tables
> ---------------------------
>
>                 Key: HADOOP-1528
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1528
>             Project: Hadoop
>          Issue Type: Task
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: Jim Kellerman
>         Attachments: HConnection.patch
>
>
> I have an app that needs to access multiple HBase tables concurrently.  The current HClient can only have one table open at a time even though it caches region servers of multiple tables as they are looked up.
> This means that my application layer must open multiple HClients, one per table, perhaps caching those HClients in a pool to reuse them (and their cached table data) as appropriate.
> or
> Shall I write an HClient patch that makes the HClient  multi-table thread-safe?
> Jim's suggestion is to implement an HClient singleton (call it HClientManager?) that does the actual caching/resync of root/meta regions.  Individual HClients will still be one table, one update row at a time but will rely on the singleton for the cached table info.  We want HClients to be created and disposed as fast as possible with a minimum of meta lookups.
> Jim, what about non-root/meta regions, shouldn't they be cached and refreshed via the singleton also?  It may still be possible that a region split/resync will occur during on HClient session so does the HClientManager need to be able to notify the corresponding HClients in that event?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Work started: (HADOOP-1528) HClient for multiple tables

Hudson (Jira)
In reply to this post by Hudson (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on HADOOP-1528 started by Jim Kellerman.

> HClient for multiple tables
> ---------------------------
>
>                 Key: HADOOP-1528
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1528
>             Project: Hadoop
>          Issue Type: Task
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: Jim Kellerman
>         Attachments: HConnection.patch
>
>
> I have an app that needs to access multiple HBase tables concurrently.  The current HClient can only have one table open at a time even though it caches region servers of multiple tables as they are looked up.
> This means that my application layer must open multiple HClients, one per table, perhaps caching those HClients in a pool to reuse them (and their cached table data) as appropriate.
> or
> Shall I write an HClient patch that makes the HClient  multi-table thread-safe?
> Jim's suggestion is to implement an HClient singleton (call it HClientManager?) that does the actual caching/resync of root/meta regions.  Individual HClients will still be one table, one update row at a time but will rely on the singleton for the cached table info.  We want HClients to be created and disposed as fast as possible with a minimum of meta lookups.
> Jim, what about non-root/meta regions, shouldn't they be cached and refreshed via the singleton also?  It may still be possible that a region split/resync will occur during on HClient session so does the HClientManager need to be able to notify the corresponding HClients in that event?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1528) HClient for multiple tables

Hudson (Jira)
In reply to this post by Hudson (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516448 ]

Jim Kellerman commented on HADOOP-1528:
---------------------------------------

The implementation of this feature will have two phases:

Phase 1:
- Implement new classes
 - HConnection manages connections to multiple tables and instances
 - HTable - used by clients to manipulate data in a single table
 - HBaseAdmin - used to perform administrative functions
- Reimplement HClient in terms of HConnection/HTable/HBaseAdmin. HClient will be deprecated

Phase 2:
- Modify other HBase components to use HConnection/HTable/HBaseAdmin instead of HClient


> HClient for multiple tables
> ---------------------------
>
>                 Key: HADOOP-1528
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1528
>             Project: Hadoop
>          Issue Type: Task
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: Jim Kellerman
>         Attachments: HConnection.patch
>
>
> I have an app that needs to access multiple HBase tables concurrently.  The current HClient can only have one table open at a time even though it caches region servers of multiple tables as they are looked up.
> This means that my application layer must open multiple HClients, one per table, perhaps caching those HClients in a pool to reuse them (and their cached table data) as appropriate.
> or
> Shall I write an HClient patch that makes the HClient  multi-table thread-safe?
> Jim's suggestion is to implement an HClient singleton (call it HClientManager?) that does the actual caching/resync of root/meta regions.  Individual HClients will still be one table, one update row at a time but will rely on the singleton for the cached table info.  We want HClients to be created and disposed as fast as possible with a minimum of meta lookups.
> Jim, what about non-root/meta regions, shouldn't they be cached and refreshed via the singleton also?  It may still be possible that a region split/resync will occur during on HClient session so does the HClientManager need to be able to notify the corresponding HClients in that event?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1528) HClient for multiple tables

Hudson (Jira)
In reply to this post by Hudson (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-1528:
----------------------------------

    Attachment: patch.txt

patch for phase 1

> HClient for multiple tables
> ---------------------------
>
>                 Key: HADOOP-1528
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1528
>             Project: Hadoop
>          Issue Type: Task
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: Jim Kellerman
>         Attachments: HConnection.patch, patch.txt
>
>
> I have an app that needs to access multiple HBase tables concurrently.  The current HClient can only have one table open at a time even though it caches region servers of multiple tables as they are looked up.
> This means that my application layer must open multiple HClients, one per table, perhaps caching those HClients in a pool to reuse them (and their cached table data) as appropriate.
> or
> Shall I write an HClient patch that makes the HClient  multi-table thread-safe?
> Jim's suggestion is to implement an HClient singleton (call it HClientManager?) that does the actual caching/resync of root/meta regions.  Individual HClients will still be one table, one update row at a time but will rely on the singleton for the cached table info.  We want HClients to be created and disposed as fast as possible with a minimum of meta lookups.
> Jim, what about non-root/meta regions, shouldn't they be cached and refreshed via the singleton also?  It may still be possible that a region split/resync will occur during on HClient session so does the HClientManager need to be able to notify the corresponding HClients in that event?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1528) HClient for multiple tables

Hudson (Jira)
In reply to this post by Hudson (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-1528:
----------------------------------

        Fix Version/s: 0.15.0
    Affects Version/s:     (was: 0.14.0)
                       0.15.0
               Status: Patch Available  (was: In Progress)

patch.txt contains the changes for phase 1 of this issue. Works in local environment. Try to get +1 from Hudson

> HClient for multiple tables
> ---------------------------
>
>                 Key: HADOOP-1528
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1528
>             Project: Hadoop
>          Issue Type: Task
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: James Kennedy
>            Assignee: Jim Kellerman
>             Fix For: 0.15.0
>
>         Attachments: HConnection.patch, patch.txt
>
>
> I have an app that needs to access multiple HBase tables concurrently.  The current HClient can only have one table open at a time even though it caches region servers of multiple tables as they are looked up.
> This means that my application layer must open multiple HClients, one per table, perhaps caching those HClients in a pool to reuse them (and their cached table data) as appropriate.
> or
> Shall I write an HClient patch that makes the HClient  multi-table thread-safe?
> Jim's suggestion is to implement an HClient singleton (call it HClientManager?) that does the actual caching/resync of root/meta regions.  Individual HClients will still be one table, one update row at a time but will rely on the singleton for the cached table info.  We want HClients to be created and disposed as fast as possible with a minimum of meta lookups.
> Jim, what about non-root/meta regions, shouldn't they be cached and refreshed via the singleton also?  It may still be possible that a region split/resync will occur during on HClient session so does the HClientManager need to be able to notify the corresponding HClients in that event?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1528) HClient for multiple tables

Hudson (Jira)
In reply to this post by Hudson (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516860 ]

Hadoop QA commented on HADOOP-1528:
-----------------------------------

-1, build or testing failed

2 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12362940/patch.txt against trunk revision r561603.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/494/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/494/console

Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.

> HClient for multiple tables
> ---------------------------
>
>                 Key: HADOOP-1528
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1528
>             Project: Hadoop
>          Issue Type: Task
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: James Kennedy
>            Assignee: Jim Kellerman
>             Fix For: 0.15.0
>
>         Attachments: HConnection.patch, patch.txt
>
>
> I have an app that needs to access multiple HBase tables concurrently.  The current HClient can only have one table open at a time even though it caches region servers of multiple tables as they are looked up.
> This means that my application layer must open multiple HClients, one per table, perhaps caching those HClients in a pool to reuse them (and their cached table data) as appropriate.
> or
> Shall I write an HClient patch that makes the HClient  multi-table thread-safe?
> Jim's suggestion is to implement an HClient singleton (call it HClientManager?) that does the actual caching/resync of root/meta regions.  Individual HClients will still be one table, one update row at a time but will rely on the singleton for the cached table info.  We want HClients to be created and disposed as fast as possible with a minimum of meta lookups.
> Jim, what about non-root/meta regions, shouldn't they be cached and refreshed via the singleton also?  It may still be possible that a region split/resync will occur during on HClient session so does the HClientManager need to be able to notify the corresponding HClients in that event?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1528) HClient for multiple tables

Hudson (Jira)
In reply to this post by Hudson (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-1528:
----------------------------------

    Status: In Progress  (was: Patch Available)

Fix test errors

> HClient for multiple tables
> ---------------------------
>
>                 Key: HADOOP-1528
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1528
>             Project: Hadoop
>          Issue Type: Task
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: James Kennedy
>            Assignee: Jim Kellerman
>             Fix For: 0.15.0
>
>         Attachments: HConnection.patch, patch.txt
>
>
> I have an app that needs to access multiple HBase tables concurrently.  The current HClient can only have one table open at a time even though it caches region servers of multiple tables as they are looked up.
> This means that my application layer must open multiple HClients, one per table, perhaps caching those HClients in a pool to reuse them (and their cached table data) as appropriate.
> or
> Shall I write an HClient patch that makes the HClient  multi-table thread-safe?
> Jim's suggestion is to implement an HClient singleton (call it HClientManager?) that does the actual caching/resync of root/meta regions.  Individual HClients will still be one table, one update row at a time but will rely on the singleton for the cached table info.  We want HClients to be created and disposed as fast as possible with a minimum of meta lookups.
> Jim, what about non-root/meta regions, shouldn't they be cached and refreshed via the singleton also?  It may still be possible that a region split/resync will occur during on HClient session so does the HClientManager need to be able to notify the corresponding HClients in that event?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

123