[jira] Created: (HADOOP-1538) Provide capability for client specified time stamps in HBase

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (HADOOP-1538) Provide capability for client specified time stamps in HBase

kuladeep (Jira)
Provide capability for client specified time stamps in HBase
------------------------------------------------------------

                 Key: HADOOP-1538
                 URL: https://issues.apache.org/jira/browse/HADOOP-1538
             Project: Hadoop
          Issue Type: New Feature
          Components: contrib/hbase
    Affects Versions: 0.14.0
            Reporter: Jim Kellerman
            Priority: Minor
             Fix For: 0.14.0


Currently all records stored in HBase are given a time stamp derived from the current time. It should be possible for the client to specify the time stamp.

For example, if a web crawler is storing page contents in HBase, it should be able to set the time stamp to the crawl time which is not necessarily 'now'


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Work started: (HADOOP-1538) Provide capability for client specified time stamps in HBase

kuladeep (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on HADOOP-1538 started by Jim Kellerman.

> Provide capability for client specified time stamps in HBase
> ------------------------------------------------------------
>
>                 Key: HADOOP-1538
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1538
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>            Priority: Minor
>             Fix For: 0.14.0
>
>
> Currently all records stored in HBase are given a time stamp derived from the current time. It should be possible for the client to specify the time stamp.
> For example, if a web crawler is storing page contents in HBase, it should be able to set the time stamp to the crawl time which is not necessarily 'now'

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Assigned: (HADOOP-1538) Provide capability for client specified time stamps in HBase

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman reassigned HADOOP-1538:
-------------------------------------

    Assignee: Jim Kellerman

> Provide capability for client specified time stamps in HBase
> ------------------------------------------------------------
>
>                 Key: HADOOP-1538
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1538
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>            Priority: Minor
>             Fix For: 0.14.0
>
>
> Currently all records stored in HBase are given a time stamp derived from the current time. It should be possible for the client to specify the time stamp.
> For example, if a web crawler is storing page contents in HBase, it should be able to set the time stamp to the crawl time which is not necessarily 'now'

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1538) Provide capability for client specified time stamps in HBase

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511188 ]

Jim Kellerman commented on HADOOP-1538:
---------------------------------------

User specified time stamps will be per-update, just as automatically generated time stamps are.

If it is necessary for columns to have different user specified time stamps, they will need to be done in separate updates. When we have batched updates (HADOOP-1468), this should not entail that much overhead.


> Provide capability for client specified time stamps in HBase
> ------------------------------------------------------------
>
>                 Key: HADOOP-1538
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1538
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>            Priority: Minor
>             Fix For: 0.14.0
>
>
> Currently all records stored in HBase are given a time stamp derived from the current time. It should be possible for the client to specify the time stamp.
> For example, if a web crawler is storing page contents in HBase, it should be able to set the time stamp to the crawl time which is not necessarily 'now'

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1538) Provide capability for client specified time stamps in HBase

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511199 ]

Jim Kellerman commented on HADOOP-1538:
---------------------------------------

If you can insert a value with a specified time stamp, you should be able to scan for values with that time stamp too. This is not too bad as scanners already take a time stamp, but it is manufactured on the server side. We just need to move it to the client side and pass it across the API.

> Provide capability for client specified time stamps in HBase
> ------------------------------------------------------------
>
>                 Key: HADOOP-1538
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1538
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>            Priority: Minor
>             Fix For: 0.14.0
>
>
> Currently all records stored in HBase are given a time stamp derived from the current time. It should be possible for the client to specify the time stamp.
> For example, if a web crawler is storing page contents in HBase, it should be able to set the time stamp to the crawl time which is not necessarily 'now'

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1538) Provide capability for client specified time stamps in HBase

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511262 ]

Jim Kellerman commented on HADOOP-1538:
---------------------------------------

User specified time stamps are specified on commit. If no time stamp is specified, the current time is used as it is currently.

> Provide capability for client specified time stamps in HBase
> ------------------------------------------------------------
>
>                 Key: HADOOP-1538
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1538
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>            Priority: Minor
>             Fix For: 0.14.0
>
>
> Currently all records stored in HBase are given a time stamp derived from the current time. It should be possible for the client to specify the time stamp.
> For example, if a web crawler is storing page contents in HBase, it should be able to set the time stamp to the crawl time which is not necessarily 'now'

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1538) Provide capability for client specified time stamps in HBase

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511263 ]

Jim Kellerman commented on HADOOP-1538:
---------------------------------------

Does it make sense to have time stamped deletes?

Currently, a delete operation erases all the values for the specified column, whose time stamp is less than 'now', effectively erasing all the values for the column.

If a time stamp is specified on an update which contains a delete, then it would 'erase' all versions of the column whose time stamp <= the update time stamp.

> Provide capability for client specified time stamps in HBase
> ------------------------------------------------------------
>
>                 Key: HADOOP-1538
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1538
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>            Priority: Minor
>             Fix For: 0.14.0
>
>
> Currently all records stored in HBase are given a time stamp derived from the current time. It should be possible for the client to specify the time stamp.
> For example, if a web crawler is storing page contents in HBase, it should be able to set the time stamp to the crawl time which is not necessarily 'now'

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1538) Provide capability for client specified time stamps in HBase

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511266 ]

stack commented on HADOOP-1538:
-------------------------------

One possible use case would have a resource-constrained user running delete+timestamp to clear all data collected before a certain date to make room for the new.

> Provide capability for client specified time stamps in HBase
> ------------------------------------------------------------
>
>                 Key: HADOOP-1538
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1538
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>            Priority: Minor
>             Fix For: 0.14.0
>
>
> Currently all records stored in HBase are given a time stamp derived from the current time. It should be possible for the client to specify the time stamp.
> For example, if a web crawler is storing page contents in HBase, it should be able to set the time stamp to the crawl time which is not necessarily 'now'

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1538) Provide capability for client specified time stamps in HBase

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-1538:
----------------------------------

    Status: Patch Available  (was: In Progress)

> Provide capability for client specified time stamps in HBase
> ------------------------------------------------------------
>
>                 Key: HADOOP-1538
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1538
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>            Priority: Minor
>             Fix For: 0.14.0
>
>         Attachments: patch.txt
>
>
> Currently all records stored in HBase are given a time stamp derived from the current time. It should be possible for the client to specify the time stamp.
> For example, if a web crawler is storing page contents in HBase, it should be able to set the time stamp to the crawl time which is not necessarily 'now'

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1538) Provide capability for client specified time stamps in HBase

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-1538:
----------------------------------

    Attachment: patch.txt

Works in my environment. Make sure Hudson agrees.

> Provide capability for client specified time stamps in HBase
> ------------------------------------------------------------
>
>                 Key: HADOOP-1538
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1538
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>            Priority: Minor
>             Fix For: 0.14.0
>
>         Attachments: patch.txt
>
>
> Currently all records stored in HBase are given a time stamp derived from the current time. It should be possible for the client to specify the time stamp.
> For example, if a web crawler is storing page contents in HBase, it should be able to set the time stamp to the crawl time which is not necessarily 'now'

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1538) Provide capability for client specified time stamps in HBase

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511318 ]

Hadoop QA commented on HADOOP-1538:
-----------------------------------

+1

http://issues.apache.org/jira/secure/attachment/12361452/patch.txt applied and successfully tested against trunk revision r554144.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/382/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/382/console

> Provide capability for client specified time stamps in HBase
> ------------------------------------------------------------
>
>                 Key: HADOOP-1538
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1538
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>            Priority: Minor
>             Fix For: 0.14.0
>
>         Attachments: patch.txt
>
>
> Currently all records stored in HBase are given a time stamp derived from the current time. It should be possible for the client to specify the time stamp.
> For example, if a web crawler is storing page contents in HBase, it should be able to set the time stamp to the crawl time which is not necessarily 'now'

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1538) Provide capability for client specified time stamps in HBase

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-1538:
----------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this.

> Provide capability for client specified time stamps in HBase
> ------------------------------------------------------------
>
>                 Key: HADOOP-1538
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1538
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>            Priority: Minor
>             Fix For: 0.14.0
>
>         Attachments: patch.txt
>
>
> Currently all records stored in HBase are given a time stamp derived from the current time. It should be possible for the client to specify the time stamp.
> For example, if a web crawler is storing page contents in HBase, it should be able to set the time stamp to the crawl time which is not necessarily 'now'

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1538) Provide capability for client specified time stamps in HBase

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511400 ]

Hudson commented on HADOOP-1538:
--------------------------------

Integrated in Hadoop-Nightly #150 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/150/])

> Provide capability for client specified time stamps in HBase
> ------------------------------------------------------------
>
>                 Key: HADOOP-1538
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1538
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>            Priority: Minor
>             Fix For: 0.14.0
>
>         Attachments: patch.txt
>
>
> Currently all records stored in HBase are given a time stamp derived from the current time. It should be possible for the client to specify the time stamp.
> For example, if a web crawler is storing page contents in HBase, it should be able to set the time stamp to the crawl time which is not necessarily 'now'

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.