Is there any way to use a hdfs file as a Circular buffer?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Is there any way to use a hdfs file as a Circular buffer?

Wukang Lin
Hi all,
   Is there any way to use a hdfs file as a Circular buffer? I mean, if I set a quotas to a directory on hdfs, and writting data to a file in that directory continuously. Once the quotas exceeded, I can redirect the writter and write the data from the beginning of the file automatically .
Reply | Threaded
Open this post in threaded view
|

Re: Is there any way to use a hdfs file as a Circular buffer?

Bertrand Dechoux
Regardless of the quota, you can't write over a existing file (except by removing the file and adding a new file with the same name). But if you are only worried about the quota, you simply need to rotate your files. However, I don't know about any 'hdfs logrotate' like tool.

Bertrand


On Wed, Jul 24, 2013 at 6:10 PM, Wukang Lin <[hidden email]> wrote:
Hi all,
   Is there any way to use a hdfs file as a Circular buffer? I mean, if I set a quotas to a directory on hdfs, and writting data to a file in that directory continuously. Once the quotas exceeded, I can redirect the writter and write the data from the beginning of the file automatically .


Reply | Threaded
Open this post in threaded view
|

Re: Is there any way to use a hdfs file as a Circular buffer?

Niels Basjes
In reply to this post by Wukang Lin

A circular file on hdfs is not possible.

Some of the ways around this limitation:
- Create a series of files and delete the oldest file when you have too much.
- Put the data into an hbase table and do something similar.
- Use completely different technology like mongodb which has built in support for a circular buffer (capped collection).

Niels

Hi all,
   Is there any way to use a hdfs file as a Circular buffer? I mean, if I set a quotas to a directory on hdfs, and writting data to a file in that directory continuously. Once the quotas exceeded, I can redirect the writter and write the data from the beginning of the file automatically .
Reply | Threaded
Open this post in threaded view
|

Re: Is there any way to use a hdfs file as a Circular buffer?

Wukang Lin
Hi Niels and Bertrand,
    Thank you for you great advices.
    In our scenario, we need to store a steady stream of binary data into a circular storage,throughput and concurrency are the most important indicators.The first way seems work, but as  hdfs is not friendly for small files, this approche may be not smooth enough.HBase is good, but  not appropriate for us, both for throughput and storage.mongodb is quite good for web applications, but not suitable the scenario we meet all the same.
    we need a distributed storage system,with Highe throughput, HA,LB and secure. Maybe It act much like hbase, manager a lot of small file(hfile) as a large region. we manager a lot of small file as a large one. Perhaps we should develop it by ourselives.

Thank you.
Lin Wukang


2013/7/25 Niels Basjes <[hidden email]>

A circular file on hdfs is not possible.

Some of the ways around this limitation:
- Create a series of files and delete the oldest file when you have too much.
- Put the data into an hbase table and do something similar.
- Use completely different technology like mongodb which has built in support for a circular buffer (capped collection).

Niels

Hi all,
   Is there any way to use a hdfs file as a Circular buffer? I mean, if I set a quotas to a directory on hdfs, and writting data to a file in that directory continuously. Once the quotas exceeded, I can redirect the writter and write the data from the beginning of the file automatically .

Reply | Threaded
Open this post in threaded view
|

Re: Is there any way to use a hdfs file as a Circular buffer?

shekhar sharma
Use CEP tool like Esper and Storm, you will be able to achieve that
...I can give you more inputs if you can provide me more details of what you are trying to achieve
Regards,
Som Shekhar Sharma
+91-8197243810


On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin <[hidden email]> wrote:
Hi Niels and Bertrand,
    Thank you for you great advices.
    In our scenario, we need to store a steady stream of binary data into a circular storage,throughput and concurrency are the most important indicators.The first way seems work, but as  hdfs is not friendly for small files, this approche may be not smooth enough.HBase is good, but  not appropriate for us, both for throughput and storage.mongodb is quite good for web applications, but not suitable the scenario we meet all the same.
    we need a distributed storage system,with Highe throughput, HA,LB and secure. Maybe It act much like hbase, manager a lot of small file(hfile) as a large region. we manager a lot of small file as a large one. Perhaps we should develop it by ourselives.

Thank you.
Lin Wukang


2013/7/25 Niels Basjes <[hidden email]>

A circular file on hdfs is not possible.

Some of the ways around this limitation:
- Create a series of files and delete the oldest file when you have too much.
- Put the data into an hbase table and do something similar.
- Use completely different technology like mongodb which has built in support for a circular buffer (capped collection).

Niels

Hi all,
   Is there any way to use a hdfs file as a Circular buffer? I mean, if I set a quotas to a directory on hdfs, and writting data to a file in that directory continuously. Once the quotas exceeded, I can redirect the writter and write the data from the beginning of the file automatically .


Reply | Threaded
Open this post in threaded view
|

Re: Is there any way to use a hdfs file as a Circular buffer?

Wukang Lin
Hi Shekhar,
    Thank you for your replies.So far as I know, Storm is a distributed computing framework, but what we need is a storage system, high throughput and concurrency is matters.We have thousands of devices, each device will produce a steady stream of brinary data. The space for every device is fixed, so their should reuse the space on the disk.So, how can storm or esper achieve that?

Many Thanks
Lin Wukang


2013/8/8 Shekhar Sharma <[hidden email]>
Use CEP tool like Esper and Storm, you will be able to achieve that
...I can give you more inputs if you can provide me more details of what you are trying to achieve
Regards,
Som Shekhar Sharma
+91-8197243810


On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin <[hidden email]> wrote:
Hi Niels and Bertrand,
    Thank you for you great advices.
    In our scenario, we need to store a steady stream of binary data into a circular storage,throughput and concurrency are the most important indicators.The first way seems work, but as  hdfs is not friendly for small files, this approche may be not smooth enough.HBase is good, but  not appropriate for us, both for throughput and storage.mongodb is quite good for web applications, but not suitable the scenario we meet all the same.
    we need a distributed storage system,with Highe throughput, HA,LB and secure. Maybe It act much like hbase, manager a lot of small file(hfile) as a large region. we manager a lot of small file as a large one. Perhaps we should develop it by ourselives.

Thank you.
Lin Wukang


2013/7/25 Niels Basjes <[hidden email]>

A circular file on hdfs is not possible.

Some of the ways around this limitation:
- Create a series of files and delete the oldest file when you have too much.
- Put the data into an hbase table and do something similar.
- Use completely different technology like mongodb which has built in support for a circular buffer (capped collection).

Niels

Hi all,
   Is there any way to use a hdfs file as a Circular buffer? I mean, if I set a quotas to a directory on hdfs, and writting data to a file in that directory continuously. Once the quotas exceeded, I can redirect the writter and write the data from the beginning of the file automatically .



Reply | Threaded
Open this post in threaded view
|

Re: Is there any way to use a hdfs file as a Circular buffer?

Adam Faris
If every device can send it's information as a 'event', you could use a publish-subscribe messaging system like Apache Kafka. (http://kafka.apache.org/)  Kafka is designed to self-manage it's storage by saving the last 'n-events' of data, acting like a circular buffer.  The device would publish it's binary data to Kafka and Hadoop would act as a subscriber to Kafka by consuming events.   If you need a scheduler to make Hadoop process the Kafka events look at Azkaban as it supports both scheduling and job dependencies. (http://azkaban.github.io/azkaban2/)  

Remember Hadoop is batch processing so reports won't happen in real time.   If you need to run reports in real-time, watch the Samza project which uses Yarn and Kafka to process real-time streaming data. (http://incubator.apache.org/projects/samza.html)

On Aug 7, 2013, at 9:59 AM, Wukang Lin <[hidden email]> wrote:

> Hi Shekhar,
>     Thank you for your replies.So far as I know, Storm is a distributed computing framework, but what we need is a storage system, high throughput and concurrency is matters.We have thousands of devices, each device will produce a steady stream of brinary data. The space for every device is fixed, so their should reuse the space on the disk.So, how can storm or esper achieve that?
>
> Many Thanks
> Lin Wukang
>
>
> 2013/8/8 Shekhar Sharma <[hidden email]>
> Use CEP tool like Esper and Storm, you will be able to achieve that
> ...I can give you more inputs if you can provide me more details of what you are trying to achieve
> Regards,
> Som Shekhar Sharma
> +91-8197243810
>
>
> On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin <[hidden email]> wrote:
> Hi Niels and Bertrand,
>     Thank you for you great advices.
>     In our scenario, we need to store a steady stream of binary data into a circular storage,throughput and concurrency are the most important indicators.The first way seems work, but as  hdfs is not friendly for small files, this approche may be not smooth enough.HBase is good, but  not appropriate for us, both for throughput and storage.mongodb is quite good for web applications, but not suitable the scenario we meet all the same.
>     we need a distributed storage system,with Highe throughput, HA,LB and secure. Maybe It act much like hbase, manager a lot of small file(hfile) as a large region. we manager a lot of small file as a large one. Perhaps we should develop it by ourselives.
>
> Thank you.
> Lin Wukang
>
>
> 2013/7/25 Niels Basjes <[hidden email]>
> A circular file on hdfs is not possible.
>
> Some of the ways around this limitation:
> - Create a series of files and delete the oldest file when you have too much.
> - Put the data into an hbase table and do something similar.
> - Use completely different technology like mongodb which has built in support for a circular buffer (capped collection).
>
> Niels
>
> Hi all,
>    Is there any way to use a hdfs file as a Circular buffer? I mean, if I set a quotas to a directory on hdfs, and writting data to a file in that directory continuously. Once the quotas exceeded, I can redirect the writter and write the data from the beginning of the file automatically .
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Is there any way to use a hdfs file as a Circular buffer?

Sandy Ryza
Hi Lin,

It might be worth checking out Apache Flume, which was built for highly parallel ingest into HDFS.

-Sandy


On Thu, Aug 15, 2013 at 11:16 AM, Adam Faris <[hidden email]> wrote:
If every device can send it's information as a 'event', you could use a publish-subscribe messaging system like Apache Kafka. (http://kafka.apache.org/)  Kafka is designed to self-manage it's storage by saving the last 'n-events' of data, acting like a circular buffer.  The device would publish it's binary data to Kafka and Hadoop would act as a subscriber to Kafka by consuming events.   If you need a scheduler to make Hadoop process the Kafka events look at Azkaban as it supports both scheduling and job dependencies. (http://azkaban.github.io/azkaban2/)

Remember Hadoop is batch processing so reports won't happen in real time.   If you need to run reports in real-time, watch the Samza project which uses Yarn and Kafka to process real-time streaming data. (http://incubator.apache.org/projects/samza.html)

On Aug 7, 2013, at 9:59 AM, Wukang Lin <[hidden email]> wrote:

> Hi Shekhar,
>     Thank you for your replies.So far as I know, Storm is a distributed computing framework, but what we need is a storage system, high throughput and concurrency is matters.We have thousands of devices, each device will produce a steady stream of brinary data. The space for every device is fixed, so their should reuse the space on the disk.So, how can storm or esper achieve that?
>
> Many Thanks
> Lin Wukang
>
>
> 2013/8/8 Shekhar Sharma <[hidden email]>
> Use CEP tool like Esper and Storm, you will be able to achieve that
> ...I can give you more inputs if you can provide me more details of what you are trying to achieve
> Regards,
> Som Shekhar Sharma
> <a href="tel:%2B91-8197243810" value="+918197243810">+91-8197243810
>
>
> On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin <[hidden email]> wrote:
> Hi Niels and Bertrand,
>     Thank you for you great advices.
>     In our scenario, we need to store a steady stream of binary data into a circular storage,throughput and concurrency are the most important indicators.The first way seems work, but as  hdfs is not friendly for small files, this approche may be not smooth enough.HBase is good, but  not appropriate for us, both for throughput and storage.mongodb is quite good for web applications, but not suitable the scenario we meet all the same.
>     we need a distributed storage system,with Highe throughput, HA,LB and secure. Maybe It act much like hbase, manager a lot of small file(hfile) as a large region. we manager a lot of small file as a large one. Perhaps we should develop it by ourselives.
>
> Thank you.
> Lin Wukang
>
>
> 2013/7/25 Niels Basjes <[hidden email]>
> A circular file on hdfs is not possible.
>
> Some of the ways around this limitation:
> - Create a series of files and delete the oldest file when you have too much.
> - Put the data into an hbase table and do something similar.
> - Use completely different technology like mongodb which has built in support for a circular buffer (capped collection).
>
> Niels
>
> Hi all,
>    Is there any way to use a hdfs file as a Circular buffer? I mean, if I set a quotas to a directory on hdfs, and writting data to a file in that directory continuously. Once the quotas exceeded, I can redirect the writter and write the data from the beginning of the file automatically .
>
>
>
>