Uncompressing SEQ files from cmdline

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Uncompressing SEQ files from cmdline

brainstorm-2-2
How can I easily uncompress a downloaded file from HDFS ? Does anyone
have any java snippet on this ?

SEQ^F^Yorg.apache.hadoop.io.Text!org.apache.nutch.crawl.CrawlDatum^@^@^@^@^@^@���^?^NGy�\~~K�^\!^W^@^@^@<^@^@^@^_^^http://-jackal.deviantarrt.com/
Reply | Threaded
Open this post in threaded view
|

Re: Uncompressing SEQ files from cmdline

Dennis Kubes-2
While some sequence files may be compressed, they are binary not text
formats.  You would need to use a MR job to output the values to
TextOutputFormat.

Dennis

brainstorm wrote:
> How can I easily uncompress a downloaded file from HDFS ? Does anyone
> have any java snippet on this ?
>
> SEQ^F^Yorg.apache.hadoop.io.Text!org.apache.nutch.crawl.CrawlDatum^@^@^@^@^@^@���^?^NGy�\~~K�^\!^W^@^@^@<^@^@^@^_^^http://-jackal.deviantarrt.com/
Reply | Threaded
Open this post in threaded view
|

Re: Uncompressing SEQ files from cmdline

Andrzej Białecki-2
Dennis Kubes wrote:
> While some sequence files may be compressed, they are binary not text
> formats.  You would need to use a MR job to output the values to
> TextOutputFormat.

See also this issue https://issues.apache.org/jira/browse/HADOOP-175 .


--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com