parse-zip Nutch 2.x compatibility?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

parse-zip Nutch 2.x compatibility?

Michael Chen
Dear all,

I was trying to parse .xml.gz sitemaps with Nutch 2.x, but couldn't
build the parse-zip plugin. parse-ext, parse-swf and feed also failed to
build. It seems to be a known issue (NUTCH-874) and is marked for
version 2.5.

Is there a workaround to parse gunzipped files? Is the porting of these
plugins under active development?

Thank you!

Michael

Reply | Threaded
Open this post in threaded view
|

Re: parse-zip Nutch 2.x compatibility?

Michael Chen
Maybe with processGzippedXML() from Crawler-Commons? Is this possible?

Thanks,

Michael


On 08/01/2017 05:21 PM, Michael Chen wrote:

> Dear all,
>
> I was trying to parse .xml.gz sitemaps with Nutch 2.x, but couldn't
> build the parse-zip plugin. parse-ext, parse-swf and feed also failed
> to build. It seems to be a known issue (NUTCH-874) and is marked for
> version 2.5.
>
> Is there a workaround to parse gunzipped files? Is the porting of
> these plugins under active development?
>
> Thank you!
>
> Michael
>