Hi Michael,
The zip from Hasan isn't the working version of the parser. He tried to
upgrade some of my old code from the old net.nutch.* style package names to
the org.apache.* ones, but I had already performed that work. The working
version of the parser is the 5th attachment, titled:
"parse-rss-srcbin-incl-path.zip", uploaded on 17/Aprl/05, at 9:42 PM. This
was discussed back in April/May on the Nutch list, so you may have missed
that conversation. Here is a direct link to the parser that I uploaded:
http://issues.apache.org/jira/secure/attachment/19661/parse-rss-srcbin-incl-path.zip
Here is a link to a page where you can see the different attachments and
upload dates:
http://issues.apache.org/jira/secure/ManageAttachments.jspa?id=31220One thing to note is that my plugin was pre-Andrzej's updates to the
protocol plugins and the parser code, so it may need to be updated to work
with the latest Nutch SVN. I have the parse-rss plugin currently managed in
a local CVS of mine, so I will download the latest SVN of Nutch and see if
it works with it, and if an updated patch is needed, I can take care of
that.
Another thing to note is that Andrzej was working with me on getting this
plugin included in the source, but that was before he left for Vacation, so
we may have to wait till he gets back before we make any progress on
commiting it...
Cheers,
Chris
On 7/27/05 8:42 AM, "Michael Nebel (JIRA)" <
[hidden email]> wrote:
> [
>
http://issues.apache.org/jira/browse/NUTCH-30?page=comments#action_12316928 ]
>
> Michael Nebel commented on NUTCH-30:
> ------------------------------------
>
> I loaded the latest sources from the svn yesterday and tried to integrate this
> plugin (I used the Zip from Hasan) . I found:
>
> - getParse throws a ParseException which isn't supported by getParse
> - the call to new ParseData needs a new parameter "ParseStatus"
>
> My fixes are far from perfect (I just identified the problems by now), so I'm
> not creating a patch. :-(
>
>> rss feed parser
>> ---------------
>>
>> Key: NUTCH-30
>> URL:
http://issues.apache.org/jira/browse/NUTCH-30>> Project: Nutch
>> Type: Improvement
>> Components: fetcher
>> Reporter: Stefan Grroschupf
>> Assignee: Chris A. Mattmann
>> Priority: Minor
>> Attachments: RSSParserPatch.txt, RSS_Parser.zip, parse-rss-1.0-040605.zip,
>> parse-rss-patch.txt, parse-rss-srcbin-incl-path.zip, parse-rss.zip,
>> parseRss.zip
>>
>> A simple rss feed parser supporting:
>> rss and atom:
>> + version 0.3
>> + version 09
>> + version 10
>> + version 20
>> Converting of different rss versions is done via xslt.
>> The xslt was contributed by Frank Henze - Thanks!
______________________________________________
Chris A. Mattmann
[hidden email]
Staff Member
Modeling and Data Management Systems Section (387)
Data Management Systems and Technologies Group
_________________________________________________
Jet Propulsion Laboratory Pasadena, CA
Office: 171-266B Mailstop: 171-246
_______________________________________________________
Disclaimer: The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.