[jira] Commented: (NUTCH-30) rss feed parser

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-30) rss feed parser

David Eric Pugh (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-30?page=comments#action_12316928 ]

Michael Nebel commented on NUTCH-30:
------------------------------------

I loaded the latest sources from the svn yesterday and tried to integrate this plugin (I used the Zip from Hasan) . I found:

- getParse throws a ParseException which isn't supported by getParse
- the call to new ParseData needs a new parameter "ParseStatus"

My fixes are far from perfect (I just identified the problems by now), so I'm not creating a patch. :-(

> rss feed parser
> ---------------
>
>          Key: NUTCH-30
>          URL: http://issues.apache.org/jira/browse/NUTCH-30
>      Project: Nutch
>         Type: Improvement
>   Components: fetcher
>     Reporter: Stefan Grroschupf
>     Assignee: Chris A. Mattmann
>     Priority: Minor
>  Attachments: RSSParserPatch.txt, RSS_Parser.zip, parse-rss-1.0-040605.zip, parse-rss-patch.txt, parse-rss-srcbin-incl-path.zip, parse-rss.zip, parseRss.zip
>
> A simple rss feed parser supporting:
> rss and atom:
> + version 0.3
> +  version 09
> + version 10
> + version 20
> Converting of different rss versions  is done via xslt.
> The xslt was contributed by Frank Henze - Thanks!

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (NUTCH-30) rss feed parser

chrismattmann
Hi Michael,

 The zip from Hasan  isn't the working version of the parser. He tried to
upgrade some of my old code from the old net.nutch.* style package names to
the org.apache.* ones, but I had already performed that work. The working
version of the parser is the 5th attachment, titled:
"parse-rss-srcbin-incl-path.zip", uploaded on 17/Aprl/05, at 9:42 PM. This
was discussed back in April/May on the Nutch list, so you may have missed
that conversation. Here is a direct link to the parser that I uploaded:

http://issues.apache.org/jira/secure/attachment/19661/parse-rss-srcbin-incl-
path.zip

Here is a link to a page where you can see the different attachments and
upload dates:

http://issues.apache.org/jira/secure/ManageAttachments.jspa?id=31220

One thing to note is that my plugin was pre-Andrzej's updates to the
protocol plugins and the parser code, so it may need to be updated to work
with the latest Nutch SVN. I have the parse-rss plugin currently managed in
a local CVS of mine, so I will download the latest SVN of Nutch and see if
it works with it, and if an updated patch is needed, I can take care of
that.

Another thing to note is that Andrzej was working with me on getting this
plugin included in the source, but that was before he left for Vacation, so
we may have to wait till he gets back before we make any progress on
commiting it...


Cheers,
  Chris



On 7/27/05 8:42 AM, "Michael Nebel (JIRA)" <[hidden email]> wrote:

>     [
> http://issues.apache.org/jira/browse/NUTCH-30?page=comments#action_12316928 ]
>
> Michael Nebel commented on NUTCH-30:
> ------------------------------------
>
> I loaded the latest sources from the svn yesterday and tried to integrate this
> plugin (I used the Zip from Hasan) . I found:
>
> - getParse throws a ParseException which isn't supported by getParse
> - the call to new ParseData needs a new parameter "ParseStatus"
>
> My fixes are far from perfect (I just identified the problems by now), so I'm
> not creating a patch. :-(
>
>> rss feed parser
>> ---------------
>>
>>          Key: NUTCH-30
>>          URL: http://issues.apache.org/jira/browse/NUTCH-30
>>      Project: Nutch
>>         Type: Improvement
>>   Components: fetcher
>>     Reporter: Stefan Grroschupf
>>     Assignee: Chris A. Mattmann
>>     Priority: Minor
>>  Attachments: RSSParserPatch.txt, RSS_Parser.zip, parse-rss-1.0-040605.zip,
>> parse-rss-patch.txt, parse-rss-srcbin-incl-path.zip, parse-rss.zip,
>> parseRss.zip
>>
>> A simple rss feed parser supporting:
>> rss and atom:
>> + version 0.3
>> +  version 09
>> + version 10
>> + version 20
>> Converting of different rss versions  is done via xslt.
>> The xslt was contributed by Frank Henze - Thanks!

______________________________________________
Chris A. Mattmann
[hidden email]
Staff Member
Modeling and Data Management Systems Section (387)
Data Management Systems and Technologies Group
 
_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                        Mailstop:  171-246

_______________________________________________________
 
Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.
 
 



Reply | Threaded
Open this post in threaded view
|

RE: [jira] Commented: (NUTCH-30) rss feed parser

chrismattmann
In reply to this post by David Eric Pugh (Jira)
Hi Folks,
 
  I response to Michael's comment, I've went ahead and uploaded a working
patch and an updated patch and source distribution for the parse-rss plugin.
The latest patch and source work against the new protocol and parsing APIs
by Andrzej. The patch was made against the latest SVN from 73005.

The patch and source distro are zipped up in the file: parse-rss-73005.zip.
Here is a direct link:
http://issues.apache.org/jira/secure/attachment/12311475/parse-rss-73005.zip


Thanks!

Cheers,
  Chris Mattmann
______________________________________________
Chris A. Mattmann
[hidden email]
Staff Member
Modeling and Data Management Systems Section (387)
Data Management Systems and Technologies Group

_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                        Mailstop:  171-246
Phone:  818-354-8810
_______________________________________________________

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.

> -----Original Message-----
> From: Michael Nebel (JIRA) [mailto:[hidden email]]
> Sent: Wednesday, July 27, 2005 8:42 AM
> To: [hidden email]
> Subject: [jira] Commented: (NUTCH-30) rss feed parser
>
>     [ http://issues.apache.org/jira/browse/NUTCH-
> 30?page=comments#action_12316928 ]
>
> Michael Nebel commented on NUTCH-30:
> ------------------------------------
>
> I loaded the latest sources from the svn yesterday and tried to integrate
> this plugin (I used the Zip from Hasan) . I found:
>
> - getParse throws a ParseException which isn't supported by getParse
> - the call to new ParseData needs a new parameter "ParseStatus"
>
> My fixes are far from perfect (I just identified the problems by now), so
> I'm not creating a patch. :-(
>
> > rss feed parser
> > ---------------
> >
> >          Key: NUTCH-30
> >          URL: http://issues.apache.org/jira/browse/NUTCH-30
> >      Project: Nutch
> >         Type: Improvement
> >   Components: fetcher
> >     Reporter: Stefan Grroschupf
> >     Assignee: Chris A. Mattmann
> >     Priority: Minor
> >  Attachments: RSSParserPatch.txt, RSS_Parser.zip, parse-rss-1.0-
> 040605.zip, parse-rss-patch.txt, parse-rss-srcbin-incl-path.zip, parse-
> rss.zip, parseRss.zip
> >
> > A simple rss feed parser supporting:
> > rss and atom:
> > + version 0.3
> > +  version 09
> > + version 10
> > + version 20
> > Converting of different rss versions  is done via xslt.
> > The xslt was contributed by Frank Henze - Thanks!
>
> --
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the administrators:
>    http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see:
>    http://www.atlassian.com/software/jira