[jira] Created: (NUTCH-196) lib-xml and lib-log4j plugins

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (NUTCH-196) lib-xml and lib-log4j plugins

Michael Gibney (Jira)
lib-xml and lib-log4j plugins
-----------------------------

         Key: NUTCH-196
         URL: http://issues.apache.org/jira/browse/NUTCH-196
     Project: Nutch
        Type: Improvement
    Versions: 0.8-dev    
    Reporter: Andrzej Bialecki
 Assigned to: Andrzej Bialecki  


Many places in Nutch use XML. Parsing XML using the JDK API is painful. I propose to add one (or more) library plugins with JDOM, DOM4J, Jaxen, etc. This should simplify the current deployment, and help plugin writers to use the existing API.

Similarly, many plugins use log4j. Either we add it to the /lib, or we could create a lib-log4j plugin.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-196) lib-xml and lib-log4j plugins

Michael Gibney (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-196?page=comments#action_12364840 ]

Doug Cutting commented on NUTCH-196:
------------------------------------

I think we should try to limit Nutch's core code to a single XML parser and logging API.  I chose those built-in to the JDK when starting out.  Do you propose that we move core code to a different APIs?  Apache's commons-logging might be a better choice for a standard logging API.

> lib-xml and lib-log4j plugins
> -----------------------------
>
>          Key: NUTCH-196
>          URL: http://issues.apache.org/jira/browse/NUTCH-196
>      Project: Nutch
>         Type: Improvement
>     Versions: 0.8-dev
>     Reporter: Andrzej Bialecki
>     Assignee: Andrzej Bialecki

>
> Many places in Nutch use XML. Parsing XML using the JDK API is painful. I propose to add one (or more) library plugins with JDOM, DOM4J, Jaxen, etc. This should simplify the current deployment, and help plugin writers to use the existing API.
> Similarly, many plugins use log4j. Either we add it to the /lib, or we could create a lib-log4j plugin.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-196) lib-xml and lib-log4j plugins

Michael Gibney (Jira)
In reply to this post by Michael Gibney (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-196?page=comments#action_12364861 ]

Andrzej Bialecki  commented on NUTCH-196:
-----------------------------------------

I don't think it's necessary for the core to use anything else than the standard XML APIs. I specifically meant the plugins environment.

Sometimes it's not possible to select a single API, e.g. the RSS parser uses JDOM as a dependency of FeedParser; the PDF parser uses log4j as a dependency of PDFBox, etc. All I'm proposing is to avoid putting the same libraries in many places, and to make it easier for plugin developers to use other, more flexible APIs than the core XML API (since we already have the necessary libraries in the tree, but now they are not reusable).

> lib-xml and lib-log4j plugins
> -----------------------------
>
>          Key: NUTCH-196
>          URL: http://issues.apache.org/jira/browse/NUTCH-196
>      Project: Nutch
>         Type: Improvement
>     Versions: 0.8-dev
>     Reporter: Andrzej Bialecki
>     Assignee: Andrzej Bialecki

>
> Many places in Nutch use XML. Parsing XML using the JDK API is painful. I propose to add one (or more) library plugins with JDOM, DOM4J, Jaxen, etc. This should simplify the current deployment, and help plugin writers to use the existing API.
> Similarly, many plugins use log4j. Either we add it to the /lib, or we could create a lib-log4j plugin.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-196) lib-xml and lib-log4j plugins

Michael Gibney (Jira)
In reply to this post by Michael Gibney (Jira)
     [ http://issues.apache.org/jira/browse/NUTCH-196?page=all ]

Jerome Charron updated NUTCH-196:
---------------------------------

    Attachment: NUTCH-196.lib-log4j.patch

My two cents with this patch that:
  * provides a lib-log4j plugin (base on log4j 1.2.11)
  * remove log4j jars from parse-rss, clustering-carrots2 and parse-pdf plugins
  * add a dependency on the lib-log4j plugin in parse-rss, clustering-carrots2 and parse-pdf

Please notice that unit tests are ok, but I didn't test this patch with the impacted plugins (parse-rss, clustering-carrots3 and parse-pdf) in a real life case.
So, If someone has the occasion to perform such functional tests, please confirm that all is ok or not with this patch, so that I can commit it or review it.

Jérôme


> lib-xml and lib-log4j plugins
> -----------------------------
>
>          Key: NUTCH-196
>          URL: http://issues.apache.org/jira/browse/NUTCH-196
>      Project: Nutch
>         Type: Improvement
>     Versions: 0.8-dev
>     Reporter: Andrzej Bialecki
>     Assignee: Andrzej Bialecki
>  Attachments: NUTCH-196.lib-log4j.patch
>
> Many places in Nutch use XML. Parsing XML using the JDK API is painful. I propose to add one (or more) library plugins with JDOM, DOM4J, Jaxen, etc. This should simplify the current deployment, and help plugin writers to use the existing API.
> Similarly, many plugins use log4j. Either we add it to the /lib, or we could create a lib-log4j plugin.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-196) lib-xml and lib-log4j plugins

Michael Gibney (Jira)
In reply to this post by Michael Gibney (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-196?page=comments#action_12366066 ]

ilango gurusamy commented on NUTCH-196:
---------------------------------------

Hi Andrej
What improvements or features do you feel can go into this plugin?

ilango

> lib-xml and lib-log4j plugins
> -----------------------------
>
>          Key: NUTCH-196
>          URL: http://issues.apache.org/jira/browse/NUTCH-196
>      Project: Nutch
>         Type: Improvement
>     Versions: 0.8-dev
>     Reporter: Andrzej Bialecki
>     Assignee: Andrzej Bialecki
>  Attachments: NUTCH-196.lib-log4j.patch
>
> Many places in Nutch use XML. Parsing XML using the JDK API is painful. I propose to add one (or more) library plugins with JDOM, DOM4J, Jaxen, etc. This should simplify the current deployment, and help plugin writers to use the existing API.
> Similarly, many plugins use log4j. Either we add it to the /lib, or we could create a lib-log4j plugin.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-196) lib-xml and lib-log4j plugins

Michael Gibney (Jira)
In reply to this post by Michael Gibney (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-196?page=comments#action_12366487 ]

Jerome Charron commented on NUTCH-196:
--------------------------------------

lib-log4j committed : http://svn.apache.org/viewcvs?rev=378011&view=rev

> lib-xml and lib-log4j plugins
> -----------------------------
>
>          Key: NUTCH-196
>          URL: http://issues.apache.org/jira/browse/NUTCH-196
>      Project: Nutch
>         Type: Improvement
>     Versions: 0.8-dev
>     Reporter: Andrzej Bialecki
>     Assignee: Andrzej Bialecki
>  Attachments: NUTCH-196.lib-log4j.patch
>
> Many places in Nutch use XML. Parsing XML using the JDK API is painful. I propose to add one (or more) library plugins with JDOM, DOM4J, Jaxen, etc. This should simplify the current deployment, and help plugin writers to use the existing API.
> Similarly, many plugins use log4j. Either we add it to the /lib, or we could create a lib-log4j plugin.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-196) lib-xml and lib-log4j plugins

Michael Gibney (Jira)
In reply to this post by Michael Gibney (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-196?page=comments#action_12366613 ]

Jerome Charron commented on NUTCH-196:
--------------------------------------

lib-commons-httpclient committed : http://svn.apache.org/viewcvs?rev=378214&view=rev

> lib-xml and lib-log4j plugins
> -----------------------------
>
>          Key: NUTCH-196
>          URL: http://issues.apache.org/jira/browse/NUTCH-196
>      Project: Nutch
>         Type: Improvement
>     Versions: 0.8-dev
>     Reporter: Andrzej Bialecki
>     Assignee: Andrzej Bialecki
>  Attachments: NUTCH-196.lib-log4j.patch
>
> Many places in Nutch use XML. Parsing XML using the JDK API is painful. I propose to add one (or more) library plugins with JDOM, DOM4J, Jaxen, etc. This should simplify the current deployment, and help plugin writers to use the existing API.
> Similarly, many plugins use log4j. Either we add it to the /lib, or we could create a lib-log4j plugin.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-196) lib-xml and lib-log4j plugins

Michael Gibney (Jira)
In reply to this post by Michael Gibney (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-196?page=comments#action_12366618 ]

Jerome Charron commented on NUTCH-196:
--------------------------------------

lib-nekohtml committed : http://svn.apache.org/viewcvs?rev=378219&view=rev

> lib-xml and lib-log4j plugins
> -----------------------------
>
>          Key: NUTCH-196
>          URL: http://issues.apache.org/jira/browse/NUTCH-196
>      Project: Nutch
>         Type: Improvement
>     Versions: 0.8-dev
>     Reporter: Andrzej Bialecki
>     Assignee: Andrzej Bialecki
>  Attachments: NUTCH-196.lib-log4j.patch
>
> Many places in Nutch use XML. Parsing XML using the JDK API is painful. I propose to add one (or more) library plugins with JDOM, DOM4J, Jaxen, etc. This should simplify the current deployment, and help plugin writers to use the existing API.
> Similarly, many plugins use log4j. Either we add it to the /lib, or we could create a lib-log4j plugin.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Closed: (NUTCH-196) lib-xml and lib-log4j plugins

Michael Gibney (Jira)
In reply to this post by Michael Gibney (Jira)
     [ http://issues.apache.org/jira/browse/NUTCH-196?page=all ]
     
Jerome Charron closed NUTCH-196:
--------------------------------

    Fix Version: 0.8-dev
     Resolution: Fixed

Added a lib-xml that gathers many xml libraries previously used in parse-rss.
(http://svn.apache.org/viewcvs?rev=389716&view=rev)


> lib-xml and lib-log4j plugins
> -----------------------------
>
>          Key: NUTCH-196
>          URL: http://issues.apache.org/jira/browse/NUTCH-196
>      Project: Nutch
>         Type: Improvement
>     Versions: 0.8-dev
>     Reporter: Andrzej Bialecki
>     Assignee: Andrzej Bialecki
>      Fix For: 0.8-dev
>  Attachments: NUTCH-196.lib-log4j.patch
>
> Many places in Nutch use XML. Parsing XML using the JDK API is painful. I propose to add one (or more) library plugins with JDOM, DOM4J, Jaxen, etc. This should simplify the current deployment, and help plugin writers to use the existing API.
> Similarly, many plugins use log4j. Either we add it to the /lib, or we could create a lib-log4j plugin.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Closed: (NUTCH-196) lib-xml and lib-log4j plugins

Andrzej Białecki-2
Jerome Charron (JIRA) wrote:

>      [ http://issues.apache.org/jira/browse/NUTCH-196?page=all ]
>      
> Jerome Charron closed NUTCH-196:
> --------------------------------
>
>     Fix Version: 0.8-dev
>      Resolution: Fixed
>
> Added a lib-xml that gathers many xml libraries previously used in parse-rss.
> (http://svn.apache.org/viewcvs?rev=389716&view=rev)
>  

Thanks!

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com