[jira] Created: (NUTCH-169) remove static NutchConf

classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (NUTCH-169) remove static NutchConf

Nick Burch (Jira)
remove static NutchConf
-----------------------

         Key: NUTCH-169
         URL: http://issues.apache.org/jira/browse/NUTCH-169
     Project: Nutch
        Type: Improvement
    Reporter: Stefan Groschupf
    Priority: Critical
     Fix For: 0.8-dev


Removing the static NutchConf.get is required for a set of improvements and new features.
+ it allows a better integration of nutch in j2ee or other systems.
+ it allows the management of nutch from a web based gui (a kind of nutch appliance) which will improve the usability and also increase the user acceptance of nutch
+ it allows to change configuration properties until runtime
+ it allows to implement NutchConf as a abstract class or interface to provide other configuration value sources than xml files. (community request)


--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-169) remove static NutchConf

Nick Burch (Jira)
     [ http://issues.apache.org/jira/browse/NUTCH-169?page=all ]

Stefan Groschupf updated NUTCH-169:
-----------------------------------

    Attachment: nutchConf.patch

The patch was created by Marko Bauhardt with some  help from me, so full credits to Marko!
It remove any access of nutchConf via the static method 'get'. Therefore some API was changed to pass a instance of the NutchConf down the call stack to be available in all required objects.
For performance reasons the PluginRepository is now cached in the nutchConf. The Repository will not be serialized and re-instantiated as soon it is requested.  
The complete test suite is passing with this patch, only Jerome's new HTML protocol need to port to the new NutchConf API also.
Jerome mentioned that he will do this. (Thanks)  
I would be great if we can bring this patch somehow soon to the svn, since this is just the first step to a nutch gui.
 

> remove static NutchConf
> -----------------------
>
>          Key: NUTCH-169
>          URL: http://issues.apache.org/jira/browse/NUTCH-169
>      Project: Nutch
>         Type: Improvement
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: nutchConf.patch
>
> Removing the static NutchConf.get is required for a set of improvements and new features.
> + it allows a better integration of nutch in j2ee or other systems.
> + it allows the management of nutch from a web based gui (a kind of nutch appliance) which will improve the usability and also increase the user acceptance of nutch
> + it allows to change configuration properties until runtime
> + it allows to implement NutchConf as a abstract class or interface to provide other configuration value sources than xml files. (community request)

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-169) remove static NutchConf

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-169?page=comments#action_12362334 ]

Stefan Groschupf commented on NUTCH-169:
----------------------------------------

I missed to mentioned that is the first version just for discussing and provide Jerome the changed API it is not the final version!!!
 

> remove static NutchConf
> -----------------------
>
>          Key: NUTCH-169
>          URL: http://issues.apache.org/jira/browse/NUTCH-169
>      Project: Nutch
>         Type: Improvement
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: nutchConf.patch
>
> Removing the static NutchConf.get is required for a set of improvements and new features.
> + it allows a better integration of nutch in j2ee or other systems.
> + it allows the management of nutch from a web based gui (a kind of nutch appliance) which will improve the usability and also increase the user acceptance of nutch
> + it allows to change configuration properties until runtime
> + it allows to implement NutchConf as a abstract class or interface to provide other configuration value sources than xml files. (community request)

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

fetch of XXX failed with: java.lang.ClassCastException: java.util.ArrayList

Gal Nitzan

Hi,

I traced it to ParseData line 147.

      UTF8.writeString(out, (String) e.getKey());
      UTF8.writeString(out, (String) e.getValue());


 it seems that Set-Cookie key comes with a ArrayList value?





Reply | Threaded
Open this post in threaded view
|

Re: fetch of XXX failed with: java.lang.ClassCastException: java.util.ArrayList

Doug Cutting-2
Gal Nitzan wrote:
> I traced it to ParseData line 147.
>
>       UTF8.writeString(out, (String) e.getKey());
>       UTF8.writeString(out, (String) e.getValue());
>
>
>  it seems that Set-Cookie key comes with a ArrayList value?

I think that was fixed yesterday by Andrzej.

http://svn.apache.org/viewcvs?rev=367251&view=rev

Doug
Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-169) remove static NutchConf

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
     [ http://issues.apache.org/jira/browse/NUTCH-169?page=all ]

Jerome Charron updated NUTCH-169:
---------------------------------

    Attachment: NutchConf.Http.060111.patch

Attached is the patch for http related classes (lib-http, protocol-http and protocol-httpclient).
Pfou, Stefan, it was a huge work since a lot of code was static and use the static NutchConf !!!
;-)

But it is ok and it works (with a patch to the Fetcher that I will submit just after).
Please notice, that it is a raw version, and it probably needs a full review after commit.

> remove static NutchConf
> -----------------------
>
>          Key: NUTCH-169
>          URL: http://issues.apache.org/jira/browse/NUTCH-169
>      Project: Nutch
>         Type: Improvement
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: NutchConf.Http.060111.patch, nutchConf.patch
>
> Removing the static NutchConf.get is required for a set of improvements and new features.
> + it allows a better integration of nutch in j2ee or other systems.
> + it allows the management of nutch from a web based gui (a kind of nutch appliance) which will improve the usability and also increase the user acceptance of nutch
> + it allows to change configuration properties until runtime
> + it allows to implement NutchConf as a abstract class or interface to provide other configuration value sources than xml files. (community request)

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-169) remove static NutchConf

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
     [ http://issues.apache.org/jira/browse/NUTCH-169?page=all ]

Jerome Charron updated NUTCH-169:
---------------------------------

    Attachment: NutchConf.Fetcher.060111.patch

Same as the one provided in Stefan patch + the Fetcher set the NutchConf to protocol.
Not sure it is the right way: it could be better that the ProtocolFactory set the NutchConf to protocols.
???

> remove static NutchConf
> -----------------------
>
>          Key: NUTCH-169
>          URL: http://issues.apache.org/jira/browse/NUTCH-169
>      Project: Nutch
>         Type: Improvement
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: NutchConf.Fetcher.060111.patch, NutchConf.Http.060111.patch, nutchConf.patch
>
> Removing the static NutchConf.get is required for a set of improvements and new features.
> + it allows a better integration of nutch in j2ee or other systems.
> + it allows the management of nutch from a web based gui (a kind of nutch appliance) which will improve the usability and also increase the user acceptance of nutch
> + it allows to change configuration properties until runtime
> + it allows to implement NutchConf as a abstract class or interface to provide other configuration value sources than xml files. (community request)

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-169) remove static NutchConf

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-169?page=comments#action_12362393 ]

Stefan Groschupf commented on NUTCH-169:
----------------------------------------

Great! Thanks a lot  Jerome!!! We will continue to fix some smaller bugs we introduced and JobConf related issue and hopefully can provide a first 'real' version soon. However feedback is already welcome.


> remove static NutchConf
> -----------------------
>
>          Key: NUTCH-169
>          URL: http://issues.apache.org/jira/browse/NUTCH-169
>      Project: Nutch
>         Type: Improvement
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: NutchConf.Fetcher.060111.patch, NutchConf.Http.060111.patch, nutchConf.patch
>
> Removing the static NutchConf.get is required for a set of improvements and new features.
> + it allows a better integration of nutch in j2ee or other systems.
> + it allows the management of nutch from a web based gui (a kind of nutch appliance) which will improve the usability and also increase the user acceptance of nutch
> + it allows to change configuration properties until runtime
> + it allows to implement NutchConf as a abstract class or interface to provide other configuration value sources than xml files. (community request)

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-169) remove static NutchConf

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
     [ http://issues.apache.org/jira/browse/NUTCH-169?page=all ]

Jerome Charron updated NUTCH-169:
---------------------------------

    Attachment: NutchConf.RegexURLFilter.060111.patch

This patch is a merge of the version provided in Stefan's patch and the last changes committed by Doug (use JDK regexp).

> remove static NutchConf
> -----------------------
>
>          Key: NUTCH-169
>          URL: http://issues.apache.org/jira/browse/NUTCH-169
>      Project: Nutch
>         Type: Improvement
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: NutchConf.Fetcher.060111.patch, NutchConf.Http.060111.patch, NutchConf.RegexURLFilter.060111.patch, nutchConf.patch
>
> Removing the static NutchConf.get is required for a set of improvements and new features.
> + it allows a better integration of nutch in j2ee or other systems.
> + it allows the management of nutch from a web based gui (a kind of nutch appliance) which will improve the usability and also increase the user acceptance of nutch
> + it allows to change configuration properties until runtime
> + it allows to implement NutchConf as a abstract class or interface to provide other configuration value sources than xml files. (community request)

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-169) remove static NutchConf

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-169?page=comments#action_12362396 ]

Jerome Charron commented on NUTCH-169:
--------------------------------------

Notes:

1. The more I look to this solution, the more I think the major problem will be to ensure the propagation of a changed conf to all the impacted classes.
2. Does all properties should be dynamic?

> remove static NutchConf
> -----------------------
>
>          Key: NUTCH-169
>          URL: http://issues.apache.org/jira/browse/NUTCH-169
>      Project: Nutch
>         Type: Improvement
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: NutchConf.Fetcher.060111.patch, NutchConf.Http.060111.patch, NutchConf.RegexURLFilter.060111.patch, nutchConf.patch
>
> Removing the static NutchConf.get is required for a set of improvements and new features.
> + it allows a better integration of nutch in j2ee or other systems.
> + it allows the management of nutch from a web based gui (a kind of nutch appliance) which will improve the usability and also increase the user acceptance of nutch
> + it allows to change configuration properties until runtime
> + it allows to implement NutchConf as a abstract class or interface to provide other configuration value sources than xml files. (community request)

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-169) remove static NutchConf

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-169?page=comments#action_12362438 ]

Andrzej Bialecki  commented on NUTCH-169:
-----------------------------------------

Overall, good work! I have some comments regarding the details:

* I wonder what is the performance impact of this patch - in many places, where previously we used the static methods on classes initialized once per JVM lifetime, now we instantiate multiple instances of heavyweight objects like NutchConf, PluginRepository etc... I guess we'll see. ;-)

* the use of the CACHE field in filters (e.g. in QueryFilters, URLFilters, IndexingFilters) and factories (e.g. ProtocolFactory) troubles me, because there is very little chance we ever benefit from using this CACHE - please note that now e.g. QueryFilters are instantiated and discarded many times during one task, so caching filter instances doesn't help because the CACHE is discarded too. Perhaps the caching of instances of QueryFilters inside NutchConf (like you do now with PluginRepository) could solve this.

* there are some spurious or duplicate import statements, this needs to be cleaned up.

* there is one very strange import from Ant, in Content.java. This needs to be removed.

* there is one use of the old deprecated API getExtentens() (I know, the original code used that, but it's a good moment to replace it).

* please observe the coding style (whitespace and formatting). Nutch uses the Sun Coding Style. The patch is somewhat sloppy in this regard, there are missing or superfluous spaces (especially where the "static" qualifier was removed), non-aligned indents, commented out old code, strange line breaks on short lines, etc. Even if this is not essential for the functionaliy, it is still important for further maintenance, so please clean this up.

* for overridden setConf/getConf, is there any point to add the non-javadoc comments? I suggest to skip them altogether, they only clutter the source. The methods are obvious, and the javadoc will be copied from the interface javadocs.

Other than that, looks great. :-)

> remove static NutchConf
> -----------------------
>
>          Key: NUTCH-169
>          URL: http://issues.apache.org/jira/browse/NUTCH-169
>      Project: Nutch
>         Type: Improvement
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: NutchConf.Fetcher.060111.patch, NutchConf.Http.060111.patch, NutchConf.RegexURLFilter.060111.patch, nutchConf.patch
>
> Removing the static NutchConf.get is required for a set of improvements and new features.
> + it allows a better integration of nutch in j2ee or other systems.
> + it allows the management of nutch from a web based gui (a kind of nutch appliance) which will improve the usability and also increase the user acceptance of nutch
> + it allows to change configuration properties until runtime
> + it allows to implement NutchConf as a abstract class or interface to provide other configuration value sources than xml files. (community request)

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-169) remove static NutchConf

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-169?page=comments#action_12362447 ]

Stefan Groschupf commented on NUTCH-169:
----------------------------------------

>I wonder what is the performance impact of this patch - in many places, where previously we used the static methods on classes initialized once per JVM
>lifetime, now we instantiate multiple instances of heavyweight objects like NutchConf, PluginRepository etc... I guess we'll see. ;-)
The plan is to instantiate the nutchConf only once for each tool process, so only once if you start the fetch toll. This one instance will than passed down the complete call stack. If you see that the concept is broken let us know, it is so much code that we may have overseen different things. A very specific case is the JobConf in a distributed environement, since the jobconf need to be instantiated on each tasktracker (once per jvm) again.

Since the PluginRepository is now cached in the nutchConf it should be also only instantiated once per JVm. So theoretically we instantiate NutchConf and PluginRepository as often as we already had done before, that is why we changed that many API to pass the nutchconf instance down to all required objects.
As mentioned my we missed somethting.


>* the use of the CACHE field in filters (e.g. in QueryFilters, URLFilters, IndexingFilters) and factories (e.g. ProtocolFactory) troubles me, because there is
>very little chance we ever benefit from using this CACHE - please note that now e.g. QueryFilters are instantiated and discarded many times during one
>task, so caching filter instances doesn't help because the CACHE is discarded too. Perhaps the caching of instances of QueryFilters inside NutchConf (like
>you do now with PluginRepository) could solve this.

Oh, that is defintily a mistake if we agree in general that we can use the nutchConf also as cache I would say that is a very good suggestion. If other agree we will change this.

>* there are some spurious or duplicate import statements, this needs to be cleaned up.
There are millions of unused import statements in every object, we can clean up this by just one key pressure, but this will touch a lot of classes, should we do that with this patch alos?  

>* there is one very strange import from Ant, in Content.java. This needs to be removed.
A mistake, we will remove it.

>* there is one use of the old deprecated API getExtentens() (I know, the original code used that, but it's a good moment to replace it).
Will be fixed.

>* please observe the coding style (whitespace and formatting). Nutch uses the Sun Coding Style. The patch is somewhat sloppy in this regard, there are
>missing or superfluous spaces (especially where the "static" qualifier was removed), non-aligned indents, commented out old code, strange line breaks on
>short lines, etc. Even if this is not essential for the functionaliy, it is still important for further maintenance, so please clean this up.
Funny, we found the coding style in some places not Sun standard conform but had done it exactly as it was and was using Spaces to match the existing code style and formating.  

>* for overridden setConf/getConf, is there any point to add the non-javadoc comments? I suggest to skip them altogether, they only clutter the source. The
>methods are obvious, and the javadoc will be copied from the interface javadocs.
We will fix this also.

Thanks for your comments.

> remove static NutchConf
> -----------------------
>
>          Key: NUTCH-169
>          URL: http://issues.apache.org/jira/browse/NUTCH-169
>      Project: Nutch
>         Type: Improvement
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: NutchConf.Fetcher.060111.patch, NutchConf.Http.060111.patch, NutchConf.RegexURLFilter.060111.patch, nutchConf.patch
>
> Removing the static NutchConf.get is required for a set of improvements and new features.
> + it allows a better integration of nutch in j2ee or other systems.
> + it allows the management of nutch from a web based gui (a kind of nutch appliance) which will improve the usability and also increase the user acceptance of nutch
> + it allows to change configuration properties until runtime
> + it allows to implement NutchConf as a abstract class or interface to provide other configuration value sources than xml files. (community request)

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-169) remove static NutchConf

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
     [ http://issues.apache.org/jira/browse/NUTCH-169?page=all ]

Stefan Groschupf updated NUTCH-169:
-----------------------------------

    Attachment: NutchConf.367837.patch

Next preview of the nutchConf patch. First this patch still need a lot of cleanup but we would love already share the patch, in the hope to get more feedback and poeple that test this patch.
We now use nutchConf as a cache e.g. QueryFilters.
Also we patched all JobConf related code in a similar manner as NutchConf. There are some case where I'm not that happy with the result but we was trying to change as less as possible and we may later can do some more refactorings.
We tested this patch and looks like everything is running, but since the change is that complex we may need to run some more tests, please help with that.


> remove static NutchConf
> -----------------------
>
>          Key: NUTCH-169
>          URL: http://issues.apache.org/jira/browse/NUTCH-169
>      Project: Nutch
>         Type: Improvement
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: NutchConf.367837.patch, NutchConf.Fetcher.060111.patch, NutchConf.Http.060111.patch, NutchConf.RegexURLFilter.060111.patch, nutchConf.patch
>
> Removing the static NutchConf.get is required for a set of improvements and new features.
> + it allows a better integration of nutch in j2ee or other systems.
> + it allows the management of nutch from a web based gui (a kind of nutch appliance) which will improve the usability and also increase the user acceptance of nutch
> + it allows to change configuration properties until runtime
> + it allows to implement NutchConf as a abstract class or interface to provide other configuration value sources than xml files. (community request)

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-169) remove static NutchConf

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-169?page=comments#action_12363114 ]

Andrzej Bialecki  commented on NUTCH-169:
-----------------------------------------

I noticed only minor issues in this patch, here's the list (all line numbers below refer to the line numbers in the patch file, not in the original files):

* a general comment: plugins now implement NutchConfigurable, which means that you had to add two new methods, looking exactly the same, to many classes. That's why the NutchConfigured class was created. I suggest replacing "implements NutchConfigurable" with "extends NutchConfigured" where appropriate.

* 1094: I think a better place to set the current config on a protocol instance is inside the ProtocolFactory.getProtocol() - because now the factory itself is instantiated with an instance of nutchConf, so it keeps a reference to that config.

* 1256: what is this constructor for? I think only the public constructor is used.

* 1311: please replace getExtentens() with getExtensions()

* 1346, 1375: these classes should be static, I think

* 1542, 1570: should be static

* 1546: which file? neither NutchDocumentTokenizer.java nor NutchDocumentAnalyzer.java are generated.

* 1903,8476,10136: I wonder, shouldn't we cache these in nutchConf?

* 3154, 3650: what's the point of this line? it was already determined that there is nothing useful there... this line exists also in other similar facades.

* 3627: typo, should be indexingFilters.

* 3718, 6514: I think it would be better to create filters once, and keep them around.

* 5020: is this an intentional change??

* 6467: I think this change is an error.

* 6651: I don't understand this comment...

* 6777: should be static

* 7045: shouldn't we store these filters too, like all other filters, in nutchConf?

* 7132: I think we can cache CLIENT in nutchConf too.

* 7782: either we should remove this, or use caching in nutchConf.

* 10638: local variable overshadows a superclass variable.

I noticed also millions of places with bad formatting, introduced by this patch. Please fix this issue, otherwise it looks very sloppy  - here's a non-exhaustive list of badly formatted places that I spotted:

* 1337 and following, inside CommonGrams.java: spurious whitespace, bad formatting

* 1510-1539, 1748-1772, 1796, 1896-1907, 2556-2582, 2880, 3124-3160, 3207, 3211-3217, 4295,
4405,4657,4848,5343,5493,6566,6806-6822,6872,7295,7404,7441,7503,7540,7644,7680,7720,
7859,7896,7964,7977,8011,8049,8214,8226,8244,8280,8456,8471,9045,9162,9227,9261,9323,9342,
9380,9403,9580,9627,9677,9702,9779,9816,9820,9863,9871,9944,9961,10045,10130,10394,10415,
11079,11129: inconsistent indenting, should be 2 spaces. Some missing whitespace.

* 1613, 1629, 1691, 2262, 2515, 2687, 2861, 3244, 3510, 3774, 3929, 4010, 4157, 4273,
4491,6578,6831,6840,6867,6900,6932,6956,6972,6981,7045,7065,7084,7140,7169,7357,7477,
7882,9171,9195,9204,9765,9910: whitespace

* 1659, 2905,7404,7503,7540,7644,7683,7728,7890,7972,8015,8053,8151,8393,8471,8614,8741,8799,9005,
9049,9221,9342,9403,9589,9627,9702,9779,9820,9871,9961,10130,10410,10672,10765,
11129: non-javadoc generated comments should be removed

* 2241, 2498, 2534-2538, 4244,6176,6798,7096,7256-7264: junk

All of this stuff would have to be eventually cleaned up by someone, so why not keep high standards from the start and not letting it in? I prefer to put my "whitespace police" hat now, and ravage you some, so that you clean it up before we commit it. Thank you for your co-operation ;-)

> remove static NutchConf
> -----------------------
>
>          Key: NUTCH-169
>          URL: http://issues.apache.org/jira/browse/NUTCH-169
>      Project: Nutch
>         Type: Improvement
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: NutchConf.367837.patch, NutchConf.Fetcher.060111.patch, NutchConf.Http.060111.patch, NutchConf.RegexURLFilter.060111.patch, nutchConf.patch
>
> Removing the static NutchConf.get is required for a set of improvements and new features.
> + it allows a better integration of nutch in j2ee or other systems.
> + it allows the management of nutch from a web based gui (a kind of nutch appliance) which will improve the usability and also increase the user acceptance of nutch
> + it allows to change configuration properties until runtime
> + it allows to implement NutchConf as a abstract class or interface to provide other configuration value sources than xml files. (community request)

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-169) remove static NutchConf

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-169?page=comments#action_12363116 ]

Stefan Groschupf commented on NUTCH-169:
----------------------------------------

Thanks, we will fix this in the beginning of next week.

> remove static NutchConf
> -----------------------
>
>          Key: NUTCH-169
>          URL: http://issues.apache.org/jira/browse/NUTCH-169
>      Project: Nutch
>         Type: Improvement
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: NutchConf.367837.patch, NutchConf.Fetcher.060111.patch, NutchConf.Http.060111.patch, NutchConf.RegexURLFilter.060111.patch, nutchConf.patch
>
> Removing the static NutchConf.get is required for a set of improvements and new features.
> + it allows a better integration of nutch in j2ee or other systems.
> + it allows the management of nutch from a web based gui (a kind of nutch appliance) which will improve the usability and also increase the user acceptance of nutch
> + it allows to change configuration properties until runtime
> + it allows to implement NutchConf as a abstract class or interface to provide other configuration value sources than xml files. (community request)

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-169) remove static NutchConf

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
     [ http://issues.apache.org/jira/browse/NUTCH-169?page=all ]

Marko Bauhardt updated NUTCH-169:
---------------------------------

    Attachment: NutchConf.370854.patch

> * a general comment: plugins now implement NutchConfigurable, which means that you had to add two new methods,
> looking exactly the same, to many classes. That's why the NutchConfigured class was created.
> I suggest replacing "implements NutchConfigurable" with "extends NutchConfigured" where appropriate.

In any case i have to implement two methods in the one hand I have to implement set and getConf in the other hand I have to overwrite a constructor and a setConf.
Since in many cases a constructor wouldn't helpful i decide to use the interface.
In general it may would be make sense to have interface or an abstract class that has just a configure method nothing more.

> * 1094: I think a better place to set the current config on a protocol instance is inside the ProtocolFactory.getProtocol()
> because now the factory itself is instantiated with an instance of nutchConf, so it keeps a reference to that config.

Done.

> * 1256: what is this constructor for? I think only the public constructor is used.

My mistake, fixed


> * 1311: please replace getExtentens() with getExtensions()
> * 1346, 1375: these classes should be static, I think
> * 1542, 1570: should be static

Done.

> * 1903,8476,10136: I wonder, shouldn't we cache these in nutchConf?

1903: Done.
8476,10136: we cache PluginRepository and not the extensions itself. From my point of view in general to move caching or object recycling to the tools that use the extensions / objects but not cache the object in it self.

> * 3154, 3650: what's the point of this line? it was already determined that there is nothing useful there... this line exists also in other similar facades.

In case the cache is empty I fill the cache inside the if condition in line 3150. So get the freshly cached values to assign them to the field.


> * 3627: typo, should be indexingFilters.
Done.



> * 3718, 6514: I think it would be better to create filters once, and keep them around.
> * 5020: is this an intentional change??
> * 6467: I think this change is an error.
> * 6651: I don't understand this comment...
> * 6777: should be static
> * 7045: shouldn't we store these filters too, like all other filters, in nutchConf?
> * 7132: I think we can cache CLIENT in nutchConf too.

Fixed.

> * 7782: either we should remove this, or use caching in nutchConf.

Done, I removed this.

> * 10638: local variable overshadows a superclass variable.

Done.


> * 1337 and following, inside CommonGrams.java: spurious whitespace, bad formatting
> * 1510-1539, 1748-1772, 1796, 1896-1907, 2556-2582, 2880, 3124-3160, 3207, 3211-3217, 4295,
> 4405,4657,4848,5343,5493,6566,6806-6822,6872,7295,7404,7441,7503,7540,7644,7680,7720,
> 7859,7896,7964,7977,8011,8049,8214,8226,8244,8280,8456,8471,9045,9162,9227,9261,9323,9342,
> 9380,9403,9580,9627,9677,9702,9779,9816,9820,9863,9871,9944,9961,10045,10130,10394,10415,
> 11079,11129: inconsistent indenting, should be 2 spaces. Some missing whitespace.
> * 1613, 1629, 1691, 2262, 2515, 2687, 2861, 3244, 3510, 3774, 3929, 4010, 4157, 4273,
> 4491,6578,6831,6840,6867,6900,6932,6956,6972,6981,7045,7065,7084,7140,7169,7357,7477,
> 7882,9171,9195,9204,9765,9910: whitespace
> * 1659, 2905,7404,7503,7540,7644,7683,7728,7890,7972,8015,8053,8151,8393,8471,8614,8741,8799,9005,
> 9049,9221,9342,9403,9589,9627,9702,9779,9820,9871,9961,10130,10410,10672,10765,
> 11129: non-javadoc generated comments should be removed
> * 2241, 2498, 2534-2538, 4244,6176,6798,7096,7256-7264: junk

i had done my best, to get all of this fixed.


I fixed also some other problems beside these you mentioned. Anyway, the test suite, the crawl-process and the search runs local and in the ndfs successfully for me.
Anyway it is a really big thing so please test it again.

Thanks, Marko


> remove static NutchConf
> -----------------------
>
>          Key: NUTCH-169
>          URL: http://issues.apache.org/jira/browse/NUTCH-169
>      Project: Nutch
>         Type: Improvement
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: NutchConf.367837.patch, NutchConf.370854.patch, NutchConf.Fetcher.060111.patch, NutchConf.Http.060111.patch, NutchConf.RegexURLFilter.060111.patch, nutchConf.patch
>
> Removing the static NutchConf.get is required for a set of improvements and new features.
> + it allows a better integration of nutch in j2ee or other systems.
> + it allows the management of nutch from a web based gui (a kind of nutch appliance) which will improve the usability and also increase the user acceptance of nutch
> + it allows to change configuration properties until runtime
> + it allows to implement NutchConf as a abstract class or interface to provide other configuration value sources than xml files. (community request)

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Assigned: (NUTCH-169) remove static NutchConf

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
     [ http://issues.apache.org/jira/browse/NUTCH-169?page=all ]

Andrzej Bialecki  reassigned NUTCH-169:
---------------------------------------

    Assign To: Andrzej Bialecki

> remove static NutchConf
> -----------------------
>
>          Key: NUTCH-169
>          URL: http://issues.apache.org/jira/browse/NUTCH-169
>      Project: Nutch
>         Type: Improvement
>     Reporter: Stefan Groschupf
>     Assignee: Andrzej Bialecki
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: NutchConf.367837.patch, NutchConf.370854.patch, NutchConf.Fetcher.060111.patch, NutchConf.Http.060111.patch, NutchConf.RegexURLFilter.060111.patch, nutchConf.patch
>
> Removing the static NutchConf.get is required for a set of improvements and new features.
> + it allows a better integration of nutch in j2ee or other systems.
> + it allows the management of nutch from a web based gui (a kind of nutch appliance) which will improve the usability and also increase the user acceptance of nutch
> + it allows to change configuration properties until runtime
> + it allows to implement NutchConf as a abstract class or interface to provide other configuration value sources than xml files. (community request)

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-169) remove static NutchConf

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-169?page=comments#action_12364544 ]

Andrzej Bialecki  commented on NUTCH-169:
-----------------------------------------

This patch looks good! If there are no further objections, I'll test it and commit it within the next 12 hours.

> remove static NutchConf
> -----------------------
>
>          Key: NUTCH-169
>          URL: http://issues.apache.org/jira/browse/NUTCH-169
>      Project: Nutch
>         Type: Improvement
>     Reporter: Stefan Groschupf
>     Assignee: Andrzej Bialecki
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: NutchConf.367837.patch, NutchConf.370854.patch, NutchConf.Fetcher.060111.patch, NutchConf.Http.060111.patch, NutchConf.RegexURLFilter.060111.patch, nutchConf.patch
>
> Removing the static NutchConf.get is required for a set of improvements and new features.
> + it allows a better integration of nutch in j2ee or other systems.
> + it allows the management of nutch from a web based gui (a kind of nutch appliance) which will improve the usability and also increase the user acceptance of nutch
> + it allows to change configuration properties until runtime
> + it allows to implement NutchConf as a abstract class or interface to provide other configuration value sources than xml files. (community request)

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Closed: (NUTCH-169) remove static NutchConf

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
     [ http://issues.apache.org/jira/browse/NUTCH-169?page=all ]
     
Andrzej Bialecki  closed NUTCH-169:
-----------------------------------

    Resolution: Fixed

Patches applied, with some changes (mostly whitespace related). Thank you!

> remove static NutchConf
> -----------------------
>
>          Key: NUTCH-169
>          URL: http://issues.apache.org/jira/browse/NUTCH-169
>      Project: Nutch
>         Type: Improvement
>     Reporter: Stefan Groschupf
>     Assignee: Andrzej Bialecki
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: NutchConf.367837.patch, NutchConf.370854.patch, NutchConf.Fetcher.060111.patch, NutchConf.Http.060111.patch, NutchConf.RegexURLFilter.060111.patch, nutchConf.patch
>
> Removing the static NutchConf.get is required for a set of improvements and new features.
> + it allows a better integration of nutch in j2ee or other systems.
> + it allows the management of nutch from a web based gui (a kind of nutch appliance) which will improve the usability and also increase the user acceptance of nutch
> + it allows to change configuration properties until runtime
> + it allows to implement NutchConf as a abstract class or interface to provide other configuration value sources than xml files. (community request)

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (NUTCH-169) remove static NutchConf

Sami Siren
In reply to this post by Nick Burch (Jira)
Andrzej Bialecki (JIRA) wrote:
>     [ http://issues.apache.org/jira/browse/NUTCH-169?page=comments#action_12364544 ]
>
> Andrzej Bialecki  commented on NUTCH-169:
> -----------------------------------------
>
> This patch looks good! If there are no further objections, I'll test it and commit it within the next 12 hours.

I quess I am a bit late with this...

The goal of these changes is very good but I don't like the idea of
duplicating identical code (implementing the interface NutchConfigure)
instead of inheritance (extending NutchConfigured) in so many places.

--
  Sami Siren
12