[jira] Created: (NUTCH-99) ports are hardcoded or random

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (NUTCH-99) ports are hardcoded or random

Sebastian Nagel (Jira)
ports are hardcoded or random
-----------------------------

         Key: NUTCH-99
         URL: http://issues.apache.org/jira/browse/NUTCH-99
     Project: Nutch
        Type: Bug
    Versions: 0.8-dev    
    Reporter: Stefan Groschupf
    Priority: Critical
     Fix For: 0.8-dev


Ports of tasktracker are random and the port of the datanode is hardcoded to 7000 as strting port.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-99) ports are hardcoded or random

Sebastian Nagel (Jira)
     [ http://issues.apache.org/jira/browse/NUTCH-99?page=all ]

Stefan Groschupf updated NUTCH-99:
----------------------------------

    Attachment: port_patch.txt

This patch make the port of datanode and tasktracker configurable in the nutch-default.xml.
I changed as less as possible code,  to be sure this patch can be assigned as fast as possible to the sources, since for me this is a critical issue.



> ports are hardcoded or random
> -----------------------------
>
>          Key: NUTCH-99
>          URL: http://issues.apache.org/jira/browse/NUTCH-99
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: port_patch.txt
>
> Ports of tasktracker are random and the port of the datanode is hardcoded to 7000 as strting port.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-99) ports are hardcoded or random

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)
     [ http://issues.apache.org/jira/browse/NUTCH-99?page=all ]

Stefan Groschupf updated NUTCH-99:
----------------------------------

    Attachment: port_patch_02.txt

I notice there are no tests for  ndfs and mapreduce trackers, so the test suite was running after patching the sources. But my manually tests today fails.
So the last patch is just for the trash. Sorry!
Find attached a patch that at least pass my manually tests.
Now the port for jetty is configurable as well, the code already had trying to load a value from the configuration file, but the node in nutch-default.xml was missing.
Beside that I organize some how the ports that are default used a bit.
I suggested following ports:
ndfs namenode 7000
ndfs data node 7010
mr job tracker 7020
mr job tracker info port 7025
mr task tracker output 7030
mr task tracker report 7040

Well, now all of this ports should be configurable individually.




> ports are hardcoded or random
> -----------------------------
>
>          Key: NUTCH-99
>          URL: http://issues.apache.org/jira/browse/NUTCH-99
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: port_patch.txt, port_patch_02.txt
>
> Ports of tasktracker are random and the port of the datanode is hardcoded to 7000 as strting port.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-99) ports are hardcoded or random

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-99?page=comments#action_12331220 ]

Doug Cutting commented on NUTCH-99:
-----------------------------------

I like  the cleanup of the port numbers.  And removing the use of random port numbers may make some network administrators happy.  But switching from random to fixed ports in the TaskTracker means that only a single task tracker can be run at a time.  Currently I frequently find it useful to debug things by running multiple task trackers on a single box.

So we need to either loop, trying a range of port numbers, or switch back to random allocation, or both (since random allocations may collide).

According to the IANA, we should be able to randomly allocate stuff in 49152-65535.  But that still could make folks upset who wish to set up restrictive firewalls.




> ports are hardcoded or random
> -----------------------------
>
>          Key: NUTCH-99
>          URL: http://issues.apache.org/jira/browse/NUTCH-99
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: port_patch.txt, port_patch_02.txt
>
> Ports of tasktracker are random and the port of the datanode is hardcoded to 7000 as strting port.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-99) ports are hardcoded or random

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-99?page=comments#action_12331224 ]

Stefan Groschupf commented on NUTCH-99:
---------------------------------------

OK, make sense.
Do you prefer command line args for the ports for this 'lets search for a port' code?
I personal would prefer command line args.


> ports are hardcoded or random
> -----------------------------
>
>          Key: NUTCH-99
>          URL: http://issues.apache.org/jira/browse/NUTCH-99
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: port_patch.txt, port_patch_02.txt
>
> Ports of tasktracker are random and the port of the datanode is hardcoded to 7000 as strting port.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-99) ports are hardcoded or random

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-99?page=comments#action_12331225 ]

Doug Cutting commented on NUTCH-99:
-----------------------------------

What command line would you add this to?  I think this should simply start at the default port (e.g., 7030) and loop trying port+1 until BindException is not thrown.  A message should be logged for each failure.

> ports are hardcoded or random
> -----------------------------
>
>          Key: NUTCH-99
>          URL: http://issues.apache.org/jira/browse/NUTCH-99
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: port_patch.txt, port_patch_02.txt
>
> Ports of tasktracker are random and the port of the datanode is hardcoded to 7000 as strting port.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-99) ports are hardcoded or random

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)
     [ http://issues.apache.org/jira/browse/NUTCH-99?page=all ]

Stefan Groschupf updated NUTCH-99:
----------------------------------

    Attachment: port_patch_03.txt

As discussed,  tasktracker iterates now until it is finding a free port, starting  with a configurable port from nutch-default.xml. Fails will be logged.
Only ports higher than  49152 are used.
Hope this patch match the requirements to be submitted.

> ports are hardcoded or random
> -----------------------------
>
>          Key: NUTCH-99
>          URL: http://issues.apache.org/jira/browse/NUTCH-99
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: port_patch.txt, port_patch_02.txt, port_patch_03.txt
>
> Ports of tasktracker are random and the port of the datanode is hardcoded to 7000 as strting port.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-99) ports are hardcoded or random

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-99?page=comments#action_12356853 ]

Stefan Groschupf commented on NUTCH-99:
---------------------------------------

Is there anything I can improve so one of the developers commit this patch into the svn?
Thanks in case one of the people with svn write access can commit this.




> ports are hardcoded or random
> -----------------------------
>
>          Key: NUTCH-99
>          URL: http://issues.apache.org/jira/browse/NUTCH-99
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: port_patch.txt, port_patch_02.txt, port_patch_03.txt
>
> Ports of tasktracker are random and the port of the datanode is hardcoded to 7000 as strting port.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-99) ports are hardcoded or random

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-99?page=comments#action_12357291 ]

Doug Cutting commented on NUTCH-99:
-----------------------------------

I cannot get patch on linux to accept this. The absolute DOS paths seem to cause problems.  Can you please regenerate this with relative paths?  Generating it on linux would also be preferable, as patch also has problems with EOL differences.

Also, ndfs.datanode.port would be a better name for that property.

And catching Exception is overkill.  This should be java.net.BindException, no?


> ports are hardcoded or random
> -----------------------------
>
>          Key: NUTCH-99
>          URL: http://issues.apache.org/jira/browse/NUTCH-99
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: port_patch.txt, port_patch_02.txt, port_patch_03.txt
>
> Ports of tasktracker are random and the port of the datanode is hardcoded to 7000 as strting port.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-99) ports are hardcoded or random

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-99?page=comments#action_12357409 ]

Stefan Groschupf commented on NUTCH-99:
---------------------------------------

I'm not sure what you are meaning with catching Exception is overkill.
In case the try to open a server on this port fails a axception is thrown and I have to catch this since I will iterate to a higher port number and try it again instead of exit the methdo my since a BindException is thrown.

 

> ports are hardcoded or random
> -----------------------------
>
>          Key: NUTCH-99
>          URL: http://issues.apache.org/jira/browse/NUTCH-99
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: port_patch.txt, port_patch_02.txt, port_patch_03.txt
>
> Ports of tasktracker are random and the port of the datanode is hardcoded to 7000 as strting port.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-99) ports are hardcoded or random

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)
     [ http://issues.apache.org/jira/browse/NUTCH-99?page=all ]

Stefan Groschupf updated NUTCH-99:
----------------------------------

    Attachment:  port_patch_04.txt

+ rename the property as requested
+ regenerate the patch with relative paths on a unix based system :-)
+ as commented the exception catching is necessary and is also done in other exsting code that iterates to find a free port.

THANKS DOUG FOR TAKING THE TIME to get this into the sources!

> ports are hardcoded or random
> -----------------------------
>
>          Key: NUTCH-99
>          URL: http://issues.apache.org/jira/browse/NUTCH-99
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments:  port_patch_04.txt, port_patch.txt, port_patch_02.txt, port_patch_03.txt
>
> Ports of tasktracker are random and the port of the datanode is hardcoded to 7000 as strting port.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-99) ports are hardcoded or random

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-99?page=comments#action_12357614 ]

Piotr Kosiorowski commented on NUTCH-99:
----------------------------------------

I think Doug meant that we should have:
} catch (BindException e) {
instead of  generic:
} catch (Exception e) {

And I agree with such suggestiom.

If such change is ok with you - Stefan - I can change this one small thing myself and commit it  - as I am starting to use  mapreduce branch.



> ports are hardcoded or random
> -----------------------------
>
>          Key: NUTCH-99
>          URL: http://issues.apache.org/jira/browse/NUTCH-99
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments:  port_patch_04.txt, port_patch.txt, port_patch_02.txt, port_patch_03.txt
>
> Ports of tasktracker are random and the port of the datanode is hardcoded to 7000 as strting port.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-99) ports are hardcoded or random

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-99?page=comments#action_12357616 ]

Stefan Groschupf commented on NUTCH-99:
---------------------------------------

SURE! That is absolutly ok for me!
Thanks a lot Piotr!!!!



> ports are hardcoded or random
> -----------------------------
>
>          Key: NUTCH-99
>          URL: http://issues.apache.org/jira/browse/NUTCH-99
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments:  port_patch_04.txt, port_patch.txt, port_patch_02.txt, port_patch_03.txt
>
> Ports of tasktracker are random and the port of the datanode is hardcoded to 7000 as strting port.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-99) ports are hardcoded or random

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-99?page=comments#action_12357617 ]

Doug Cutting commented on NUTCH-99:
-----------------------------------

Sounds good.  We should also probably note in the config property descriptions that these port numbers are the first in a range that will be tried.


> ports are hardcoded or random
> -----------------------------
>
>          Key: NUTCH-99
>          URL: http://issues.apache.org/jira/browse/NUTCH-99
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments:  port_patch_04.txt, port_patch.txt, port_patch_02.txt, port_patch_03.txt
>
> Ports of tasktracker are random and the port of the datanode is hardcoded to 7000 as strting port.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Closed: (NUTCH-99) ports are hardcoded or random

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)
     [ http://issues.apache.org/jira/browse/NUTCH-99?page=all ]
     
Piotr Kosiorowski closed NUTCH-99:
----------------------------------

    Resolution: Fixed

Patch committed. Thanks Stefan.


> ports are hardcoded or random
> -----------------------------
>
>          Key: NUTCH-99
>          URL: http://issues.apache.org/jira/browse/NUTCH-99
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments:  port_patch_04.txt, port_patch.txt, port_patch_02.txt, port_patch_03.txt
>
> Ports of tasktracker are random and the port of the datanode is hardcoded to 7000 as strting port.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira