Problems with a new Installation of Nutch

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Problems with a new Installation of Nutch

Tom Landvoigt
Hallo,

 

I hope someone can help me.

 

I installed nutch on 2 Amazon EC2 computers. Everything is fine but I
can't put data in the hdfs.

 

I formatted the namenode and start the hdfs with start all.

 

 All  java processes start properly, but when I want to make hadoop fs
-put something / I get these logs:

 

 

 

nutch@bla:/nutch/search> ./bin/hadoop fs -put
/tmp/hadoop-nutch-tasktracker.pid blub

put: Protocol not available

 

DATA NODE LOG on the master:

2009-12-04 12:50:15,566 INFO  http.HttpServer - Version Jetty/5.1.4

2009-12-04 12:50:15,582 INFO  util.Credential - Checking Resource
aliases

2009-12-04 12:50:16,483 INFO  util.Container - Started
org.mortbay.jetty.servlet.WebApplicationHandler@e45b5e

2009-12-04 12:50:16,614 INFO  util.Container - Started
WebApplicationContext[/static,/static]

2009-12-04 12:50:16,882 INFO  util.Container - Started
org.mortbay.jetty.servlet.WebApplicationHandler@1284fd4

2009-12-04 12:50:16,883 INFO  util.Container - Started
WebApplicationContext[/logs,/logs]

2009-12-04 12:50:17,827 INFO  util.Container - Started
org.mortbay.jetty.servlet.WebApplicationHandler@39c8c1

2009-12-04 12:50:17,849 INFO  util.Container - Started
WebApplicationContext[/,/]

2009-12-04 12:50:18,485 INFO  http.SocketListener - Started
SocketListener on 0.0.0.0:50075

2009-12-04 12:50:18,485 INFO  util.Container - Started
org.mortbay.jetty.Server@36527f

2009-12-04 12:54:20,745 ERROR datanode.DataNode -
DatanodeRegistration(10.224.113.210:50010,
storageID=DS-1135263253-10.224.113.210-50010-1259926637370,
infoPort=50075, ipcPort=50020):DataXceiver

java.io.EOFException

    at java.io.DataInputStream.readShort(DataInputStream.java:315)

    at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:
79)

    at java.lang.Thread.run(Thread.java:636)

2009-12-04 12:54:20,746 ERROR datanode.DataNode -
DatanodeRegistration(10.224.113.210:50010,
storageID=DS-1135263253-10.224.113.210-50010-1259926637370,
infoPort=50075, ipcPort=50020):DataXceiver

java.io.EOFException

    at java.io.DataInputStream.readShort(DataInputStream.java:315)

    at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:
79)

    at java.lang.Thread.run(Thread.java:636)

2009-12-04 12:54:20,747 ERROR datanode.DataNode -
DatanodeRegistration(10.224.113.210:50010,
storageID=DS-1135263253-10.224.113.210-50010-1259926637370,
infoPort=50075, ipcPort=50020):DataXceiver

java.io.EOFException

    at java.io.DataInputStream.readShort(DataInputStream.java:315)

    at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:
79)

    at java.lang.Thread.run(Thread.java:636)

2009-12-04 12:54:20,747 ERROR datanode.DataNode -
DatanodeRegistration(10.224.113.210:50010,
storageID=DS-1135263253-10.224.113.210-50010-1259926637370,
infoPort=50075, ipcPort=50020):DataXceiver

java.io.EOFException

    at java.io.DataInputStream.readShort(DataInputStream.java:315)

    at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:
79)

    at java.lang.Thread.run(Thread.java:636)

 

NAME NODE LOG

2009-12-04 12:50:11,539 INFO  http.HttpServer - Version Jetty/5.1.4

2009-12-04 12:50:11,573 INFO  util.Credential - Checking Resource
aliases

2009-12-04 12:50:12,488 INFO  util.Container - Started
org.mortbay.jetty.servlet.WebApplicationHandler@19fe451

2009-12-04 12:50:12,565 INFO  util.Container - Started
WebApplicationContext[/static,/static]

2009-12-04 12:50:12,891 INFO  util.Container - Started
org.mortbay.jetty.servlet.WebApplicationHandler@1570945

2009-12-04 12:50:12,891 INFO  util.Container - Started
WebApplicationContext[/logs,/logs]

2009-12-04 12:50:13,569 INFO  util.Container - Started
org.mortbay.jetty.servlet.WebApplicationHandler@11410e5

2009-12-04 12:50:13,582 INFO  util.Container - Started
WebApplicationContext[/,/]

2009-12-04 12:50:13,613 INFO  http.SocketListener - Started
SocketListener on 0.0.0.0:50070

2009-12-04 12:50:13,613 INFO  util.Container - Started
org.mortbay.jetty.Server@173ec72

 

SECONDARY NAMENODE LOG

2009-12-04 12:50:19,163 INFO  http.HttpServer - Version Jetty/5.1.4

2009-12-04 12:50:19,207 INFO  util.Credential - Checking Resource
aliases

2009-12-04 12:50:20,365 INFO  util.Container - Started
org.mortbay.jetty.servlet.WebApplicationHandler@174d93a

2009-12-04 12:50:20,454 INFO  util.Container - Started
WebApplicationContext[/static,/static]

2009-12-04 12:50:21,396 INFO  util.Container - Started
org.mortbay.jetty.servlet.WebApplicationHandler@31f2a7

2009-12-04 12:50:21,396 INFO  util.Container - Started
WebApplicationContext[/logs,/logs]

2009-12-04 12:50:21,533 INFO  servlet.XMLConfiguration - No
WEB-INF/web.xml in file:/mnt/nutch/nutch-1.0/webapps/secondary. Serving
files and default/dynamic servlets only

2009-12-04 12:50:22,206 INFO  util.Container - Started
org.mortbay.jetty.servlet.WebApplicationHandler@383118

2009-12-04 12:50:22,785 INFO  util.Container - Started
WebApplicationContext[/,/]

2009-12-04 12:50:22,787 INFO  http.SocketListener - Started
SocketListener on 0.0.0.0:50090

2009-12-04 12:50:22,787 INFO  util.Container - Started
org.mortbay.jetty.Server@297ffb

2009-12-04 12:50:22,787 WARN  namenode.SecondaryNameNode - Checkpoint
Period   :3600 secs (60 min)

2009-12-04 12:50:22,787 WARN  namenode.SecondaryNameNode - Log Size
Trigger    :67108864 bytes (65536 KB)

2009-12-04 12:55:23,908 WARN  namenode.SecondaryNameNode - Checkpoint
done. New Image Size: 1056

 

HADOOP LOG

2009-12-04 12:54:20,708 WARN  hdfs.DFSClient - DataStreamer Exception:
java.io.IOException: Unable to create new block.

    at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(D
FSClient.java:2722)

    at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.j
ava:1996)

    at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSCli
ent.java:2183)

 

2009-12-04 12:54:20,709 WARN  hdfs.DFSClient - Error Recovery for block
blk_5506837520665828594_1002 bad datanode[0] nodes == null

2009-12-04 12:54:20,709 WARN  hdfs.DFSClient - Could not get block
locations. Source file "/user/nutch/blub/hadoop-nutch-tasktracker.pid" -
Aborting...

 

DATA NODE LOG on the slave

2009-12-04 12:49:49,433 INFO  http.HttpServer - Version Jetty/5.1.4

2009-12-04 12:49:49,438 INFO  util.Credential - Checking Resource
aliases

2009-12-04 12:49:50,288 INFO  util.Container - Started
org.mortbay.jetty.servlet.WebApplicationHandler@e45b5e

2009-12-04 12:49:50,357 INFO  util.Container - Started
WebApplicationContext[/static,/static]

2009-12-04 12:49:50,555 INFO  util.Container - Started
org.mortbay.jetty.servlet.WebApplicationHandler@2016b0

2009-12-04 12:49:50,555 INFO  util.Container - Started
WebApplicationContext[/logs,/logs]

2009-12-04 12:49:50,816 INFO  util.Container - Started
org.mortbay.jetty.servlet.WebApplicationHandler@118278a

2009-12-04 12:49:50,820 INFO  util.Container - Started
WebApplicationContext[/,/]

2009-12-04 12:49:50,849 INFO  http.SocketListener - Started
SocketListener on 0.0.0.0:50075

2009-12-04 12:49:50,849 INFO  util.Container - Started
org.mortbay.jetty.Server@b02928

 

HADOOP SITE XML

<property>

  <name>fs.default.name</name>

  <value>hdfs://(yes here is the right ip):9000</value>

  <description>

    The name of the default file system. Either the literal string

    "local" or a host:port for NDFS.

  </description>

</property>

 

<!-- Gibt an wo der JobTracker (koordiniert die (MapReduce-)Auftraege)
zu finden ist. -->

<property>

  <name>mapred.job.tracker</name>

  <value>hdfs://(here to):9001</value>

  <description>

    The host and port that the MapReduce job tracker runs at. If

    "local", then jobs are run in-process as a single map and

    reduce task.

  </description>

</property>

 

<!-- Gibt an wieviele MapJobs gleichzeitig laufen duerfen-->

<property>

  <name>mapred.tasktracker.map.tasks.maximum</name>

  <value>2</value>

  <description>

    define mapred.map tasks to be number of slave hosts

  </description>

</property>

 

<!-- Gibt an wieviele ReduceJobs gleichzeitig laufen duerfen-->

<property>

  <name>mapred.tasktracker.reduce.tasks.maximum</name>

  <value>2</value>

  <description>

    define mapred.reduce tasks to be number of slave hosts

  </description>

</property>

 

<property>

  <name>mapred.child.java.opts</name>

  <value>-Xmx1500m</value>

</property>

 

<property>

  <name>mapred.jobtracker.restart.recover</name>

  <value>true</value>

</property>

 

<!-- Die naechsten Einstellungen geben an wo das HadoopFS seine Datein
auf der Festplatte jeder Instanz speichert. -->

<property>

  <name>dfs.name.dir</name>

  <value>/nutch/filesystem/name</value>

</property>

 

<property>

  <name>dfs.data.dir</name>

  <value>/nutch/filesystem/data</value>

</property>

 

<property>

  <name>mapred.system.dir</name>

  <value>/nutch/filesystem/mapreduce/system</value>

</property>

 

<property>

  <name>mapred.local.dir</name>

  <value>/nutch/filesystem/mapreduce/local</value>

</property>

 

<!-- Gibt an wieviele Replikate einer Datei im Dateisystem vorhanden
sein muessen damit sie erreichbar ist. Am Anfang 1 -->

<property>

  <name>dfs.replication</name>

  <value>2</value>

</property>

 

I hope someone can help me.

 

Thanks

 

Tom

Reply | Threaded
Open this post in threaded view
|

Re: Problems with a new Installation of Nutch

MilleBii
Did you check with the web interface ? It gives a lot of info you can
even browse the file system.

Try hadoop fs -ls to see what it gives you ?

2009/12/4, Tom Landvoigt <[hidden email]>:

> Hallo,
>
>
>
> I hope someone can help me.
>
>
>
> I installed nutch on 2 Amazon EC2 computers. Everything is fine but I
> can't put data in the hdfs.
>
>
>
> I formatted the namenode and start the hdfs with start all.
>
>
>
>  All  java processes start properly, but when I want to make hadoop fs
> -put something / I get these logs:
>
>
>
>
>
>
>
> nutch@bla:/nutch/search> ./bin/hadoop fs -put
> /tmp/hadoop-nutch-tasktracker.pid blub
>
> put: Protocol not available
>
>
>
> DATA NODE LOG on the master:
>
> 2009-12-04 12:50:15,566 INFO  http.HttpServer - Version Jetty/5.1.4
>
> 2009-12-04 12:50:15,582 INFO  util.Credential - Checking Resource
> aliases
>
> 2009-12-04 12:50:16,483 INFO  util.Container - Started
> org.mortbay.jetty.servlet.WebApplicationHandler@e45b5e
>
> 2009-12-04 12:50:16,614 INFO  util.Container - Started
> WebApplicationContext[/static,/static]
>
> 2009-12-04 12:50:16,882 INFO  util.Container - Started
> org.mortbay.jetty.servlet.WebApplicationHandler@1284fd4
>
> 2009-12-04 12:50:16,883 INFO  util.Container - Started
> WebApplicationContext[/logs,/logs]
>
> 2009-12-04 12:50:17,827 INFO  util.Container - Started
> org.mortbay.jetty.servlet.WebApplicationHandler@39c8c1
>
> 2009-12-04 12:50:17,849 INFO  util.Container - Started
> WebApplicationContext[/,/]
>
> 2009-12-04 12:50:18,485 INFO  http.SocketListener - Started
> SocketListener on 0.0.0.0:50075
>
> 2009-12-04 12:50:18,485 INFO  util.Container - Started
> org.mortbay.jetty.Server@36527f
>
> 2009-12-04 12:54:20,745 ERROR datanode.DataNode -
> DatanodeRegistration(10.224.113.210:50010,
> storageID=DS-1135263253-10.224.113.210-50010-1259926637370,
> infoPort=50075, ipcPort=50020):DataXceiver
>
> java.io.EOFException
>
>     at java.io.DataInputStream.readShort(DataInputStream.java:315)
>
>     at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:
> 79)
>
>     at java.lang.Thread.run(Thread.java:636)
>
> 2009-12-04 12:54:20,746 ERROR datanode.DataNode -
> DatanodeRegistration(10.224.113.210:50010,
> storageID=DS-1135263253-10.224.113.210-50010-1259926637370,
> infoPort=50075, ipcPort=50020):DataXceiver
>
> java.io.EOFException
>
>     at java.io.DataInputStream.readShort(DataInputStream.java:315)
>
>     at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:
> 79)
>
>     at java.lang.Thread.run(Thread.java:636)
>
> 2009-12-04 12:54:20,747 ERROR datanode.DataNode -
> DatanodeRegistration(10.224.113.210:50010,
> storageID=DS-1135263253-10.224.113.210-50010-1259926637370,
> infoPort=50075, ipcPort=50020):DataXceiver
>
> java.io.EOFException
>
>     at java.io.DataInputStream.readShort(DataInputStream.java:315)
>
>     at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:
> 79)
>
>     at java.lang.Thread.run(Thread.java:636)
>
> 2009-12-04 12:54:20,747 ERROR datanode.DataNode -
> DatanodeRegistration(10.224.113.210:50010,
> storageID=DS-1135263253-10.224.113.210-50010-1259926637370,
> infoPort=50075, ipcPort=50020):DataXceiver
>
> java.io.EOFException
>
>     at java.io.DataInputStream.readShort(DataInputStream.java:315)
>
>     at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:
> 79)
>
>     at java.lang.Thread.run(Thread.java:636)
>
>
>
> NAME NODE LOG
>
> 2009-12-04 12:50:11,539 INFO  http.HttpServer - Version Jetty/5.1.4
>
> 2009-12-04 12:50:11,573 INFO  util.Credential - Checking Resource
> aliases
>
> 2009-12-04 12:50:12,488 INFO  util.Container - Started
> org.mortbay.jetty.servlet.WebApplicationHandler@19fe451
>
> 2009-12-04 12:50:12,565 INFO  util.Container - Started
> WebApplicationContext[/static,/static]
>
> 2009-12-04 12:50:12,891 INFO  util.Container - Started
> org.mortbay.jetty.servlet.WebApplicationHandler@1570945
>
> 2009-12-04 12:50:12,891 INFO  util.Container - Started
> WebApplicationContext[/logs,/logs]
>
> 2009-12-04 12:50:13,569 INFO  util.Container - Started
> org.mortbay.jetty.servlet.WebApplicationHandler@11410e5
>
> 2009-12-04 12:50:13,582 INFO  util.Container - Started
> WebApplicationContext[/,/]
>
> 2009-12-04 12:50:13,613 INFO  http.SocketListener - Started
> SocketListener on 0.0.0.0:50070
>
> 2009-12-04 12:50:13,613 INFO  util.Container - Started
> org.mortbay.jetty.Server@173ec72
>
>
>
> SECONDARY NAMENODE LOG
>
> 2009-12-04 12:50:19,163 INFO  http.HttpServer - Version Jetty/5.1.4
>
> 2009-12-04 12:50:19,207 INFO  util.Credential - Checking Resource
> aliases
>
> 2009-12-04 12:50:20,365 INFO  util.Container - Started
> org.mortbay.jetty.servlet.WebApplicationHandler@174d93a
>
> 2009-12-04 12:50:20,454 INFO  util.Container - Started
> WebApplicationContext[/static,/static]
>
> 2009-12-04 12:50:21,396 INFO  util.Container - Started
> org.mortbay.jetty.servlet.WebApplicationHandler@31f2a7
>
> 2009-12-04 12:50:21,396 INFO  util.Container - Started
> WebApplicationContext[/logs,/logs]
>
> 2009-12-04 12:50:21,533 INFO  servlet.XMLConfiguration - No
> WEB-INF/web.xml in file:/mnt/nutch/nutch-1.0/webapps/secondary. Serving
> files and default/dynamic servlets only
>
> 2009-12-04 12:50:22,206 INFO  util.Container - Started
> org.mortbay.jetty.servlet.WebApplicationHandler@383118
>
> 2009-12-04 12:50:22,785 INFO  util.Container - Started
> WebApplicationContext[/,/]
>
> 2009-12-04 12:50:22,787 INFO  http.SocketListener - Started
> SocketListener on 0.0.0.0:50090
>
> 2009-12-04 12:50:22,787 INFO  util.Container - Started
> org.mortbay.jetty.Server@297ffb
>
> 2009-12-04 12:50:22,787 WARN  namenode.SecondaryNameNode - Checkpoint
> Period   :3600 secs (60 min)
>
> 2009-12-04 12:50:22,787 WARN  namenode.SecondaryNameNode - Log Size
> Trigger    :67108864 bytes (65536 KB)
>
> 2009-12-04 12:55:23,908 WARN  namenode.SecondaryNameNode - Checkpoint
> done. New Image Size: 1056
>
>
>
> HADOOP LOG
>
> 2009-12-04 12:54:20,708 WARN  hdfs.DFSClient - DataStreamer Exception:
> java.io.IOException: Unable to create new block.
>
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(D
> FSClient.java:2722)
>
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.j
> ava:1996)
>
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSCli
> ent.java:2183)
>
>
>
> 2009-12-04 12:54:20,709 WARN  hdfs.DFSClient - Error Recovery for block
> blk_5506837520665828594_1002 bad datanode[0] nodes == null
>
> 2009-12-04 12:54:20,709 WARN  hdfs.DFSClient - Could not get block
> locations. Source file "/user/nutch/blub/hadoop-nutch-tasktracker.pid" -
> Aborting...
>
>
>
> DATA NODE LOG on the slave
>
> 2009-12-04 12:49:49,433 INFO  http.HttpServer - Version Jetty/5.1.4
>
> 2009-12-04 12:49:49,438 INFO  util.Credential - Checking Resource
> aliases
>
> 2009-12-04 12:49:50,288 INFO  util.Container - Started
> org.mortbay.jetty.servlet.WebApplicationHandler@e45b5e
>
> 2009-12-04 12:49:50,357 INFO  util.Container - Started
> WebApplicationContext[/static,/static]
>
> 2009-12-04 12:49:50,555 INFO  util.Container - Started
> org.mortbay.jetty.servlet.WebApplicationHandler@2016b0
>
> 2009-12-04 12:49:50,555 INFO  util.Container - Started
> WebApplicationContext[/logs,/logs]
>
> 2009-12-04 12:49:50,816 INFO  util.Container - Started
> org.mortbay.jetty.servlet.WebApplicationHandler@118278a
>
> 2009-12-04 12:49:50,820 INFO  util.Container - Started
> WebApplicationContext[/,/]
>
> 2009-12-04 12:49:50,849 INFO  http.SocketListener - Started
> SocketListener on 0.0.0.0:50075
>
> 2009-12-04 12:49:50,849 INFO  util.Container - Started
> org.mortbay.jetty.Server@b02928
>
>
>
> HADOOP SITE XML
>
> <property>
>
>   <name>fs.default.name</name>
>
>   <value>hdfs://(yes here is the right ip):9000</value>
>
>   <description>
>
>     The name of the default file system. Either the literal string
>
>     "local" or a host:port for NDFS.
>
>   </description>
>
> </property>
>
>
>
> <!-- Gibt an wo der JobTracker (koordiniert die (MapReduce-)Auftraege)
> zu finden ist. -->
>
> <property>
>
>   <name>mapred.job.tracker</name>
>
>   <value>hdfs://(here to):9001</value>
>
>   <description>
>
>     The host and port that the MapReduce job tracker runs at. If
>
>     "local", then jobs are run in-process as a single map and
>
>     reduce task.
>
>   </description>
>
> </property>
>
>
>
> <!-- Gibt an wieviele MapJobs gleichzeitig laufen duerfen-->
>
> <property>
>
>   <name>mapred.tasktracker.map.tasks.maximum</name>
>
>   <value>2</value>
>
>   <description>
>
>     define mapred.map tasks to be number of slave hosts
>
>   </description>
>
> </property>
>
>
>
> <!-- Gibt an wieviele ReduceJobs gleichzeitig laufen duerfen-->
>
> <property>
>
>   <name>mapred.tasktracker.reduce.tasks.maximum</name>
>
>   <value>2</value>
>
>   <description>
>
>     define mapred.reduce tasks to be number of slave hosts
>
>   </description>
>
> </property>
>
>
>
> <property>
>
>   <name>mapred.child.java.opts</name>
>
>   <value>-Xmx1500m</value>
>
> </property>
>
>
>
> <property>
>
>   <name>mapred.jobtracker.restart.recover</name>
>
>   <value>true</value>
>
> </property>
>
>
>
> <!-- Die naechsten Einstellungen geben an wo das HadoopFS seine Datein
> auf der Festplatte jeder Instanz speichert. -->
>
> <property>
>
>   <name>dfs.name.dir</name>
>
>   <value>/nutch/filesystem/name</value>
>
> </property>
>
>
>
> <property>
>
>   <name>dfs.data.dir</name>
>
>   <value>/nutch/filesystem/data</value>
>
> </property>
>
>
>
> <property>
>
>   <name>mapred.system.dir</name>
>
>   <value>/nutch/filesystem/mapreduce/system</value>
>
> </property>
>
>
>
> <property>
>
>   <name>mapred.local.dir</name>
>
>   <value>/nutch/filesystem/mapreduce/local</value>
>
> </property>
>
>
>
> <!-- Gibt an wieviele Replikate einer Datei im Dateisystem vorhanden
> sein muessen damit sie erreichbar ist. Am Anfang 1 -->
>
> <property>
>
>   <name>dfs.replication</name>
>
>   <value>2</value>
>
> </property>
>
>
>
> I hope someone can help me.
>
>
>
> Thanks
>
>
>
> Tom
>
>


--
-MilleBii-
Reply | Threaded
Open this post in threaded view
|

RE: Problems with a new Installation of Nutch

Tom Landvoigt
Hi,

I don't have tomcat on this system because I don't want to use the websearch. But if it is necessary for hadoop what I don’t think I will install it.

nutch@ip-10-224-113-210:/nutch/search> ./bin/hadoop fs -ls /
Found 1 items
-rw-r--r--   2 nutch supergroup          0 2009-12-04 14:04 /url.txt
nutch@ip-10-224-113-210:/nutch/search>

I get the normal answer but the file is empty.

-----Original Message-----
From: MilleBii [mailto:[hidden email]]
Sent: Freitag, 4. Dezember 2009 15:06
To: [hidden email]
Subject: Re: Problems with a new Installation of Nutch

Did you check with the web interface ? It gives a lot of info you can
even browse the file system.

Try hadoop fs -ls to see what it gives you ?

2009/12/4, Tom Landvoigt <[hidden email]>:

> Hallo,
>
>
>
> I hope someone can help me.
>
>
>
> I installed nutch on 2 Amazon EC2 computers. Everything is fine but I
> can't put data in the hdfs.
>
>
>
> I formatted the namenode and start the hdfs with start all.
>
>
>
>  All  java processes start properly, but when I want to make hadoop fs
> -put something / I get these logs:
>
>
>
>
>
>
>
> nutch@bla:/nutch/search> ./bin/hadoop fs -put
> /tmp/hadoop-nutch-tasktracker.pid blub
>
> put: Protocol not available
>
>
>
> DATA NODE LOG on the master:
>
> 2009-12-04 12:50:15,566 INFO  http.HttpServer - Version Jetty/5.1.4
>
> 2009-12-04 12:50:15,582 INFO  util.Credential - Checking Resource
> aliases
>
> 2009-12-04 12:50:16,483 INFO  util.Container - Started
> org.mortbay.jetty.servlet.WebApplicationHandler@e45b5e
>
> 2009-12-04 12:50:16,614 INFO  util.Container - Started
> WebApplicationContext[/static,/static]
>
> 2009-12-04 12:50:16,882 INFO  util.Container - Started
> org.mortbay.jetty.servlet.WebApplicationHandler@1284fd4
>
> 2009-12-04 12:50:16,883 INFO  util.Container - Started
> WebApplicationContext[/logs,/logs]
>
> 2009-12-04 12:50:17,827 INFO  util.Container - Started
> org.mortbay.jetty.servlet.WebApplicationHandler@39c8c1
>
> 2009-12-04 12:50:17,849 INFO  util.Container - Started
> WebApplicationContext[/,/]
>
> 2009-12-04 12:50:18,485 INFO  http.SocketListener - Started
> SocketListener on 0.0.0.0:50075
>
> 2009-12-04 12:50:18,485 INFO  util.Container - Started
> org.mortbay.jetty.Server@36527f
>
> 2009-12-04 12:54:20,745 ERROR datanode.DataNode -
> DatanodeRegistration(10.224.113.210:50010,
> storageID=DS-1135263253-10.224.113.210-50010-1259926637370,
> infoPort=50075, ipcPort=50020):DataXceiver
>
> java.io.EOFException
>
>     at java.io.DataInputStream.readShort(DataInputStream.java:315)
>
>     at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:
> 79)
>
>     at java.lang.Thread.run(Thread.java:636)
>
> 2009-12-04 12:54:20,746 ERROR datanode.DataNode -
> DatanodeRegistration(10.224.113.210:50010,
> storageID=DS-1135263253-10.224.113.210-50010-1259926637370,
> infoPort=50075, ipcPort=50020):DataXceiver
>
> java.io.EOFException
>
>     at java.io.DataInputStream.readShort(DataInputStream.java:315)
>
>     at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:
> 79)
>
>     at java.lang.Thread.run(Thread.java:636)
>
> 2009-12-04 12:54:20,747 ERROR datanode.DataNode -
> DatanodeRegistration(10.224.113.210:50010,
> storageID=DS-1135263253-10.224.113.210-50010-1259926637370,
> infoPort=50075, ipcPort=50020):DataXceiver
>
> java.io.EOFException
>
>     at java.io.DataInputStream.readShort(DataInputStream.java:315)
>
>     at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:
> 79)
>
>     at java.lang.Thread.run(Thread.java:636)
>
> 2009-12-04 12:54:20,747 ERROR datanode.DataNode -
> DatanodeRegistration(10.224.113.210:50010,
> storageID=DS-1135263253-10.224.113.210-50010-1259926637370,
> infoPort=50075, ipcPort=50020):DataXceiver
>
> java.io.EOFException
>
>     at java.io.DataInputStream.readShort(DataInputStream.java:315)
>
>     at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:
> 79)
>
>     at java.lang.Thread.run(Thread.java:636)
>
>
>
> NAME NODE LOG
>
> 2009-12-04 12:50:11,539 INFO  http.HttpServer - Version Jetty/5.1.4
>
> 2009-12-04 12:50:11,573 INFO  util.Credential - Checking Resource
> aliases
>
> 2009-12-04 12:50:12,488 INFO  util.Container - Started
> org.mortbay.jetty.servlet.WebApplicationHandler@19fe451
>
> 2009-12-04 12:50:12,565 INFO  util.Container - Started
> WebApplicationContext[/static,/static]
>
> 2009-12-04 12:50:12,891 INFO  util.Container - Started
> org.mortbay.jetty.servlet.WebApplicationHandler@1570945
>
> 2009-12-04 12:50:12,891 INFO  util.Container - Started
> WebApplicationContext[/logs,/logs]
>
> 2009-12-04 12:50:13,569 INFO  util.Container - Started
> org.mortbay.jetty.servlet.WebApplicationHandler@11410e5
>
> 2009-12-04 12:50:13,582 INFO  util.Container - Started
> WebApplicationContext[/,/]
>
> 2009-12-04 12:50:13,613 INFO  http.SocketListener - Started
> SocketListener on 0.0.0.0:50070
>
> 2009-12-04 12:50:13,613 INFO  util.Container - Started
> org.mortbay.jetty.Server@173ec72
>
>
>
> SECONDARY NAMENODE LOG
>
> 2009-12-04 12:50:19,163 INFO  http.HttpServer - Version Jetty/5.1.4
>
> 2009-12-04 12:50:19,207 INFO  util.Credential - Checking Resource
> aliases
>
> 2009-12-04 12:50:20,365 INFO  util.Container - Started
> org.mortbay.jetty.servlet.WebApplicationHandler@174d93a
>
> 2009-12-04 12:50:20,454 INFO  util.Container - Started
> WebApplicationContext[/static,/static]
>
> 2009-12-04 12:50:21,396 INFO  util.Container - Started
> org.mortbay.jetty.servlet.WebApplicationHandler@31f2a7
>
> 2009-12-04 12:50:21,396 INFO  util.Container - Started
> WebApplicationContext[/logs,/logs]
>
> 2009-12-04 12:50:21,533 INFO  servlet.XMLConfiguration - No
> WEB-INF/web.xml in file:/mnt/nutch/nutch-1.0/webapps/secondary. Serving
> files and default/dynamic servlets only
>
> 2009-12-04 12:50:22,206 INFO  util.Container - Started
> org.mortbay.jetty.servlet.WebApplicationHandler@383118
>
> 2009-12-04 12:50:22,785 INFO  util.Container - Started
> WebApplicationContext[/,/]
>
> 2009-12-04 12:50:22,787 INFO  http.SocketListener - Started
> SocketListener on 0.0.0.0:50090
>
> 2009-12-04 12:50:22,787 INFO  util.Container - Started
> org.mortbay.jetty.Server@297ffb
>
> 2009-12-04 12:50:22,787 WARN  namenode.SecondaryNameNode - Checkpoint
> Period   :3600 secs (60 min)
>
> 2009-12-04 12:50:22,787 WARN  namenode.SecondaryNameNode - Log Size
> Trigger    :67108864 bytes (65536 KB)
>
> 2009-12-04 12:55:23,908 WARN  namenode.SecondaryNameNode - Checkpoint
> done. New Image Size: 1056
>
>
>
> HADOOP LOG
>
> 2009-12-04 12:54:20,708 WARN  hdfs.DFSClient - DataStreamer Exception:
> java.io.IOException: Unable to create new block.
>
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(D
> FSClient.java:2722)
>
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.j
> ava:1996)
>
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSCli
> ent.java:2183)
>
>
>
> 2009-12-04 12:54:20,709 WARN  hdfs.DFSClient - Error Recovery for block
> blk_5506837520665828594_1002 bad datanode[0] nodes == null
>
> 2009-12-04 12:54:20,709 WARN  hdfs.DFSClient - Could not get block
> locations. Source file "/user/nutch/blub/hadoop-nutch-tasktracker.pid" -
> Aborting...
>
>
>
> DATA NODE LOG on the slave
>
> 2009-12-04 12:49:49,433 INFO  http.HttpServer - Version Jetty/5.1.4
>
> 2009-12-04 12:49:49,438 INFO  util.Credential - Checking Resource
> aliases
>
> 2009-12-04 12:49:50,288 INFO  util.Container - Started
> org.mortbay.jetty.servlet.WebApplicationHandler@e45b5e
>
> 2009-12-04 12:49:50,357 INFO  util.Container - Started
> WebApplicationContext[/static,/static]
>
> 2009-12-04 12:49:50,555 INFO  util.Container - Started
> org.mortbay.jetty.servlet.WebApplicationHandler@2016b0
>
> 2009-12-04 12:49:50,555 INFO  util.Container - Started
> WebApplicationContext[/logs,/logs]
>
> 2009-12-04 12:49:50,816 INFO  util.Container - Started
> org.mortbay.jetty.servlet.WebApplicationHandler@118278a
>
> 2009-12-04 12:49:50,820 INFO  util.Container - Started
> WebApplicationContext[/,/]
>
> 2009-12-04 12:49:50,849 INFO  http.SocketListener - Started
> SocketListener on 0.0.0.0:50075
>
> 2009-12-04 12:49:50,849 INFO  util.Container - Started
> org.mortbay.jetty.Server@b02928
>
>
>
> HADOOP SITE XML
>
> <property>
>
>   <name>fs.default.name</name>
>
>   <value>hdfs://(yes here is the right ip):9000</value>
>
>   <description>
>
>     The name of the default file system. Either the literal string
>
>     "local" or a host:port for NDFS.
>
>   </description>
>
> </property>
>
>
>
> <!-- Gibt an wo der JobTracker (koordiniert die (MapReduce-)Auftraege)
> zu finden ist. -->
>
> <property>
>
>   <name>mapred.job.tracker</name>
>
>   <value>hdfs://(here to):9001</value>
>
>   <description>
>
>     The host and port that the MapReduce job tracker runs at. If
>
>     "local", then jobs are run in-process as a single map and
>
>     reduce task.
>
>   </description>
>
> </property>
>
>
>
> <!-- Gibt an wieviele MapJobs gleichzeitig laufen duerfen-->
>
> <property>
>
>   <name>mapred.tasktracker.map.tasks.maximum</name>
>
>   <value>2</value>
>
>   <description>
>
>     define mapred.map tasks to be number of slave hosts
>
>   </description>
>
> </property>
>
>
>
> <!-- Gibt an wieviele ReduceJobs gleichzeitig laufen duerfen-->
>
> <property>
>
>   <name>mapred.tasktracker.reduce.tasks.maximum</name>
>
>   <value>2</value>
>
>   <description>
>
>     define mapred.reduce tasks to be number of slave hosts
>
>   </description>
>
> </property>
>
>
>
> <property>
>
>   <name>mapred.child.java.opts</name>
>
>   <value>-Xmx1500m</value>
>
> </property>
>
>
>
> <property>
>
>   <name>mapred.jobtracker.restart.recover</name>
>
>   <value>true</value>
>
> </property>
>
>
>
> <!-- Die naechsten Einstellungen geben an wo das HadoopFS seine Datein
> auf der Festplatte jeder Instanz speichert. -->
>
> <property>
>
>   <name>dfs.name.dir</name>
>
>   <value>/nutch/filesystem/name</value>
>
> </property>
>
>
>
> <property>
>
>   <name>dfs.data.dir</name>
>
>   <value>/nutch/filesystem/data</value>
>
> </property>
>
>
>
> <property>
>
>   <name>mapred.system.dir</name>
>
>   <value>/nutch/filesystem/mapreduce/system</value>
>
> </property>
>
>
>
> <property>
>
>   <name>mapred.local.dir</name>
>
>   <value>/nutch/filesystem/mapreduce/local</value>
>
> </property>
>
>
>
> <!-- Gibt an wieviele Replikate einer Datei im Dateisystem vorhanden
> sein muessen damit sie erreichbar ist. Am Anfang 1 -->
>
> <property>
>
>   <name>dfs.replication</name>
>
>   <value>2</value>
>
> </property>
>
>
>
> I hope someone can help me.
>
>
>
> Thanks
>
>
>
> Tom
>
>


--
-MilleBii-
Reply | Threaded
Open this post in threaded view
|

Re: Problems with a new Installation of Nutch

MilleBii
I don't know that hadoop uses tomcat... But I think it uses Jetty
instead. The nodes communicate via http: so you need some kind of web
server... And for monitorin its the best way

2009/12/4, Tom Landvoigt <[hidden email]>:

> Hi,
>
> I don't have tomcat on this system because I don't want to use the
> websearch. But if it is necessary for hadoop what I don’t think I will
> install it.
>
> nutch@ip-10-224-113-210:/nutch/search> ./bin/hadoop fs -ls /
> Found 1 items
> -rw-r--r--   2 nutch supergroup          0 2009-12-04 14:04 /url.txt
> nutch@ip-10-224-113-210:/nutch/search>
>
> I get the normal answer but the file is empty.
>
> -----Original Message-----
> From: MilleBii [mailto:[hidden email]]
> Sent: Freitag, 4. Dezember 2009 15:06
> To: [hidden email]
> Subject: Re: Problems with a new Installation of Nutch
>
> Did you check with the web interface ? It gives a lot of info you can
> even browse the file system.
>
> Try hadoop fs -ls to see what it gives you ?
>
> 2009/12/4, Tom Landvoigt <[hidden email]>:
>> Hallo,
>>
>>
>>
>> I hope someone can help me.
>>
>>
>>
>> I installed nutch on 2 Amazon EC2 computers. Everything is fine but I
>> can't put data in the hdfs.
>>
>>
>>
>> I formatted the namenode and start the hdfs with start all.
>>
>>
>>
>>  All  java processes start properly, but when I want to make hadoop fs
>> -put something / I get these logs:
>>
>>
>>
>>
>>
>>
>>
>> nutch@bla:/nutch/search> ./bin/hadoop fs -put
>> /tmp/hadoop-nutch-tasktracker.pid blub
>>
>> put: Protocol not available
>>
>>
>>
>> DATA NODE LOG on the master:
>>
>> 2009-12-04 12:50:15,566 INFO  http.HttpServer - Version Jetty/5.1.4
>>
>> 2009-12-04 12:50:15,582 INFO  util.Credential - Checking Resource
>> aliases
>>
>> 2009-12-04 12:50:16,483 INFO  util.Container - Started
>> org.mortbay.jetty.servlet.WebApplicationHandler@e45b5e
>>
>> 2009-12-04 12:50:16,614 INFO  util.Container - Started
>> WebApplicationContext[/static,/static]
>>
>> 2009-12-04 12:50:16,882 INFO  util.Container - Started
>> org.mortbay.jetty.servlet.WebApplicationHandler@1284fd4
>>
>> 2009-12-04 12:50:16,883 INFO  util.Container - Started
>> WebApplicationContext[/logs,/logs]
>>
>> 2009-12-04 12:50:17,827 INFO  util.Container - Started
>> org.mortbay.jetty.servlet.WebApplicationHandler@39c8c1
>>
>> 2009-12-04 12:50:17,849 INFO  util.Container - Started
>> WebApplicationContext[/,/]
>>
>> 2009-12-04 12:50:18,485 INFO  http.SocketListener - Started
>> SocketListener on 0.0.0.0:50075
>>
>> 2009-12-04 12:50:18,485 INFO  util.Container - Started
>> org.mortbay.jetty.Server@36527f
>>
>> 2009-12-04 12:54:20,745 ERROR datanode.DataNode -
>> DatanodeRegistration(10.224.113.210:50010,
>> storageID=DS-1135263253-10.224.113.210-50010-1259926637370,
>> infoPort=50075, ipcPort=50020):DataXceiver
>>
>> java.io.EOFException
>>
>>     at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>
>>     at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:
>> 79)
>>
>>     at java.lang.Thread.run(Thread.java:636)
>>
>> 2009-12-04 12:54:20,746 ERROR datanode.DataNode -
>> DatanodeRegistration(10.224.113.210:50010,
>> storageID=DS-1135263253-10.224.113.210-50010-1259926637370,
>> infoPort=50075, ipcPort=50020):DataXceiver
>>
>> java.io.EOFException
>>
>>     at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>
>>     at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:
>> 79)
>>
>>     at java.lang.Thread.run(Thread.java:636)
>>
>> 2009-12-04 12:54:20,747 ERROR datanode.DataNode -
>> DatanodeRegistration(10.224.113.210:50010,
>> storageID=DS-1135263253-10.224.113.210-50010-1259926637370,
>> infoPort=50075, ipcPort=50020):DataXceiver
>>
>> java.io.EOFException
>>
>>     at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>
>>     at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:
>> 79)
>>
>>     at java.lang.Thread.run(Thread.java:636)
>>
>> 2009-12-04 12:54:20,747 ERROR datanode.DataNode -
>> DatanodeRegistration(10.224.113.210:50010,
>> storageID=DS-1135263253-10.224.113.210-50010-1259926637370,
>> infoPort=50075, ipcPort=50020):DataXceiver
>>
>> java.io.EOFException
>>
>>     at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>
>>     at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:
>> 79)
>>
>>     at java.lang.Thread.run(Thread.java:636)
>>
>>
>>
>> NAME NODE LOG
>>
>> 2009-12-04 12:50:11,539 INFO  http.HttpServer - Version Jetty/5.1.4
>>
>> 2009-12-04 12:50:11,573 INFO  util.Credential - Checking Resource
>> aliases
>>
>> 2009-12-04 12:50:12,488 INFO  util.Container - Started
>> org.mortbay.jetty.servlet.WebApplicationHandler@19fe451
>>
>> 2009-12-04 12:50:12,565 INFO  util.Container - Started
>> WebApplicationContext[/static,/static]
>>
>> 2009-12-04 12:50:12,891 INFO  util.Container - Started
>> org.mortbay.jetty.servlet.WebApplicationHandler@1570945
>>
>> 2009-12-04 12:50:12,891 INFO  util.Container - Started
>> WebApplicationContext[/logs,/logs]
>>
>> 2009-12-04 12:50:13,569 INFO  util.Container - Started
>> org.mortbay.jetty.servlet.WebApplicationHandler@11410e5
>>
>> 2009-12-04 12:50:13,582 INFO  util.Container - Started
>> WebApplicationContext[/,/]
>>
>> 2009-12-04 12:50:13,613 INFO  http.SocketListener - Started
>> SocketListener on 0.0.0.0:50070
>>
>> 2009-12-04 12:50:13,613 INFO  util.Container - Started
>> org.mortbay.jetty.Server@173ec72
>>
>>
>>
>> SECONDARY NAMENODE LOG
>>
>> 2009-12-04 12:50:19,163 INFO  http.HttpServer - Version Jetty/5.1.4
>>
>> 2009-12-04 12:50:19,207 INFO  util.Credential - Checking Resource
>> aliases
>>
>> 2009-12-04 12:50:20,365 INFO  util.Container - Started
>> org.mortbay.jetty.servlet.WebApplicationHandler@174d93a
>>
>> 2009-12-04 12:50:20,454 INFO  util.Container - Started
>> WebApplicationContext[/static,/static]
>>
>> 2009-12-04 12:50:21,396 INFO  util.Container - Started
>> org.mortbay.jetty.servlet.WebApplicationHandler@31f2a7
>>
>> 2009-12-04 12:50:21,396 INFO  util.Container - Started
>> WebApplicationContext[/logs,/logs]
>>
>> 2009-12-04 12:50:21,533 INFO  servlet.XMLConfiguration - No
>> WEB-INF/web.xml in file:/mnt/nutch/nutch-1.0/webapps/secondary. Serving
>> files and default/dynamic servlets only
>>
>> 2009-12-04 12:50:22,206 INFO  util.Container - Started
>> org.mortbay.jetty.servlet.WebApplicationHandler@383118
>>
>> 2009-12-04 12:50:22,785 INFO  util.Container - Started
>> WebApplicationContext[/,/]
>>
>> 2009-12-04 12:50:22,787 INFO  http.SocketListener - Started
>> SocketListener on 0.0.0.0:50090
>>
>> 2009-12-04 12:50:22,787 INFO  util.Container - Started
>> org.mortbay.jetty.Server@297ffb
>>
>> 2009-12-04 12:50:22,787 WARN  namenode.SecondaryNameNode - Checkpoint
>> Period   :3600 secs (60 min)
>>
>> 2009-12-04 12:50:22,787 WARN  namenode.SecondaryNameNode - Log Size
>> Trigger    :67108864 bytes (65536 KB)
>>
>> 2009-12-04 12:55:23,908 WARN  namenode.SecondaryNameNode - Checkpoint
>> done. New Image Size: 1056
>>
>>
>>
>> HADOOP LOG
>>
>> 2009-12-04 12:54:20,708 WARN  hdfs.DFSClient - DataStreamer Exception:
>> java.io.IOException: Unable to create new block.
>>
>>     at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(D
>> FSClient.java:2722)
>>
>>     at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.j
>> ava:1996)
>>
>>     at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSCli
>> ent.java:2183)
>>
>>
>>
>> 2009-12-04 12:54:20,709 WARN  hdfs.DFSClient - Error Recovery for block
>> blk_5506837520665828594_1002 bad datanode[0] nodes == null
>>
>> 2009-12-04 12:54:20,709 WARN  hdfs.DFSClient - Could not get block
>> locations. Source file "/user/nutch/blub/hadoop-nutch-tasktracker.pid" -
>> Aborting...
>>
>>
>>
>> DATA NODE LOG on the slave
>>
>> 2009-12-04 12:49:49,433 INFO  http.HttpServer - Version Jetty/5.1.4
>>
>> 2009-12-04 12:49:49,438 INFO  util.Credential - Checking Resource
>> aliases
>>
>> 2009-12-04 12:49:50,288 INFO  util.Container - Started
>> org.mortbay.jetty.servlet.WebApplicationHandler@e45b5e
>>
>> 2009-12-04 12:49:50,357 INFO  util.Container - Started
>> WebApplicationContext[/static,/static]
>>
>> 2009-12-04 12:49:50,555 INFO  util.Container - Started
>> org.mortbay.jetty.servlet.WebApplicationHandler@2016b0
>>
>> 2009-12-04 12:49:50,555 INFO  util.Container - Started
>> WebApplicationContext[/logs,/logs]
>>
>> 2009-12-04 12:49:50,816 INFO  util.Container - Started
>> org.mortbay.jetty.servlet.WebApplicationHandler@118278a
>>
>> 2009-12-04 12:49:50,820 INFO  util.Container - Started
>> WebApplicationContext[/,/]
>>
>> 2009-12-04 12:49:50,849 INFO  http.SocketListener - Started
>> SocketListener on 0.0.0.0:50075
>>
>> 2009-12-04 12:49:50,849 INFO  util.Container - Started
>> org.mortbay.jetty.Server@b02928
>>
>>
>>
>> HADOOP SITE XML
>>
>> <property>
>>
>>   <name>fs.default.name</name>
>>
>>   <value>hdfs://(yes here is the right ip):9000</value>
>>
>>   <description>
>>
>>     The name of the default file system. Either the literal string
>>
>>     "local" or a host:port for NDFS.
>>
>>   </description>
>>
>> </property>
>>
>>
>>
>> <!-- Gibt an wo der JobTracker (koordiniert die (MapReduce-)Auftraege)
>> zu finden ist. -->
>>
>> <property>
>>
>>   <name>mapred.job.tracker</name>
>>
>>   <value>hdfs://(here to):9001</value>
>>
>>   <description>
>>
>>     The host and port that the MapReduce job tracker runs at. If
>>
>>     "local", then jobs are run in-process as a single map and
>>
>>     reduce task.
>>
>>   </description>
>>
>> </property>
>>
>>
>>
>> <!-- Gibt an wieviele MapJobs gleichzeitig laufen duerfen-->
>>
>> <property>
>>
>>   <name>mapred.tasktracker.map.tasks.maximum</name>
>>
>>   <value>2</value>
>>
>>   <description>
>>
>>     define mapred.map tasks to be number of slave hosts
>>
>>   </description>
>>
>> </property>
>>
>>
>>
>> <!-- Gibt an wieviele ReduceJobs gleichzeitig laufen duerfen-->
>>
>> <property>
>>
>>   <name>mapred.tasktracker.reduce.tasks.maximum</name>
>>
>>   <value>2</value>
>>
>>   <description>
>>
>>     define mapred.reduce tasks to be number of slave hosts
>>
>>   </description>
>>
>> </property>
>>
>>
>>
>> <property>
>>
>>   <name>mapred.child.java.opts</name>
>>
>>   <value>-Xmx1500m</value>
>>
>> </property>
>>
>>
>>
>> <property>
>>
>>   <name>mapred.jobtracker.restart.recover</name>
>>
>>   <value>true</value>
>>
>> </property>
>>
>>
>>
>> <!-- Die naechsten Einstellungen geben an wo das HadoopFS seine Datein
>> auf der Festplatte jeder Instanz speichert. -->
>>
>> <property>
>>
>>   <name>dfs.name.dir</name>
>>
>>   <value>/nutch/filesystem/name</value>
>>
>> </property>
>>
>>
>>
>> <property>
>>
>>   <name>dfs.data.dir</name>
>>
>>   <value>/nutch/filesystem/data</value>
>>
>> </property>
>>
>>
>>
>> <property>
>>
>>   <name>mapred.system.dir</name>
>>
>>   <value>/nutch/filesystem/mapreduce/system</value>
>>
>> </property>
>>
>>
>>
>> <property>
>>
>>   <name>mapred.local.dir</name>
>>
>>   <value>/nutch/filesystem/mapreduce/local</value>
>>
>> </property>
>>
>>
>>
>> <!-- Gibt an wieviele Replikate einer Datei im Dateisystem vorhanden
>> sein muessen damit sie erreichbar ist. Am Anfang 1 -->
>>
>> <property>
>>
>>   <name>dfs.replication</name>
>>
>>   <value>2</value>
>>
>> </property>
>>
>>
>>
>> I hope someone can help me.
>>
>>
>>
>> Thanks
>>
>>
>>
>> Tom
>>
>>
>
>
> --
> -MilleBii-
>


--
-MilleBii-
Reply | Threaded
Open this post in threaded view
|

RE: Problems with a new Installation of Nutch

Tom Landvoigt
Hi,

Does anyone know what packages I have to install in Suse to get Nutch running?

I have another installation with nutch where everything is fine. So I copied the hole installation. It's also an Suse linux but it is in 64 bit and I don’t installed it.

But the same problem.
At the moment I installed the following packages:
Tomcat 6
Openjdk devel 1.6
Sun java devel 1.6
Ant 1.7

Now it is enough for today.

Hope someone can help.

Tom

-----Original Message-----
From: MilleBii [mailto:[hidden email]]
Sent: Freitag, 4. Dezember 2009 17:31
To: [hidden email]
Subject: Re: Problems with a new Installation of Nutch

I don't know that hadoop uses tomcat... But I think it uses Jetty
instead. The nodes communicate via http: so you need some kind of web
server... And for monitorin its the best way

2009/12/4, Tom Landvoigt <[hidden email]>:

> Hi,
>
> I don't have tomcat on this system because I don't want to use the
> websearch. But if it is necessary for hadoop what I don’t think I will
> install it.
>
> nutch@ip-10-224-113-210:/nutch/search> ./bin/hadoop fs -ls /
> Found 1 items
> -rw-r--r--   2 nutch supergroup          0 2009-12-04 14:04 /url.txt
> nutch@ip-10-224-113-210:/nutch/search>
>
> I get the normal answer but the file is empty.
>
> -----Original Message-----
> From: MilleBii [mailto:[hidden email]]
> Sent: Freitag, 4. Dezember 2009 15:06
> To: [hidden email]
> Subject: Re: Problems with a new Installation of Nutch
>
> Did you check with the web interface ? It gives a lot of info you can
> even browse the file system.
>
> Try hadoop fs -ls to see what it gives you ?
>
> 2009/12/4, Tom Landvoigt <[hidden email]>:
>> Hallo,
>>
>>
>>
>> I hope someone can help me.
>>
>>
>>
>> I installed nutch on 2 Amazon EC2 computers. Everything is fine but I
>> can't put data in the hdfs.
>>
>>
>>
>> I formatted the namenode and start the hdfs with start all.
>>
>>
>>
>>  All  java processes start properly, but when I want to make hadoop fs
>> -put something / I get these logs:
>>
>>
>>
>>
>>
>>
>>
>> nutch@bla:/nutch/search> ./bin/hadoop fs -put
>> /tmp/hadoop-nutch-tasktracker.pid blub
>>
>> put: Protocol not available
>>
>>
>>
>> DATA NODE LOG on the master:
>>
>> 2009-12-04 12:50:15,566 INFO  http.HttpServer - Version Jetty/5.1.4
>>
>> 2009-12-04 12:50:15,582 INFO  util.Credential - Checking Resource
>> aliases
>>
>> 2009-12-04 12:50:16,483 INFO  util.Container - Started
>> org.mortbay.jetty.servlet.WebApplicationHandler@e45b5e
>>
>> 2009-12-04 12:50:16,614 INFO  util.Container - Started
>> WebApplicationContext[/static,/static]
>>
>> 2009-12-04 12:50:16,882 INFO  util.Container - Started
>> org.mortbay.jetty.servlet.WebApplicationHandler@1284fd4
>>
>> 2009-12-04 12:50:16,883 INFO  util.Container - Started
>> WebApplicationContext[/logs,/logs]
>>
>> 2009-12-04 12:50:17,827 INFO  util.Container - Started
>> org.mortbay.jetty.servlet.WebApplicationHandler@39c8c1
>>
>> 2009-12-04 12:50:17,849 INFO  util.Container - Started
>> WebApplicationContext[/,/]
>>
>> 2009-12-04 12:50:18,485 INFO  http.SocketListener - Started
>> SocketListener on 0.0.0.0:50075
>>
>> 2009-12-04 12:50:18,485 INFO  util.Container - Started
>> org.mortbay.jetty.Server@36527f
>>
>> 2009-12-04 12:54:20,745 ERROR datanode.DataNode -
>> DatanodeRegistration(10.224.113.210:50010,
>> storageID=DS-1135263253-10.224.113.210-50010-1259926637370,
>> infoPort=50075, ipcPort=50020):DataXceiver
>>
>> java.io.EOFException
>>
>>     at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>
>>     at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:
>> 79)
>>
>>     at java.lang.Thread.run(Thread.java:636)
>>
>> 2009-12-04 12:54:20,746 ERROR datanode.DataNode -
>> DatanodeRegistration(10.224.113.210:50010,
>> storageID=DS-1135263253-10.224.113.210-50010-1259926637370,
>> infoPort=50075, ipcPort=50020):DataXceiver
>>
>> java.io.EOFException
>>
>>     at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>
>>     at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:
>> 79)
>>
>>     at java.lang.Thread.run(Thread.java:636)
>>
>> 2009-12-04 12:54:20,747 ERROR datanode.DataNode -
>> DatanodeRegistration(10.224.113.210:50010,
>> storageID=DS-1135263253-10.224.113.210-50010-1259926637370,
>> infoPort=50075, ipcPort=50020):DataXceiver
>>
>> java.io.EOFException
>>
>>     at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>
>>     at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:
>> 79)
>>
>>     at java.lang.Thread.run(Thread.java:636)
>>
>> 2009-12-04 12:54:20,747 ERROR datanode.DataNode -
>> DatanodeRegistration(10.224.113.210:50010,
>> storageID=DS-1135263253-10.224.113.210-50010-1259926637370,
>> infoPort=50075, ipcPort=50020):DataXceiver
>>
>> java.io.EOFException
>>
>>     at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>
>>     at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:
>> 79)
>>
>>     at java.lang.Thread.run(Thread.java:636)
>>
>>
>>
>> NAME NODE LOG
>>
>> 2009-12-04 12:50:11,539 INFO  http.HttpServer - Version Jetty/5.1.4
>>
>> 2009-12-04 12:50:11,573 INFO  util.Credential - Checking Resource
>> aliases
>>
>> 2009-12-04 12:50:12,488 INFO  util.Container - Started
>> org.mortbay.jetty.servlet.WebApplicationHandler@19fe451
>>
>> 2009-12-04 12:50:12,565 INFO  util.Container - Started
>> WebApplicationContext[/static,/static]
>>
>> 2009-12-04 12:50:12,891 INFO  util.Container - Started
>> org.mortbay.jetty.servlet.WebApplicationHandler@1570945
>>
>> 2009-12-04 12:50:12,891 INFO  util.Container - Started
>> WebApplicationContext[/logs,/logs]
>>
>> 2009-12-04 12:50:13,569 INFO  util.Container - Started
>> org.mortbay.jetty.servlet.WebApplicationHandler@11410e5
>>
>> 2009-12-04 12:50:13,582 INFO  util.Container - Started
>> WebApplicationContext[/,/]
>>
>> 2009-12-04 12:50:13,613 INFO  http.SocketListener - Started
>> SocketListener on 0.0.0.0:50070
>>
>> 2009-12-04 12:50:13,613 INFO  util.Container - Started
>> org.mortbay.jetty.Server@173ec72
>>
>>
>>
>> SECONDARY NAMENODE LOG
>>
>> 2009-12-04 12:50:19,163 INFO  http.HttpServer - Version Jetty/5.1.4
>>
>> 2009-12-04 12:50:19,207 INFO  util.Credential - Checking Resource
>> aliases
>>
>> 2009-12-04 12:50:20,365 INFO  util.Container - Started
>> org.mortbay.jetty.servlet.WebApplicationHandler@174d93a
>>
>> 2009-12-04 12:50:20,454 INFO  util.Container - Started
>> WebApplicationContext[/static,/static]
>>
>> 2009-12-04 12:50:21,396 INFO  util.Container - Started
>> org.mortbay.jetty.servlet.WebApplicationHandler@31f2a7
>>
>> 2009-12-04 12:50:21,396 INFO  util.Container - Started
>> WebApplicationContext[/logs,/logs]
>>
>> 2009-12-04 12:50:21,533 INFO  servlet.XMLConfiguration - No
>> WEB-INF/web.xml in file:/mnt/nutch/nutch-1.0/webapps/secondary. Serving
>> files and default/dynamic servlets only
>>
>> 2009-12-04 12:50:22,206 INFO  util.Container - Started
>> org.mortbay.jetty.servlet.WebApplicationHandler@383118
>>
>> 2009-12-04 12:50:22,785 INFO  util.Container - Started
>> WebApplicationContext[/,/]
>>
>> 2009-12-04 12:50:22,787 INFO  http.SocketListener - Started
>> SocketListener on 0.0.0.0:50090
>>
>> 2009-12-04 12:50:22,787 INFO  util.Container - Started
>> org.mortbay.jetty.Server@297ffb
>>
>> 2009-12-04 12:50:22,787 WARN  namenode.SecondaryNameNode - Checkpoint
>> Period   :3600 secs (60 min)
>>
>> 2009-12-04 12:50:22,787 WARN  namenode.SecondaryNameNode - Log Size
>> Trigger    :67108864 bytes (65536 KB)
>>
>> 2009-12-04 12:55:23,908 WARN  namenode.SecondaryNameNode - Checkpoint
>> done. New Image Size: 1056
>>
>>
>>
>> HADOOP LOG
>>
>> 2009-12-04 12:54:20,708 WARN  hdfs.DFSClient - DataStreamer Exception:
>> java.io.IOException: Unable to create new block.
>>
>>     at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(D
>> FSClient.java:2722)
>>
>>     at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.j
>> ava:1996)
>>
>>     at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSCli
>> ent.java:2183)
>>
>>
>>
>> 2009-12-04 12:54:20,709 WARN  hdfs.DFSClient - Error Recovery for block
>> blk_5506837520665828594_1002 bad datanode[0] nodes == null
>>
>> 2009-12-04 12:54:20,709 WARN  hdfs.DFSClient - Could not get block
>> locations. Source file "/user/nutch/blub/hadoop-nutch-tasktracker.pid" -
>> Aborting...
>>
>>
>>
>> DATA NODE LOG on the slave
>>
>> 2009-12-04 12:49:49,433 INFO  http.HttpServer - Version Jetty/5.1.4
>>
>> 2009-12-04 12:49:49,438 INFO  util.Credential - Checking Resource
>> aliases
>>
>> 2009-12-04 12:49:50,288 INFO  util.Container - Started
>> org.mortbay.jetty.servlet.WebApplicationHandler@e45b5e
>>
>> 2009-12-04 12:49:50,357 INFO  util.Container - Started
>> WebApplicationContext[/static,/static]
>>
>> 2009-12-04 12:49:50,555 INFO  util.Container - Started
>> org.mortbay.jetty.servlet.WebApplicationHandler@2016b0
>>
>> 2009-12-04 12:49:50,555 INFO  util.Container - Started
>> WebApplicationContext[/logs,/logs]
>>
>> 2009-12-04 12:49:50,816 INFO  util.Container - Started
>> org.mortbay.jetty.servlet.WebApplicationHandler@118278a
>>
>> 2009-12-04 12:49:50,820 INFO  util.Container - Started
>> WebApplicationContext[/,/]
>>
>> 2009-12-04 12:49:50,849 INFO  http.SocketListener - Started
>> SocketListener on 0.0.0.0:50075
>>
>> 2009-12-04 12:49:50,849 INFO  util.Container - Started
>> org.mortbay.jetty.Server@b02928
>>
>>
>>
>> HADOOP SITE XML
>>
>> <property>
>>
>>   <name>fs.default.name</name>
>>
>>   <value>hdfs://(yes here is the right ip):9000</value>
>>
>>   <description>
>>
>>     The name of the default file system. Either the literal string
>>
>>     "local" or a host:port for NDFS.
>>
>>   </description>
>>
>> </property>
>>
>>
>>
>> <!-- Gibt an wo der JobTracker (koordiniert die (MapReduce-)Auftraege)
>> zu finden ist. -->
>>
>> <property>
>>
>>   <name>mapred.job.tracker</name>
>>
>>   <value>hdfs://(here to):9001</value>
>>
>>   <description>
>>
>>     The host and port that the MapReduce job tracker runs at. If
>>
>>     "local", then jobs are run in-process as a single map and
>>
>>     reduce task.
>>
>>   </description>
>>
>> </property>
>>
>>
>>
>> <!-- Gibt an wieviele MapJobs gleichzeitig laufen duerfen-->
>>
>> <property>
>>
>>   <name>mapred.tasktracker.map.tasks.maximum</name>
>>
>>   <value>2</value>
>>
>>   <description>
>>
>>     define mapred.map tasks to be number of slave hosts
>>
>>   </description>
>>
>> </property>
>>
>>
>>
>> <!-- Gibt an wieviele ReduceJobs gleichzeitig laufen duerfen-->
>>
>> <property>
>>
>>   <name>mapred.tasktracker.reduce.tasks.maximum</name>
>>
>>   <value>2</value>
>>
>>   <description>
>>
>>     define mapred.reduce tasks to be number of slave hosts
>>
>>   </description>
>>
>> </property>
>>
>>
>>
>> <property>
>>
>>   <name>mapred.child.java.opts</name>
>>
>>   <value>-Xmx1500m</value>
>>
>> </property>
>>
>>
>>
>> <property>
>>
>>   <name>mapred.jobtracker.restart.recover</name>
>>
>>   <value>true</value>
>>
>> </property>
>>
>>
>>
>> <!-- Die naechsten Einstellungen geben an wo das HadoopFS seine Datein
>> auf der Festplatte jeder Instanz speichert. -->
>>
>> <property>
>>
>>   <name>dfs.name.dir</name>
>>
>>   <value>/nutch/filesystem/name</value>
>>
>> </property>
>>
>>
>>
>> <property>
>>
>>   <name>dfs.data.dir</name>
>>
>>   <value>/nutch/filesystem/data</value>
>>
>> </property>
>>
>>
>>
>> <property>
>>
>>   <name>mapred.system.dir</name>
>>
>>   <value>/nutch/filesystem/mapreduce/system</value>
>>
>> </property>
>>
>>
>>
>> <property>
>>
>>   <name>mapred.local.dir</name>
>>
>>   <value>/nutch/filesystem/mapreduce/local</value>
>>
>> </property>
>>
>>
>>
>> <!-- Gibt an wieviele Replikate einer Datei im Dateisystem vorhanden
>> sein muessen damit sie erreichbar ist. Am Anfang 1 -->
>>
>> <property>
>>
>>   <name>dfs.replication</name>
>>
>>   <value>2</value>
>>
>> </property>
>>
>>
>>
>> I hope someone can help me.
>>
>>
>>
>> Thanks
>>
>>
>>
>> Tom
>>
>>
>
>
> --
> -MilleBii-
>


--
-MilleBii-