Running the Crawl without using bin/nutch in side a scala program

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Running the Crawl without using bin/nutch in side a scala program

Sailaja Dhiviti
Hi ,
        I am trying to run the crawl inside a scala program without using bin/nutch command, I am adding all the environment variables which are set by nutch.sh when crawl is running through bin/nutch command. And i am calling the Crawl.main(prams) class and i am getting the following error Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:113)
and here is the code i am trying to write
for {
       (line) <- Source.fromFile("/root/classpaths.sh").getLines
     } if(line != null){
       
    var bo:Array[Byte] = new Array[Byte](100);
    var cmd:Array[String] = new Array[String](3);
     cmd(0)="bash"
     cmd(1)="-c"
     cmd(2)=line;
      var checkingCrawl:Process = Runtime.getRuntime().exec(cmd);
      }
      var params:Array[String] = new Array[String](5);
      params(0)="urls"
      params(1)="-dir"
      params(2)="insidejava"
      params(3)="-depth"
      params(4)="1"
      org.apache.nutch.crawl.Crawl.main(params);



contents of classpaths.sh:

JAVA=$JAVA_HOME/bin/java
JAVA_HEAP_MAX=-Xmx1000m

# check envvars which might override default args
if [ "$NUTCH_HEAPSIZE" != "" ]; then
  #echo "run with heapsize $NUTCH_HEAPSIZE"
  JAVA_HEAP_MAX="-Xmx""$NUTCH_HEAPSIZE""m"
  #echo $JAVA_HEAP_MAX
fi

# CLASSPATH initially contains $NUTCH_CONF_DIR, or defaults to $NUTCH_HOME/conf
CLASSPATH=${NUTCH_CONF_DIR:=$NUTCH_HOME/conf}
CLASSPATH=${CLASSPATH}:$JAVA_HOME/lib/tools.jar

# so that filenames w/ spaces are handled correctly in loops below
IFS=

# for developers, add plugins, job & test code to CLASSPATH
if [ -d "$NUTCH_HOME/build/plugins" ]; then
  CLASSPATH=${CLASSPATH}:$NUTCH_HOME/build
fi
if [ -d "$NUTCH_HOME/build/test/classes" ]; then
  CLASSPATH=${CLASSPATH}:$NUTCH_HOME/build/test/classes
fi

if [ $IS_CORE == 0 ]
then
  for f in $NUTCH_HOME/build/nutch-*.job; do
    CLASSPATH=${CLASSPATH}:$f;
  done

  # for releases, add Nutch job to CLASSPATH
  for f in $NUTCH_HOME/nutch-*.job; do
    CLASSPATH=${CLASSPATH}:$f;
  done
else
  CLASSPATH=${CLASSPATH}:$NUTCH_HOME/build/classes
fi
# add plugins to classpath
if [ -d "$NUTCH_HOME/plugins" ]; then
  CLASSPATH=${NUTCH_HOME}:${CLASSPATH}
fi
# add libs to CLASSPATH
for f in $NUTCH_HOME/lib/*.jar; do
  CLASSPATH=${CLASSPATH}:$f;
done

for f in $NUTCH_HOME/lib/jetty-ext/*.jar; do
  CLASSPATH=${CLASSPATH}:$f;
done

# setup 'java.library.path' for native-hadoop code if necessary
JAVA_LIBRARY_PATH=''
if [ -d "${NUTCH_HOME}/build/native" -o -d "${NUTCH_HOME}/lib/native" ]; then
  JAVA_PLATFORM=`CLASSPATH=${CLASSPATH} ${JAVA} org.apache.hadoop.util.PlatformName | sed -e 's/ /_/g'`

  if [ -d "$NUTCH_HOME/build/native" ]; then
    JAVA_LIBRARY_PATH=${HADOOP_HOME}/build/native/${JAVA_PLATFORM}/lib
  fi

  if [ -d "${NUTCH_HOME}/lib/native" ]; then
    if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then
      JAVA_LIBRARY_PATH=${JAVA_LIBRARY_PATH}:${NUTCH_HOME}/lib/native/${JAVA_PLATFORM}
    else
      JAVA_LIBRARY_PATH=${NUTCH_HOME}/lib/native/${JAVA_PLATFORM}
    fi
  fi
fi

# restore ordinary behaviour
unset IFS

# default log directory & file
if [ "$NUTCH_LOG_DIR" = "" ]; then
  NUTCH_LOG_DIR="$NUTCH_HOME/logs"
fi
if [ "$NUTCH_LOGFILE" = "" ]; then
  NUTCH_LOGFILE='hadoop.log'
fi
NUTCH_OPTS="$NUTCH_OPTS -Dhadoop.log.dir=$NUTCH_LOG_DIR"
NUTCH_OPTS="$NUTCH_OPTS -Dhadoop.log.file=$NUTCH_LOGFILE"

if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then
  NUTCH_OPTS="$NUTCH_OPTS -Djava.library.path=$JAVA_LIBRARY_PATH"
fi


contents of hadoop.log:

2009-07-27 18:48:55,345 INFO  crawl.Crawl - crawl started in: insidejava
2009-07-27 18:48:55,347 INFO  crawl.Crawl - rootUrlDir = urls
2009-07-27 18:48:55,347 INFO  crawl.Crawl - threads = 10
2009-07-27 18:48:55,347 INFO  crawl.Crawl - depth = 1
2009-07-27 18:48:55,779 INFO  crawl.Injector - Injector: starting
2009-07-27 18:48:55,780 INFO  crawl.Injector - Injector: crawlDb: insidejava/crawldb
2009-07-27 18:48:55,781 INFO  crawl.Injector - Injector: urlDir: urls
2009-07-27 18:48:55,781 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
2009-07-27 18:48:55,974 WARN  mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2009-07-27 18:49:19,685 WARN  plugin.PluginRepository - Plugins: not a file: url. Can't load plugins from: jar:file:/nutch-1.0/crawler/nutch-1.0.job!/plugins
2009-07-27 18:49:19,686 INFO  plugin.PluginRepository - Plugin Auto-activation mode: [true]
2009-07-27 18:49:19,686 INFO  plugin.PluginRepository - Registered Plugins:
2009-07-27 18:49:19,686 INFO  plugin.PluginRepository - NONE
2009-07-27 18:49:19,686 INFO  plugin.PluginRepository - Registered Extension-Points:
2009-07-27 18:49:19,686 INFO  plugin.PluginRepository - NONE
2009-07-27 18:49:19,689 WARN  mapred.LocalJobRunner - job_local_0001
java.lang.RuntimeException: x point org.apache.nutch.net.URLNormalizer not found.
        at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:122)
        at org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:57)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
        at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:338)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)

    how to solve this issue any idea please reply to this...

Thanks in advance..

----Sailaja



DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Reply | Threaded
Open this post in threaded view
|

Re: Running the Crawl without using bin/nutch in side a scala program

Doğacan Güney-3
On Mon, Jul 27, 2009 at 16:47, Sailaja
Dhiviti<[hidden email]> wrote:

> Hi ,
>        I am trying to run the crawl inside a scala program without using bin/nutch command, I am adding all the environment variables which are set by nutch.sh when crawl is running through bin/nutch command. And i am calling the Crawl.main(prams) class and i am getting the following error Exception in thread "main" java.io.IOException: Job failed!
>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
>        at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
>        at org.apache.nutch.crawl.Crawl.main(Crawl.java:113)
> and here is the code i am trying to write
> for {
>       (line) <- Source.fromFile("/root/classpaths.sh").getLines
>     } if(line != null){
>
>    var bo:Array[Byte] = new Array[Byte](100);
>    var cmd:Array[String] = new Array[String](3);
>     cmd(0)="bash"
>     cmd(1)="-c"
>     cmd(2)=line;
>      var checkingCrawl:Process = Runtime.getRuntime().exec(cmd);
>      }
>      var params:Array[String] = new Array[String](5);
>      params(0)="urls"
>      params(1)="-dir"
>      params(2)="insidejava"
>      params(3)="-depth"
>      params(4)="1"
>      org.apache.nutch.crawl.Crawl.main(params);
>
>
>
> contents of classpaths.sh:
>
> JAVA=$JAVA_HOME/bin/java
> JAVA_HEAP_MAX=-Xmx1000m
>
> # check envvars which might override default args
> if [ "$NUTCH_HEAPSIZE" != "" ]; then
>  #echo "run with heapsize $NUTCH_HEAPSIZE"
>  JAVA_HEAP_MAX="-Xmx""$NUTCH_HEAPSIZE""m"
>  #echo $JAVA_HEAP_MAX
> fi
>
> # CLASSPATH initially contains $NUTCH_CONF_DIR, or defaults to $NUTCH_HOME/conf
> CLASSPATH=${NUTCH_CONF_DIR:=$NUTCH_HOME/conf}
> CLASSPATH=${CLASSPATH}:$JAVA_HOME/lib/tools.jar
>
> # so that filenames w/ spaces are handled correctly in loops below
> IFS=
>
> # for developers, add plugins, job & test code to CLASSPATH
> if [ -d "$NUTCH_HOME/build/plugins" ]; then
>  CLASSPATH=${CLASSPATH}:$NUTCH_HOME/build
> fi
> if [ -d "$NUTCH_HOME/build/test/classes" ]; then
>  CLASSPATH=${CLASSPATH}:$NUTCH_HOME/build/test/classes
> fi
>
> if [ $IS_CORE == 0 ]
> then
>  for f in $NUTCH_HOME/build/nutch-*.job; do
>    CLASSPATH=${CLASSPATH}:$f;
>  done
>
>  # for releases, add Nutch job to CLASSPATH
>  for f in $NUTCH_HOME/nutch-*.job; do
>    CLASSPATH=${CLASSPATH}:$f;
>  done
> else
>  CLASSPATH=${CLASSPATH}:$NUTCH_HOME/build/classes
> fi
> # add plugins to classpath
> if [ -d "$NUTCH_HOME/plugins" ]; then
>  CLASSPATH=${NUTCH_HOME}:${CLASSPATH}
> fi
> # add libs to CLASSPATH
> for f in $NUTCH_HOME/lib/*.jar; do
>  CLASSPATH=${CLASSPATH}:$f;
> done
>
> for f in $NUTCH_HOME/lib/jetty-ext/*.jar; do
>  CLASSPATH=${CLASSPATH}:$f;
> done
>
> # setup 'java.library.path' for native-hadoop code if necessary
> JAVA_LIBRARY_PATH=''
> if [ -d "${NUTCH_HOME}/build/native" -o -d "${NUTCH_HOME}/lib/native" ]; then
>  JAVA_PLATFORM=`CLASSPATH=${CLASSPATH} ${JAVA} org.apache.hadoop.util.PlatformName | sed -e 's/ /_/g'`
>
>  if [ -d "$NUTCH_HOME/build/native" ]; then
>    JAVA_LIBRARY_PATH=${HADOOP_HOME}/build/native/${JAVA_PLATFORM}/lib
>  fi
>
>  if [ -d "${NUTCH_HOME}/lib/native" ]; then
>    if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then
>      JAVA_LIBRARY_PATH=${JAVA_LIBRARY_PATH}:${NUTCH_HOME}/lib/native/${JAVA_PLATFORM}
>    else
>      JAVA_LIBRARY_PATH=${NUTCH_HOME}/lib/native/${JAVA_PLATFORM}
>    fi
>  fi
> fi
>
> # restore ordinary behaviour
> unset IFS
>
> # default log directory & file
> if [ "$NUTCH_LOG_DIR" = "" ]; then
>  NUTCH_LOG_DIR="$NUTCH_HOME/logs"
> fi
> if [ "$NUTCH_LOGFILE" = "" ]; then
>  NUTCH_LOGFILE='hadoop.log'
> fi
> NUTCH_OPTS="$NUTCH_OPTS -Dhadoop.log.dir=$NUTCH_LOG_DIR"
> NUTCH_OPTS="$NUTCH_OPTS -Dhadoop.log.file=$NUTCH_LOGFILE"
>
> if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then
>  NUTCH_OPTS="$NUTCH_OPTS -Djava.library.path=$JAVA_LIBRARY_PATH"
> fi
>
>
> contents of hadoop.log:
>
> 2009-07-27 18:48:55,345 INFO  crawl.Crawl - crawl started in: insidejava
> 2009-07-27 18:48:55,347 INFO  crawl.Crawl - rootUrlDir = urls
> 2009-07-27 18:48:55,347 INFO  crawl.Crawl - threads = 10
> 2009-07-27 18:48:55,347 INFO  crawl.Crawl - depth = 1
> 2009-07-27 18:48:55,779 INFO  crawl.Injector - Injector: starting
> 2009-07-27 18:48:55,780 INFO  crawl.Injector - Injector: crawlDb: insidejava/crawldb
> 2009-07-27 18:48:55,781 INFO  crawl.Injector - Injector: urlDir: urls
> 2009-07-27 18:48:55,781 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2009-07-27 18:48:55,974 WARN  mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 2009-07-27 18:49:19,685 WARN  plugin.PluginRepository - Plugins: not a file: url. Can't load plugins from: jar:file:/nutch-1.0/crawler/nutch-1.0.job!/plugins
> 2009-07-27 18:49:19,686 INFO  plugin.PluginRepository - Plugin Auto-activation mode: [true]
> 2009-07-27 18:49:19,686 INFO  plugin.PluginRepository - Registered Plugins:
> 2009-07-27 18:49:19,686 INFO  plugin.PluginRepository -         NONE
> 2009-07-27 18:49:19,686 INFO  plugin.PluginRepository - Registered Extension-Points:
> 2009-07-27 18:49:19,686 INFO  plugin.PluginRepository -         NONE
> 2009-07-27 18:49:19,689 WARN  mapred.LocalJobRunner - job_local_0001
> java.lang.RuntimeException: x point org.apache.nutch.net.URLNormalizer not found.
>        at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:122)
>        at org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:57)
>        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
>        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
>        at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
>        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:338)
>        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
>
>    how to solve this issue any idea please reply to this...
>

I think $nutch/build/plugins is not in your classpath, but I am not sure.

> Thanks in advance..
>
> ----Sailaja
>
>
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
>



--
Doğacan Güney
Reply | Threaded
Open this post in threaded view
|

RE: Running the Crawl without using bin/nutch in side a scala program

Sailaja Dhiviti
$nutch/build/plugins is mentioned in my classpath but its still showing the error is there any other approach to implement the crawl with out using crawl
-----Original Message-----
From: Doğacan Güney [mailto:[hidden email]]
Sent: Monday, July 27, 2009 7:32 PM
To: [hidden email]
Subject: Re: Running the Crawl without using bin/nutch in side a scala program

On Mon, Jul 27, 2009 at 16:47, Sailaja
Dhiviti<[hidden email]> wrote:

> Hi ,
>        I am trying to run the crawl inside a scala program without using bin/nutch command, I am adding all the environment variables which are set by nutch.sh when crawl is running through bin/nutch command. And i am calling the Crawl.main(prams) class and i am getting the following error Exception in thread "main" java.io.IOException: Job failed!
>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
>        at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
>        at org.apache.nutch.crawl.Crawl.main(Crawl.java:113)
> and here is the code i am trying to write
> for {
>       (line) <- Source.fromFile("/root/classpaths.sh").getLines
>     } if(line != null){
>
>    var bo:Array[Byte] = new Array[Byte](100);
>    var cmd:Array[String] = new Array[String](3);
>     cmd(0)="bash"
>     cmd(1)="-c"
>     cmd(2)=line;
>      var checkingCrawl:Process = Runtime.getRuntime().exec(cmd);
>      }
>      var params:Array[String] = new Array[String](5);
>      params(0)="urls"
>      params(1)="-dir"
>      params(2)="insidejava"
>      params(3)="-depth"
>      params(4)="1"
>      org.apache.nutch.crawl.Crawl.main(params);
>
>
>
> contents of classpaths.sh:
>
> JAVA=$JAVA_HOME/bin/java
> JAVA_HEAP_MAX=-Xmx1000m
>
> # check envvars which might override default args
> if [ "$NUTCH_HEAPSIZE" != "" ]; then
>  #echo "run with heapsize $NUTCH_HEAPSIZE"
>  JAVA_HEAP_MAX="-Xmx""$NUTCH_HEAPSIZE""m"
>  #echo $JAVA_HEAP_MAX
> fi
>
> # CLASSPATH initially contains $NUTCH_CONF_DIR, or defaults to $NUTCH_HOME/conf
> CLASSPATH=${NUTCH_CONF_DIR:=$NUTCH_HOME/conf}
> CLASSPATH=${CLASSPATH}:$JAVA_HOME/lib/tools.jar
>
> # so that filenames w/ spaces are handled correctly in loops below
> IFS=
>
> # for developers, add plugins, job & test code to CLASSPATH
> if [ -d "$NUTCH_HOME/build/plugins" ]; then
>  CLASSPATH=${CLASSPATH}:$NUTCH_HOME/build
> fi
> if [ -d "$NUTCH_HOME/build/test/classes" ]; then
>  CLASSPATH=${CLASSPATH}:$NUTCH_HOME/build/test/classes
> fi
>
> if [ $IS_CORE == 0 ]
> then
>  for f in $NUTCH_HOME/build/nutch-*.job; do
>    CLASSPATH=${CLASSPATH}:$f;
>  done
>
>  # for releases, add Nutch job to CLASSPATH
>  for f in $NUTCH_HOME/nutch-*.job; do
>    CLASSPATH=${CLASSPATH}:$f;
>  done
> else
>  CLASSPATH=${CLASSPATH}:$NUTCH_HOME/build/classes
> fi
> # add plugins to classpath
> if [ -d "$NUTCH_HOME/plugins" ]; then
>  CLASSPATH=${NUTCH_HOME}:${CLASSPATH}
> fi
> # add libs to CLASSPATH
> for f in $NUTCH_HOME/lib/*.jar; do
>  CLASSPATH=${CLASSPATH}:$f;
> done
>
> for f in $NUTCH_HOME/lib/jetty-ext/*.jar; do
>  CLASSPATH=${CLASSPATH}:$f;
> done
>
> # setup 'java.library.path' for native-hadoop code if necessary
> JAVA_LIBRARY_PATH=''
> if [ -d "${NUTCH_HOME}/build/native" -o -d "${NUTCH_HOME}/lib/native" ]; then
>  JAVA_PLATFORM=`CLASSPATH=${CLASSPATH} ${JAVA} org.apache.hadoop.util.PlatformName | sed -e 's/ /_/g'`
>
>  if [ -d "$NUTCH_HOME/build/native" ]; then
>    JAVA_LIBRARY_PATH=${HADOOP_HOME}/build/native/${JAVA_PLATFORM}/lib
>  fi
>
>  if [ -d "${NUTCH_HOME}/lib/native" ]; then
>    if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then
>      JAVA_LIBRARY_PATH=${JAVA_LIBRARY_PATH}:${NUTCH_HOME}/lib/native/${JAVA_PLATFORM}
>    else
>      JAVA_LIBRARY_PATH=${NUTCH_HOME}/lib/native/${JAVA_PLATFORM}
>    fi
>  fi
> fi
>
> # restore ordinary behaviour
> unset IFS
>
> # default log directory & file
> if [ "$NUTCH_LOG_DIR" = "" ]; then
>  NUTCH_LOG_DIR="$NUTCH_HOME/logs"
> fi
> if [ "$NUTCH_LOGFILE" = "" ]; then
>  NUTCH_LOGFILE='hadoop.log'
> fi
> NUTCH_OPTS="$NUTCH_OPTS -Dhadoop.log.dir=$NUTCH_LOG_DIR"
> NUTCH_OPTS="$NUTCH_OPTS -Dhadoop.log.file=$NUTCH_LOGFILE"
>
> if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then
>  NUTCH_OPTS="$NUTCH_OPTS -Djava.library.path=$JAVA_LIBRARY_PATH"
> fi
>
>
> contents of hadoop.log:
>
> 2009-07-27 18:48:55,345 INFO  crawl.Crawl - crawl started in: insidejava
> 2009-07-27 18:48:55,347 INFO  crawl.Crawl - rootUrlDir = urls
> 2009-07-27 18:48:55,347 INFO  crawl.Crawl - threads = 10
> 2009-07-27 18:48:55,347 INFO  crawl.Crawl - depth = 1
> 2009-07-27 18:48:55,779 INFO  crawl.Injector - Injector: starting
> 2009-07-27 18:48:55,780 INFO  crawl.Injector - Injector: crawlDb: insidejava/crawldb
> 2009-07-27 18:48:55,781 INFO  crawl.Injector - Injector: urlDir: urls
> 2009-07-27 18:48:55,781 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2009-07-27 18:48:55,974 WARN  mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 2009-07-27 18:49:19,685 WARN  plugin.PluginRepository - Plugins: not a file: url. Can't load plugins from: jar:file:/nutch-1.0/crawler/nutch-1.0.job!/plugins
> 2009-07-27 18:49:19,686 INFO  plugin.PluginRepository - Plugin Auto-activation mode: [true]
> 2009-07-27 18:49:19,686 INFO  plugin.PluginRepository - Registered Plugins:
> 2009-07-27 18:49:19,686 INFO  plugin.PluginRepository -         NONE
> 2009-07-27 18:49:19,686 INFO  plugin.PluginRepository - Registered Extension-Points:
> 2009-07-27 18:49:19,686 INFO  plugin.PluginRepository -         NONE
> 2009-07-27 18:49:19,689 WARN  mapred.LocalJobRunner - job_local_0001
> java.lang.RuntimeException: x point org.apache.nutch.net.URLNormalizer not found.
>        at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:122)
>        at org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:57)
>        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
>        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
>        at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
>        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:338)
>        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
>
>    how to solve this issue any idea please reply to this...
>

I think $nutch/build/plugins is not in your classpath, but I am not sure.

> Thanks in advance..
>
> ----Sailaja
>
>
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
>



--
Doğacan Güney

DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.