nutch internet crawling help

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

nutch internet crawling help

NIDHI MALIK

Hello,
      I am facing problem in using Nutch to crawl data from web. I have
configured Nutch-site.XML and Nutch-default.XML but still "HTTP 407
error authentication failure" message is displayed. I have also set
the http_proxies.

I have also tried wget. at the time of local crawling The following msg is
displayed.

------------------------------
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException:
/home/nidhi/Nutch_Installation/nutch-0.8.1/logs/hadoop.log (Permission
denied)
        at java.io.FileOutputStream.openAppend(Native Method)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:177)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:102)
        at org.apache.log4j.FileAppender.setFile(FileAppender.java:289)
        at
org.apache.log4j.FileAppender.activateOptions(FileAppender.java:163)
        at
org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:215)
        at
org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:256)
        at
org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:132)
        at
org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:96)
        at
org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:654)
        at
org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:612)
        at
org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:509)
        at
org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:415)
        at
org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:441)
        at
org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:468)
        at org.apache.log4j.LogManager.<clinit>(LogManager.java:122)
        at org.apache.log4j.Logger.getLogger(Logger.java:104)
        at
org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:229)
        at
org.apache.commons.logging.impl.Log4JLogger.<init>(Log4JLogger.java:65)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at
org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:529)
        at
org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:235)
        at
org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:209)
        at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:351)
        at org.apache.nutch.crawl.Injector.<clinit>(Injector.java:40)



------------------------------


Can anyone plz suggest the solution.


Thanks