|
Hi ,
I have been trying to run a program that takes the first 10 hits of a Nutch query and writes the parse text of the respective urls in separate files ....The code : import org.apache.hadoop.conf.Configuration; import org.apache.nutch.searcher.Hit; import org.apache.nutch.searcher.HitDetails; import org.apache.nutch.searcher.Hits; import org.apache.nutch.searcher.NutchBean; import org.apache.nutch.searcher.Query; import java.io.*; public class SearchApp { private static final int NUM_HITS = 10; public static void main(String[] args) throws IOException { if (args.length == 0) { String usage = "Usage: SearchApp query"; System.err.println(usage); System.exit(-1); } Configuration conf =new Configuration(); NutchBean bean = new NutchBean(conf); Query query = Query.parse(args[0],conf); Hits hits = bean.search(query, NUM_HITS); for (int i = 0; i < hits.getLength(); i++) { Hit hit = hits.getHit(i); HitDetails details = bean.getDetails(hit); String s = new String(); s = bean.getParseText(details).getText(); //write s to a file ; try { FileWriter outf = new FileWriter(args[0]+i); PrintWriter outp = new PrintWriter(outf); outp.println(s); } catch(IOException e) { System.out.println("\nCould not write to file"); } } } } So, I compiled the code after making some additions to the CLASSPATH variable.But when i ran it in the terminal,it showed an error like 'plugins.folder not defined ' .. How can i solve this problem ?? |
|
Hi,
I assume that you are probably running this program in Eclipse or some other IDE. However, you need to include the "path-to-nutch/conf" directory in your classpath. Otherwise the configuration files are not parsed/found on start-up. "plugins.folder" is a key from "nutch-default.xml" or " nutch-site.xml". Hope this helps, Martin On Feb 5, 2008 8:35 AM, devj <[hidden email]> wrote: > > Hi , > I have been trying to run a program that takes the first 10 hits of a > Nutch > query and writes the parse text of the respective urls in separate files > ....The code : > > import org.apache.hadoop.conf.Configuration; > import org.apache.nutch.searcher.Hit; > > import org.apache.nutch.searcher.HitDetails; > > import org.apache.nutch.searcher.Hits; > > import org.apache.nutch.searcher.NutchBean; > > import org.apache.nutch.searcher.Query; > > import java.io.*; > > > > public class SearchApp { > > > > private static final int NUM_HITS = 10; > > > > public static void main(String[] args) > > throws IOException { > > > > if (args.length == 0) { > > String usage = "Usage: SearchApp query"; > > System.err.println(usage); > > System.exit(-1); > > } > > Configuration conf =new Configuration(); > > NutchBean bean = new NutchBean(conf); > > Query query = Query.parse(args[0],conf); > > Hits hits = bean.search(query, NUM_HITS); > > > > for (int i = 0; i < hits.getLength(); i++) { > > Hit hit = hits.getHit(i); > > HitDetails details = bean.getDetails(hit); > > String s = new String(); > > s = bean.getParseText(details).getText(); > > //write s to a file ; > > try > > { > > FileWriter outf = new FileWriter(args[0]+i); > > PrintWriter outp = new PrintWriter(outf); > > outp.println(s); > > } > > catch(IOException e) > > { > > System.out.println("\nCould not write to file"); > > } > > > > } > > > > } > } > > So, I compiled the code after making some additions to the CLASSPATH > variable.But when i ran it in the terminal,it showed an error like > 'plugins.folder not defined ' .. > How can i solve this problem ?? > > > > > -- > View this message in context: > http://www.nabble.com/Urgent-help-reqd.....plz-tp15284835p15284835.html > Sent from the Nutch - User mailing list archive at Nabble.com. > > |
|
Hi,
I am trying to run this program from a bash terminal.I added the $NUTCH_HOME/conf folder to the classpath as u suggested...still i dont see it running... Heres the error text : 08/02/05 21:29:30 INFO searcher.NutchBean: opening indexes in crawl/indexes Exception in thread "main" java.lang.IllegalArgumentException: plugin.folders is not defined at org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifestParser.java:78) at org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:71) at org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:95) at org.apache.nutch.searcher.QueryFilters.<init>(QueryFilters.java:57) at org.apache.nutch.searcher.IndexSearcher.init(IndexSearcher.java:79) at org.apache.nutch.searcher.IndexSearcher.<init>(IndexSearcher.java:63) at org.apache.nutch.searcher.NutchBean.init(NutchBean.java:140) at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:106) at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:84) at SearchApp.main(SearchApp.java:22)
|
|
if running in eclipse, do an ant build and add the build/nutch-1.0-dev
folder to the classpath, then edit that and exclude everything within that folder except the plugins directory. If running outside of eclipse then you will need to include the parent of the plugins folder in the classpath. Dennis devj wrote: > Hi, > I am trying to run this program from a bash terminal.I added the > $NUTCH_HOME/conf folder to the classpath as u suggested...still i dont see > it running... > > Heres the error text : > > 08/02/05 21:29:30 INFO searcher.NutchBean: opening indexes in crawl/indexes > Exception in thread "main" java.lang.IllegalArgumentException: > plugin.folders is not defined > at > org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifestParser.java:78) > at > org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:71) > at > org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:95) > at > org.apache.nutch.searcher.QueryFilters.<init>(QueryFilters.java:57) > at > org.apache.nutch.searcher.IndexSearcher.init(IndexSearcher.java:79) > at > org.apache.nutch.searcher.IndexSearcher.<init>(IndexSearcher.java:63) > at org.apache.nutch.searcher.NutchBean.init(NutchBean.java:140) > at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:106) > at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:84) > at SearchApp.main(SearchApp.java:22) > > > Martin Kuen wrote: >> Hi, >> >> I assume that you are probably running this program in Eclipse or some >> other >> IDE. However, you need to include the "path-to-nutch/conf" directory in >> your >> classpath. Otherwise the configuration files are not parsed/found on >> start-up. "plugins.folder" is a key from "nutch-default.xml" or " >> nutch-site.xml". >> >> >> Hope this helps, >> >> Martin >> >> On Feb 5, 2008 8:35 AM, devj <[hidden email]> wrote: >> >>> Hi , >>> I have been trying to run a program that takes the first 10 hits of a >>> Nutch >>> query and writes the parse text of the respective urls in separate files >>> ....The code : >>> >>> import org.apache.hadoop.conf.Configuration; >>> import org.apache.nutch.searcher.Hit; >>> >>> import org.apache.nutch.searcher.HitDetails; >>> >>> import org.apache.nutch.searcher.Hits; >>> >>> import org.apache.nutch.searcher.NutchBean; >>> >>> import org.apache.nutch.searcher.Query; >>> >>> import java.io.*; >>> >>> >>> >>> public class SearchApp { >>> >>> >>> >>> private static final int NUM_HITS = 10; >>> >>> >>> >>> public static void main(String[] args) >>> >>> throws IOException { >>> >>> >>> >>> if (args.length == 0) { >>> >>> String usage = "Usage: SearchApp query"; >>> >>> System.err.println(usage); >>> >>> System.exit(-1); >>> >>> } >>> >>> Configuration conf =new Configuration(); >>> >>> NutchBean bean = new NutchBean(conf); >>> >>> Query query = Query.parse(args[0],conf); >>> >>> Hits hits = bean.search(query, NUM_HITS); >>> >>> >>> >>> for (int i = 0; i < hits.getLength(); i++) { >>> >>> Hit hit = hits.getHit(i); >>> >>> HitDetails details = bean.getDetails(hit); >>> >>> String s = new String(); >>> >>> s = bean.getParseText(details).getText(); >>> >>> //write s to a file ; >>> >>> try >>> >>> { >>> >>> FileWriter outf = new FileWriter(args[0]+i); >>> >>> PrintWriter outp = new PrintWriter(outf); >>> >>> outp.println(s); >>> >>> } >>> >>> catch(IOException e) >>> >>> { >>> >>> System.out.println("\nCould not write to file"); >>> >>> } >>> >>> >>> >>> } >>> >>> >>> >>> } >>> } >>> >>> So, I compiled the code after making some additions to the CLASSPATH >>> variable.But when i ran it in the terminal,it showed an error like >>> 'plugins.folder not defined ' .. >>> How can i solve this problem ?? >>> >>> >>> >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/Urgent-help-reqd.....plz-tp15284835p15284835.html >>> Sent from the Nutch - User mailing list archive at Nabble.com. >>> >>> >> > |
|
Hi,
Since i am running it in the terminal (which is outside of Eclipse and which i havent installed btw) i added the parent of the plugins folder ,which is $NUTCH_HOME variable to the classpath.. But the problem is still there...
|
|
The conf directory would need to be in the classpath. You would have a
nutch-site.xml file, amoung others, in the conf directory. That file would need to specify the plugins.folder variable with a value of plugins. Or you would need to have the nutch-default file in the conf directory which by default would have the correct value for plugins. The parent of the plugins would need to be in your classpath. You would also need to specify a searcher.dir directory pointing to either the absolute path of the parent of your indexes directory or to a directory that contains search-servers.txt. Please give more info about your layout, the conf directory, file in it, and the version of Nutch you are using. Dennis devj wrote: > Hi, > Since i am running it in the terminal (which is outside of Eclipse and which > i havent installed btw) > i added the parent of the plugins folder ,which is $NUTCH_HOME variable to > the classpath.. > But the problem is still there... > > Dennis Kubes-2 wrote: >> if running in eclipse, do an ant build and add the build/nutch-1.0-dev >> folder to the classpath, then edit that and exclude everything within >> that folder except the plugins directory. If running outside of eclipse >> then you will need to include the parent of the plugins folder in the >> classpath. >> >> Dennis >> >> devj wrote: >>> Hi, >>> I am trying to run this program from a bash terminal.I added the >>> $NUTCH_HOME/conf folder to the classpath as u suggested...still i dont >>> see >>> it running... >>> >>> Heres the error text : >>> >>> 08/02/05 21:29:30 INFO searcher.NutchBean: opening indexes in >>> crawl/indexes >>> Exception in thread "main" java.lang.IllegalArgumentException: >>> plugin.folders is not defined >>> at >>> org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifestParser.java:78) >>> at >>> org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:71) >>> at >>> org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:95) >>> at >>> org.apache.nutch.searcher.QueryFilters.<init>(QueryFilters.java:57) >>> at >>> org.apache.nutch.searcher.IndexSearcher.init(IndexSearcher.java:79) >>> at >>> org.apache.nutch.searcher.IndexSearcher.<init>(IndexSearcher.java:63) >>> at org.apache.nutch.searcher.NutchBean.init(NutchBean.java:140) >>> at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:106) >>> at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:84) >>> at SearchApp.main(SearchApp.java:22) >>> >>> >>> Martin Kuen wrote: >>>> Hi, >>>> >>>> I assume that you are probably running this program in Eclipse or some >>>> other >>>> IDE. However, you need to include the "path-to-nutch/conf" directory in >>>> your >>>> classpath. Otherwise the configuration files are not parsed/found on >>>> start-up. "plugins.folder" is a key from "nutch-default.xml" or " >>>> nutch-site.xml". >>>> >>>> >>>> Hope this helps, >>>> >>>> Martin >>>> >>>> On Feb 5, 2008 8:35 AM, devj <[hidden email]> wrote: >>>> >>>>> Hi , >>>>> I have been trying to run a program that takes the first 10 hits of a >>>>> Nutch >>>>> query and writes the parse text of the respective urls in separate >>>>> files >>>>> ....The code : >>>>> >>>>> import org.apache.hadoop.conf.Configuration; >>>>> import org.apache.nutch.searcher.Hit; >>>>> >>>>> import org.apache.nutch.searcher.HitDetails; >>>>> >>>>> import org.apache.nutch.searcher.Hits; >>>>> >>>>> import org.apache.nutch.searcher.NutchBean; >>>>> >>>>> import org.apache.nutch.searcher.Query; >>>>> >>>>> import java.io.*; >>>>> >>>>> >>>>> >>>>> public class SearchApp { >>>>> >>>>> >>>>> >>>>> private static final int NUM_HITS = 10; >>>>> >>>>> >>>>> >>>>> public static void main(String[] args) >>>>> >>>>> throws IOException { >>>>> >>>>> >>>>> >>>>> if (args.length == 0) { >>>>> >>>>> String usage = "Usage: SearchApp query"; >>>>> >>>>> System.err.println(usage); >>>>> >>>>> System.exit(-1); >>>>> >>>>> } >>>>> >>>>> Configuration conf =new Configuration(); >>>>> >>>>> NutchBean bean = new NutchBean(conf); >>>>> >>>>> Query query = Query.parse(args[0],conf); >>>>> >>>>> Hits hits = bean.search(query, NUM_HITS); >>>>> >>>>> >>>>> >>>>> for (int i = 0; i < hits.getLength(); i++) { >>>>> >>>>> Hit hit = hits.getHit(i); >>>>> >>>>> HitDetails details = bean.getDetails(hit); >>>>> >>>>> String s = new String(); >>>>> >>>>> s = bean.getParseText(details).getText(); >>>>> >>>>> //write s to a file ; >>>>> >>>>> try >>>>> >>>>> { >>>>> >>>>> FileWriter outf = new FileWriter(args[0]+i); >>>>> >>>>> PrintWriter outp = new PrintWriter(outf); >>>>> >>>>> outp.println(s); >>>>> >>>>> } >>>>> >>>>> catch(IOException e) >>>>> >>>>> { >>>>> >>>>> System.out.println("\nCould not write to file"); >>>>> >>>>> } >>>>> >>>>> >>>>> >>>>> } >>>>> >>>>> >>>>> >>>>> } >>>>> } >>>>> >>>>> So, I compiled the code after making some additions to the CLASSPATH >>>>> variable.But when i ran it in the terminal,it showed an error like >>>>> 'plugins.folder not defined ' .. >>>>> How can i solve this problem ?? >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://www.nabble.com/Urgent-help-reqd.....plz-tp15284835p15284835.html >>>>> Sent from the Nutch - User mailing list archive at Nabble.com. >>>>> >>>>> >> > |
|
Hi,
I am using the 0.9 version of Nutch. the layout is : $NUTCH_HOME is at /media/sda1/linux/java/nutch-0.9 conf folder : /media/sda1/linux/java/nutch-0.9/conf which contains the xml files plugins: /media/sda1/linux/java/nutch-0.9/plugins The conf directory is in the classpath ,and the plugins.folder property is fine...I have assigned the absolute path of the plugins folder to it. The parent of the plugins i.e Nutch home folder is in the classpath. I have specified the absolute path of my crawl directory(media/sda1/linux/java/nutch-0.9/bigcrawl) in the searcher.dir property in the nutch-site.xml file. What else am i supposed to do ?? P.S. - When I try to run it, the first line is like : 08/02/06 00:13:09 INFO searcher.NutchBean: opening merged index in crawl/index Is crawl/index the right thing or should it show the absolute path to the search directory or bigcrawl/index
|
|
You have not added nutch-default.xml and nutch-site.xml to your
Configuration object. Adding the following two lines to your code should solve the problem:- conf.addDefaultResource("nutch-default.xml"); conf.addDefaultResource("nutch-site.xml"); Regards, Susam Pal On Feb 6, 2008 12:17 AM, devj <[hidden email]> wrote: > > Hi, > I am using the 0.9 version of Nutch. > the layout is : > $NUTCH_HOME is at /media/sda1/linux/java/nutch-0.9 > conf folder : /media/sda1/linux/java/nutch-0.9/conf which contains the xml > files > plugins: /media/sda1/linux/java/nutch-0.9/plugins > > > The conf directory is in the classpath ,and the plugins.folder property is > fine...I have assigned the absolute path of the plugins folder to it. > > The parent of the plugins i.e Nutch home folder is in the classpath. > I have specified the absolute path of my crawl > directory(media/sda1/linux/java/nutch-0.9/bigcrawl) in the searcher.dir > property in the nutch-site.xml file. > > What else am i supposed to do ?? > > P.S. - When I try to run it, the first line is like : > 08/02/06 00:13:09 INFO searcher.NutchBean: opening merged index in > crawl/index > Is crawl/index the right thing or should it show the absolute path to the > search directory or bigcrawl/index > > > > > Dennis Kubes-2 wrote: > > > > The conf directory would need to be in the classpath. You would have a > > nutch-site.xml file, amoung others, in the conf directory. That file > > would need to specify the plugins.folder variable with a value of > > plugins. Or you would need to have the nutch-default file in the conf > > directory which by default would have the correct value for plugins. > > > > > > The parent of the plugins would need to be in your classpath. You would > > also need to specify a searcher.dir directory pointing to either the > > absolute path of the parent of your indexes directory or to a directory > > that contains search-servers.txt. > > > > Please give more info about your layout, the conf directory, file in it, > > and the version of Nutch you are using. > > > > Dennis > > > > devj wrote: > >> Hi, > >> Since i am running it in the terminal (which is outside of Eclipse and > >> which > >> i havent installed btw) > >> i added the parent of the plugins folder ,which is $NUTCH_HOME variable > >> to > >> the classpath.. > >> But the problem is still there... > >> > >> Dennis Kubes-2 wrote: > >>> if running in eclipse, do an ant build and add the build/nutch-1.0-dev > >>> folder to the classpath, then edit that and exclude everything within > >>> that folder except the plugins directory. If running outside of eclipse > >>> then you will need to include the parent of the plugins folder in the > >>> classpath. > >>> > >>> Dennis > >>> > >>> devj wrote: > >>>> Hi, > >>>> I am trying to run this program from a bash terminal.I added the > >>>> $NUTCH_HOME/conf folder to the classpath as u suggested...still i dont > >>>> see > >>>> it running... > >>>> > >>>> Heres the error text : > >>>> > >>>> 08/02/05 21:29:30 INFO searcher.NutchBean: opening indexes in > >>>> crawl/indexes > >>>> Exception in thread "main" java.lang.IllegalArgumentException: > >>>> plugin.folders is not defined > >>>> at > >>>> org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifestParser.java:78) > >>>> at > >>>> org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:71) > >>>> at > >>>> org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:95) > >>>> at > >>>> org.apache.nutch.searcher.QueryFilters.<init>(QueryFilters.java:57) > >>>> at > >>>> org.apache.nutch.searcher.IndexSearcher.init(IndexSearcher.java:79) > >>>> at > >>>> org.apache.nutch.searcher.IndexSearcher.<init>(IndexSearcher.java:63) > >>>> at org.apache.nutch.searcher.NutchBean.init(NutchBean.java:140) > >>>> at > >>>> org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:106) > >>>> at > >>>> org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:84) > >>>> at SearchApp.main(SearchApp.java:22) > >>>> > >>>> > >>>> Martin Kuen wrote: > >>>>> Hi, > >>>>> > >>>>> I assume that you are probably running this program in Eclipse or some > >>>>> other > >>>>> IDE. However, you need to include the "path-to-nutch/conf" directory > >>>>> in > >>>>> your > >>>>> classpath. Otherwise the configuration files are not parsed/found on > >>>>> start-up. "plugins.folder" is a key from "nutch-default.xml" or " > >>>>> nutch-site.xml". > >>>>> > >>>>> > >>>>> Hope this helps, > >>>>> > >>>>> Martin > >>>>> > >>>>> On Feb 5, 2008 8:35 AM, devj <[hidden email]> wrote: > >>>>> > >>>>>> Hi , > >>>>>> I have been trying to run a program that takes the first 10 hits of a > >>>>>> Nutch > >>>>>> query and writes the parse text of the respective urls in separate > >>>>>> files > >>>>>> ....The code : > >>>>>> > >>>>>> import org.apache.hadoop.conf.Configuration; > >>>>>> import org.apache.nutch.searcher.Hit; > >>>>>> > >>>>>> import org.apache.nutch.searcher.HitDetails; > >>>>>> > >>>>>> import org.apache.nutch.searcher.Hits; > >>>>>> > >>>>>> import org.apache.nutch.searcher.NutchBean; > >>>>>> > >>>>>> import org.apache.nutch.searcher.Query; > >>>>>> > >>>>>> import java.io.*; > >>>>>> > >>>>>> > >>>>>> > >>>>>> public class SearchApp { > >>>>>> > >>>>>> > >>>>>> > >>>>>> private static final int NUM_HITS = 10; > >>>>>> > >>>>>> > >>>>>> > >>>>>> public static void main(String[] args) > >>>>>> > >>>>>> throws IOException { > >>>>>> > >>>>>> > >>>>>> > >>>>>> if (args.length == 0) { > >>>>>> > >>>>>> String usage = "Usage: SearchApp query"; > >>>>>> > >>>>>> System.err.println(usage); > >>>>>> > >>>>>> System.exit(-1); > >>>>>> > >>>>>> } > >>>>>> > >>>>>> Configuration conf =new Configuration(); > >>>>>> > >>>>>> NutchBean bean = new NutchBean(conf); > >>>>>> > >>>>>> Query query = Query.parse(args[0],conf); > >>>>>> > >>>>>> Hits hits = bean.search(query, NUM_HITS); > >>>>>> > >>>>>> > >>>>>> > >>>>>> for (int i = 0; i < hits.getLength(); i++) { > >>>>>> > >>>>>> Hit hit = hits.getHit(i); > >>>>>> > >>>>>> HitDetails details = bean.getDetails(hit); > >>>>>> > >>>>>> String s = new String(); > >>>>>> > >>>>>> s = bean.getParseText(details).getText(); > >>>>>> > >>>>>> //write s to a file ; > >>>>>> > >>>>>> try > >>>>>> > >>>>>> { > >>>>>> > >>>>>> FileWriter outf = new FileWriter(args[0]+i); > >>>>>> > >>>>>> PrintWriter outp = new PrintWriter(outf); > >>>>>> > >>>>>> outp.println(s); > >>>>>> > >>>>>> } > >>>>>> > >>>>>> catch(IOException e) > >>>>>> > >>>>>> { > >>>>>> > >>>>>> System.out.println("\nCould not write to file"); > >>>>>> > >>>>>> } > >>>>>> > >>>>>> > >>>>>> > >>>>>> } > >>>>>> > >>>>>> > >>>>>> > >>>>>> } > >>>>>> } > >>>>>> > >>>>>> So, I compiled the code after making some additions to the CLASSPATH > >>>>>> variable.But when i ran it in the terminal,it showed an error like > >>>>>> 'plugins.folder not defined ' .. > >>>>>> How can i solve this problem ?? > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> View this message in context: > >>>>>> http://www.nabble.com/Urgent-help-reqd.....plz-tp15284835p15284835.html > >>>>>> Sent from the Nutch - User mailing list archive at Nabble.com. > >>>>>> > >>>>>> > >>> > >> > > > > > > -- > View this message in context: http://www.nabble.com/Urgent-help-reqd.....plz-tp15284835p15296837.html > > Sent from the Nutch - User mailing list archive at Nabble.com. > > |
|
Good catch Susam. Instead of including the files directly through add
methods, an easier way would be this: Configuration conf = NutchConfiguration.create(); Dennis Susam Pal wrote: > You have not added nutch-default.xml and nutch-site.xml to your > Configuration object. Adding the following two lines to your code > should solve the problem:- > > conf.addDefaultResource("nutch-default.xml"); > conf.addDefaultResource("nutch-site.xml"); > > Regards, > Susam Pal > > > On Feb 6, 2008 12:17 AM, devj <[hidden email]> wrote: >> Hi, >> I am using the 0.9 version of Nutch. >> the layout is : >> $NUTCH_HOME is at /media/sda1/linux/java/nutch-0.9 >> conf folder : /media/sda1/linux/java/nutch-0.9/conf which contains the xml >> files >> plugins: /media/sda1/linux/java/nutch-0.9/plugins >> >> >> The conf directory is in the classpath ,and the plugins.folder property is >> fine...I have assigned the absolute path of the plugins folder to it. >> >> The parent of the plugins i.e Nutch home folder is in the classpath. >> I have specified the absolute path of my crawl >> directory(media/sda1/linux/java/nutch-0.9/bigcrawl) in the searcher.dir >> property in the nutch-site.xml file. >> >> What else am i supposed to do ?? >> >> P.S. - When I try to run it, the first line is like : >> 08/02/06 00:13:09 INFO searcher.NutchBean: opening merged index in >> crawl/index >> Is crawl/index the right thing or should it show the absolute path to the >> search directory or bigcrawl/index >> >> >> >> >> Dennis Kubes-2 wrote: >>> The conf directory would need to be in the classpath. You would have a >>> nutch-site.xml file, amoung others, in the conf directory. That file >>> would need to specify the plugins.folder variable with a value of >>> plugins. Or you would need to have the nutch-default file in the conf >>> directory which by default would have the correct value for plugins. >>> >>> >>> The parent of the plugins would need to be in your classpath. You would >>> also need to specify a searcher.dir directory pointing to either the >>> absolute path of the parent of your indexes directory or to a directory >>> that contains search-servers.txt. >>> >>> Please give more info about your layout, the conf directory, file in it, >>> and the version of Nutch you are using. >>> >>> Dennis >>> >>> devj wrote: >>>> Hi, >>>> Since i am running it in the terminal (which is outside of Eclipse and >>>> which >>>> i havent installed btw) >>>> i added the parent of the plugins folder ,which is $NUTCH_HOME variable >>>> to >>>> the classpath.. >>>> But the problem is still there... >>>> >>>> Dennis Kubes-2 wrote: >>>>> if running in eclipse, do an ant build and add the build/nutch-1.0-dev >>>>> folder to the classpath, then edit that and exclude everything within >>>>> that folder except the plugins directory. If running outside of eclipse >>>>> then you will need to include the parent of the plugins folder in the >>>>> classpath. >>>>> >>>>> Dennis >>>>> >>>>> devj wrote: >>>>>> Hi, >>>>>> I am trying to run this program from a bash terminal.I added the >>>>>> $NUTCH_HOME/conf folder to the classpath as u suggested...still i dont >>>>>> see >>>>>> it running... >>>>>> >>>>>> Heres the error text : >>>>>> >>>>>> 08/02/05 21:29:30 INFO searcher.NutchBean: opening indexes in >>>>>> crawl/indexes >>>>>> Exception in thread "main" java.lang.IllegalArgumentException: >>>>>> plugin.folders is not defined >>>>>> at >>>>>> org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifestParser.java:78) >>>>>> at >>>>>> org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:71) >>>>>> at >>>>>> org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:95) >>>>>> at >>>>>> org.apache.nutch.searcher.QueryFilters.<init>(QueryFilters.java:57) >>>>>> at >>>>>> org.apache.nutch.searcher.IndexSearcher.init(IndexSearcher.java:79) >>>>>> at >>>>>> org.apache.nutch.searcher.IndexSearcher.<init>(IndexSearcher.java:63) >>>>>> at org.apache.nutch.searcher.NutchBean.init(NutchBean.java:140) >>>>>> at >>>>>> org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:106) >>>>>> at >>>>>> org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:84) >>>>>> at SearchApp.main(SearchApp.java:22) >>>>>> >>>>>> >>>>>> Martin Kuen wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I assume that you are probably running this program in Eclipse or some >>>>>>> other >>>>>>> IDE. However, you need to include the "path-to-nutch/conf" directory >>>>>>> in >>>>>>> your >>>>>>> classpath. Otherwise the configuration files are not parsed/found on >>>>>>> start-up. "plugins.folder" is a key from "nutch-default.xml" or " >>>>>>> nutch-site.xml". >>>>>>> >>>>>>> >>>>>>> Hope this helps, >>>>>>> >>>>>>> Martin >>>>>>> >>>>>>> On Feb 5, 2008 8:35 AM, devj <[hidden email]> wrote: >>>>>>> >>>>>>>> Hi , >>>>>>>> I have been trying to run a program that takes the first 10 hits of a >>>>>>>> Nutch >>>>>>>> query and writes the parse text of the respective urls in separate >>>>>>>> files >>>>>>>> ....The code : >>>>>>>> >>>>>>>> import org.apache.hadoop.conf.Configuration; >>>>>>>> import org.apache.nutch.searcher.Hit; >>>>>>>> >>>>>>>> import org.apache.nutch.searcher.HitDetails; >>>>>>>> >>>>>>>> import org.apache.nutch.searcher.Hits; >>>>>>>> >>>>>>>> import org.apache.nutch.searcher.NutchBean; >>>>>>>> >>>>>>>> import org.apache.nutch.searcher.Query; >>>>>>>> >>>>>>>> import java.io.*; >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> public class SearchApp { >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> private static final int NUM_HITS = 10; >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> public static void main(String[] args) >>>>>>>> >>>>>>>> throws IOException { >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> if (args.length == 0) { >>>>>>>> >>>>>>>> String usage = "Usage: SearchApp query"; >>>>>>>> >>>>>>>> System.err.println(usage); >>>>>>>> >>>>>>>> System.exit(-1); >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> Configuration conf =new Configuration(); >>>>>>>> >>>>>>>> NutchBean bean = new NutchBean(conf); >>>>>>>> >>>>>>>> Query query = Query.parse(args[0],conf); >>>>>>>> >>>>>>>> Hits hits = bean.search(query, NUM_HITS); >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> for (int i = 0; i < hits.getLength(); i++) { >>>>>>>> >>>>>>>> Hit hit = hits.getHit(i); >>>>>>>> >>>>>>>> HitDetails details = bean.getDetails(hit); >>>>>>>> >>>>>>>> String s = new String(); >>>>>>>> >>>>>>>> s = bean.getParseText(details).getText(); >>>>>>>> >>>>>>>> //write s to a file ; >>>>>>>> >>>>>>>> try >>>>>>>> >>>>>>>> { >>>>>>>> >>>>>>>> FileWriter outf = new FileWriter(args[0]+i); >>>>>>>> >>>>>>>> PrintWriter outp = new PrintWriter(outf); >>>>>>>> >>>>>>>> outp.println(s); >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> catch(IOException e) >>>>>>>> >>>>>>>> { >>>>>>>> >>>>>>>> System.out.println("\nCould not write to file"); >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> So, I compiled the code after making some additions to the CLASSPATH >>>>>>>> variable.But when i ran it in the terminal,it showed an error like >>>>>>>> 'plugins.folder not defined ' .. >>>>>>>> How can i solve this problem ?? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> View this message in context: >>>>>>>> http://www.nabble.com/Urgent-help-reqd.....plz-tp15284835p15284835.html >>>>>>>> Sent from the Nutch - User mailing list archive at Nabble.com. >>>>>>>> >>>>>>>> >>> >> -- >> View this message in context: http://www.nabble.com/Urgent-help-reqd.....plz-tp15284835p15296837.html >> >> Sent from the Nutch - User mailing list archive at Nabble.com. >> >> |
|
In reply to this post by Susam Pal
Hi,
So it finally worked.Thanks ,Susam, for the two lines..now i know wat the conf object is for... but now theres another problem I keep getting a no class found error for ' org/apache/commons/cli/ParseException ' .I downloaded the commons sourec package,built it and added the necessary jar file to the classpath..Strangely enough,it still doesnt work !
|
|
devj wrote: > Hi, > So it finally worked.Thanks ,Susam, for the two lines..now i know wat the > conf object is for... > but now theres another problem > I keep getting a no class found error for ' > org/apache/commons/cli/ParseException ' .I downloaded the commons sourec > package,built it and added the necessary jar file to the > classpath..Strangely enough,it still doesnt work ! All of those needed jar files, including cli, are in the lib directory. An easier way to run your command might, if you have a complete nutch install, to do: bin/nutch CLASSNAME When you do that the environment including conf, plugins, lib, etc. is setup for you by the nutch shell script. You would still have to have the nutch-default and nutch-site files added or use NutchConfiguration within your code. Dennis > > Susam Pal wrote: >> You have not added nutch-default.xml and nutch-site.xml to your >> Configuration object. Adding the following two lines to your code >> should solve the problem:- >> >> conf.addDefaultResource("nutch-default.xml"); >> conf.addDefaultResource("nutch-site.xml"); >> >> Regards, >> Susam Pal >> >> >> On Feb 6, 2008 12:17 AM, devj <[hidden email]> wrote: >>> Hi, >>> I am using the 0.9 version of Nutch. >>> the layout is : >>> $NUTCH_HOME is at /media/sda1/linux/java/nutch-0.9 >>> conf folder : /media/sda1/linux/java/nutch-0.9/conf which contains the >>> xml >>> files >>> plugins: /media/sda1/linux/java/nutch-0.9/plugins >>> >>> >>> The conf directory is in the classpath ,and the plugins.folder property >>> is >>> fine...I have assigned the absolute path of the plugins folder to it. >>> >>> The parent of the plugins i.e Nutch home folder is in the classpath. >>> I have specified the absolute path of my crawl >>> directory(media/sda1/linux/java/nutch-0.9/bigcrawl) in the searcher.dir >>> property in the nutch-site.xml file. >>> >>> What else am i supposed to do ?? >>> >>> P.S. - When I try to run it, the first line is like : >>> 08/02/06 00:13:09 INFO searcher.NutchBean: opening merged index in >>> crawl/index >>> Is crawl/index the right thing or should it show the absolute path to the >>> search directory or bigcrawl/index >>> >>> >>> >>> >>> Dennis Kubes-2 wrote: >>>> The conf directory would need to be in the classpath. You would have a >>>> nutch-site.xml file, amoung others, in the conf directory. That file >>>> would need to specify the plugins.folder variable with a value of >>>> plugins. Or you would need to have the nutch-default file in the conf >>>> directory which by default would have the correct value for plugins. >>>> >>>> >>>> The parent of the plugins would need to be in your classpath. You >>> would >>>> also need to specify a searcher.dir directory pointing to either the >>>> absolute path of the parent of your indexes directory or to a directory >>>> that contains search-servers.txt. >>>> >>>> Please give more info about your layout, the conf directory, file in >>> it, >>>> and the version of Nutch you are using. >>>> >>>> Dennis >>>> >>>> devj wrote: >>>>> Hi, >>>>> Since i am running it in the terminal (which is outside of Eclipse and >>>>> which >>>>> i havent installed btw) >>>>> i added the parent of the plugins folder ,which is $NUTCH_HOME >>> variable >>>>> to >>>>> the classpath.. >>>>> But the problem is still there... >>>>> >>>>> Dennis Kubes-2 wrote: >>>>>> if running in eclipse, do an ant build and add the >>> build/nutch-1.0-dev >>>>>> folder to the classpath, then edit that and exclude everything within >>>>>> that folder except the plugins directory. If running outside of >>> eclipse >>>>>> then you will need to include the parent of the plugins folder in the >>>>>> classpath. >>>>>> >>>>>> Dennis >>>>>> >>>>>> devj wrote: >>>>>>> Hi, >>>>>>> I am trying to run this program from a bash terminal.I added the >>>>>>> $NUTCH_HOME/conf folder to the classpath as u suggested...still i >>> dont >>>>>>> see >>>>>>> it running... >>>>>>> >>>>>>> Heres the error text : >>>>>>> >>>>>>> 08/02/05 21:29:30 INFO searcher.NutchBean: opening indexes in >>>>>>> crawl/indexes >>>>>>> Exception in thread "main" java.lang.IllegalArgumentException: >>>>>>> plugin.folders is not defined >>>>>>> at >>>>>>> >>> org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifestParser.java:78) >>>>>>> at >>>>>>> >>> org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:71) >>>>>>> at >>>>>>> >>> org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:95) >>>>>>> at >>>>>>> org.apache.nutch.searcher.QueryFilters.<init>(QueryFilters.java:57) >>>>>>> at >>>>>>> org.apache.nutch.searcher.IndexSearcher.init(IndexSearcher.java:79) >>>>>>> at >>>>>>> >>> org.apache.nutch.searcher.IndexSearcher.<init>(IndexSearcher.java:63) >>>>>>> at >>> org.apache.nutch.searcher.NutchBean.init(NutchBean.java:140) >>>>>>> at >>>>>>> org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:106) >>>>>>> at >>>>>>> org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:84) >>>>>>> at SearchApp.main(SearchApp.java:22) >>>>>>> >>>>>>> >>>>>>> Martin Kuen wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I assume that you are probably running this program in Eclipse or >>> some >>>>>>>> other >>>>>>>> IDE. However, you need to include the "path-to-nutch/conf" >>> directory >>>>>>>> in >>>>>>>> your >>>>>>>> classpath. Otherwise the configuration files are not parsed/found >>> on >>>>>>>> start-up. "plugins.folder" is a key from "nutch-default.xml" or " >>>>>>>> nutch-site.xml". >>>>>>>> >>>>>>>> >>>>>>>> Hope this helps, >>>>>>>> >>>>>>>> Martin >>>>>>>> >>>>>>>> On Feb 5, 2008 8:35 AM, devj <[hidden email]> wrote: >>>>>>>> >>>>>>>>> Hi , >>>>>>>>> I have been trying to run a program that takes the first 10 hits >>> of a >>>>>>>>> Nutch >>>>>>>>> query and writes the parse text of the respective urls in separate >>>>>>>>> files >>>>>>>>> ....The code : >>>>>>>>> >>>>>>>>> import org.apache.hadoop.conf.Configuration; >>>>>>>>> import org.apache.nutch.searcher.Hit; >>>>>>>>> >>>>>>>>> import org.apache.nutch.searcher.HitDetails; >>>>>>>>> >>>>>>>>> import org.apache.nutch.searcher.Hits; >>>>>>>>> >>>>>>>>> import org.apache.nutch.searcher.NutchBean; >>>>>>>>> >>>>>>>>> import org.apache.nutch.searcher.Query; >>>>>>>>> >>>>>>>>> import java.io.*; >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> public class SearchApp { >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> private static final int NUM_HITS = 10; >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> public static void main(String[] args) >>>>>>>>> >>>>>>>>> throws IOException { >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> if (args.length == 0) { >>>>>>>>> >>>>>>>>> String usage = "Usage: SearchApp query"; >>>>>>>>> >>>>>>>>> System.err.println(usage); >>>>>>>>> >>>>>>>>> System.exit(-1); >>>>>>>>> >>>>>>>>> } >>>>>>>>> >>>>>>>>> Configuration conf =new Configuration(); >>>>>>>>> >>>>>>>>> NutchBean bean = new NutchBean(conf); >>>>>>>>> >>>>>>>>> Query query = Query.parse(args[0],conf); >>>>>>>>> >>>>>>>>> Hits hits = bean.search(query, NUM_HITS); >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> for (int i = 0; i < hits.getLength(); i++) { >>>>>>>>> >>>>>>>>> Hit hit = hits.getHit(i); >>>>>>>>> >>>>>>>>> HitDetails details = bean.getDetails(hit); >>>>>>>>> >>>>>>>>> String s = new String(); >>>>>>>>> >>>>>>>>> s = bean.getParseText(details).getText(); >>>>>>>>> >>>>>>>>> //write s to a file ; >>>>>>>>> >>>>>>>>> try >>>>>>>>> >>>>>>>>> { >>>>>>>>> >>>>>>>>> FileWriter outf = new FileWriter(args[0]+i); >>>>>>>>> >>>>>>>>> PrintWriter outp = new PrintWriter(outf); >>>>>>>>> >>>>>>>>> outp.println(s); >>>>>>>>> >>>>>>>>> } >>>>>>>>> >>>>>>>>> catch(IOException e) >>>>>>>>> >>>>>>>>> { >>>>>>>>> >>>>>>>>> System.out.println("\nCould not write to file"); >>>>>>>>> >>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> } >>>>>>>>> } >>>>>>>>> >>>>>>>>> So, I compiled the code after making some additions to the >>> CLASSPATH >>>>>>>>> variable.But when i ran it in the terminal,it showed an error like >>>>>>>>> 'plugins.folder not defined ' .. >>>>>>>>> How can i solve this problem ?? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> View this message in context: >>>>>>>>> >>> http://www.nabble.com/Urgent-help-reqd.....plz-tp15284835p15284835.html >>>>>>>>> Sent from the Nutch - User mailing list archive at Nabble.com. >>>>>>>>> >>>>>>>>> >>>> >>> -- >>> View this message in context: >>> http://www.nabble.com/Urgent-help-reqd.....plz-tp15284835p15296837.html >>> >>> Sent from the Nutch - User mailing list archive at Nabble.com. >>> >>> >> > |
|
In reply to this post by Dennis Kubes-2
Hi,
This time there is a new problem.... ![]() I have this error : umakant@umakant-desktop:/media/sdb9/IRNLP.dont.touch$ java SearchApp Kurt 08/02/07 21:33:58 INFO searcher.NutchBean: opening indexes in crawl/indexes 08/02/07 21:33:58 WARN plugin.PluginRepository: Plugins: not a file: url. Can't load plugins from: jar:file:/media/sdb9/IRNLP.dont.touch/nutch-0.9/build/nutch-0.9.job!/plugins 08/02/07 21:33:58 INFO plugin.PluginRepository: Plugin Auto-activation mode: [true] 08/02/07 21:33:58 INFO plugin.PluginRepository: Registered Plugins: 08/02/07 21:33:58 INFO plugin.PluginRepository: NONE 08/02/07 21:33:58 INFO plugin.PluginRepository: Registered Extension-Points: 08/02/07 21:33:58 INFO plugin.PluginRepository: NONE Exception in thread "main" java.lang.RuntimeException: org.apache.nutch.searcher.QueryFilter not found. at org.apache.nutch.searcher.QueryFilters.<init>(QueryFilters.java:60) at org.apache.nutch.searcher.IndexSearcher.init(IndexSearcher.java:79) at org.apache.nutch.searcher.IndexSearcher.<init>(IndexSearcher.java:63) at org.apache.nutch.searcher.NutchBean.init(NutchBean.java:140) at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:106) at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:84) at SearchApp.main(SearchApp.java:24) The classpath variable is : :/media/sdb9/IRNLP.dont.touch/lucene-2.2.0/lucene-demos-2.2.0.jar:/media/sdb9/IRNLP.dont.touch/lucene-2.2.0/lucene-core-2.2.0.jar:/media/sdb9/IRNLP.dont.touch/hadoop-0.14.4/hadoop-0.14.4-core.jar:/media/sdb9/IRNLP.dont.touch/nutch-0.9/build/classes:/media/sdb9/IRNLP.dont.touch/hadoop-0.14.4/lib/commons-logging-1.0.4.jar:/media/sdb9/IRNLP.dont.touch/hadoop-0.14.4/lib/log4j-1.2.13.jar:/media/sdb9/IRNLP.dont.touch/nutch-0.9/conf:/media/sdb9/IRNLP.dont.touch/nutch-0.9/nutch-0.9.jar:/media/sdb9/IRNLP.dont.touch/nutch-0.9/build/nutch-0.9.job:
|
|
Currently you can't load the plugins from a jar file (the job file is a
jar file). You would need to unzip the job file or just the plugins directory into your classpath. I am currently working on a patch to allow plugins to be loaded via a jar file, eta < 1 week. Dennis devj wrote: > Hi, > This time there is a new problem....:confused: > > I have this error : > > umakant@umakant-desktop:/media/sdb9/IRNLP.dont.touch$ java SearchApp Kurt > 08/02/07 21:33:58 INFO searcher.NutchBean: opening indexes in crawl/indexes > 08/02/07 21:33:58 WARN plugin.PluginRepository: Plugins: not a file: url. > Can't load plugins from: > jar:file:/media/sdb9/IRNLP.dont.touch/nutch-0.9/build/nutch-0.9.job!/plugins > 08/02/07 21:33:58 INFO plugin.PluginRepository: Plugin Auto-activation mode: > [true] > 08/02/07 21:33:58 INFO plugin.PluginRepository: Registered Plugins: > 08/02/07 21:33:58 INFO plugin.PluginRepository: NONE > 08/02/07 21:33:58 INFO plugin.PluginRepository: Registered Extension-Points: > 08/02/07 21:33:58 INFO plugin.PluginRepository: NONE > Exception in thread "main" java.lang.RuntimeException: > org.apache.nutch.searcher.QueryFilter not found. > at > org.apache.nutch.searcher.QueryFilters.<init>(QueryFilters.java:60) > at > org.apache.nutch.searcher.IndexSearcher.init(IndexSearcher.java:79) > at > org.apache.nutch.searcher.IndexSearcher.<init>(IndexSearcher.java:63) > at org.apache.nutch.searcher.NutchBean.init(NutchBean.java:140) > at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:106) > at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:84) > at SearchApp.main(SearchApp.java:24) > > The classpath variable is : > :/media/sdb9/IRNLP.dont.touch/lucene-2.2.0/lucene-demos-2.2.0.jar:/media/sdb9/IRNLP.dont.touch/lucene-2.2.0/lucene-core-2.2.0.jar:/media/sdb9/IRNLP.dont.touch/hadoop-0.14.4/hadoop-0.14.4-core.jar:/media/sdb9/IRNLP.dont.touch/nutch-0.9/build/classes:/media/sdb9/IRNLP.dont.touch/hadoop-0.14.4/lib/commons-logging-1.0.4.jar:/media/sdb9/IRNLP.dont.touch/hadoop-0.14.4/lib/log4j-1.2.13.jar:/media/sdb9/IRNLP.dont.touch/nutch-0.9/conf:/media/sdb9/IRNLP.dont.touch/nutch-0.9/nutch-0.9.jar:/media/sdb9/IRNLP.dont.touch/nutch-0.9/build/nutch-0.9.job: > > > Dennis Kubes-2 wrote: >> Good catch Susam. Instead of including the files directly through add >> methods, an easier way would be this: >> >> Configuration conf = NutchConfiguration.create(); >> >> Dennis >> >> Susam Pal wrote: >>> You have not added nutch-default.xml and nutch-site.xml to your >>> Configuration object. Adding the following two lines to your code >>> should solve the problem:- >>> >>> conf.addDefaultResource("nutch-default.xml"); >>> conf.addDefaultResource("nutch-site.xml"); >>> >>> Regards, >>> Susam Pal >>> >>> >>> On Feb 6, 2008 12:17 AM, devj <[hidden email]> wrote: >>>> Hi, >>>> I am using the 0.9 version of Nutch. >>>> the layout is : >>>> $NUTCH_HOME is at /media/sda1/linux/java/nutch-0.9 >>>> conf folder : /media/sda1/linux/java/nutch-0.9/conf which contains the >>>> xml >>>> files >>>> plugins: /media/sda1/linux/java/nutch-0.9/plugins >>>> >>>> >>>> The conf directory is in the classpath ,and the plugins.folder property >>>> is >>>> fine...I have assigned the absolute path of the plugins folder to it. >>>> >>>> The parent of the plugins i.e Nutch home folder is in the classpath. >>>> I have specified the absolute path of my crawl >>>> directory(media/sda1/linux/java/nutch-0.9/bigcrawl) in the searcher.dir >>>> property in the nutch-site.xml file. >>>> >>>> What else am i supposed to do ?? >>>> >>>> P.S. - When I try to run it, the first line is like : >>>> 08/02/06 00:13:09 INFO searcher.NutchBean: opening merged index in >>>> crawl/index >>>> Is crawl/index the right thing or should it show the absolute path to >>>> the >>>> search directory or bigcrawl/index >>>> >>>> >>>> >>>> >>>> Dennis Kubes-2 wrote: >>>>> The conf directory would need to be in the classpath. You would have a >>>>> nutch-site.xml file, amoung others, in the conf directory. That file >>>>> would need to specify the plugins.folder variable with a value of >>>>> plugins. Or you would need to have the nutch-default file in the conf >>>>> directory which by default would have the correct value for plugins. >>>>> >>>>> >>>>> The parent of the plugins would need to be in your classpath. You >>>>> would >>>>> also need to specify a searcher.dir directory pointing to either the >>>>> absolute path of the parent of your indexes directory or to a directory >>>>> that contains search-servers.txt. >>>>> >>>>> Please give more info about your layout, the conf directory, file in >>>>> it, >>>>> and the version of Nutch you are using. >>>>> >>>>> Dennis >>>>> >>>>> devj wrote: >>>>>> Hi, >>>>>> Since i am running it in the terminal (which is outside of Eclipse and >>>>>> which >>>>>> i havent installed btw) >>>>>> i added the parent of the plugins folder ,which is $NUTCH_HOME >>>>>> variable >>>>>> to >>>>>> the classpath.. >>>>>> But the problem is still there... >>>>>> >>>>>> Dennis Kubes-2 wrote: >>>>>>> if running in eclipse, do an ant build and add the >>>>>>> build/nutch-1.0-dev >>>>>>> folder to the classpath, then edit that and exclude everything within >>>>>>> that folder except the plugins directory. If running outside of >>>>>>> eclipse >>>>>>> then you will need to include the parent of the plugins folder in the >>>>>>> classpath. >>>>>>> >>>>>>> Dennis >>>>>>> >>>>>>> devj wrote: >>>>>>>> Hi, >>>>>>>> I am trying to run this program from a bash terminal.I added the >>>>>>>> $NUTCH_HOME/conf folder to the classpath as u suggested...still i >>>>>>>> dont >>>>>>>> see >>>>>>>> it running... >>>>>>>> >>>>>>>> Heres the error text : >>>>>>>> >>>>>>>> 08/02/05 21:29:30 INFO searcher.NutchBean: opening indexes in >>>>>>>> crawl/indexes >>>>>>>> Exception in thread "main" java.lang.IllegalArgumentException: >>>>>>>> plugin.folders is not defined >>>>>>>> at >>>>>>>> org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifestParser.java:78) >>>>>>>> at >>>>>>>> org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:71) >>>>>>>> at >>>>>>>> org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:95) >>>>>>>> at >>>>>>>> org.apache.nutch.searcher.QueryFilters.<init>(QueryFilters.java:57) >>>>>>>> at >>>>>>>> org.apache.nutch.searcher.IndexSearcher.init(IndexSearcher.java:79) >>>>>>>> at >>>>>>>> org.apache.nutch.searcher.IndexSearcher.<init>(IndexSearcher.java:63) >>>>>>>> at >>>>>>>> org.apache.nutch.searcher.NutchBean.init(NutchBean.java:140) >>>>>>>> at >>>>>>>> org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:106) >>>>>>>> at >>>>>>>> org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:84) >>>>>>>> at SearchApp.main(SearchApp.java:22) >>>>>>>> >>>>>>>> >>>>>>>> Martin Kuen wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I assume that you are probably running this program in Eclipse or >>>>>>>>> some >>>>>>>>> other >>>>>>>>> IDE. However, you need to include the "path-to-nutch/conf" >>>>>>>>> directory >>>>>>>>> in >>>>>>>>> your >>>>>>>>> classpath. Otherwise the configuration files are not parsed/found >>>>>>>>> on >>>>>>>>> start-up. "plugins.folder" is a key from "nutch-default.xml" or " >>>>>>>>> nutch-site.xml". >>>>>>>>> >>>>>>>>> >>>>>>>>> Hope this helps, >>>>>>>>> >>>>>>>>> Martin >>>>>>>>> >>>>>>>>> On Feb 5, 2008 8:35 AM, devj <[hidden email]> wrote: >>>>>>>>> >>>>>>>>>> Hi , >>>>>>>>>> I have been trying to run a program that takes the first 10 hits >>>>>>>>>> of a >>>>>>>>>> Nutch >>>>>>>>>> query and writes the parse text of the respective urls in separate >>>>>>>>>> files >>>>>>>>>> ....The code : >>>>>>>>>> >>>>>>>>>> import org.apache.hadoop.conf.Configuration; >>>>>>>>>> import org.apache.nutch.searcher.Hit; >>>>>>>>>> >>>>>>>>>> import org.apache.nutch.searcher.HitDetails; >>>>>>>>>> >>>>>>>>>> import org.apache.nutch.searcher.Hits; >>>>>>>>>> >>>>>>>>>> import org.apache.nutch.searcher.NutchBean; >>>>>>>>>> >>>>>>>>>> import org.apache.nutch.searcher.Query; >>>>>>>>>> >>>>>>>>>> import java.io.*; >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> public class SearchApp { >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> private static final int NUM_HITS = 10; >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> public static void main(String[] args) >>>>>>>>>> >>>>>>>>>> throws IOException { >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> if (args.length == 0) { >>>>>>>>>> >>>>>>>>>> String usage = "Usage: SearchApp query"; >>>>>>>>>> >>>>>>>>>> System.err.println(usage); >>>>>>>>>> >>>>>>>>>> System.exit(-1); >>>>>>>>>> >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> Configuration conf =new Configuration(); >>>>>>>>>> >>>>>>>>>> NutchBean bean = new NutchBean(conf); >>>>>>>>>> >>>>>>>>>> Query query = Query.parse(args[0],conf); >>>>>>>>>> >>>>>>>>>> Hits hits = bean.search(query, NUM_HITS); >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> for (int i = 0; i < hits.getLength(); i++) { >>>>>>>>>> >>>>>>>>>> Hit hit = hits.getHit(i); >>>>>>>>>> >>>>>>>>>> HitDetails details = bean.getDetails(hit); >>>>>>>>>> >>>>>>>>>> String s = new String(); >>>>>>>>>> >>>>>>>>>> s = bean.getParseText(details).getText(); >>>>>>>>>> >>>>>>>>>> //write s to a file ; >>>>>>>>>> >>>>>>>>>> try >>>>>>>>>> >>>>>>>>>> { >>>>>>>>>> >>>>>>>>>> FileWriter outf = new FileWriter(args[0]+i); >>>>>>>>>> >>>>>>>>>> PrintWriter outp = new PrintWriter(outf); >>>>>>>>>> >>>>>>>>>> outp.println(s); >>>>>>>>>> >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> catch(IOException e) >>>>>>>>>> >>>>>>>>>> { >>>>>>>>>> >>>>>>>>>> System.out.println("\nCould not write to file"); >>>>>>>>>> >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> So, I compiled the code after making some additions to the >>>>>>>>>> CLASSPATH >>>>>>>>>> variable.But when i ran it in the terminal,it showed an error like >>>>>>>>>> 'plugins.folder not defined ' .. >>>>>>>>>> How can i solve this problem ?? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> View this message in context: >>>>>>>>>> http://www.nabble.com/Urgent-help-reqd.....plz-tp15284835p15284835.html >>>>>>>>>> Sent from the Nutch - User mailing list archive at Nabble.com. >>>>>>>>>> >>>>>>>>>> >>>> -- >>>> View this message in context: >>>> http://www.nabble.com/Urgent-help-reqd.....plz-tp15284835p15296837.html >>>> >>>> Sent from the Nutch - User mailing list archive at Nabble.com. >>>> >>>> >> > |
|
Hi,
I compiled my SearchApp code into the nutch directory and ran it from there and it worked !! Thanks ,Dennis and Susam !!
|
| Powered by Nabble | See how NAML generates this page |
