to find if a url is present in the nutch master index

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

to find if a url is present in the nutch master index

Rozina Sorathia


Hi ,

I tried writing a class which calls public Page getPage(String url)  method of DistributedWebDbReader…. And I passed the lucene website’s url

On debugging , I find that it goes into a sleep loop as follows:

public DistributedWebDBReader(NutchFileSystem nfs, File root) throws IOException, FileNotFoundException {


        // Get the current db from the given nutchfs.  It consists

        // of a bunch of directories full of files. 


        this.root = root;

        this.dbDir = new File(new File(root, "standard"), "webdb");



        // Wait until the webdb is complete, by waiting till a given

        // file exists.


        File dirIsComplete = new File(dbDir, "dbIsComplete");

        while (! nfs.exists(dirIsComplete)) {

            try {


            } catch (InterruptedException ie) {




It sleeps every time in the above loop…Can anyone tell me what’s the problem??

What I actually want to do is to check whether a given url is present in the Nutch’s master index or not?



 Thanks and regards,

Rozina Sorathia,

[hidden email]



Phone No. (020) 5652 5000 

Ext. :2206