to find if a url is present in the nutch master index

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

to find if a url is present in the nutch master index

Rozina Sorathia

 

Hi ,

I tried writing a class which calls public Page getPage(String url)  method of DistributedWebDbReader…. And I passed the lucene website’s url

On debugging , I find that it goes into a sleep loop as follows:

public DistributedWebDBReader(NutchFileSystem nfs, File root) throws IOException, FileNotFoundException {

        //

        // Get the current db from the given nutchfs.  It consists

        // of a bunch of directories full of files. 

        //

        this.root = root;

        this.dbDir = new File(new File(root, "standard"), "webdb");

 

        //

        // Wait until the webdb is complete, by waiting till a given

        // file exists.

        //

        File dirIsComplete = new File(dbDir, "dbIsComplete");

        while (! nfs.exists(dirIsComplete)) {

            try {

                Thread.sleep(2000);

            } catch (InterruptedException ie) {

            }

        }

 

It sleeps every time in the above loop…Can anyone tell me what’s the problem??

What I actually want to do is to check whether a given url is present in the Nutch’s master index or not?

 

 

 Thanks and regards,

Rozina Sorathia,

[hidden email]

 

 

Phone No. (020) 5652 5000 

Ext. :2206