Can I generate nutch index without crawling?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Can I generate nutch index without crawling?

scott green
Hi,

I am now debugging nutch searcher and wondering can I generate nutch
index without crawling? If yes, can you give me some hints? Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: Can I generate nutch index without crawling?

Sean Dean-3
What exactly are you looking to do?
 
If you don't crawl for anything, then what data are you looking to index?
 
You can certainly take some other persons Nutch segment (that they crawled) and then index it yourself, on your machines.


----- Original Message ----
From: Scott Green <[hidden email]>
To: [hidden email]
Sent: Tuesday, January 23, 2007 12:08:31 PM
Subject: Can I generate nutch index without crawling?


Hi,

I am now debugging nutch searcher and wondering can I generate nutch
index without crawling? If yes, can you give me some hints? Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: Can I generate nutch index without crawling?

The Golden Condor !
If you have some pre-downloaded pages on the local file system that
you'd like to index, then the process is not as simple. You'd need to
make some changes to the fetcher code to open and load a local file
instead of opening an http socket. Be careful not to try and change
the download protocol to file:// because, while the protocol is
supported, you will get undesirable results in terms of in-link and
anchor text information. Substituting http://blah with
file://path/to/blah will make Nutch believe that file://path/to/blah
is the URL of the file, which is likely not what you want.

Cheers,

TAA

On 1/23/07, Sean Dean <[hidden email]> wrote:

> What exactly are you looking to do?
>
> If you don't crawl for anything, then what data are you looking to index?
>
> You can certainly take some other persons Nutch segment (that they crawled) and then index it yourself, on your machines.
>
>
> ----- Original Message ----
> From: Scott Green <[hidden email]>
> To: [hidden email]
> Sent: Tuesday, January 23, 2007 12:08:31 PM
> Subject: Can I generate nutch index without crawling?
>
>
> Hi,
>
> I am now debugging nutch searcher and wondering can I generate nutch
> index without crawling? If yes, can you give me some hints? Thanks.
>
Reply | Threaded
Open this post in threaded view
|

Re: Can I generate nutch index without crawling?

scott green
In reply to this post by Sean Dean-3
On 1/24/07, Sean Dean <[hidden email]> wrote:
> What exactly are you looking to do?
>
> If you don't crawl for anything, then what data are you looking to index?
>
> You can certainly take some other persons Nutch segment (that they crawled) and then index it yourself, on your machines.
>
Hi

My requirements only for debugging. Let pseudocodes talk what i exactly need:

//File content = new File(....)
String[] content = new String[]{".....", "......."};
debugTool.generateIndex(content);


> ----- Original Message ----
> From: Scott Green <[hidden email]>
> To: [hidden email]
> Sent: Tuesday, January 23, 2007 12:08:31 PM
> Subject: Can I generate nutch index without crawling?
>
>
> Hi,
>
> I am now debugging nutch searcher and wondering can I generate nutch
> index without crawling? If yes, can you give me some hints? Thanks.
>
Reply | Threaded
Open this post in threaded view
|

Re: Can I generate nutch index without crawling?

Enis Soztutar
Scott Green wrote:

> On 1/24/07, Sean Dean <[hidden email]> wrote:
>> What exactly are you looking to do?
>>
>> If you don't crawl for anything, then what data are you looking to
>> index?
>>
>> You can certainly take some other persons Nutch segment (that they
>> crawled) and then index it yourself, on your machines.
>>
> Hi
>
> My requirements only for debugging. Let pseudocodes talk what i
> exactly need:
>
> //File content = new File(....)
> String[] content = new String[]{".....", "......."};
> debugTool.generateIndex(content);
>
>
>> ----- Original Message ----
>> From: Scott Green <[hidden email]>
>> To: [hidden email]
>> Sent: Tuesday, January 23, 2007 12:08:31 PM
>> Subject: Can I generate nutch index without crawling?
>>
>>
>> Hi,
>>
>> I am now debugging nutch searcher and wondering can I generate nutch
>> index without crawling? If yes, can you give me some hints? Thanks.
>>
>
nutch uses lucene for indexing. you can use lucene api for creating an
index from any content.
1. open an index writer
2. create documents
3. add the documents to the index.
4. close the indexwriter
5. open the indexes with indexReader if you want.


you can look at lucene api, or the book called Lucene in Action.