Fetcher File Error 404 when crawling through file system

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Fetcher File Error 404 when crawling through file system

Bruno-2

Hi,

I am trying to configure a recent nutch (0.8+) to configure to fetch
directly from the file system instead of http which is fairly slow. The
fetcher hits a 404 - File not found (see below). When I'm copying the
file:/// <file:///>  URL into lynx it gets found without any problems.

2006-09-15 10:29:57,739 INFO  fetcher.Fetcher - fetching
file:///mnt/smbfs/hollywood/projects/Telstra/Keystone\
<file:///mnt/smbfs/hollywood/projects/Telstra/Keystone\>  -\
Leapfrog/Keystone/Architecture/Archives/info.txt
2006-09-15 10:29:57,746 INFO  fetcher.Fetcher - fetch of
file:///mnt/smbfs/hollywood/projects/Telstra/Keystone\
<file:///mnt/smbfs/hollywood/projects/Telstra/Keystone\>  -\
Leapfrog/Keystone/Architecture/Archives/info.txt failed with:
org.apache.nutch.protocol.file.FileError: File Error: 404

Anybody having a similar problem - or better - resolution?

Cheers, Bruno


Loading...