invalid urls

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

invalid urls

Edward Quick

Hi,
 
When I run a crawl on our intranet (which is run on a lotus notes domino server hence the stange urls), I get back a few error messages, most of them in the format below.
 
fetch of http://planetba.baplc.com/general/aptrix/aptrix.nsf/AttachmentsByTitle/VideoJavaScript/$FILE/)){this.addVariable( failed with: java.lang.IllegalArgumentException: Invalid uri 'http://planetba.baplc.com/general/aptrix/aptrix.nsf/AttachmentsByTitle/VideoJavaScript/$FILE/)){this.addVariable(': escaped absolute path not valid
 
fetch of http://planetba.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home%5CPeople+%26+Training%5CAircraft+Maintenance+Training+%E2%80%93+A320+Single+Aisle+Family failed with: Http code=500, url=http://planetba.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home%5CPeople+%26+Training%5CAircraft+Maintenance+Training+%E2%80%93+A320+Single+Aisle+Family
 
Is there anything I can configure in Nutch to handle these without filtering them out as they do appear to be legitimate pages?
 
Thanks for any help.
 
Rgds,
 
Ed.
_________________________________________________________________
Make a mini you and download it into Windows Live Messenger
http://clk.atdmt.com/UKM/go/111354029/direct/01/