Getting round bad behaviour in Lotus Domino

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Getting round bad behaviour in Lotus Domino

J S-18
Hi,

I'm crawling the Intranet at work which runs on a Lotus Domino server. When
you go to some URLs on the Intranet, Domino returns a code 400, then appends
?OpenDocument on the end of the URL, and the GET on this comes back with a
code 200.

The problem is obviously with Domino, but I don't think it's something I can
fix easily (as it's not in my department). Nutch correctly thinks the URL
doesn't work so misses it out. However, I wondered if I could tailor the
code in Nutch to say (in pseudo code):

if (rc=400){
   try{
        URL=URL+"?OpenDocument");
        getURL(URL);
   }
}

Could anyone point me to the relevant java file which I would need to update
to achieve this?

Many thanks,

JS.


Reply | Threaded
Open this post in threaded view
|

Re: Getting round bad behaviour in Lotus Domino

Stefan Groschupf-2
Have a look into the http protocol plugin.
HTH
Stefan
Am 18.06.2005 um 08:32 schrieb J S:

> Hi,
>
> I'm crawling the Intranet at work which runs on a Lotus Domino  
> server. When you go to some URLs on the Intranet, Domino returns a  
> code 400, then appends ?OpenDocument on the end of the URL, and the  
> GET on this comes back with a code 200.
>
> The problem is obviously with Domino, but I don't think it's  
> something I can fix easily (as it's not in my department). Nutch  
> correctly thinks the URL doesn't work so misses it out. However, I  
> wondered if I could tailor the code in Nutch to say (in pseudo code):
>
> if (rc=400){
>   try{
>        URL=URL+"?OpenDocument");
>        getURL(URL);
>   }
> }
>
> Could anyone point me to the relevant java file which I would need  
> to update to achieve this?
>
> Many thanks,
>
> JS.
>
>
>
>

---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net


Reply | Threaded
Open this post in threaded view
|

Re: Getting round bad behaviour in Lotus Domino

J S-18
Stefan,

Thanks for your reply. That was a great help.

Cheers,

JS.

>Have a look into the http protocol plugin.
>HTH
>Stefan
>Am 18.06.2005 um 08:32 schrieb J S:
>
>>Hi,
>>
>>I'm crawling the Intranet at work which runs on a Lotus Domino  server.
>>When you go to some URLs on the Intranet, Domino returns a  code 400, then
>>appends ?OpenDocument on the end of the URL, and the  GET on this comes
>>back with a code 200.
>>
>>The problem is obviously with Domino, but I don't think it's  something I
>>can fix easily (as it's not in my department). Nutch  correctly thinks the
>>URL doesn't work so misses it out. However, I  wondered if I could tailor
>>the code in Nutch to say (in pseudo code):
>>
>>if (rc=400){
>>   try{
>>        URL=URL+"?OpenDocument");
>>        getURL(URL);
>>   }
>>}
>>
>>Could anyone point me to the relevant java file which I would need  to
>>update to achieve this?
>>
>>Many thanks,
>>
>>JS.
>>
>>
>>
>>
>
>---------------------------------------------------------------
>company:        http://www.media-style.com
>forum:        http://www.text-mining.org
>blog:            http://www.find23.net
>
>