problem with nutch 0.7 and text file

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

problem with nutch 0.7 and text file

Marc DELERUE-2
I use nutch 0.7 for better recognition of xls file. But now I get this error that I didn't get under the version 0.6 :

 

http://XXX.XXX.XXX.XXX/tostaky.txtorg.apache.nutch.util.mime.MimeTypeException: Invalid Sub Type plain; charset=iso-8859-1 <http://xxx.xxx.xxx.xxx/tostaky.txtorg.apache.nutch.util.mime.MimeTypeException:%20Invalid%20Sub%20Type%20plain;%20charset=iso-8859-1>

I get the same kind of error with html files : ...org.apache.nutch.util.mime.MimeTypeException: Invalid Sub Type html: charset=iso-8859-1.

 

Did somebody meet the same problem, and in this case, how to do to 'repair'  my crawl.

Thank you all

Regards.

 

Marc Delerue

[hidden email] <mailto:[hidden email]>

[hidden email]

 

\_@< plop !

 

Reply | Threaded
Open this post in threaded view
|

Re: problem with nutch 0.7 and text file

Jérôme Charron
>
>
> http://XXX.XXX.XXX.XXX/tostaky.txtorg.apache.nutch.util.mime.MimeTypeException: 
> Invalid Sub Type plain; charset=iso-8859-1 <
> http://xxx.xxx.xxx.xxx/tostaky.txtorg.apache.nutch.util.mime.MimeTypeException:%20Invalid%20Sub%20Type%20plain;%20charset=iso-8859-1
> >
> I get the same kind of error with html files :
> ...org.apache.nutch.util.mime.MimeTypeException: Invalid Sub Type html:
> charset=iso-8859-1.
>
> Did somebody meet the same problem, and in this case, how to do to
> 'repair' my crawl.


This is due to a bug in the patch I submitted about ContentType detection.
I will correct it as soon as possible. Could you create an issue on jira
please.

Jerome

--
http://motrech.free.fr/
http://frutch.free.fr/