Searching with Ö and Ä?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Searching with Ö and Ä?

J B-2
Hello,

Is there anyone who can help me configure Nutch so that I can use it for
Swedics or German websites containing characters like "?" and "?"? Crawling
and indexing seems to work fine, it's just the searching that goes wrong.
When I enter a searchstring like "K?ln", knowing that it appears in the
text, the resultpage says that there are no matching results, and the "?" is
replaced by random characters...

I have searched the docs and the web, but I can't find the answer to my
problem.

Best regards,

Jon

P.S. Sorry if two versions of this message reached the list, I am quite new
to this...

_________________________________________________________________
Chat: Ha en fest p? Habbo Hotel
http://habbohotel.msn.se/habbo/sv/channelizer Checka in h?r!

Reply | Threaded
Open this post in threaded view
|

Re: Searching with Ö and Ä?

Andrzej Białecki-2
J B wrote:

> Hello,
>
> Is there anyone who can help me configure Nutch so that I can use it for
> Swedics or German websites containing characters like "?" and "?"?
> Crawling and indexing seems to work fine, it's just the searching that
> goes wrong. When I enter a searchstring like "K?ln", knowing that it
> appears in the text, the resultpage says that there are no matching
> results, and the "?" is replaced by random characters...
>
> I have searched the docs and the web, but I can't find the answer to my
> problem.


The characters are not random - they correspond to a url-encoding of
utf-8 encoding of latin1 characters, whereas they should be a
url-encoding of utf-8 encoding of utf-8 characters.

;-)

For the US-Ascii range each of the above gives the same result, but for
all other characters it gives wrong results.

Please make sure that you set the page encoding to utf-8 in your JSPs,
htmls, and preferably the same as the default character encoding,
somewhere in the configuration of your servlet engine. As the old hands
say: "choose UTF-8 and stick to it religiously".

--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply | Threaded
Open this post in threaded view
|

RE: Searching with Ö and Ä?

Chirag Chaman
In reply to this post by J B-2
Jon,

You'll need to set encoding to UTF-8.
We don't use the default Nutch JSP pages, so I'm not sure if they have it or
not, but here's the simplified process.

1. make sure your JSP files have the something like this on top
<%@ page contentType="text/html; charset=utf-8" pageEncoding="utf-8"  

2. Your tomcat server.xml should have this line (URIEncoding="UTF-8")
     <Connector port="80"
               maxThreads="250" minSpareThreads="25" maxSpareThreads="75"
               enableLookups="false" redirectPort="8443" acceptCount="100"
               connectionTimeout="15000" disableUploadTimeout="180000"
URIEncoding="UTF-8" useBodyEncodingForURI="false" />

This should take care of it.

Regards,
CC

--------------------------------------------
Filangy, Inc.
Interested in Improving Search? Join our Team!
http://filangy.com/jointheteam.jsp 



-----Original Message-----
From: J B [mailto:[hidden email]]
Sent: Monday, May 30, 2005 1:46 PM
To: [hidden email]
Subject: Searching with Ö and Ä?

Hello,

Is there anyone who can help me configure Nutch so that I can use it for
Swedics or German websites containing characters like "ö" and "ä"? Crawling
and indexing seems to work fine, it's just the searching that goes wrong.
When I enter a searchstring like "Köln", knowing that it appears in the
text, the resultpage says that there are no matching results, and the "ö" is
replaced by random characters...

I have searched the docs and the web, but I can't find the answer to my
problem.

Best regards,

Jon

P.S. Sorry if two versions of this message reached the list, I am quite new
to this...

_________________________________________________________________
Chat: Ha en fest på Habbo Hotel
http://habbohotel.msn.se/habbo/sv/channelizer Checka in här!



Reply | Threaded
Open this post in threaded view
|

RE: Searching with Ö and Ä?

J B-2
In reply to this post by J B-2
Chirag and Andrzej,

It worked, thank you very much for your help!

Best regards,

Jon

P.S. I'm really sorry about the duplicate messages, I didn't think the two
first reached the list...

_________________________________________________________________
Nyhet! Hotmail direkt i Mobilen! http://mobile.msn.com/