Ampersand issue making me go nuts!

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Ampersand issue making me go nuts!

Marcus Herou
Hi, turning to the mailing list since I cannot find any similar case by
googling.

We have troubles searching when the "&" sign is included in the search
query, for example; description:"h&m", "m&m" etc.

The server setup looks like this.

We apache-solr-1.3.0-RC2.war on all machines (same issues with earlier
versions so it seems to be unrelated to version)

server1 = Master, write-only.
java version "1.6.0_03
Linux 2.6.22-14-server #1 SMP Tue Feb 12 08:27:05 UTC 2008 i686 GNU/Linux
feisty

replicaX = Read-only, gets an optimized index from the master early every
morning.
Linux 2.6.20-16-server #2 SMP Tue Dec 18 05:52:19 UTC 2007 i686 GNU/Linux
edgy - yes I know, old as hell...

Fresh replica:
Linux 2.6.24-19-server #1 SMP Fri Jul 11 21:50:43 UTC 2008 x86_64 GNU/Linux
hardy

The strange thing is that the query works on both my local machine(s) and on
the master.

I thought it could be some Linux locale issue or such since I had similar
issues in the past but with non-ascii chars (UTF-8 issues). So instead of
inspecting every aspect of the machine I thought a good idea would be to
first try the known working index on a fresh box.

We copied the master index to a fresh Linux box with a fresh Tomcat
installation (the others are using Resin) but we get the same disappointing
results as with the first slave's Resin! Just to mention UTF-8 encoding
errors is most probably not the issue since we are hosting webapps which
uses UTF-8 and they work perfectly.

OK... next test: I created a little test case which uses plain Lucene to
read and search in the index, this works perfectly on all machines.
Next test: Try the SolrServer java client. Same results as the command line
client GET, it works on the master but not on the slaves.

Finally I downloaded the index to my laptop and reran the same tests there
which works perfectly.


My conclusion is that it has to be something going on when interpreting the
input from the clients in the webapp which is affected by the server setup
in some way. Ohh and by the way other searches works just fine, non-ascii
chars like swedish åäö etc works like a charm.

Help is really appreciated.

Kindly

//Marcus
Reply | Threaded
Open this post in threaded view
|

Re: Ampersand issue making me go nuts!

Yonik Seeley-2
On Sun, Sep 7, 2008 at 7:50 AM, Marcus Herou <[hidden email]> wrote:
> The strange thing is that the query works on both my local machine(s) and on
> the master.

- Check and ensure that the schema is the same on all the systems.
- Check to see that Solr is reading the index you think it is by doing
a simple query on the ID (or anything else) to try and retrieve the
document that contains "m&m" (or simply check the index version of
both via the admin stats page)

If the first two don't shed some light, then try the admin analysis
page with "m&m" on both the master and slave to see if it works
differently (it shouldn't if the schema is the same).

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Ampersand issue making me go nuts!

Marcus Herou
Hi Yonik.

I am so embarrassed! The schema files where totally different. Actually the
slave was the master back in the day and had a WordDelimiterFilterFactory
configured. I wouldn't be to surprised if words are splitted on "&" in that
one. As I said embarrassingly simple... But good! You can't imagine how many
hours I've put into this so thanks again! A 1 minute solution :)

Kindly

//Marcus






On Sun, Sep 7, 2008 at 3:10 PM, Yonik Seeley <[hidden email]> wrote:

> On Sun, Sep 7, 2008 at 7:50 AM, Marcus Herou <[hidden email]>
> wrote:
> > The strange thing is that the query works on both my local machine(s) and
> on
> > the master.
>
> - Check and ensure that the schema is the same on all the systems.
> - Check to see that Solr is reading the index you think it is by doing
> a simple query on the ID (or anything else) to try and retrieve the
> document that contains "m&m" (or simply check the index version of
> both via the admin stats page)
>
> If the first two don't shed some light, then try the admin analysis
> page with "m&m" on both the master and slave to see if it works
> differently (it shouldn't if the schema is the same).
>
> -Yonik
>



--
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
[hidden email]
http://www.tailsweep.com/
http://blogg.tailsweep.com/