Query with literal quote character: 6'2"

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Query with literal quote character: 6'2"

Walter Underwood, Netflix
We have a movie with this title: 6'2"

I can get that string indexed, but I can't get it through the query
parser and into DisMax. It goes through the analyzers fine. I can
run the analysis tool in the admin interface and get a match with
that exact string.

These variants don't work:

6'2"
6'2\"
6\'2\"

Any ideas? I'm still running 1.1. Been a bit busy to plan the upgrade.

wunder

Reply | Threaded
Open this post in threaded view
|

Re: Query with literal quote character: 6'2"

Yonik Seeley-2
On Feb 7, 2008 12:24 PM, Walter Underwood <[hidden email]> wrote:

> We have a movie with this title: 6'2"
>
> I can get that string indexed, but I can't get it through the query
> parser and into DisMax. It goes through the analyzers fine. I can
> run the analysis tool in the admin interface and get a match with
> that exact string.
>
> These variants don't work:
>
> 6'2"
> 6'2\"
> 6\'2\"
>
> Any ideas? I'm still running 1.1. Been a bit busy to plan the upgrade.

I confirmed this behavior in trunk with the following query:
http://localhost:8983/solr/select?qt=dismax&q=6'2"&debugQuery=on&qf=cat&pf=cat

The result is that the double quote is dropped:
+DisjunctionMaxQuery((cat:6'2)~0.01) DisjunctionMaxQuery((cat:6'2)~0.01)

This seems like it's a bug (rather than by design), but I could be
wrong... Hoss?

-Yonik
Reply | Threaded
Open this post in threaded view
|

RE: Query with literal quote character: 6'2"

Lance Norskog-2
In reply to this post by Walter Underwood, Netflix
Some people loathe UTF-8 and do all of their text in XML entities. This
might work better for your punctuation needs.  But it still won't help you
with Prince :)

-----Original Message-----
From: Walter Underwood [mailto:[hidden email]]
Sent: Thursday, February 07, 2008 9:25 AM
To: [hidden email]
Subject: Query with literal quote character: 6'2"

We have a movie with this title: 6'2"

I can get that string indexed, but I can't get it through the query parser
and into DisMax. It goes through the analyzers fine. I can run the analysis
tool in the admin interface and get a match with that exact string.

These variants don't work:

6'2"
6'2\"
6\'2\"

Any ideas? I'm still running 1.1. Been a bit busy to plan the upgrade.

wunder


Reply | Threaded
Open this post in threaded view
|

Re: Query with literal quote character: 6'2"

hossman
In reply to this post by Yonik Seeley-2

: I confirmed this behavior in trunk with the following query:
: http://localhost:8983/solr/select?qt=dismax&q=6'2"&debugQuery=on&qf=cat&pf=cat
:
: The result is that the double quote is dropped:
: +DisjunctionMaxQuery((cat:6'2)~0.01) DisjunctionMaxQuery((cat:6'2)~0.01)
:
: This seems like it's a bug (rather than by design), but I could be
: wrong... Hoss?

It was by design ... but it could be handled better.  the idea is that if
the input has balanced quotes (ie: an even number) then leave them alone
so they are dealt with as phrase delimiters.  If there is an uneven number
strip them out since we don't know wether they are a mistake (ie: unclosed
phrase) or intended to be literal.

auto-escaping them probably would have been a better way to go (ie: let
the analyzer decide wether or not to strip them) ... i'm not sure why i
didn't do that in the first place (I think at the time the lucene
QueryParser didn't deal with escaped quotes very well)

the thing to keep in mind, is that even if it did escape them, this still
wouldn't work if the user input were...

             the 6'2" man dating the 5'3" woman

...because it would assume the even number of double-quote characters mean
that   " man dating the 5'3"  is a phrase.  i remember spending a day
going over query loks trying tp figure out a good set of hueristic rules
for guessing when quote characters in user input should be interpreted as
phrase delims vs "inch" markers before a coworker smacked me and made me
realize it was a fairly intractable problem and simple rules would be
easier to understand anyway.

FYI: this is all happening in
SolrPluginUtils.stripUnbalancedQuotes(CharSequence) which
DisMax(RequestHanler) calls before passing the string to
DisjunctionMaxQueryParser.



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Query with literal quote character: 6'2"

Walter Underwood, Netflix
How about the query parser respecting backslash escaping? I need
free-text input, no syntax at all. Right now, I'm escaping every
Lucene special character in the front end. I just figured out that
it breaks for colon, can't search for "12:01" with "12\:01".

wunder

On 2/7/08 11:06 AM, "Chris Hostetter" <[hidden email]> wrote:

>
> : I confirmed this behavior in trunk with the following query:
> :
> http://localhost:8983/solr/select?qt=dismax&q=6'2"&debugQuery=on&qf=cat&pf=cat
> :
> : The result is that the double quote is dropped:
> : +DisjunctionMaxQuery((cat:6'2)~0.01) DisjunctionMaxQuery((cat:6'2)~0.01)
> :
> : This seems like it's a bug (rather than by design), but I could be
> : wrong... Hoss?
>
> It was by design ... but it could be handled better.  the idea is that if
> the input has balanced quotes (ie: an even number) then leave them alone
> so they are dealt with as phrase delimiters.  If there is an uneven number
> strip them out since we don't know wether they are a mistake (ie: unclosed
> phrase) or intended to be literal.
>
> auto-escaping them probably would have been a better way to go (ie: let
> the analyzer decide wether or not to strip them) ... i'm not sure why i
> didn't do that in the first place (I think at the time the lucene
> QueryParser didn't deal with escaped quotes very well)
>
> the thing to keep in mind, is that even if it did escape them, this still
> wouldn't work if the user input were...
>
>              the 6'2" man dating the 5'3" woman
>
> ...because it would assume the even number of double-quote characters mean
> that   " man dating the 5'3"  is a phrase.  i remember spending a day
> going over query loks trying tp figure out a good set of hueristic rules
> for guessing when quote characters in user input should be interpreted as
> phrase delims vs "inch" markers before a coworker smacked me and made me
> realize it was a fairly intractable problem and simple rules would be
> easier to understand anyway.
>
> FYI: this is all happening in
> SolrPluginUtils.stripUnbalancedQuotes(CharSequence) which
> DisMax(RequestHanler) calls before passing the string to
> DisjunctionMaxQueryParser.
>
>
>
> -Hoss
>

Reply | Threaded
Open this post in threaded view
|

Re: Query with literal quote character: 6'2"

Walter Underwood, Netflix
In reply to this post by Lance Norskog-2
Huh? Queries come in through URL parameters and this is all ASCII
anyway. Even in XML, entities and UTF-8 decode to the same characters
after parsing.

The glyph formerly known as Prince belongs in the private use area,
of course.

wunder

On 2/7/08 11:06 AM, "Lance Norskog" <[hidden email]> wrote:

> Some people loathe UTF-8 and do all of their text in XML entities. This
> might work better for your punctuation needs.  But it still won't help you
> with Prince :)
>
> -----Original Message-----
> From: Walter Underwood [mailto:[hidden email]]
> Sent: Thursday, February 07, 2008 9:25 AM
> To: [hidden email]
> Subject: Query with literal quote character: 6'2"
>
> We have a movie with this title: 6'2"
>
> I can get that string indexed, but I can't get it through the query parser
> and into DisMax. It goes through the analyzers fine. I can run the analysis
> tool in the admin interface and get a match with that exact string.
>
> These variants don't work:
>
> 6'2"
> 6'2\"
> 6\'2\"
>
> Any ideas? I'm still running 1.1. Been a bit busy to plan the upgrade.
>
> wunder


Reply | Threaded
Open this post in threaded view
|

Re: Query with literal quote character: 6'2"

hossman
In reply to this post by Walter Underwood, Netflix

: How about the query parser respecting backslash escaping? I need

one of the orriginal design decisions was "no user escaping" ... be able
to take in raw query strings from the user with only '+' '-' and '"'
treated as special characters ... if you allow backslash escaping of those
characters, then by definition '\' becomes a special character too.

: free-text input, no syntax at all. Right now, I'm escaping every
: Lucene special character in the front end. I just figured out that
: it breaks for colon, can't search for "12:01" with "12\:01".

yeah ... your '\' character is being taken litterally.  you shouldn't do
any escaping if you hand off to dismax.

the right thing to do is probably to expose more the "query parsing" stuff
as options for hte handler ... let people configure it with what
characters should be escaped, and what should be left alone.  We should
also stop using the static utility methods for things like partial
escaping and unbalanced quote striping and start using helper methods
that subclasses can override.


-Hoss

Reply | Threaded
Open this post in threaded view
|

RE: Query with literal quote character: 6'2"

Fuad Efendi
In reply to this post by Yonik Seeley-2
I have same kind of queries correctly working on my site.

It's probably because I am using URL Escaping:
http://www.tokenizer.org/?q=6%272%22



> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf
> Of Yonik Seeley
> Sent: Thursday, February 07, 2008 12:58 PM
> To: [hidden email]
> Subject: Re: Query with literal quote character: 6'2"
>
>
> On Feb 7, 2008 12:24 PM, Walter Underwood
> <[hidden email]> wrote:
> > We have a movie with this title: 6'2"
> >
> > I can get that string indexed, but I can't get it through the query
> > parser and into DisMax. It goes through the analyzers fine. I can
> > run the analysis tool in the admin interface and get a match with
> > that exact string.
> >
> > These variants don't work:
> >
> > 6'2"
> > 6'2\"
> > 6\'2\"
> >
> > Any ideas? I'm still running 1.1. Been a bit busy to plan
> the upgrade.
>
> I confirmed this behavior in trunk with the following query:
> http://localhost:8983/solr/select?qt=dismax&q=6'2"&debugQuery=
on&qf=cat&pf=cat

The result is that the double quote is dropped:
+DisjunctionMaxQuery((cat:6'2)~0.01) DisjunctionMaxQuery((cat:6'2)~0.01)

This seems like it's a bug (rather than by design), but I could be
wrong... Hoss?

-Yonik


Reply | Threaded
Open this post in threaded view
|

Re: Query with literal quote character: 6'2"

Walter Underwood, Netflix
In reply to this post by hossman
Our users can blow up the parser without special characters.

  AND THE BAND PLAYED ON
  TO HAVE AND HAVE NOT

Lower-casing in the front end avoids that.

We have auto-complete on titles, so the there are plenty
of chances to inadvertently use special characters:

  Romeo + Juliet
  Airplane!
  Shrek (Widescreen)

We also have people type "--" for a dash in titles.

wunder

On 2/7/08 12:00 PM, "Chris Hostetter" <[hidden email]> wrote:

>
> : How about the query parser respecting backslash escaping? I need
>
> one of the orriginal design decisions was "no user escaping" ... be able
> to take in raw query strings from the user with only '+' '-' and '"'
> treated as special characters ... if you allow backslash escaping of those
> characters, then by definition '\' becomes a special character too.
>
> : free-text input, no syntax at all. Right now, I'm escaping every
> : Lucene special character in the front end. I just figured out that
> : it breaks for colon, can't search for "12:01" with "12\:01".
>
> yeah ... your '\' character is being taken litterally.  you shouldn't do
> any escaping if you hand off to dismax.
>
> the right thing to do is probably to expose more the "query parsing" stuff
> as options for hte handler ... let people configure it with what
> characters should be escaped, and what should be left alone.  We should
> also stop using the static utility methods for things like partial
> escaping and unbalanced quote striping and start using helper methods
> that subclasses can override.
>
>
> -Hoss
>

Reply | Threaded
Open this post in threaded view
|

RE: Query with literal quote character: 6'2"

Fuad Efendi
This query works just fine: http://www.tokenizer.org/?q=Romeo+%2B+Juliet

%2B is URL-Encoded presentation of +
It shows, for instance, [Romeo & Juliet] in output.


> -----Original Message-----
> From: Walter Underwood [mailto:[hidden email]]
> Sent: Thursday, February 07, 2008 3:25 PM
> To: [hidden email]
> Subject: Re: Query with literal quote character: 6'2"
>
>
> Our users can blow up the parser without special characters.
>
>   AND THE BAND PLAYED ON
>   TO HAVE AND HAVE NOT
>
> Lower-casing in the front end avoids that.
>
> We have auto-complete on titles, so the there are plenty
> of chances to inadvertently use special characters:
>
>   Romeo + Juliet
>   Airplane!
>   Shrek (Widescreen)
>
> We also have people type "--" for a dash in titles.
>
> wunder
>
> On 2/7/08 12:00 PM, "Chris Hostetter"
> <[hidden email]> wrote:
>
> >
> > : How about the query parser respecting backslash escaping? I need
> >
> > one of the orriginal design decisions was "no user
> escaping" ... be able
> > to take in raw query strings from the user with only '+' '-' and '"'
> > treated as special characters ... if you allow backslash
> escaping of those
> > characters, then by definition '\' becomes a special character too.
> >
> > : free-text input, no syntax at all. Right now, I'm escaping every
> > : Lucene special character in the front end. I just figured out that
> > : it breaks for colon, can't search for "12:01" with "12\:01".
> >
> > yeah ... your '\' character is being taken litterally.  you
> shouldn't do
> > any escaping if you hand off to dismax.
> >
> > the right thing to do is probably to expose more the "query
> parsing" stuff
> > as options for hte handler ... let people configure it with what
> > characters should be escaped, and what should be left
> alone.  We should
> > also stop using the static utility methods for things like partial
> > escaping and unbalanced quote striping and start using
> helper methods
> > that subclasses can override.
> >
> >
> > -Hoss
> >
>
>
>

Reply | Threaded
Open this post in threaded view
|

RE: Query with literal quote character: 6'2"

Fuad Efendi
I forgot to mention: default opereator is AND; DisMax.
Withot URL-encoding some queries will show exceptions even with dismax.

> -----Original Message-----
> From: Fuad Efendi [mailto:[hidden email]]
> Sent: Thursday, February 07, 2008 3:31 PM
> To: [hidden email]
> Subject: RE: Query with literal quote character: 6'2"
>
>
> This query works just fine:
> http://www.tokenizer.org/?q=Romeo+%2B+Juliet
>
> %2B is URL-Encoded presentation of +
> It shows, for instance, [Romeo & Juliet] in output.
>
>
> > -----Original Message-----
> > From: Walter Underwood [mailto:[hidden email]]
> > Sent: Thursday, February 07, 2008 3:25 PM
> > To: [hidden email]
> > Subject: Re: Query with literal quote character: 6'2"
> >
> >
> > Our users can blow up the parser without special characters.
> >
> >   AND THE BAND PLAYED ON
> >   TO HAVE AND HAVE NOT
> >
> > Lower-casing in the front end avoids that.
> >
> > We have auto-complete on titles, so the there are plenty
> > of chances to inadvertently use special characters:
> >
> >   Romeo + Juliet
> >   Airplane!
> >   Shrek (Widescreen)
> >
> > We also have people type "--" for a dash in titles.
> >
> > wunder
> >
> > On 2/7/08 12:00 PM, "Chris Hostetter"
> > <[hidden email]> wrote:
> >
> > >
> > > : How about the query parser respecting backslash escaping? I need
> > >
> > > one of the orriginal design decisions was "no user
> > escaping" ... be able
> > > to take in raw query strings from the user with only '+'
> '-' and '"'
> > > treated as special characters ... if you allow backslash
> > escaping of those
> > > characters, then by definition '\' becomes a special
> character too.
> > >
> > > : free-text input, no syntax at all. Right now, I'm escaping every
> > > : Lucene special character in the front end. I just
> figured out that
> > > : it breaks for colon, can't search for "12:01" with "12\:01".
> > >
> > > yeah ... your '\' character is being taken litterally.  you
> > shouldn't do
> > > any escaping if you hand off to dismax.
> > >
> > > the right thing to do is probably to expose more the "query
> > parsing" stuff
> > > as options for hte handler ... let people configure it with what
> > > characters should be escaped, and what should be left
> > alone.  We should
> > > also stop using the static utility methods for things like partial
> > > escaping and unbalanced quote striping and start using
> > helper methods
> > > that subclasses can override.
> > >
> > >
> > > -Hoss
> > >
> >
> >
> >
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Query with literal quote character: 6'2"

hossman
In reply to this post by Walter Underwood, Netflix
: Our users can blow up the parser without special characters.
:
:   AND THE BAND PLAYED ON
:   TO HAVE AND HAVE NOT

Grrr... yeah, i'd forgotten about that problem.  I was hopping LUCENE-682
could solve that (by "unregistering" AND/OR/NOT as operators) but that
issue fairly dead in the water since the performance differnece wsa fairly
significant.

DisMaxQueryParser should really just have it's own grammer instead of the
hacks i put in to subclass QueryParser.

: We have auto-complete on titles, so the there are plenty
: of chances to inadvertently use special characters:
:
:   Romeo + Juliet
:   Airplane!
:   Shrek (Widescreen)
:
: We also have people type "--" for a dash in titles.

Only the '+' and '-' characters are special in those examples ... the
others will be treated as literal characters by dismax.  but like i said
we could patch dismax to have an option containing the list of characters to
auto-escape that would default to the current hardcoded list ... for your
use case you could have all the special characters (including '+' and '-')
.. still wouldn't solve your quote problem though -- that's where
we'd need hooks for subclasses to override the quote striping.


-Hoss

Reply | Threaded
Open this post in threaded view
|

RE: Query with literal quote character: 6'2"

Fuad Efendi
Try this query with asterisk *

http://192.168.1.5:18080/apache-solr-1.2.0/select/?q=*&version=2.2&start=0&r
ows=10&indent=on


Response:
HTTP Status 400 - Query parsing error: Cannot parse '*': '*' or '?' not
allowed as first character in WildcardQuery

----------------------------------------------------------------------------
----

type Status report

message Query parsing error: Cannot parse '*': '*' or '?' not allowed as
first character in WildcardQuery

description The request sent by the client was syntactically incorrect
(Query parsing error: Cannot parse '*': '*' or '?' not allowed as first
character in WildcardQuery).


----------------------------------------------------------------------------
----

Apache Tomcat/6.0.13



I tried to discuss it last year... It shouldn't be HTTP 400. I do not have
such problems probably because I encapsulate * (using front-end layer) into
Item_name:"*"

I encapsulate user input into:
Item_name:"Romeo+Juliet" AND category:"books"
Item_name:"Romeo+Juliet"+category:"books"

Of course I use URL encoding before calling SOLR...


-Fuad

Reply | Threaded
Open this post in threaded view
|

RE: Query with literal quote character: 6'2"

Fuad Efendi
This is what appears in Address Bar of IE:
http://localhost:8080/apache-solr-1.2.0/select/?q=item_name%3A%22Romeo%2BJul
iet%22%2Bcategory%3A%22books%22&version=2.2&start=0&rows=10&indent=on

Input was:
item_name:"Romeo+Juliet"+category:"books"

Another input which works just fine: item_name:"6'\"" (user input was just
6'2")

It is not a bug/problem of SOLR. SOLR can't be exposed directly to end
users. For handling user input and generating SOLR-specific query, use
something... So that I don't really understand why do we need HTTP caching
support at SOLR if we can't use it without "front-end" (off-topic; I use
HTTP caching at front-end, and don't use SOLR's HTTP-cashing at all,
especially because it can't reply on request with If-Modified-Sinse header).


> -----Original Message-----
> From: Fuad Efendi

> I encapsulate user input into:
> Item_name:"Romeo+Juliet" AND category:"books"
> Item_name:"Romeo+Juliet"+category:"books"
>

Reply | Threaded
Open this post in threaded view
|

RE: Query with literal quote character: 6'2"

hossman
In reply to this post by Fuad Efendi


: http://192.168.1.5:18080/apache-solr-1.2.0/select/?q=*&version=2.2&start=0&r
: ows=10&indent=on

That's using standard request handler right? ... that's a much differnet
discussion -- when using standard you must of course be aware of hte
syntax and the special characters ... Walter and i have specificly been
talking about dismax which attempts to protect you from these things.

: description The request sent by the client was syntactically incorrect
: (Query parsing error: Cannot parse '*': '*' or '?' not allowed as first
: character in WildcardQuery).

: I tried to discuss it last year... It shouldn't be HTTP 400. I do not have

why not?  the "client" sent a bad request (with a query string that is
"malformed syntax" according to the contract of standard request handler



-Hoss

Reply | Threaded
Open this post in threaded view
|

RE: Query with literal quote character: 6'2"

hossman
In reply to this post by Fuad Efendi

: It is not a bug/problem of SOLR. SOLR can't be exposed directly to end
: users. For handling user input and generating SOLR-specific query, use

while i agree that you don't wnat to expose your end users directly to
Solr (largely for security reasons) that doesn't mean you *must*
preprocess user entered strings before handing them to dismax ... dismax's
whole goal is to make it posisble for apps to not have to worry about
sanitizing user inputed query strings.

: something... So that I don't really understand why do we need HTTP caching
: support at SOLR if we can't use it without "front-end" (off-topic; I use

yeah ... this is *WAY* off topic ... but the sort answer is: the issues
are orthoginal.  wether or not you let "humans using web browsers" talk to
Solr directly or not doesn't change the fact that Solr should be "well
behaved" regarding HTTP -- which includes output response headers useful
for HTTP caching, and understanding incoming request headers related to
HTTP caching ... just because you expect a some application to sit between
your end user browsers and Solr doesn't exclude teh possibility of having
an HTTP cache sitting between that application and Solr.

-Hoss
Reply | Threaded
Open this post in threaded view
|

Socket exception

Sundar Sankaranarayanan
Hi All,
       I am using Solr for about a couple of months now and am very
satisfied with it. My solr on dev environment runs on a windows box with
1 gig memory and the solr.war is deployed on a jboss 4.05 version. When
investigating on a "Solr commit not working sometimes issue " in our
application, I found out that the server was sometimes throwing a
"socket exception : connection refused" and when ever this was happening
the commit/optimize did not function properly. I am not sure as to why
this is happening as when the box was used to deploy the application, we
never got the issue, but when it solely is not being used as a solr
server, we are getting this. Any ideas / suggestions to solve this is
appreciated.


Thanks and Regards
Sundar

P.S : The stack trace for the same :::


2008-02-06 17:10:08,101 [STDERR:152] ERROR  - Feb 6, 2008 5:10:08 PM
org.apache.solr.core.SolrException log
SEVERE: java.net.SocketException: Connection reset
 at java.net.SocketInputStream.read(SocketInputStream.java:168)
 at
org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.ja
va:747)
 at
org.apache.coyote.http11.InternalInputBuffer$InputStreamInputBuffer.doRe
ad(InternalInputBuffer.java:777)
 at
org.apache.coyote.http11.filters.IdentityInputFilter.doRead(IdentityInpu
tFilter.java:115)
 at
org.apache.coyote.http11.InternalInputBuffer.doRead(InternalInputBuffer.
java:712)
 at org.apache.coyote.Request.doRead(Request.java:418)
 at
org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java
:284)
 at org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:404)
 at org.apache.catalina.connector.InputBuffer.read(InputBuffer.java:299)
 at
org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.j
ava:192)
 at sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java:411)
 at sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:453)
 at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183)
 at java.io.InputStreamReader.read(InputStreamReader.java:167)
 at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:2972)
 at org.xmlpull.mxp1.MXParser.more(MXParser.java:3026)
 at org.xmlpull.mxp1.MXParser.parseAttribute(MXParser.java:2026)
 at org.xmlpull.mxp1.MXParser.parseStartTag(MXParser.java:1799)
 at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1259)
 at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093)
 at org.xmlpull.mxp1.MXParser.nextTag(MXParser.java:1078)
 at
org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequest
Handler.java:298)
 at
org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestH
andler.java:162)
 at
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpd
ateRequestHandler.java:84)
 at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:77)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)
 at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja
va:191)
 at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
ava:159)
 at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica
tionFilterChain.java:202)
 at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt
erChain.java:173)
 at
org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilte
r.java:96)
 at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica
tionFilterChain.java:202)
 at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt
erChain.java:173)
 at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv
e.java:213)
 at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv
e.java:178)
 at
org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAs
sociationValve.java:175)
 at
org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.j
ava:74)
 at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java
:126)
 at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java
:105)
 at
org.jboss.web.tomcat.tc5.jca.CachedConnectionValve.invoke(CachedConnecti
onValve.java:156)
 at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.
java:107)
 at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:1
48)
 at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:86
9)
 at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.proc
essConnection(Http11BaseProtocol.java:664)
 at
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint
.java:527)
 at
org.apache.tomcat.util.net.MasterSlaveWorkerThread.run(MasterSlaveWorker
Thread.java:112)
 at java.lang.Thread.run(Thread.java:595)
Reply | Threaded
Open this post in threaded view
|

RE: Query with literal quote character: 6'2"

Fuad Efendi
In reply to this post by hossman
> while i agree that you don't wnat to expose your end users
> directly to
> Solr (largely for security reasons) that doesn't mean you *must*
> preprocess user entered strings before handing them to dismax
> ... dismax's
> whole goal is to make it posisble for apps to not have to worry about
> sanitizing user inputed query strings.

I am using org.apache.solr.client.solrj.SolrQuery to preprocess user entered
strings.
And I am using dismax & facets:

INFO: /select
facet.limit=100&wt=xml&rows=100&start=0&facet=true&facet.mincount=1&q=Romeo%
2BJuliet&fl=id,item_name,category,price,url,host,country&qt=dismax&version=2
.2&facet.field=country&facet.field=host&facet.field=category&fq=category:"ar
mani"&hl=true 0 1943

(catalina.out file of SOLR,
http://www.tokenizer.org/armani/price.htm?q=Romeo%2bJuliet from production)

As you can see, + sign is properly encoded in URL: %2B
Unfortunately, DISMAX queries via CONSOLE do not support that. Fortunately,
SOLRJ does.

(Sorry for mistake in previous Email: it was direct SOLR request via admin
console with "standard" handler.)

===============
About https://issues.apache.org/jira/browse/SOLR-127

- We do not need this!!!!!!!!!!!

Simply add request parameter http.header="If-Modified-Since: Tue, 05 Feb
2008 03:50:00 GMT", and let SOLR to respond via standard XML message "Not
Modified", and avoid using 400/500/304!!!

Let others manage "Reverse-Proxy" via PHP, HTTPD, Tomcat+Spring, etc.; SOLR
can use exact "last-modified" timestamp from the index.


I am going to comment SOLR-127...





>

Reply | Threaded
Open this post in threaded view
|

RE: Query with literal quote character: 6'2"

Fuad Efendi
> (catalina.out file of SOLR,
> http://www.tokenizer.org/armani/price.htm?q=Romeo%2bJuliet 
> from production)
> ...
> ... DISMAX queries via CONSOLE do not support
> that...

Opsss... Again mistake, sorry.
http://192.168.1.5:18080/apache-solr-1.2.0/select?indent=on&version=2.2&q=Ro
meo%2BJuliet&start=0&rows=10&fl=*%2Cscore&qt=dismax&wt=standard&explainOther
=&hl.fl=


Anyway I can't understand where is the problem?!! Everything works fine with
dismax/standard/escaping/encoding. Can we use AND operator with dismax by
the way? I think: no. And 6'2" works just as prescribed:
http://www.tokenizer.org/shimano/price.htm?q=6'2%22

Reply | Threaded
Open this post in threaded view
|

Re: Query with literal quote character: 6'2"

Yonik Seeley-2
On Feb 7, 2008 6:35 PM, Fuad Efendi <[hidden email]> wrote:
> Anyway I can't understand where is the problem?!! Everything works fine with
> dismax/standard/escaping/encoding.

> Can we use AND operator with dismax by
> the way?

No.

> I think: no. And 6'2" works just as prescribed:

Not really... it depends on the analyzer.  If the index analyzer for
the field ends up stripping off the trailing quote anyway, then the
dismax query (which also dropped the quote) will match documents.
That's why you don't see any issues.

-Yonik
12