merely a suggestion: schema.xml validator or better schema validation logging

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
32 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: merely a suggestion: schema.xml validator or better schema validation logging

Yonik Seeley-2
On 3/3/07, Ryan McKinley <[hidden email]> wrote:
> Is there enough general interest in having error response codes to
> change the standard web.xml config to let the SolrDispatchFilter
> handle /select?

/select should already use HTTP error codes, right?

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: merely a suggestion: schema.xml validator or better schema validation logging

Ryan McKinley
On 3/3/07, Yonik Seeley <[hidden email]> wrote:
> On 3/3/07, Ryan McKinley <[hidden email]> wrote:
> > Is there enough general interest in having error response codes to
> > change the standard web.xml config to let the SolrDispatchFilter
> > handle /select?
>
> /select should already use HTTP error codes, right?
>

i see whats happening...  I ran into this while writing the
SolrDispatchFilter - had me stumped for a while.

The SolrServlet passes along the status code from a SolrException.
This works great if you throw a SolrException with a 'valid' HTTP
status code (400, etc).  But MANY of the SolrExceptions use a status
code '1'.  Then it depends on the servlet container what is actually
sent to the client.  I know resin and jetty do different things.  In
the SolrDispatchFilter, I send a HTTP status code 500 if the
SolrException status is less then 100.
Reply | Threaded
Open this post in threaded view
|

Re: merely a suggestion: schema.xml validator or better schema validation logging

Ryan McKinley
In reply to this post by Yonik Seeley-2
/update

does send 200 even if there was an error.

after SOLR-173 we may want to change the default solrconfig to map
/update so that everything has a consistent error format.


On 3/3/07, Yonik Seeley <[hidden email]> wrote:
> On 3/3/07, Ryan McKinley <[hidden email]> wrote:
> > Is there enough general interest in having error response codes to
> > change the standard web.xml config to let the SolrDispatchFilter
> > handle /select?
>
> /select should already use HTTP error codes, right?
>
> -Yonik
>
Reply | Threaded
Open this post in threaded view
|

Re: merely a suggestion: schema.xml validator or better schema validation logging

Ryan McKinley
In reply to this post by Jed Reynolds-2
For anyone not on the dev list, I just posted:
http://issues.apache.org/jira/browse/SOLR-179

so it is not lost, I also posted Otis' bug report:
http://issues.apache.org/jira/browse/SOLR-180
Reply | Threaded
Open this post in threaded view
|

Re: merely a suggestion: schema.xml validator or better schema validation logging

Yonik Seeley-2
In reply to this post by Ryan McKinley
On 3/3/07, Ryan McKinley <[hidden email]> wrote:
> But MANY of the SolrExceptions use a status
> code '1'.

Hmmm, I did an audit of the exceptions before we entered the incubator, and
I thought I caught all the ones that generated anything out of the 400
and 500 range
and could be thrown during a query (most of the "1" return codes had
to do with schema or config parsing I think).

Any I missed should be fixed.

> Then it depends on the servlet container what is actually
> sent to the client.  I know resin and jetty do different things.  In
> the SolrDispatchFilter, I send a HTTP status code 500 if the
> SolrException status is less then 100.

That sounds fine.  I didn't realize it could even vary by container.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: merely a suggestion: schema.xml validator or better schema validation logging

Ryan McKinley
On 3/3/07, Yonik Seeley <[hidden email]> wrote:

> On 3/3/07, Ryan McKinley <[hidden email]> wrote:
> > But MANY of the SolrExceptions use a status
> > code '1'.
>
> Hmmm, I did an audit of the exceptions before we entered the incubator, and
> I thought I caught all the ones that generated anything out of the 400
> and 500 range
> and could be thrown during a query (most of the "1" return codes had
> to do with schema or config parsing I think).
>
> Any I missed should be fixed.
>

I clearly overstated the case with "MANY" - and you are right, none
are reachable from /select, so i must be off base about the /select
response code stuff.

quick search shows
  IndexSchema.java - 3, "1" status codes
  DirectUpdateHandler.java - 2, "2" status codes
  UpdateHandler.java - 2, "1" status codes

everthing else has 500,400,503
Reply | Threaded
Open this post in threaded view
|

Re: merely a suggestion: schema.xml validator or better schema validation logging

Jed Reynolds-2
In reply to this post by Chris Hostetter-3
Chris Hostetter wrote:
> : I almost didn't notice the exception fly by because there's soooo much
> : log output, and I can see why I might not have noticed. Yay for
>  

> you should be able to configure it to put WARNING and SEVERE messages in a
> seperate log file even.
>  

Certainly! I learned to reconfigure tomcat's logging when I was doing my
Nutch deployment. I'm very likely going to reconfigure my logging.

> i've been thinking a Servlet that didn't depend on any special Solr code
> (so it will work even if SolrCore isn't initialized) but registeres a log
> handler and records the last N messages from Solr above a certain level
> would be handy to refer people to when they are having issues and aren't
> overly comfortable with log files.
>  

Yeah, like a ring buffer for last x number warning|severe messages.

I'm pretty used to looking at apache log files.  Some errors pointing
out configuration or operational failure (like running out of file
descriptors) on the admin and status pages would be helpful because I
think that some people are probably going to check those pages first,
possibly because they're deving and not necessarily watching logs. I'd
still use Solr even if it didn't have a logging servlet, tho ;-)

Jed
Reply | Threaded
Open this post in threaded view
|

Re: merely a suggestion: schema.xml validator or better schema validation logging

Walter Underwood, Netflix
In reply to this post by Chris Hostetter-3
On 3/3/07 1:43 PM, "Chris Hostetter" <[hidden email]> wrote:

> : Right now, Solr accesses the DOM as needed (at runtime) to fetch
> : information. There isn't much up-front checking beyond the XML
> : parser.
>
> bingo, and adding more upfront checking is hard for at least two reasons i
> can think of...
>
> 1) keeping a DTD up to date is a pain sa new features are added
> 2) the way some options are passed to plugable classes makes it impossible
> to validate (ie: tokenizers, caches, etc...)

I was thinking of translating the config file into internal config
properties when it was read, and logging Solr specific errors then.
Things like "I can't load this class" are pretty easy at that poin.

DTDs are inadequate and XML Schema is horrid, plus the error messages
from either would be not particularly useful.

wunder


Reply | Threaded
Open this post in threaded view
|

Re: merely a suggestion: schema.xml validator or better schema validation logging

Chris Hostetter-3
:
: > : Right now, Solr accesses the DOM as needed (at runtime) to fetch
: > : information. There isn't much up-front checking beyond the XML
: > : parser.

: I was thinking of translating the config file into internal config
: properties when it was read, and logging Solr specific errors then.
: Things like "I can't load this class" are pretty easy at that poin.

most of that work is done right now when the solrconfig.xml and schema.xml
are loaded ... any missing classes should be logged as errors
immediately.

I'm actaully haven't a hard time thinking of what kinds of "just in time"
DOM walking is delayed until request ... all of the feld names are already
known, the analyzers are built, the requesthandlers and responsewriters
all exist and have been initialized ... what stuff isn't checked until a
request comes in?



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: merely a suggestion: schema.xml validator or better schema validation logging

Walter Underwood, Netflix
On 3/4/07 3:01 PM, "Chris Hostetter" <[hidden email]> wrote:

> I'm actaully haven't a hard time thinking of what kinds of "just in time"
> DOM walking is delayed until request ... all of the feld names are already
> known, the analyzers are built, the requesthandlers and responsewriters
> all exist and have been initialized ... what stuff isn't checked until a
> request comes in?

I had <mm> (minimum match) blow up at query time with a number
format exception (this is from memory).

I had silent a error that I can't remember the details of, but it
was something like putting the <str> for boost functions outside
the <lst>. It didn't blow up, but it was a nonsense config that
was accepted.

wunder

Reply | Threaded
Open this post in threaded view
|

Re: merely a suggestion: schema.xml validator or better schema validation logging

Chris Hostetter-3

: I had <mm> (minimum match) blow up at query time with a number
: format exception (this is from memory).

That's a RequestHandler specific request param that can also be specified
as a default/invarient/appended init param ... i'm not sure that SolrCore
could do much to validate that when parsing the solrconfig.xml.
DisMaxRequestHandler could possible throw an exception from it's init
method if it sees param it recognizes but can't parse ... but that's a
dangerous road to go down ... what if i want to subclass
DisMaxRequestHandler and change hte format of the "mm" param?

One thing you could do to ensure that your RequestHandler configuration
makes sense without waiting for an error generated by a request, is to put
in some explicit cache warming as part of the firstSearcher listener that
hits each configured requestHandler with the minimal amount of input you
expect ...  then you'll see an error in your log immediately


: I had silent a error that I can't remember the details of, but it
: was something like putting the <str> for boost functions outside
: the <lst>. It didn't blow up, but it was a nonsense config that
: was accepted.

again, there's nothing erroneous about having a <str> outside of a <lst>
when specifing the init params of a RequestHandler as far as SolrCore is
concerned ... it has no idea what types of init params the RequestHandler
wants ... and the StandardRequestHandler could say that if it sees
any top level init params which aren't "defaults", "invarients" or
"appended" then it could complain ... but again: what if i subclass
StandardRequestHandler and i want to add some custom init param to
determine behavior in my subclass?


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: merely a suggestion: schema.xml validator or better schema validation logging

Ryan McKinley
> : I had silent a error that I can't remember the details of, but it
> : was something like putting the <str> for boost functions outside
> : the <lst>. It didn't blow up, but it was a nonsense config that
> : was accepted.
>
> again, there's nothing erroneous about having a <str> outside of a <lst>
> when specifing the init params of a RequestHandler as far as SolrCore is
> concerned ... it has no idea what types of init params the RequestHandler
> wants ... and the StandardRequestHandler could say that if it sees
> any top level init params which aren't "defaults", "invarients" or
> "appended" then it could complain ... but again: what if i subclass
> StandardRequestHandler and i want to add some custom init param to
> determine behavior in my subclass?
>

One trick i have used elsewhere is to output the loaded config and
compare it to the initalazation config - if they are different, there
may be a problem.

We could pretty easily add a utility method like this to
RequestHandlerBase and let RequestHandler's 'validate' their config in
init() - It would not be an automatic thing that applies to every
request handler, but adding some validation to DisMaxRequestHandler
and  StandardRequestHandler would take care of most problems
(especially for beginners)

ryan
12