[jira] Created: (SOLR-713) Differentiated request logging

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (SOLR-713) Differentiated request logging

Nick Burch (Jira)
Differentiated request logging
------------------------------

                 Key: SOLR-713
                 URL: https://issues.apache.org/jira/browse/SOLR-713
             Project: Solr
          Issue Type: Improvement
          Components: search
    Affects Versions: 1.3
            Reporter: Lars Kotthoff
            Priority: Minor


Currently the complete query string is logged for all search requests. When the query string is large, the logs tend to become hard to read. Worse, when using a sharded setup and faceting the query string during the facet count refine phase contains the IDs of all documents for which facet counts are requested, easily amounting to several GB of logs over the course of a day when the number of facets is large.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-713) Differentiated request logging

Nick Burch (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Kotthoff updated SOLR-713:
-------------------------------

    Attachment: SOLR-713.patch

Attaching initial attempt at this which logs the query, highlight, and facet parameter for every query on INFO level and the whole query string on FINE level to reduce log noise.

In the long term the parameters logged on INFO for each request should probably be configurable in solrconfig.xml, but this should probably be tackled as part of a separate issue.

> Differentiated request logging
> ------------------------------
>
>                 Key: SOLR-713
>                 URL: https://issues.apache.org/jira/browse/SOLR-713
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Lars Kotthoff
>            Priority: Minor
>         Attachments: SOLR-713.patch
>
>
> Currently the complete query string is logged for all search requests. When the query string is large, the logs tend to become hard to read. Worse, when using a sharded setup and faceting the query string during the facet count refine phase contains the IDs of all documents for which facet counts are requested, easily amounting to several GB of logs over the course of a day when the number of facets is large.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-713) Differentiated request logging

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627271#action_12627271 ]

Hoss Man commented on SOLR-713:
-------------------------------

+1 on the goal.
-1 on the patch.

I'm really opposed to SolrCore having special behavior for certain params, regardless of how good the intentions are.

A few alternate suggestions...

1) let's start using a special logger for the particular log call involved (at the end of SolrCore.execute) instead of the normal SolrCore logger. that way people can configure it separately from other INFO level actions in SolrCore if they want.  This can be done independent of and in combination with other ideas

2) we could consider eliminating this log call from SolrCore.execute altogether, and require that individual RequestHandlers take the burden -- that way use cases like Shard refinement requests could choose to log themselves differently.

3) leave the log call in, but put in a check for a new value in req.getContext() after the handler.handleRequest call which influences the logging behavior in some way, such as specifying a list of param names that should be left out of hte log message because they are too verbose.

4) add a configuration option indicating a size N ... while iterating over the list of params, if any one param contains one or more values that such that it's resulting string length is more then N characters, truncate the value with "...(M)" where M is the number of total characters that would be output if truncation didn't happen.

I think no matter what we do #1 is a good idea.  I'm also a fan of #4 because: it keeps a standard log message about every request that for "simple" cases will always have all the params even if a RequestHandler is buggy/sneaky; won't be too verbose in the complex case; won't silently hide info.


Also: even if we change nothing else, we should probably put all of this logging work inside a test that the INFO level is even turned on for the logger being used so we dont' waste StringBuilder cycles when people have disabled the logging.

> Differentiated request logging
> ------------------------------
>
>                 Key: SOLR-713
>                 URL: https://issues.apache.org/jira/browse/SOLR-713
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Lars Kotthoff
>            Priority: Minor
>         Attachments: SOLR-713.patch
>
>
> Currently the complete query string is logged for all search requests. When the query string is large, the logs tend to become hard to read. Worse, when using a sharded setup and faceting the query string during the facet count refine phase contains the IDs of all documents for which facet counts are requested, easily amounting to several GB of logs over the course of a day when the number of facets is large.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-713) Differentiated request logging

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627290#action_12627290 ]

Lars Kotthoff commented on SOLR-713:
------------------------------------

bq. 3) leave the log call in, but put in a check for a new value in req.getContext() after the handler.handleRequest call which influences the logging behavior in some way, such as specifying a list of param names that should be left out of hte log message because they are too verbose.

How about configuring this in solrconfig.xml/solr.xml? This could even be implemented in a way that allows you to specify which log level to log which parameters at.

> Differentiated request logging
> ------------------------------
>
>                 Key: SOLR-713
>                 URL: https://issues.apache.org/jira/browse/SOLR-713
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Lars Kotthoff
>            Priority: Minor
>         Attachments: SOLR-713.patch
>
>
> Currently the complete query string is logged for all search requests. When the query string is large, the logs tend to become hard to read. Worse, when using a sharded setup and faceting the query string during the facet count refine phase contains the IDs of all documents for which facet counts are requested, easily amounting to several GB of logs over the course of a day when the number of facets is large.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-713) Differentiated request logging

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627580#action_12627580 ]

Hoss Man commented on SOLR-713:
-------------------------------

i guess the distinction i was trying to draw is between the Solr administrator configuring a max on how much he cares about see logged when the values for any param name get really long, vs a RequestHandler specifying what param names don't need to be logged because they aren't particularly useful for debugging.

If in the later case it makes sense to have the list of fields configured in solrconfig.xml, then it should be as part of that RequestHandler's config.  (the wazbat param for one handler isn't necessarily the same as the wazbat param for another handler, and you might care about logging the id params for one instance of SearchHandler, but not for others)

> Differentiated request logging
> ------------------------------
>
>                 Key: SOLR-713
>                 URL: https://issues.apache.org/jira/browse/SOLR-713
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Lars Kotthoff
>            Priority: Minor
>         Attachments: SOLR-713.patch
>
>
> Currently the complete query string is logged for all search requests. When the query string is large, the logs tend to become hard to read. Worse, when using a sharded setup and faceting the query string during the facet count refine phase contains the IDs of all documents for which facet counts are requested, easily amounting to several GB of logs over the course of a day when the number of facets is large.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.