[jira] Created: (SOLR-37) Add additional configuration options for Highlighting

classic Classic list List threaded Threaded
38 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (SOLR-37) Add additional configuration options for Highlighting

Nick Burch (Jira)
Add additional configuration options for Highlighting
-----------------------------------------------------

                 Key: SOLR-37
                 URL: http://issues.apache.org/jira/browse/SOLR-37
             Project: Solr
          Issue Type: Improvement
          Components: search
            Reporter: Andrew May


As discussed in the mailing list, I've been looking at adding additional configuration options for highlighting.
I've made quite a few changes to the properties for highlighting:

Properties that can be set on request, or in solrconfig.xml at the top level:
  highlight (true/false)
  highlightFields
Properties that can be set in solrconfig.xml at the top level or per-field
  formatter (simple/gradient)
  formatterPre (preTag for simple formatter)
  formatterPost (postTag for simple formatter)
  formatterMinFgCl (min foreground colour for gradient formatter)
  formatterMaxFgCl (max foreground colour for gradient formatter)
  formatterMinBgCl (min background colour for gradient formatter)
  formatterMaxBgCl (max background colour for gradient formatter)
  fragsize (if <=0 use NullFragmenter, otherwise use GapFragmenter with this value)

I've added variables for these values to CommonParams, plus there's a fields Map<String,CommonParams> that is parsed from nested NamedLists (i.e. a lst named "fields", with a nested lst for each field).

Here's a sample of how you can mix and match properties in solrconfig.xml:

  <requestHandler name="hl" class="solr.StandardRequestHandler" >
    <str name="formatter">simple</str>
    <str name="formatterPre">&lt;i></str>
    <str name="formatterPost">&lt;/i></str>
    <str name="highlightFields">title,authors,journal</str>
    <int name="fragsize">0</int>
    <lst name="fields">
      <lst name="abstract">
        <str name="formatter">gradient</str>
        <str name="formatterMinBgCl">#FFFF99</str>
        <str name="formatterMaxBgCl">#FF9900</str>
        <int name="fragsize">30</int>
        <int name="maxSnippets">2</int>
      </lst>
      <lst name="authors">
        <str name="formatterPre">&lt;strong></str>
        <str name="formatterPost">&lt;/strong></str>
      </lst>
    </lst>
  </requestHandler>

I've created HighlightingUtils to handle most of the parameter parsing, but the hightlighting is still done in SolrPluginUtils and the doStandardHighlighting() method still has the same signature, but the other highlighting methods have had to be changed (because highlighters are now created per highlighted field).

I'm not particularly happy with the code to pull parameters from CommonParams, first checking the field then falling back, e.g.:
         String pre = (params.fields.containsKey(fieldName) && params.fields.get(fieldName).formatterPre != null) ?
               params.fields.get(fieldName).formatterPre :
                  params.formatterPre != null ? params.formatterPre : "<em>";

I've removed support for a custom formatter - just choosing between simple/gradient. Probably that's a bad decision, but I wanted an easy way to choose between the standard formatters without having to invent a generic way of supplying arguments for the constructor. Perhaps there should be formatterType=simple/gradient and formatterClass=... which overrides formatterType if set at a lower level - with the formatterClass having to have a zero-args constructor? Note: gradient is actually SpanGradientFormatter.

I'm not sure I properly understand how Fragmenters work, so supplying fragsize to GapFragmenter where >0 (instead of what was a default of 50) may not make sense.


--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-37) Add additional configuration options for Highlighting

Nick Burch (Jira)
     [ http://issues.apache.org/jira/browse/SOLR-37?page=all ]

Andrew May updated SOLR-37:
---------------------------

    Attachment: patch

Changes to CommonParams, SolrPluginUtils, plus new HighlightingUtils

> Add additional configuration options for Highlighting
> -----------------------------------------------------
>
>                 Key: SOLR-37
>                 URL: http://issues.apache.org/jira/browse/SOLR-37
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Andrew May
>         Attachments: patch
>
>
> As discussed in the mailing list, I've been looking at adding additional configuration options for highlighting.
> I've made quite a few changes to the properties for highlighting:
> Properties that can be set on request, or in solrconfig.xml at the top level:
>   highlight (true/false)
>   highlightFields
> Properties that can be set in solrconfig.xml at the top level or per-field
>   formatter (simple/gradient)
>   formatterPre (preTag for simple formatter)
>   formatterPost (postTag for simple formatter)
>   formatterMinFgCl (min foreground colour for gradient formatter)
>   formatterMaxFgCl (max foreground colour for gradient formatter)
>   formatterMinBgCl (min background colour for gradient formatter)
>   formatterMaxBgCl (max background colour for gradient formatter)
>   fragsize (if <=0 use NullFragmenter, otherwise use GapFragmenter with this value)
> I've added variables for these values to CommonParams, plus there's a fields Map<String,CommonParams> that is parsed from nested NamedLists (i.e. a lst named "fields", with a nested lst for each field).
> Here's a sample of how you can mix and match properties in solrconfig.xml:
>   <requestHandler name="hl" class="solr.StandardRequestHandler" >
>     <str name="formatter">simple</str>
>     <str name="formatterPre">&lt;i></str>
>     <str name="formatterPost">&lt;/i></str>
>     <str name="highlightFields">title,authors,journal</str>
>     <int name="fragsize">0</int>
>     <lst name="fields">
>       <lst name="abstract">
>         <str name="formatter">gradient</str>
>         <str name="formatterMinBgCl">#FFFF99</str>
>         <str name="formatterMaxBgCl">#FF9900</str>
>         <int name="fragsize">30</int>
>         <int name="maxSnippets">2</int>
>       </lst>
>       <lst name="authors">
>         <str name="formatterPre">&lt;strong></str>
>         <str name="formatterPost">&lt;/strong></str>
>       </lst>
>     </lst>
>   </requestHandler>
> I've created HighlightingUtils to handle most of the parameter parsing, but the hightlighting is still done in SolrPluginUtils and the doStandardHighlighting() method still has the same signature, but the other highlighting methods have had to be changed (because highlighters are now created per highlighted field).
> I'm not particularly happy with the code to pull parameters from CommonParams, first checking the field then falling back, e.g.:
>          String pre = (params.fields.containsKey(fieldName) && params.fields.get(fieldName).formatterPre != null) ?
>                params.fields.get(fieldName).formatterPre :
>                   params.formatterPre != null ? params.formatterPre : "<em>";
> I've removed support for a custom formatter - just choosing between simple/gradient. Probably that's a bad decision, but I wanted an easy way to choose between the standard formatters without having to invent a generic way of supplying arguments for the constructor. Perhaps there should be formatterType=simple/gradient and formatterClass=... which overrides formatterType if set at a lower level - with the formatterClass having to have a zero-args constructor? Note: gradient is actually SpanGradientFormatter.
> I'm not sure I properly understand how Fragmenters work, so supplying fragsize to GapFragmenter where >0 (instead of what was a default of 50) may not make sense.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Updated: (SOLR-37) Add additional configuration options for Highlighting

Yonik Seeley
Thanks Andrew!

I still think that being able to configure how a field is highlighted
on a per-query basis is almost as basic as being able to ask for which
fields to highlight or which fields to return.  Regardless, per-query
overriding can be added later, but I think it should be kept in mind
at least while designing the configuration.

A couple of points about the config interface:
I think the proliferation of parameter names may be confusing either
in solrconfig.xml or in query args.

I think we should be using a convention like namespaces, much like
java property files do (think what a property file would be like
without that).

Parameter names like "formatter" or "fields" are pretty confusing if
you don't know the context is highlighting.... people could easily
think formatter specifies the output format (XML, JSON, etc), and
could very easily think that "fields" were the stored fields to
return.

Also, parameters like formatterPre, formatterPost, formatterMinFgCl,
etc aren't global... they only apply to specific highlighter
formatters, and you have to understand a lot about those formatters to
understand which apply.  Some hierarchy could be added to reflect that
also.

If we put more of this info into our parameter names, it would ease
the burden of understanding for new users (and experienced ones too
perhaps)

So query args could be:

hl.formatter=simple
hl.fragsize=100
hl.simple.pre=<em>
hl.simple.post=</em>
hl.color.minBg=#FFFF99
hl.color.maxBg=#FF9900

And solrconfig.xml config could be the simple form

<str name="hl.formatter">simple</str>
<int name="hl.fragsize">simple</str>

OR, maybe it could be more structured and put the hierarchy in the XML:
<lst name="hl>
  <str name="formatter">simple</name>
  <int name="fragsize">100</int>
  <lst name="simple">
    <str name="pre">&lt;em></str>
    <str name="post">&lt;/em></str>
  </lst>

In either XML config schema, it makes sense to try and keep it easy to
figure out how to override it via a query param (simple name mapping).

Of course I've only addressed global defaults, not per-field defaults,
but the same style could be used.

After going through this exercise and thinking it through, I think I'm
coming back to the same preference that Mike Klass had from the list
of examples: field properties.  That way, if any other parameters need
per-field config, someone doesn't end up re-inventing the wheel all
over again in a non-consistent manner.  It's also relatively easy to
explain to someone.

http://www.nabble.com/Support-for-custom-Fragmenter-when-Highlighting-tf1962395.html#a5386994
: : #model things as properties on fields (with f. being the field namespace)
: :
: : f.foo.fragsize=0
: : f.bar.fragsize=1000
: : f.*.fragsize=100   #the default
:
: I like this option the best, though the wildcard specification might
get out of hand.
:
: There could be a top-level namespace:
: hl.fragsize = 100 #default
: And field-level overrides precisely matching the top-level general params:
: f.foo.hl.fragsize = 0

Plugins could then do something like: getFieldProperty("title","hl.formatter")
and get the built-in standard mechanism for checking a hierarchy of
places this property could be defined (handler defaults, handler field
specific config, query defaults, query field specific config).

-Yonik

On 7/21/06, Andrew May (JIRA) <[hidden email]> wrote:

>      [ http://issues.apache.org/jira/browse/SOLR-37?page=all ]
>
> Andrew May updated SOLR-37:
> ---------------------------
>
>     Attachment: patch
>
> Changes to CommonParams, SolrPluginUtils, plus new HighlightingUtils
>
> > Add additional configuration options for Highlighting
> > -----------------------------------------------------
> >
> >                 Key: SOLR-37
> >                 URL: http://issues.apache.org/jira/browse/SOLR-37
> >             Project: Solr
> >          Issue Type: Improvement
> >          Components: search
> >            Reporter: Andrew May
> >         Attachments: patch
> >
> >
> > As discussed in the mailing list, I've been looking at adding additional configuration options for highlighting.
> > I've made quite a few changes to the properties for highlighting:
> > Properties that can be set on request, or in solrconfig.xml at the top level:
> >   highlight (true/false)
> >   highlightFields
> > Properties that can be set in solrconfig.xml at the top level or per-field
> >   formatter (simple/gradient)
> >   formatterPre (preTag for simple formatter)
> >   formatterPost (postTag for simple formatter)
> >   formatterMinFgCl (min foreground colour for gradient formatter)
> >   formatterMaxFgCl (max foreground colour for gradient formatter)
> >   formatterMinBgCl (min background colour for gradient formatter)
> >   formatterMaxBgCl (max background colour for gradient formatter)
> >   fragsize (if <=0 use NullFragmenter, otherwise use GapFragmenter with this value)
> > I've added variables for these values to CommonParams, plus there's a fields Map<String,CommonParams> that is parsed from nested NamedLists (i.e. a lst named "fields", with a nested lst for each field).
> > Here's a sample of how you can mix and match properties in solrconfig.xml:
> >   <requestHandler name="hl" class="solr.StandardRequestHandler" >
> >     <str name="formatter">simple</str>
> >     <str name="formatterPre">&lt;i></str>
> >     <str name="formatterPost">&lt;/i></str>
> >     <str name="highlightFields">title,authors,journal</str>
> >     <int name="fragsize">0</int>
> >     <lst name="fields">
> >       <lst name="abstract">
> >         <str name="formatter">gradient</str>
> >         <str name="formatterMinBgCl">#FFFF99</str>
> >         <str name="formatterMaxBgCl">#FF9900</str>
> >         <int name="fragsize">30</int>
> >         <int name="maxSnippets">2</int>
> >       </lst>
> >       <lst name="authors">
> >         <str name="formatterPre">&lt;strong></str>
> >         <str name="formatterPost">&lt;/strong></str>
> >       </lst>
> >     </lst>
> >   </requestHandler>
> > I've created HighlightingUtils to handle most of the parameter parsing, but the hightlighting is still done in SolrPluginUtils and the doStandardHighlighting() method still has the same signature, but the other highlighting methods have had to be changed (because highlighters are now created per highlighted field).
> > I'm not particularly happy with the code to pull parameters from CommonParams, first checking the field then falling back, e.g.:
> >          String pre = (params.fields.containsKey(fieldName) && params.fields.get(fieldName).formatterPre != null) ?
> >                params.fields.get(fieldName).formatterPre :
> >                   params.formatterPre != null ? params.formatterPre : "<em>";
> > I've removed support for a custom formatter - just choosing between simple/gradient. Probably that's a bad decision, but I wanted an easy way to choose between the standard formatters without having to invent a generic way of supplying arguments for the constructor. Perhaps there should be formatterType=simple/gradient and formatterClass=... which overrides formatterType if set at a lower level - with the formatterClass having to have a zero-args constructor? Note: gradient is actually SpanGradientFormatter.
> > I'm not sure I properly understand how Fragmenters work, so supplying fragsize to GapFragmenter where >0 (instead of what was a default of 50) may not make sense.
>
> --
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see: http://www.atlassian.com/software/jira
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (SOLR-37) Add additional configuration options for Highlighting

Mike Klaas
In reply to this post by Nick Burch (Jira)
On 7/21/06, Andrew May (JIRA) <[hidden email]> wrote:

> As discussed in the mailing list, I've been looking at adding additional configuration options for highlighting.
> I've made quite a few changes to the properties for highlighting:

Cool -- this is definately useful.

> Properties that can be set on request, or in solrconfig.xml at the top level:
>   highlight (true/false)
>   highlightFields
> Properties that can be set in solrconfig.xml at the top level or per-field
>   formatter (simple/gradient)
>   formatterPre (preTag for simple formatter)
>   formatterPost (postTag for simple formatter)
>   formatterMinFgCl (min foreground colour for gradient formatter)
>   formatterMaxFgCl (max foreground colour for gradient formatter)
>   formatterMinBgCl (min background colour for gradient formatter)
>   formatterMaxBgCl (max background colour for gradient formatter)
>   fragsize (if <=0 use NullFragmenter, otherwise use GapFragmenter with this value)

I agree with Yonik that the number of request handler parameters has
gotten a bit too large for a flat namespace (the tipping point was
probably the introduction of three highlighting parameters--so you can
blame me <g>).

> I've created HighlightingUtils to handle most of the parameter parsing, but the hightlighting is still done in SolrPluginUtils and the doStandardHighlighting() method still has the same signature, but the other highlighting methods have had to be changed (because highlighters are now created per highlighted field).

Highlighting has definately grown to the point where it should be in
its own class, and I could see the justification for giving it its own
package, given there there are now at least three supporting classes
for Solr, and this could easily grow.

> I'm not particularly happy with the code to pull parameters from CommonParams, first checking the field then falling back, e.g.:
>          String pre = (params.fields.containsKey(fieldName) && params.fields.get(fieldName).formatterPre != null) ?
>                params.fields.get(fieldName).formatterPre :
>                   params.formatterPre != null ? params.formatterPre : "<em>";

Parameter handling is a little clunky as it is, as CommonParams was
added after general parameter handling, so it wasn't as tightly
integrated as it should be.

Currently:
  (static) CommonParams contains parameter defaults
  instance of CommonParams contains the defaults for a given request handler
  instance of SolrQueryRequest contains passed-in query parameters
  (static) method of SolrPluginUtils can be used to retrieve
parameters, as follows:
          SolrPluginUtils.getParam(req,"param", commomParams.<defaultValue>)

This isn't ideal, and as you've demonstrated, an absolute disaster
once you try to tack on per-field parameters.  There could be some
sort of CommonParams-like class which parses the default parameters,
like it does currently (but adding logic for per-field overrides), but
having an addition method which accepts a SolrQueryRequest and adds
any per-query overrides, finally outputting an instance of some
Parameter class which can be queried to determine the global and
per-field settings in one central location.

Designing and implementing this carefully will be time-consuming but
important, I think.  Without it, it is more difficult to accept useful
contributions like yours.

> I've removed support for a custom formatter - just choosing between simple/gradient. Probably that's a bad decision, but I wanted an easy way to choose between the standard formatters without having to invent a generic way of supplying arguments for the constructor. Perhaps there should be formatterType=simple/gradient and formatterClass=... which overrides formatterType if set at a lower level - with the formatterClass having to have a zero-args constructor? Note: gradient is actually SpanGradientFormatter.

It isn't worth supporting formatterClass without making it more
general.  I'd like to see an extension which allows the specification
of a class and an array of constructor args.

ex.
<class name="formatter"
classname="org.apache.lucene.search.highlighting.SimpleHTMLFormatter"
/>

<class name="formatter"
classname="org.apache.lucene.search.highlighting.SimpleHTMLFormatter">
  <arr>
      <str>&lt;strong></str>
      <str>&tl;/string></str>
  <arr>
</class>

> I'm not sure I properly understand how Fragmenters work, so supplying fragsize to GapFragmenter where >0 (instead of what was a default of 50) may not make sense.

Passing to the constructor makes sense, as it sets the fragmentSize of
the base SimpleFragmenter class (note the default is 100; 50 is the
position gap which indicates when to start a new fragment).

Hopefully I'll have time in a few days to look at the patch and give
you some feedback.

Thanks,
-Mike
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Updated: (SOLR-37) Add additional configuration options for Highlighting

Andrew May
In reply to this post by Yonik Seeley
Yonik Seeley wrote:
> Thanks Andrew!
>
> I still think that being able to configure how a field is highlighted
> on a per-query basis is almost as basic as being able to ask for which
> fields to highlight or which fields to return.  Regardless, per-query
> overriding can be added later, but I think it should be kept in mind
> at least while designing the configuration.
>

I'm not sure I agree, as I think that it's unlikely to have many different variations on
how the highlighting should be configured, and multiple handlers can be configured to cope
with a small number of variations. But that's getting into "matters of opinion", and I'm
the Solr newbie...

One question that would need addressing if these configuration options were supported via
the request is whether top-level settings on the request override per-field settings.

The other big question is the exact syntax, and without a clear idea for that I didn't
want to tackle it.

> A couple of points about the config interface:
> I think the proliferation of parameter names may be confusing either
> in solrconfig.xml or in query args.
>
> I think we should be using a convention like namespaces, much like
> java property files do (think what a property file would be like
> without that).
>
> Parameter names like "formatter" or "fields" are pretty confusing if
> you don't know the context is highlighting.... people could easily
> think formatter specifies the output format (XML, JSON, etc), and
> could very easily think that "fields" were the stored fields to
> return.
>

Point taken - although I could think of a better name for "fields" apart from something
verbose like "perFieldConfiguration". Of course it could just be "f" as kind of suggested
below.

> Also, parameters like formatterPre, formatterPost, formatterMinFgCl,
> etc aren't global... they only apply to specific highlighter
> formatters, and you have to understand a lot about those formatters to
> understand which apply.  Some hierarchy could be added to reflect that
> also.
>
> If we put more of this info into our parameter names, it would ease
> the burden of understanding for new users (and experienced ones too
> perhaps)
>
> So query args could be:
>
> hl.formatter=simple
> hl.fragsize=100
> hl.simple.pre=<em>
> hl.simple.post=</em>
> hl.color.minBg=#FFFF99
> hl.color.maxBg=#FF9900
>
> And solrconfig.xml config could be the simple form
>
> <str name="hl.formatter">simple</str>
> <int name="hl.fragsize">simple</str>
>
> OR, maybe it could be more structured and put the hierarchy in the XML:
> <lst name="hl>
>  <str name="formatter">simple</name>
>  <int name="fragsize">100</int>
>  <lst name="simple">
>    <str name="pre">&lt;em></str>
>    <str name="post">&lt;/em></str>
>  </lst>
>

I think that starts to get a bit confusing if you're already in a nested list of lists for
a per-field configuration.

> In either XML config schema, it makes sense to try and keep it easy to
> figure out how to override it via a query param (simple name mapping).
>
> Of course I've only addressed global defaults, not per-field defaults,
> but the same style could be used.
>
> After going through this exercise and thinking it through, I think I'm
> coming back to the same preference that Mike Klass had from the list
> of examples: field properties.  That way, if any other parameters need
> per-field config, someone doesn't end up re-inventing the wheel all
> over again in a non-consistent manner.  It's also relatively easy to
> explain to someone.
>
> http://www.nabble.com/Support-for-custom-Fragmenter-when-Highlighting-tf1962395.html#a5386994 
>
> : : #model things as properties on fields (with f. being the field
> namespace)
> : :
> : : f.foo.fragsize=0
> : : f.bar.fragsize=1000
> : : f.*.fragsize=100   #the default
> :
> : I like this option the best, though the wildcard specification might
> get out of hand.
> :
> : There could be a top-level namespace:
> : hl.fragsize = 100 #default
> : And field-level overrides precisely matching the top-level general
> params:
> : f.foo.hl.fragsize = 0
>
> Plugins could then do something like:
> getFieldProperty("title","hl.formatter")
> and get the built-in standard mechanism for checking a hierarchy of
> places this property could be defined (handler defaults, handler field
> specific config, query defaults, query field specific config).
>

It would be good to have a generic way for checking values set at
field/handler/request/default levels, but what stopped me from doing this was the way
CommonParams parses the NamedList it's given and sets a number of variables for the values
it understands.

It would be a lot easier if CommonParams just kept the NamedList and we pull values from
that by name - but that was more radical a change than I wanted to make.

I'll be honest - I only have so much patience for working on configuration issues (I can
never find a good solution for this kind of thing), and I'm reluctant to spend any more
time on this unless I feel there is a consensus amongst the people on this mailing list,
so that I can come up with a patch that is likely to be accepted.

I guess what it comes down to is that we need a clear specification of whether there are
going to be namespaces or something similar in query parameters, and also how per-field
query parameters should look. And then how this maps to configuration in solrconfig.xml
and how that is stored and accessed in the code.

We could take something like "." in a query parameter name to be equivalent to nested
lists in solrconfig.xml, so "foo.bar.fubar=..." on the query string is comparable to
<lst name="foo">
   <lst name="bar">
      <str name="fubar">...</str>
   </lst>
</lst>

But if you also use "." as a way of introducing grouping between properties, then you
could get a lot of nesting - e.g. specifying the pre tag for a simple formatter for the
title field might be "f.title.hl.simple.pre=<em>" which is a bit more nesting than I'd be
comfortable with (or am prepared to type out!).

If instead we separated on ":" and used "." just as part of a naming convention, then
"f:title:hl.simple.pre=<em>" would be equivalent to:

<lst name="f">
   <lst name="title">
      <str name="hl.simple.pre">&lt;em></str>
   </lst>
</lst>

If something is specified globally, there's no wildcards, just a top level property.

How does that sound?

That leaves the issue of what to do with CommonParams. If we don't test the types in
solrconfig.xml when Solr starts then it introduces the possibility of type errors occuring
at runtime. Perhaps we could keep everything in the named list, but define the know typed
properties elsewhere (e.g. an Enum) and validate when the handler is initialised?

Sigh. I just don't know how to write short technical emails.

-Andrew
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (SOLR-37) Add additional configuration options for Highlighting

Andrew May
In reply to this post by Mike Klaas
Mike Klaas wrote:
> Highlighting has definately grown to the point where it should be in
> its own class, and I could see the justification for giving it its own
> package, given there there are now at least three supporting classes
> for Solr, and this could easily grow.
>

I was a bit nervous about doing anything as radical as repackaging, as I don't really have
a good grasp for how Solr fits together yet.

> It isn't worth supporting formatterClass without making it more
> general.  I'd like to see an extension which allows the specification
> of a class and an array of constructor args.
>
> ex.
> <class name="formatter"
> classname="org.apache.lucene.search.highlighting.SimpleHTMLFormatter"
> />
>
> <class name="formatter"
> classname="org.apache.lucene.search.highlighting.SimpleHTMLFormatter">
>  <arr>
>      <str>&lt;strong></str>
>      <str>&tl;/string></str>
>  <arr>
> </class>
>

Sounds like a good idea. Do you think there's still value in being able to choose between
simple/gradient with defined properties for the arguments? Unless someone wants to use a
custom formatter that's going to be easier to configure without consulting the javadoc for
those formatters (although I'm not sure I understand the Gradient formatter as it just
seems to put out the same background colour for everything - which may be a bug or
misunderstanding in what I've done).

>> I'm not sure I properly understand how Fragmenters work, so supplying
>> fragsize to GapFragmenter where >0 (instead of what was a default of
>> 50) may not make sense.
>
> Passing to the constructor makes sense, as it sets the fragmentSize of
> the base SimpleFragmenter class (note the default is 100; 50 is the
> position gap which indicates when to start a new fragment).
>

OK - I did mess this up then! Do you think the incrementThreshold in GapFragmenter (which
is what I was actually setting) is something that needs to be configurable as well?

-Andrew
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (SOLR-37) Add additional configuration options for Highlighting

Mike Klaas
On 7/21/06, Andrew May <[hidden email]> wrote:
> Mike Klaas wrote:

> > [snip general config object constructor]

> Sounds like a good idea. Do you think there's still value in being able to choose between
> simple/gradient with defined properties for the arguments? Unless someone wants to use a
> custom formatter that's going to be easier to configure without consulting the javadoc for
> those formatters (although I'm not sure I understand the Gradient formatter as it just
> seems to put out the same background colour for everything - which may be a bug or
> misunderstanding in what I've done).

I think there might be value in providing less explicit Highlighter
configuration and instead provide a set of intuitive options form
which we can construct various configurations.  For another point of
configuration is the Scorer -- right now the default QueryScorer is
used.  But you can also specify an IndexReader+fieldName to the
QueryScorer constructor to augment fragment scores with IDF values for
the terms, which is necessary to see a difference in Gradient
formatting.  It would be nice to hide all those details from the user.

Perhaps the formatter spec could just be a single string:

formatter="simple" # use default pre/post
formatter = "simple;<em>;</em>"
formatter = "gradient;#FFFFFF;#FFFFFF;#FFFFFF;#FFFFFF"

> > Passing to the constructor makes sense, as it sets the fragmentSize of
> > the base SimpleFragmenter class (note the default is 100; 50 is the
> > position gap which indicates when to start a new fragment).
> >
>
> OK - I did mess this up then! Do you think the incrementThreshold in GapFragmenter (which
> is what I was actually setting) is something that needs to be configurable as well?

I don't think so.  The most sensible thing to do if you are
constructing a per-field Highlighter is to use the
positionINcrementGap of the field's Analyzer.  But the current default
seems to do fine.

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (SOLR-37) Add additional configuration options for Highlighting

Andrew May
In reply to this post by Nick Burch (Jira)
Just a FYI - I messed up the patch somehow - one of the files had got reverted to an
earlier version, so what I attached doesn't actually compile.

Given the discussion about config, it doesn't really seem worth attaching a fixed patch,
but let me know if that would be useful.

-Andrew
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (SOLR-37) Add additional configuration options for Highlighting

Yonik Seeley
In reply to this post by Mike Klaas
On 7/21/06, Mike Klaas <[hidden email]> wrote:
> I think there might be value in providing less explicit Highlighter
> configuration and instead provide a set of intuitive options form
> which we can construct various configurations.

Sounds good... if we do it well, pretty much no one will need a custom
highlighter.
In the very rare case they did, a custom query handler could always be
a fallback.

> For another point of
> configuration is the Scorer -- right now the default QueryScorer is
> used.  But you can also specify an IndexReader+fieldName to the
> QueryScorer constructor to augment fragment scores with IDF values for
> the terms, which is necessary to see a difference in Gradient
> formatting.  It would be nice to hide all those details from the user.

Agree.

> Perhaps the formatter spec could just be a single string:
>
> formatter="simple" # use default pre/post
> formatter = "simple;<em>;</em>"
> formatter = "gradient;#FFFFFF;#FFFFFF;#FFFFFF;#FFFFFF"

That goes back to the different ways to specify config. This is a
variant of having more complex query args with a secondary parse (and
escaping rules, etc).  I'm undecided on if this is better than
separate properties.  It is more compact at least, and works OK when
the number of parameters is limited.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (SOLR-37) Add additional configuration options for Highlighting

Andrew May
In reply to this post by Mike Klaas
Mike Klaas wrote:

>
> I think there might be value in providing less explicit Highlighter
> configuration and instead provide a set of intuitive options form
> which we can construct various configurations.  For another point of
> configuration is the Scorer -- right now the default QueryScorer is
> used.  But you can also specify an IndexReader+fieldName to the
> QueryScorer constructor to augment fragment scores with IDF values for
> the terms, which is necessary to see a difference in Gradient
> formatting.  It would be nice to hide all those details from the user.
>

Well, that explains why I wasn't getting any variations when using the gradient formatter.
Also, creating the QueryScorer with the fieldName prevents highlights from appearing where
they shouldn't - e.g. when searching for +title:management +journal:quarterly I was seeing
"Management" highlighted in my journal field as well as the title.

Is there much of an extra cost in using a scorer created with the IndexReader and fieldName?

-Andrew
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (SOLR-37) Add additional configuration options for Highlighting

Mike Klaas
On 7/21/06, Andrew May <[hidden email]> wrote:

> Well, that explains why I wasn't getting any variations when using the gradient formatter.
> Also, creating the QueryScorer with the fieldName prevents highlights from appearing where
> they shouldn't - e.g. when searching for +title:management +journal:quarterly I was seeing
> "Management" highlighted in my journal field as well as the title.

Interesting--I wasn't aware that it affected that.

> Is there much of an extra cost in using a scorer created with the IndexReader and fieldName?

I don't expect so, but I'm not entirely sure.  Regardless, I think the
additional functionality (and field-specific highlighting) is
sufficient reason to enable it by default.  If term lookup is slow,
that can be fixed in the future with a cache.

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Updated: (SOLR-37) Add additional configuration options for Highlighting

Yonik Seeley
In reply to this post by Andrew May
On 7/21/06, Andrew May <[hidden email]> wrote:

> I guess what it comes down to is that we need a clear specification of whether there are
> going to be namespaces or something similar in query parameters, and also how per-field
> query parameters should look. And then how this maps to configuration in solrconfig.xml
> and how that is stored and accessed in the code.
>
> We could take something like "." in a query parameter name to be equivalent to nested
> lists in solrconfig.xml, so "foo.bar.fubar=..." on the query string is comparable to
> <lst name="foo">
>    <lst name="bar">
>       <str name="fubar">...</str>
>    </lst>
> </lst>
>
> But if you also use "." as a way of introducing grouping between properties, then you
> could get a lot of nesting - e.g. specifying the pre tag for a simple formatter for the
> title field might be "f.title.hl.simple.pre=<em>" which is a bit more nesting than I'd be
> comfortable with (or am prepared to type out!).
>
> If instead we separated on ":" and used "." just as part of a naming convention, then
> "f:title:hl.simple.pre=<em>" would be equivalent to:
>
> <lst name="f">
>    <lst name="title">
>       <str name="hl.simple.pre">&lt;em></str>
>    </lst>
> </lst>
>
> If something is specified globally, there's no wildcards, just a top level property.
>
> How does that sound?

Most of the time, config won't be per-field, so that should be kept
simple (so I agree with the no wildcards part).

I'm not sure I like the mix of ':' and '.', but we could just leave
everything as '.' and special case per-field overrides so that the XML
looks like what you have above.

The other alternative is just to collapse everything:
<name="hl.fragsize">
<name="f.title.hl.fragsize">

A one-to-one mapping of queryargs to solrconfig.xml params also has
it's bonuses of being easier to explain to people.

Anyway, thanks for your input! I know this has widened in scope and
gotten a little off the path of what you were trying to achieve.  The
difficulties lie in Solr's configuration growing pains.  (Spring or
HiveMind Solr 2.0 anyone ;-)


-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (SOLR-37) Add additional configuration options for Highlighting

Yonik Seeley
In reply to this post by Mike Klaas
On 7/21/06, Mike Klaas <[hidden email]> wrote:
> There could be some
> sort of CommonParams-like class which parses the default parameters,
> like it does currently (but adding logic for per-field overrides), but
> having an addition method which accepts a SolrQueryRequest and adds
> any per-query overrides, finally outputting an instance of some
> Parameter class which can be queried to determine the global and
> per-field settings in one central location.

I think that is a good idea in general.  It also can provide more of a
proper interface behind which we can optimize parameter
getting/parsing, which can become important as the number of
parameters grows.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (SOLR-37) Add additional configuration options for Highlighting

Chris Hostetter-3
In reply to this post by Yonik Seeley

: > I think there might be value in providing less explicit Highlighter
: > configuration and instead provide a set of intuitive options form
: > which we can construct various configurations.
:
: Sounds good... if we do it well, pretty much no one will need a custom
: highlighter.
: In the very rare case they did, a custom query handler could always be
: a fallback.

+1, +1



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (SOLR-37) Add additional configuration options for Highlighting

Andrew May
In reply to this post by Mike Klaas
Mike Klaas wrote:

> On 7/21/06, Andrew May <[hidden email]> wrote:
>
>> Well, that explains why I wasn't getting any variations when using the
>> gradient formatter.
>> Also, creating the QueryScorer with the fieldName prevents highlights
>> from appearing where
>> they shouldn't - e.g. when searching for +title:management
>> +journal:quarterly I was seeing
>> "Management" highlighted in my journal field as well as the title.
>
> Interesting--I wasn't aware that it affected that.
>

I found that there are downsides to supplying the fieldName to the QueryScorer - it means
you can't search one field and highlight another - I have a combined field for
"title/keywords/abstract", but only title is displayed on the search results page, so I
just want to highlight "title", which doesn't work if the QueryScorer is constructed that
way. Likewise, if you've built a summary field for display in search results, but you
search other fields you won't be able to highlight the summary.

I guess it's another thing for the list of things that needs to be configurable. Perhaps
"hl.exact=true/false"?

-Andrew
Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-37) Add additional configuration options for Highlighting

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
     [ http://issues.apache.org/jira/browse/SOLR-37?page=all ]

Andrew May updated SOLR-37:
---------------------------

    Attachment: patch

New patch to configure Highlighting using new SolrParams.

Parameters:
hl - turn highlighting on/off
hl.fl - list of fields to be highlighted, either as a single parameter (e.g. hl.fl=title,keywords) or multiple parameters (hl.fl=title&hl.fl=keywords)
hl.snippets - maximum number of highlight snippets (default = 1)
hl.fragsize - fragment size for highlighting, 0 -> NullFragmenter (default = 50)
hl.formatter - value of either simple or gradient (default = simple)
hl.simple.pre - simple formatter pre tag (default = <em>)
hl.simple.post - simple formatter post tag (default = </em>)
hl.gradient.minFg - gradient formatter min foreground colour
hl.gradient.maxFg - gradient formatter max foreground colour
hl.gradient.minBg - gradient formatter min background colour
hl.gradient.maxBg - gradient formatter max background colour

All values appart from hl & hl.fl can be specified per field. e.g. f.title.hl.fragsize=0.

All the highlighting code is now in HighlighingUtils rather than SolrPluginUtils. Seems like I've ended up with one big doHighlighting method that does all the work - not sure that's a good thing, but things ended up this way when I started creating highlighters per field.

> Add additional configuration options for Highlighting
> -----------------------------------------------------
>
>                 Key: SOLR-37
>                 URL: http://issues.apache.org/jira/browse/SOLR-37
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Andrew May
>         Attachments: patch, patch
>
>
> As discussed in the mailing list, I've been looking at adding additional configuration options for highlighting.
> I've made quite a few changes to the properties for highlighting:
> Properties that can be set on request, or in solrconfig.xml at the top level:
>   highlight (true/false)
>   highlightFields
> Properties that can be set in solrconfig.xml at the top level or per-field
>   formatter (simple/gradient)
>   formatterPre (preTag for simple formatter)
>   formatterPost (postTag for simple formatter)
>   formatterMinFgCl (min foreground colour for gradient formatter)
>   formatterMaxFgCl (max foreground colour for gradient formatter)
>   formatterMinBgCl (min background colour for gradient formatter)
>   formatterMaxBgCl (max background colour for gradient formatter)
>   fragsize (if <=0 use NullFragmenter, otherwise use GapFragmenter with this value)
> I've added variables for these values to CommonParams, plus there's a fields Map<String,CommonParams> that is parsed from nested NamedLists (i.e. a lst named "fields", with a nested lst for each field).
> Here's a sample of how you can mix and match properties in solrconfig.xml:
>   <requestHandler name="hl" class="solr.StandardRequestHandler" >
>     <str name="formatter">simple</str>
>     <str name="formatterPre">&lt;i></str>
>     <str name="formatterPost">&lt;/i></str>
>     <str name="highlightFields">title,authors,journal</str>
>     <int name="fragsize">0</int>
>     <lst name="fields">
>       <lst name="abstract">
>         <str name="formatter">gradient</str>
>         <str name="formatterMinBgCl">#FFFF99</str>
>         <str name="formatterMaxBgCl">#FF9900</str>
>         <int name="fragsize">30</int>
>         <int name="maxSnippets">2</int>
>       </lst>
>       <lst name="authors">
>         <str name="formatterPre">&lt;strong></str>
>         <str name="formatterPost">&lt;/strong></str>
>       </lst>
>     </lst>
>   </requestHandler>
> I've created HighlightingUtils to handle most of the parameter parsing, but the hightlighting is still done in SolrPluginUtils and the doStandardHighlighting() method still has the same signature, but the other highlighting methods have had to be changed (because highlighters are now created per highlighted field).
> I'm not particularly happy with the code to pull parameters from CommonParams, first checking the field then falling back, e.g.:
>          String pre = (params.fields.containsKey(fieldName) && params.fields.get(fieldName).formatterPre != null) ?
>                params.fields.get(fieldName).formatterPre :
>                   params.formatterPre != null ? params.formatterPre : "<em>";
> I've removed support for a custom formatter - just choosing between simple/gradient. Probably that's a bad decision, but I wanted an easy way to choose between the standard formatters without having to invent a generic way of supplying arguments for the constructor. Perhaps there should be formatterType=simple/gradient and formatterClass=... which overrides formatterType if set at a lower level - with the formatterClass having to have a zero-args constructor? Note: gradient is actually SpanGradientFormatter.
> I'm not sure I properly understand how Fragmenters work, so supplying fragsize to GapFragmenter where >0 (instead of what was a default of 50) may not make sense.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-37) Add additional configuration options for Highlighting

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
    [ http://issues.apache.org/jira/browse/SOLR-37?page=comments#action_12430318 ]
           
Yonik Seeley commented on SOLR-37:
----------------------------------

ping test  (4:24 EDT, 8/24/2006)

> Add additional configuration options for Highlighting
> -----------------------------------------------------
>
>                 Key: SOLR-37
>                 URL: http://issues.apache.org/jira/browse/SOLR-37
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Andrew May
>         Attachments: patch, patch
>
>
> As discussed in the mailing list, I've been looking at adding additional configuration options for highlighting.
> I've made quite a few changes to the properties for highlighting:
> Properties that can be set on request, or in solrconfig.xml at the top level:
>   highlight (true/false)
>   highlightFields
> Properties that can be set in solrconfig.xml at the top level or per-field
>   formatter (simple/gradient)
>   formatterPre (preTag for simple formatter)
>   formatterPost (postTag for simple formatter)
>   formatterMinFgCl (min foreground colour for gradient formatter)
>   formatterMaxFgCl (max foreground colour for gradient formatter)
>   formatterMinBgCl (min background colour for gradient formatter)
>   formatterMaxBgCl (max background colour for gradient formatter)
>   fragsize (if <=0 use NullFragmenter, otherwise use GapFragmenter with this value)
> I've added variables for these values to CommonParams, plus there's a fields Map<String,CommonParams> that is parsed from nested NamedLists (i.e. a lst named "fields", with a nested lst for each field).
> Here's a sample of how you can mix and match properties in solrconfig.xml:
>   <requestHandler name="hl" class="solr.StandardRequestHandler" >
>     <str name="formatter">simple</str>
>     <str name="formatterPre">&lt;i></str>
>     <str name="formatterPost">&lt;/i></str>
>     <str name="highlightFields">title,authors,journal</str>
>     <int name="fragsize">0</int>
>     <lst name="fields">
>       <lst name="abstract">
>         <str name="formatter">gradient</str>
>         <str name="formatterMinBgCl">#FFFF99</str>
>         <str name="formatterMaxBgCl">#FF9900</str>
>         <int name="fragsize">30</int>
>         <int name="maxSnippets">2</int>
>       </lst>
>       <lst name="authors">
>         <str name="formatterPre">&lt;strong></str>
>         <str name="formatterPost">&lt;/strong></str>
>       </lst>
>     </lst>
>   </requestHandler>
> I've created HighlightingUtils to handle most of the parameter parsing, but the hightlighting is still done in SolrPluginUtils and the doStandardHighlighting() method still has the same signature, but the other highlighting methods have had to be changed (because highlighters are now created per highlighted field).
> I'm not particularly happy with the code to pull parameters from CommonParams, first checking the field then falling back, e.g.:
>          String pre = (params.fields.containsKey(fieldName) && params.fields.get(fieldName).formatterPre != null) ?
>                params.fields.get(fieldName).formatterPre :
>                   params.formatterPre != null ? params.formatterPre : "<em>";
> I've removed support for a custom formatter - just choosing between simple/gradient. Probably that's a bad decision, but I wanted an easy way to choose between the standard formatters without having to invent a generic way of supplying arguments for the constructor. Perhaps there should be formatterType=simple/gradient and formatterClass=... which overrides formatterType if set at a lower level - with the formatterClass having to have a zero-args constructor? Note: gradient is actually SpanGradientFormatter.
> I'm not sure I properly understand how Fragmenters work, so supplying fragsize to GapFragmenter where >0 (instead of what was a default of 50) may not make sense.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-37) Add additional configuration options for Highlighting

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
    [ http://issues.apache.org/jira/browse/SOLR-37?page=comments#action_12430345 ]
           
Mike Klaas commented on SOLR-37:
--------------------------------

Thanks for the patch!

A few comments:

1) The patch uses absolute paths, which makes it difficult to apply.  Please generate patches using 'svn diff > mypatch.diff' at the root level of a solr trunk checkout

2) I don't believe that it is necessary to add the constructors to GrapFragmenter--the existing constructors from SimpleFragmenter are equivalent.

3) The default FRAGSIZE should be 100 to conform to Lucene's Highlighter default (it would be nice if this was exposed so we could use it)

4) Might it be worth providing sensible defaults for the gradient values so users can try hl.formatter=gradient without futher configuration?

5) There are a few constuctions of this form:
+      String param = getParams(request).getFieldParam(fieldName, SNIPPETS);
+      //the default of zero should never be used, as SNIPPETS is defined in the defaults
+      return (param != null) ? Integer.parseInt(param) : 0;

where getParams returns a SolrParams instance which already has defaults for this parameter.  Surely providing a default is unnecessary, and shouldn't null also be impossible due to the DefaultSolrParams construction?  Inlining these (in the calling method) to something like

SolrParams p = getParams(request)
int maxSnippets = Integer.parseInt(p.getFieldParam(fieldName));

would be cleaner and save some object construction costs.

6) The last time this patch was discussed it was mentioned that there are tradeoffs in using field-specific idf values for highlighting scoring.  One downside is that they must be read. Another is terms are only highlit in fields they are searched in, which may be desirable behaviour, but limits the usefulness of the hl.fl parameter.  I'm not sure what the best approach is.

7) The attached patch contains no tests, and further, though I have not applied the patch due to 1), I'm skeptical that the testsuite passes since the parameter names weren't changed in src/test/o/a/s/HighlighterTest.java.  The latter is easy enough to fix, but new test should be included before this patch is committed.

Thanks again for the patch!


> Add additional configuration options for Highlighting
> -----------------------------------------------------
>
>                 Key: SOLR-37
>                 URL: http://issues.apache.org/jira/browse/SOLR-37
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Andrew May
>         Attachments: patch, patch
>
>
> As discussed in the mailing list, I've been looking at adding additional configuration options for highlighting.
> I've made quite a few changes to the properties for highlighting:
> Properties that can be set on request, or in solrconfig.xml at the top level:
>   highlight (true/false)
>   highlightFields
> Properties that can be set in solrconfig.xml at the top level or per-field
>   formatter (simple/gradient)
>   formatterPre (preTag for simple formatter)
>   formatterPost (postTag for simple formatter)
>   formatterMinFgCl (min foreground colour for gradient formatter)
>   formatterMaxFgCl (max foreground colour for gradient formatter)
>   formatterMinBgCl (min background colour for gradient formatter)
>   formatterMaxBgCl (max background colour for gradient formatter)
>   fragsize (if <=0 use NullFragmenter, otherwise use GapFragmenter with this value)
> I've added variables for these values to CommonParams, plus there's a fields Map<String,CommonParams> that is parsed from nested NamedLists (i.e. a lst named "fields", with a nested lst for each field).
> Here's a sample of how you can mix and match properties in solrconfig.xml:
>   <requestHandler name="hl" class="solr.StandardRequestHandler" >
>     <str name="formatter">simple</str>
>     <str name="formatterPre">&lt;i></str>
>     <str name="formatterPost">&lt;/i></str>
>     <str name="highlightFields">title,authors,journal</str>
>     <int name="fragsize">0</int>
>     <lst name="fields">
>       <lst name="abstract">
>         <str name="formatter">gradient</str>
>         <str name="formatterMinBgCl">#FFFF99</str>
>         <str name="formatterMaxBgCl">#FF9900</str>
>         <int name="fragsize">30</int>
>         <int name="maxSnippets">2</int>
>       </lst>
>       <lst name="authors">
>         <str name="formatterPre">&lt;strong></str>
>         <str name="formatterPost">&lt;/strong></str>
>       </lst>
>     </lst>
>   </requestHandler>
> I've created HighlightingUtils to handle most of the parameter parsing, but the hightlighting is still done in SolrPluginUtils and the doStandardHighlighting() method still has the same signature, but the other highlighting methods have had to be changed (because highlighters are now created per highlighted field).
> I'm not particularly happy with the code to pull parameters from CommonParams, first checking the field then falling back, e.g.:
>          String pre = (params.fields.containsKey(fieldName) && params.fields.get(fieldName).formatterPre != null) ?
>                params.fields.get(fieldName).formatterPre :
>                   params.formatterPre != null ? params.formatterPre : "<em>";
> I've removed support for a custom formatter - just choosing between simple/gradient. Probably that's a bad decision, but I wanted an easy way to choose between the standard formatters without having to invent a generic way of supplying arguments for the constructor. Perhaps there should be formatterType=simple/gradient and formatterClass=... which overrides formatterType if set at a lower level - with the formatterClass having to have a zero-args constructor? Note: gradient is actually SpanGradientFormatter.
> I'm not sure I properly understand how Fragmenters work, so supplying fragsize to GapFragmenter where >0 (instead of what was a default of 50) may not make sense.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-37) Add additional configuration options for Highlighting

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
    [ http://issues.apache.org/jira/browse/SOLR-37?page=comments#action_12430503 ]
           
Andrew May commented on SOLR-37:
--------------------------------

Mike,
1) Was using SVN support in Eclipse and it doesn't let you control this - I've installed the command line version now.
2) If GapFragmenter just has a default constructor it's not possible to pass the fragsize (constructors not inherited)?
3) My mistake - I think this is a legacy of me originally confusing fragsize and GapFragmenter's increment threshold.
4) Perhaps - I wasn't sure what sensible defaults were, and I can't seem to get the gradient fragmenter to do anything useful - but I can't see an obvious bug in my code to construct it.
5) That was me being overly defensive, and I've changed it now. I considered adding methods like getFieldInt(), getFieldBool to SolrParams, so there were field versions of all the non-field methods - but decided against it.
6) I was wondering whether a hl.exact flag might be useful - if false (the default?) the QueryScorer wouldn't be created with the field/maxTermWeight. I'm not sure there's any point using the gradient formatter unless you supply the field name/maxTermWeight, so that might ignore this setting.
7) Sorry - will fix and add additional tests.

One thing I don't like about HighlightingUtils as it stands is the lack of state. When highlighting multiple fields, getParams() is called many times, each time constructing a DefaultSolrParams (although I don't think there's a big overhead to doing this). If we're not specifying the field when creating the QueryScorer then this could be reused for multiple fields. We could possibly re-use the highlighter instance as well.

> Add additional configuration options for Highlighting
> -----------------------------------------------------
>
>                 Key: SOLR-37
>                 URL: http://issues.apache.org/jira/browse/SOLR-37
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Andrew May
>         Attachments: patch, patch
>
>
> As discussed in the mailing list, I've been looking at adding additional configuration options for highlighting.
> I've made quite a few changes to the properties for highlighting:
> Properties that can be set on request, or in solrconfig.xml at the top level:
>   highlight (true/false)
>   highlightFields
> Properties that can be set in solrconfig.xml at the top level or per-field
>   formatter (simple/gradient)
>   formatterPre (preTag for simple formatter)
>   formatterPost (postTag for simple formatter)
>   formatterMinFgCl (min foreground colour for gradient formatter)
>   formatterMaxFgCl (max foreground colour for gradient formatter)
>   formatterMinBgCl (min background colour for gradient formatter)
>   formatterMaxBgCl (max background colour for gradient formatter)
>   fragsize (if <=0 use NullFragmenter, otherwise use GapFragmenter with this value)
> I've added variables for these values to CommonParams, plus there's a fields Map<String,CommonParams> that is parsed from nested NamedLists (i.e. a lst named "fields", with a nested lst for each field).
> Here's a sample of how you can mix and match properties in solrconfig.xml:
>   <requestHandler name="hl" class="solr.StandardRequestHandler" >
>     <str name="formatter">simple</str>
>     <str name="formatterPre">&lt;i></str>
>     <str name="formatterPost">&lt;/i></str>
>     <str name="highlightFields">title,authors,journal</str>
>     <int name="fragsize">0</int>
>     <lst name="fields">
>       <lst name="abstract">
>         <str name="formatter">gradient</str>
>         <str name="formatterMinBgCl">#FFFF99</str>
>         <str name="formatterMaxBgCl">#FF9900</str>
>         <int name="fragsize">30</int>
>         <int name="maxSnippets">2</int>
>       </lst>
>       <lst name="authors">
>         <str name="formatterPre">&lt;strong></str>
>         <str name="formatterPost">&lt;/strong></str>
>       </lst>
>     </lst>
>   </requestHandler>
> I've created HighlightingUtils to handle most of the parameter parsing, but the hightlighting is still done in SolrPluginUtils and the doStandardHighlighting() method still has the same signature, but the other highlighting methods have had to be changed (because highlighters are now created per highlighted field).
> I'm not particularly happy with the code to pull parameters from CommonParams, first checking the field then falling back, e.g.:
>          String pre = (params.fields.containsKey(fieldName) && params.fields.get(fieldName).formatterPre != null) ?
>                params.fields.get(fieldName).formatterPre :
>                   params.formatterPre != null ? params.formatterPre : "<em>";
> I've removed support for a custom formatter - just choosing between simple/gradient. Probably that's a bad decision, but I wanted an easy way to choose between the standard formatters without having to invent a generic way of supplying arguments for the constructor. Perhaps there should be formatterType=simple/gradient and formatterClass=... which overrides formatterType if set at a lower level - with the formatterClass having to have a zero-args constructor? Note: gradient is actually SpanGradientFormatter.
> I'm not sure I properly understand how Fragmenters work, so supplying fragsize to GapFragmenter where >0 (instead of what was a default of 50) may not make sense.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (SOLR-37) Add additional configuration options for Highlighting

Yonik Seeley-2
On 8/25/06, Andrew May (JIRA) <[hidden email]> wrote:
> 5) That was me being overly defensive, and I've changed it now. I considered adding methods like getFieldInt(), getFieldBool to SolrParams, so there were field versions of all the non-field methods - but decided against it.

Yeah, I didn't know if field parameters were going to be used enough
to warrant all the repetitive methods like

  getFieldInt(field, param) => Integer
  getFieldInt(field, param, default) => int

I'm not against them either though...

> One thing I don't like about HighlightingUtils as it stands is the lack of state. When highlighting multiple fields, getParams() is called many times, each time constructing a DefaultSolrParams (although I don't think there's a big overhead to doing this). If we're not specifying the field when creating the QueryScorer then this could be reused for multiple fields. We could possibly re-use the highlighter instance as well.

Construction of DefaultSolrParams is very cheap (it just sets two pointers).
The part to worry about is the additional cost of every lookup... 6
per getFieldParam() plus a parseInt() for an integer param.

But, since DEFAULT is private, I'm fine with it since we can always
change or optimize it later if it becomes a problem.  Right now, I
imagine the big bottleneck to highlighting is analysis or the lucene
highlighter.


-Yonik
12