Solr 3.1 back compat

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr 3.1 back compat

Grant Ingersoll-2
As part of https://issues.apache.org/jira/browse/SOLR-2080, I'd like to rework the SpellCheckComponent just a bit to be more generic.  I think I can maintain the URL APIs (i.e. &spellcheck.*) in a back compatible way, but I would like change some of the Java classes a bit, namely SolrSpellChecker and related to be reusable and reflect the commonality of the solutions.  The way I see it, spell checking, auto suggest and related search suggestions are all just suggestions.  We have much of the framework of this in place, other than a few things at the Java level are named after spell checking.  I know we generally don't worry too much about Java interfaces in Solr, but this seems like one area where people do sometimes write their own.  The changes will be mostly renaming commonalities from "spellcheck" to "suggester" (or something similar) and so I don't see it as particularly hard to make the change, but it would require some code changes.  What do people think?  My other option would be to factor out as much commonality as possible into helper classes, but that doesn't feel as clean.

-Grant
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Solr 3.1 back compat

Robert Muir
On Mon, Oct 25, 2010 at 9:42 PM, Grant Ingersoll <[hidden email]> wrote:
> As part of https://issues.apache.org/jira/browse/SOLR-2080, I'd like to rework the SpellCheckComponent just a bit to be more generic.  I think I can maintain the URL APIs (i.e. &spellcheck.*) in a back compatible way, but I would like change some of the Java classes a bit, namely SolrSpellChecker and related to be reusable and reflect the commonality of the solutions.  The way I see it, spell checking, auto suggest and related search suggestions are all just suggestions.  We have much of the framework of this in place, other than a few things at the Java level are named after spell checking.  I know we generally don't worry too much about Java interfaces in Solr, but this seems like one area where people do sometimes write their own.  The changes will be mostly renaming commonalities from "spellcheck" to "suggester" (or something similar) and so I don't see it as particularly hard to make the change, but it would require some code changes.  What do people think?  My other option would be to factor out as much commonality as possible into helper classes, but that doesn't feel as clean.
>
>

Almost certainly not what you are looking for, but I'm gonna complain
anyway from my experience of trying to write a Solr spellchecker
recently.
Note: I didnt take the time to actually try to learn these APIs a lot,
so maybe i'm completely off-base, but this is what it looked like to
me:

I felt the entire framework in Solr is built around the idea of  "take
stuff from one field in an index, shove it into another field of an
index", but my spellchecker doesn't need any of this.

Configuring it for different fields is a pain in the ass, if you have
many, but really the field could and should be a query-time parameter.

The spellchecking apis have a wierd response format "Map<Token,
LinkedHashMap<String, Integer>>" which really just means you can only
provide text and docfreq, but i wanted to return the score, too... so
for now it just gets discarded.

we are still using Token everywhere, again, which is bad news if we
want to do more complex things later... like it would really make
sense to switch to the attributes API if this stuff needs to be
flexible.

Even the input format that comes into the spellchecker in
getSuggestions(SpellingOptions options) is just Tokens, but this is
pretty limiting. For instance, I think it makes way more sense for a
spellchecker API to take Query and return corrected Querys, and in my
situation i could give better results, but the Solr APIs stop me.

Apparently the whole Collator thing is designed to "do this for me",
but i have my own ideas (since my impl is new and different), only i'm
not able to implement them... I don't know how the hell it could be
doing this since i can't return the score.

I realize i could have completely discarded all the spellchecking
APIs, written a ton of code/re-invented wheels, and probably gotten
what i wanted, but i just wimped out and committed a shitty
spellchecker instead.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Solr 3.1 back compat

Chris Hostetter-3
In reply to this post by Grant Ingersoll-2

: As part of https://issues.apache.org/jira/browse/SOLR-2080, I'd like to
: rework the SpellCheckComponent just a bit to be more generic.  I think I
: can maintain the URL APIs (i.e. &spellcheck.*) in a back compatible way,
: but I would like change some of the Java classes a bit, namely
        ...
: level are named after spell checking.  I know we generally don't worry
: too much about Java interfaces in Solr, but this seems like one area
: where people do sometimes write their own.  The changes will be mostly

Go for it.

Solr 3.1 will be a major new release compared to 1.4.  we shouldn't go out
of our way to break compatibility for no reason, but if it allows us to
add new functionality i wouldn't hesitated -- especially if it's just an
internal Java API change and not an end user HTTP API change.

Even for HTTP API (or response structure) changes don't be shy about
changing things if you think it really improves stuff -- the only
hesitation i would have is in changes that are subtle and not entirely
obvious -- it's better to break back compat in a way that causes an
immediate and obvious failure then to change things it in a way that only
breaks compat in *some* cases.  (ie: if you want to change the response
structure from spellcheck, change it significantly enough that old parsing
code won't ever work -- don't change it just a little bit so it seems like
it's working initially, but in non-trivial cases data is missing)

-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]