[GitHub] lucene-solr pull request #328: SOLR-12034

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] lucene-solr pull request #328: SOLR-12034

barrotsteindev
GitHub user tballison opened a pull request:

    https://github.com/apache/lucene-solr/pull/328

    SOLR-12034

    First draft of SOLR-12034 -- not ready for committing. Some non-flaky tests are now failing.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tballison/lucene-solr jira/SOLR-12034

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/lucene-solr/pull/328.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #328
   
----
commit a118f5f9a0e3206d87e62394924d18bbf3b94fd3
Author: tballison <tallison@...>
Date:   2018-02-26T16:27:47Z

    SOLR-12034 -- first pass

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] lucene-solr issue #328: SOLR-12034

barrotsteindev
Github user romseygeek commented on the issue:

    https://github.com/apache/lucene-solr/pull/328
 
    Hi @tballison , are you still interested in pushing this one?  I can help out, as I'd like to get this committed so that we can move forward with LUCENE-8497 and remove the `MultiTermAwareComponent` interface.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [GitHub] lucene-solr issue #328: SOLR-12034

Erick Erickson
Just skimming here and on my way to the airport. The original purpose
of MultiTermAwareComponent was to be able to, for instance, search
wildcards without having to, say, lowercase the term on the client side
(remember all the questions like "Why does a search for 'powerful' succeed
but not 'Power*' ?"). I'm assuming this change won't alter that behavior....


On Mon, Oct 1, 2018 at 1:06 AM romseygeek <[hidden email]> wrote:

>
> Github user romseygeek commented on the issue:
>
>     https://github.com/apache/lucene-solr/pull/328
>
>     Hi @tballison , are you still interested in pushing this one?  I can help out, as I'd like to get this committed so that we can move forward with LUCENE-8497 and remove the `MultiTermAwareComponent` interface.
>
>
> ---
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] lucene-solr issue #328: SOLR-12034

barrotsteindev
In reply to this post by barrotsteindev
Github user tballison commented on the issue:

    https://github.com/apache/lucene-solr/pull/328
 
    @romseygeek , y, happy to fix/update this.  I'll take a look later today.
   
    Part of the reason I gave up on this is that I didn't like the changes I had to make at the Lucene level.  It felt like I was screwing up the elegant Lucene-level API.  Any recommendations?
   
    Also, @dsmiley recommended I move the Lucene-level modifications to another issue.  Are you ok if these go in as one, or should I open up a separate PR for the Lucene-level mods?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] lucene-solr issue #328: SOLR-12034

barrotsteindev
In reply to this post by barrotsteindev
Github user romseygeek commented on the issue:

    https://github.com/apache/lucene-solr/pull/328
 
    The lucene-level changes are there so that you can take an existing CustomAnalyzer and tweak it, if I'm reading correctly.  Can you instead add a constructor to CustomAnalyzer.Builder that takes a CustomAnalyzer?  I think that would simplify things.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [GitHub] lucene-solr issue #328: SOLR-12034

Alan Woodward-3
In reply to this post by Erick Erickson
See LUCENE-8497 for more details.  Mayya would like to replace the marker interface with type-safe methods on CharFilterFactory and TokenFilterFactory

> On 1 Oct 2018, at 16:15, Erick Erickson <[hidden email]> wrote:
>
> Just skimming here and on my way to the airport. The original purpose
> of MultiTermAwareComponent was to be able to, for instance, search
> wildcards without having to, say, lowercase the term on the client side
> (remember all the questions like "Why does a search for 'powerful' succeed
> but not 'Power*' ?"). I'm assuming this change won't alter that behavior....
>
>
> On Mon, Oct 1, 2018 at 1:06 AM romseygeek <[hidden email]> wrote:
>>
>> Github user romseygeek commented on the issue:
>>
>>   https://github.com/apache/lucene-solr/pull/328
>>
>>   Hi @tballison , are you still interested in pushing this one?  I can help out, as I'd like to get this committed so that we can move forward with LUCENE-8497 and remove the `MultiTermAwareComponent` interface.
>>
>>
>> ---
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] lucene-solr issue #328: SOLR-12034

barrotsteindev
In reply to this post by barrotsteindev
Github user tballison commented on the issue:

    https://github.com/apache/lucene-solr/pull/328
 
    Wow...long time since I've visited this code.  Now I think I recall...the ugliness that I don't like imposing on the CustomAnalyzer's API is that it holds its own ResourceLoader and applies it when the user calls, e.g. `withTokenizer(class/classname, params)`, `add(Token|Char)Filter(class/classname, params)`.  
   
    In Solr, the charfilter, tokenizer, tokenfilter factories are fully built with resources loaded by `FieldTypePluginLoader`'s  `loader` a (`SolrResourceLoader`) in `readAnalyzer(Node node)` one by one...I think (???), and _then_ they are added to the `CustomAnalyzer`.
   
    I also see in `ManagedIndexSchema`, that there's `postReadInform()` which calls `informResourceLoaderAwareObjectsInChain`, which then loads the resources.
   
    So, when I break the API in CustomAnalyzer and make public, e.g. `addTokenFilter(TokenFilterFactory factory)`, there's an unused private variable `ResourceLoader loader`, which feels ugly...a user could both specify a resource loader in `Builder`'s initializer and then pass in fully loaded components that would bypass that resource loader.  This smells bad to me...
   
    Any recommendations?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] lucene-solr issue #328: SOLR-12034

barrotsteindev
In reply to this post by barrotsteindev
Github user uschindler commented on the issue:

    https://github.com/apache/lucene-solr/pull/328
 
    If you can't make Solr work with the current public CustomAnalyzer API, please keep it as is and use your own TokenizerChain in Solr. Please don't make CustomAnalyzer unmodifiable or add access to internal fields! In fact, it's a minimum amount of code behind the 3 lists of factories that build up the Analyzer that does not justify cluttering Lucene's API (like the horrible MultiTermAwareComponent added by Solr).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] lucene-solr issue #328: SOLR-12034

barrotsteindev
In reply to this post by barrotsteindev
Github user uschindler commented on the issue:

    https://github.com/apache/lucene-solr/pull/328
 
    Custom Analyzer was just added to have an easy-to-use Builder-Like API for Analyzers. It was not meant to replace SOlr's (although it would be nice, but it's impossible as I figured out at that time, too). Solr is based on modifiable classes and XML, not builders...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] lucene-solr issue #328: SOLR-12034

barrotsteindev
In reply to this post by barrotsteindev
Github user tballison commented on the issue:

    https://github.com/apache/lucene-solr/pull/328
 
    I'll leave this open a bit for discussion in case someone can think of a solution.  My intuition is in accord with @uschindler 's judgment.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] lucene-solr issue #328: SOLR-12034

barrotsteindev
In reply to this post by barrotsteindev
Github user romseygeek commented on the issue:

    https://github.com/apache/lucene-solr/pull/328
 
    I think we can proceed with LUCENE-8497 anyway by adding a new method to TokenizerChain that only uses multi-term aware filters.  Feel free to close this one @tballison .


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] lucene-solr issue #328: SOLR-12034

barrotsteindev
In reply to this post by barrotsteindev
Github user tballison commented on the issue:

    https://github.com/apache/lucene-solr/pull/328
 
    Thank you, @romseygeek , for thinking of this PR.  I'm closing it because I don't want to wreck the API of CustomAnalyzer.Builder().


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] lucene-solr pull request #328: SOLR-12034

barrotsteindev
In reply to this post by barrotsteindev
Github user tballison closed the pull request at:

    https://github.com/apache/lucene-solr/pull/328


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]