[jira] [Commented] (LUCENE-2309) Fully decouple IndexWriter from analyzers

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (LUCENE-2309) Fully decouple IndexWriter from analyzers

Michael Gibney (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066684#comment-13066684 ]

Robert Muir commented on LUCENE-2309:
-------------------------------------

{quote}
I'm happy to explore those other consumers and strive to provide a user friend API to limit bugs. But I'm not getting the impression you like the concept at all.
{quote}

That's totally not it, but I do like the fact of having an "Analyzer" that is the central API for analysis that anything uses: IndexWriter, QueryParser, MoreLikeThis, Synonyms parsing, SpellChecking, or wherever we need it...

I think if we want this to take AttributesConsumer or whatever, then thats cool, Analyzer returns this instead of TokenStream and we fix all these consumers to consume the more general API.

I just want to make sure, that all consumers, not just IndexWriter, use the consistent API. This way, like today, someone declares FooAnalyzer, uses it everywhere, and stuff is consistent everywhere.



> Fully decouple IndexWriter from analyzers
> -----------------------------------------
>
>                 Key: LUCENE-2309
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2309
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Michael McCandless
>              Labels: gsoc2011, lucene-gsoc-11, mentor
>             Fix For: 4.0
>
>         Attachments: LUCENE-2309.patch
>
>
> IndexWriter only needs an AttributeSource to do indexing.
> Yet, today, it interacts with Field instances, holds a private
> analyzers, invokes analyzer.reusableTokenStream, has to deal with a
> wide variety (it's not analyzed; it is analyzed but it's a Reader,
> String; it's pre-analyzed).
> I'd like to have IW only interact with attr sources that already
> arrived with the fields.  This would be a powerful decoupling -- it
> means others are free to make their own attr sources.
> They need not even use any of Lucene's analysis impls; eg they can
> integrate to other things like [OpenPipeline|http://www.openpipeline.org].
> Or make something completely custom.
> LUCENE-2302 is already a big step towards this: it makes IW agnostic
> about which attr is "the term", and only requires that it provide a
> BytesRef (for flex).
> Then I think LUCENE-2308 would get us most of the remaining way -- ie, if the
> FieldType knows the analyzer to use, then we could simply create a
> getAttrSource() method (say) on it and move all the logic IW has today
> onto there.  (We'd still need existing IW code for back-compat).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]