Chris Male commented on LUCENE-2309:
But i don't think queryparser should take PerFieldAnalyzerWrapper while IndexWriter takes per-Field analyzers, I think thats confusing.
Yes it is.
I think Chris only started with the indexer as an example to show that it works. Of cource we can rewrite all other consumers to use this new api. Also BaseTSTestCase.
well if thats the direction here, then we should describe the jira issue differently: something like "abstract away TokenStream API".
I don't think what I've implemented in the patch is so different to what has been discussed in this issue earlier. I did consider opening another issue, but I thought this JIRA issue captured the conceptual issue quite well.
because it just looks to me as if IndexWriter works off a different analysis API than other analysis consumers and I don't like that.
I'm happy to explore those other consumers and strive to provide a user friend API to limit bugs. But I'm not getting the impression you like the concept at all.
> Fully decouple IndexWriter from analyzers
> Key: LUCENE-2309
> URL: https://issues.apache.org/jira/browse/LUCENE-2309 > Project: Lucene - Java
> Issue Type: Improvement
> Components: core/index
> Reporter: Michael McCandless
> Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
> Attachments: LUCENE-2309.patch
> IndexWriter only needs an AttributeSource to do indexing.
> Yet, today, it interacts with Field instances, holds a private
> analyzers, invokes analyzer.reusableTokenStream, has to deal with a
> wide variety (it's not analyzed; it is analyzed but it's a Reader,
> String; it's pre-analyzed).
> I'd like to have IW only interact with attr sources that already
> arrived with the fields. This would be a powerful decoupling -- it
> means others are free to make their own attr sources.
> They need not even use any of Lucene's analysis impls; eg they can
> integrate to other things like [OpenPipeline|http://www.openpipeline.org].
> Or make something completely custom.
> LUCENE-2302 is already a big step towards this: it makes IW agnostic
> about which attr is "the term", and only requires that it provide a
> BytesRef (for flex).
> Then I think LUCENE-2308 would get us most of the remaining way -- ie, if the
> FieldType knows the analyzer to use, then we could simply create a
> getAttrSource() method (say) on it and move all the logic IW has today
> onto there. (We'd still need existing IW code for back-compat).