making schema.xml nicer to read/use

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

making schema.xml nicer to read/use

Yonik Seeley-2
What do people thing of leaving off "Factory" from tokenizers and
token filters in schema.xml.
Let the user say what filter they want, not necessarily how to get it.

So you would say "LowerCaseFilter" instead of "LowerCaseFilterFactory".

Implementation details:
  - Specification of a Factory would still be accepted (100% backward compat)
  - if the specified class name isn't found, or doesn't implement
TokenFilterFactory then a factory will be searched for by appending
"Factory" to the class name.
 - if no factory can be found, an attempt will be made to construct
one dynamically (easiest would be to create a generic factory that
works via reflection).  People could use simple filters w/o creating a
factory for it.

It's not currently high on my priority list, but I thought I would
throw this out for comments before I forgot about it.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: making schema.xml nicer to read/use

Mike Klaas
On 8/22/06, Yonik Seeley <[hidden email]> wrote:

> What do people thing of leaving off "Factory" from tokenizers and
> token filters in schema.xml.
> Let the user say what filter they want, not necessarily how to get it.
>
> So you would say "LowerCaseFilter" instead of "LowerCaseFilterFactory".
>
> Implementation details:
>   - Specification of a Factory would still be accepted (100% backward compat)
>   - if the specified class name isn't found, or doesn't implement
> TokenFilterFactory then a factory will be searched for by appending
> "Factory" to the class name.
>  - if no factory can be found, an attempt will be made to construct
> one dynamically (easiest would be to create a generic factory that
> works via reflection).  People could use simple filters w/o creating a
> factory for it.

+0

This change would definately improve the flow of schema.xml, at the
expense of making the object instantiation somewhat more obscure.  I
envision people creating a custom filter and not realizing that
initialization costs can be reduced by constructing corresponding
Factory in the same package.  But this shouldn't be too much of an
issue ˆf it is well-documented in the sample schema.xml.

regards,
-Mike
Reply | Threaded
Open this post in threaded view
|

Re: making schema.xml nicer to read/use

Bertrand Delacretaz
In reply to this post by Yonik Seeley-2
On 8/22/06, Yonik Seeley <[hidden email]> wrote:

> ...So you would say "LowerCaseFilter" instead of "LowerCaseFilterFactory"...

I like the idea, as long as it's made sufficiently transparent.

For example by logging a warning when a Factory is not found and the
component is created by reflection. Or maybe the first time this
happens for a given Filter, so as not to log too much.

-Bertrand
Reply | Threaded
Open this post in threaded view
|

Re: making schema.xml nicer to read/use

Chris Hostetter-3
In reply to this post by Yonik Seeley-2

:  - if no factory can be found, an attempt will be made to construct
: one dynamically (easiest would be to create a generic factory that
: works via reflection).  People could use simple filters w/o creating a
: factory for it.

I think i mentioned this before ... my opinion depends on what the
performance impacts are -- if reflection costs are "high" because of class
resolution, but instantiation times are roughly the same, then i'm for it
because we can resolve the Class once at startup; but if the performance
differnece is still significant, i vote vote we force people who
want to mix and match custom Filters/Tokenizers to write Factories for
them -- it doesn't penalyze people who have custom Analyzers, those don't
require Factories, but if you want to mix and match you should be able to
whip up a two line factory ... hell, we can provide some code to do it
automatically (and run it on the Lucene jar everytime we update it)

what i'd really hate to see happen, is to need a FAQ
item about how "slow" Solr is at indexing docs and have the answer be:
"Don't rely on the built in reflection mechanism to build you analyzers,
create explicit Factories for each Tokenizer/Filter"  ... I'd hate for
Solr's to have reflection based Analyzer construction that winds up like
the Lucene "Hits" class -- overused and the source of countless complaints
about performance.

Ah yes, here's the old discussion...

http://www.nabble.com/foo-tf1737025.html#a4720545



-Hoss