Lifecycle of a TokenFilter from TokenFilterFactory

classic Classic list List threaded Threaded
6 messages Options
Em
Reply | Threaded
Open this post in threaded view
|

Lifecycle of a TokenFilter from TokenFilterFactory

Em
Hello list,

I saw a bug in a TokenFilter that only works, if there is a fresh
instance created by the TokenFilterFactory and it seems as TokenFilters
are reused some how for more than one request.

So, if your TokenFilterFactory has a Logging-Statement in its
create()-method, you see that log only now and again - but not on every
request.

Is this a bug in Solr 4.0-BETA or is this expected behaviour?
If it is expected, what could be wrong with the TokenFilter?

Kind regards,
Em
Reply | Threaded
Open this post in threaded view
|

Re: Lifecycle of a TokenFilter from TokenFilterFactory

Mikhail Khludnev
Hello,

Analyzers are reused. Analyzer is Tokenizer and several TokenFilters. Check
the source org.apache.lucene.analysis.Analyzer, pay attention to
reuseStrategy.

Best regards

On Sun, Sep 30, 2012 at 5:37 PM, Em <[hidden email]> wrote:

> Hello list,
>
> I saw a bug in a TokenFilter that only works, if there is a fresh
> instance created by the TokenFilterFactory and it seems as TokenFilters
> are reused some how for more than one request.
>
> So, if your TokenFilterFactory has a Logging-Statement in its
> create()-method, you see that log only now and again - but not on every
> request.
>
> Is this a bug in Solr 4.0-BETA or is this expected behaviour?
> If it is expected, what could be wrong with the TokenFilter?
>
> Kind regards,
> Em
>



--
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

<http://www.griddynamics.com>
 <[hidden email]>
Em
Reply | Threaded
Open this post in threaded view
|

Re: Lifecycle of a TokenFilter from TokenFilterFactory

Em
Hi Mikhail,

thanks for your feedback.

If so, how can I write UnitTests which respect the Reuse strategy?
What's the recommended way when creating custom Tokenizers and TokenFilters?

Kind regards,
Em

Am 01.10.2012 10:54, schrieb Mikhail Khludnev:

> Hello,
>
> Analyzers are reused. Analyzer is Tokenizer and several TokenFilters. Check
> the source org.apache.lucene.analysis.Analyzer, pay attention to
> reuseStrategy.
>
> Best regards
>
> On Sun, Sep 30, 2012 at 5:37 PM, Em <[hidden email]> wrote:
>
>> Hello list,
>>
>> I saw a bug in a TokenFilter that only works, if there is a fresh
>> instance created by the TokenFilterFactory and it seems as TokenFilters
>> are reused some how for more than one request.
>>
>> So, if your TokenFilterFactory has a Logging-Statement in its
>> create()-method, you see that log only now and again - but not on every
>> request.
>>
>> Is this a bug in Solr 4.0-BETA or is this expected behaviour?
>> If it is expected, what could be wrong with the TokenFilter?
>>
>> Kind regards,
>> Em
>>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Lifecycle of a TokenFilter from TokenFilterFactory

Mikhail Khludnev
It's not clear what you want to achieve. I don't always create custom
TokenStreams, but if I do I use Lucenes as a prototype to start from.

On Mon, Oct 1, 2012 at 6:07 PM, Em <[hidden email]> wrote:

> Hi Mikhail,
>
> thanks for your feedback.
>
> If so, how can I write UnitTests which respect the Reuse strategy?
> What's the recommended way when creating custom Tokenizers and
> TokenFilters?
>
> Kind regards,
> Em
>
> Am 01.10.2012 10:54, schrieb Mikhail Khludnev:
> > Hello,
> >
> > Analyzers are reused. Analyzer is Tokenizer and several TokenFilters.
> Check
> > the source org.apache.lucene.analysis.Analyzer, pay attention to
> > reuseStrategy.
> >
> > Best regards
> >
> > On Sun, Sep 30, 2012 at 5:37 PM, Em <[hidden email]>
> wrote:
> >
> >> Hello list,
> >>
> >> I saw a bug in a TokenFilter that only works, if there is a fresh
> >> instance created by the TokenFilterFactory and it seems as TokenFilters
> >> are reused some how for more than one request.
> >>
> >> So, if your TokenFilterFactory has a Logging-Statement in its
> >> create()-method, you see that log only now and again - but not on every
> >> request.
> >>
> >> Is this a bug in Solr 4.0-BETA or is this expected behaviour?
> >> If it is expected, what could be wrong with the TokenFilter?
> >>
> >> Kind regards,
> >> Em
> >>
> >
> >
> >
>



--
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

<http://www.griddynamics.com>
 <[hidden email]>
Em
Reply | Threaded
Open this post in threaded view
|

Re: Lifecycle of a TokenFilter from TokenFilterFactory

Em
That's exactly the way I do it when I have to write some custom stuff.

My problem is that I do not know how to integrate an Analyzer's
reusability-feature into a Unit-Test to see what happens if - i.e. - a
TokenFilter-instance is going to be reused.

Some TokenFilter-prototypes I've seen are stateful and do not "reset"
their state as neccessary in order to be reused. This problem only
occurs when I deploy those Filters to Solr and index or search for some
documents (which does not always calls create() on the
TokenFilterFactory).  However I have to be able - at least somehow - to
tackle those problems in Unit-Tests instead of noticing such problems
after a deployment to Solr.

So my question is:
How can I (Unit-)test a TokenFilter with an Analyzer which reuses the
same TokenFilter instance for more than one Input-TokenStream?

Kind regards,
Em

Am 01.10.2012 19:43, schrieb Mikhail Khludnev:

> It's not clear what you want to achieve. I don't always create custom
> TokenStreams, but if I do I use Lucenes as a prototype to start from.
>
> On Mon, Oct 1, 2012 at 6:07 PM, Em <[hidden email]> wrote:
>
>> Hi Mikhail,
>>
>> thanks for your feedback.
>>
>> If so, how can I write UnitTests which respect the Reuse strategy?
>> What's the recommended way when creating custom Tokenizers and
>> TokenFilters?
>>
>> Kind regards,
>> Em
>>
>> Am 01.10.2012 10:54, schrieb Mikhail Khludnev:
>>> Hello,
>>>
>>> Analyzers are reused. Analyzer is Tokenizer and several TokenFilters.
>> Check
>>> the source org.apache.lucene.analysis.Analyzer, pay attention to
>>> reuseStrategy.
>>>
>>> Best regards
>>>
>>> On Sun, Sep 30, 2012 at 5:37 PM, Em <[hidden email]>
>> wrote:
>>>
>>>> Hello list,
>>>>
>>>> I saw a bug in a TokenFilter that only works, if there is a fresh
>>>> instance created by the TokenFilterFactory and it seems as TokenFilters
>>>> are reused some how for more than one request.
>>>>
>>>> So, if your TokenFilterFactory has a Logging-Statement in its
>>>> create()-method, you see that log only now and again - but not on every
>>>> request.
>>>>
>>>> Is this a bug in Solr 4.0-BETA or is this expected behaviour?
>>>> If it is expected, what could be wrong with the TokenFilter?
>>>>
>>>> Kind regards,
>>>> Em
>>>>
>>>
>>>
>>>
>>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Lifecycle of a TokenFilter from TokenFilterFactory

Mikhail Khludnev
Ok. I might get what you are looking for. Extends SolrTestCase4J (see
plenty samples in codebase). Obtain request via req(), obtain schema from
it by getSchema(), then getAnalyzer() or getQueryAnalyzer() and ask for
analysis org.apache.lucene.analysis.Analyzer.tokenStream(String, Reader).
You'll find your filters cached in IndexSchema analyzers.

Let me know if it helps.

On Mon, Oct 1, 2012 at 10:54 PM, Em <[hidden email]> wrote:

> That's exactly the way I do it when I have to write some custom stuff.
>
> My problem is that I do not know how to integrate an Analyzer's
> reusability-feature into a Unit-Test to see what happens if - i.e. - a
> TokenFilter-instance is going to be reused.
>
> Some TokenFilter-prototypes I've seen are stateful and do not "reset"
> their state as neccessary in order to be reused. This problem only
> occurs when I deploy those Filters to Solr and index or search for some
> documents (which does not always calls create() on the
> TokenFilterFactory).  However I have to be able - at least somehow - to
> tackle those problems in Unit-Tests instead of noticing such problems
> after a deployment to Solr.
>
> So my question is:
> How can I (Unit-)test a TokenFilter with an Analyzer which reuses the
> same TokenFilter instance for more than one Input-TokenStream?
>
> Kind regards,
> Em
>
> Am 01.10.2012 19:43, schrieb Mikhail Khludnev:
> > It's not clear what you want to achieve. I don't always create custom
> > TokenStreams, but if I do I use Lucenes as a prototype to start from.
> >
> > On Mon, Oct 1, 2012 at 6:07 PM, Em <[hidden email]> wrote:
> >
> >> Hi Mikhail,
> >>
> >> thanks for your feedback.
> >>
> >> If so, how can I write UnitTests which respect the Reuse strategy?
> >> What's the recommended way when creating custom Tokenizers and
> >> TokenFilters?
> >>
> >> Kind regards,
> >> Em
> >>
> >> Am 01.10.2012 10:54, schrieb Mikhail Khludnev:
> >>> Hello,
> >>>
> >>> Analyzers are reused. Analyzer is Tokenizer and several TokenFilters.
> >> Check
> >>> the source org.apache.lucene.analysis.Analyzer, pay attention to
> >>> reuseStrategy.
> >>>
> >>> Best regards
> >>>
> >>> On Sun, Sep 30, 2012 at 5:37 PM, Em <[hidden email]>
> >> wrote:
> >>>
> >>>> Hello list,
> >>>>
> >>>> I saw a bug in a TokenFilter that only works, if there is a fresh
> >>>> instance created by the TokenFilterFactory and it seems as
> TokenFilters
> >>>> are reused some how for more than one request.
> >>>>
> >>>> So, if your TokenFilterFactory has a Logging-Statement in its
> >>>> create()-method, you see that log only now and again - but not on
> every
> >>>> request.
> >>>>
> >>>> Is this a bug in Solr 4.0-BETA or is this expected behaviour?
> >>>> If it is expected, what could be wrong with the TokenFilter?
> >>>>
> >>>> Kind regards,
> >>>> Em
> >>>>
> >>>
> >>>
> >>>
> >>
> >
> >
> >
>



--
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

<http://www.griddynamics.com>
 <[hidden email]>