how to reuse a tokenStream?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

how to reuse a tokenStream?

Li Li
I want to analyzer a text twice so that I can get some statistic
information from this text
                                TokenStream tokenStream=null;
                Analyzer wa=new WhitespaceAnalyzer();
                try {
                        tokenStream = wa.reusableTokenStream(fieldName, reader);
                        while(tokenStream.incrementToken()){
                                //do something
                        }
                } catch (IOException e1) {
                        e1.printStackTrace();
                }

                try {
                                                tokenStream.reset();
                        //tokenStream = wa.reusableTokenStream(fieldName, reader);
                        while(tokenStream.incrementToken()){
                                                                //not here
                                System.out.println("resuable");

                        }
                } catch (IOException e) {
                        // TODO Auto-generated catch block
                        e.printStackTrace();
                }

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: how to reuse a tokenStream?

Erick Erickson
What is the problem you're seeing? Maybe a stack trace?
You haven't told us what the incorrect behavior is.

Best
Erick

On Fri, May 28, 2010 at 12:52 AM, Li Li <[hidden email]> wrote:

> I want to analyzer a text twice so that I can get some statistic
> information from this text
>                                TokenStream tokenStream=null;
>                Analyzer wa=new WhitespaceAnalyzer();
>                try {
>                        tokenStream = wa.reusableTokenStream(fieldName,
> reader);
>                        while(tokenStream.incrementToken()){
>                                //do something
>                        }
>                } catch (IOException e1) {
>                        e1.printStackTrace();
>                }
>
>                try {
>                                                tokenStream.reset();
>                        //tokenStream = wa.reusableTokenStream(fieldName,
> reader);
>                        while(tokenStream.incrementToken()){
>                                                                //not here
>                                System.out.println("resuable");
>
>                        }
>                } catch (IOException e) {
>                        // TODO Auto-generated catch block
>                        e.printStackTrace();
>                }
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: how to reuse a tokenStream?

Li Li
I want to implement an analyzer which use WhitespaceAnalyzer first
then my tokenFilter. But my filter need not global information of
token such as how many times a token occur. So in  tokenStream method.
I iterate the tokenStream to get all the things I need. Then
pass this information to my own TokenFilter. But my tokenfilter's
incrementToken method will always get nothing from the TokenStream
because it's already consumed.
public MyAnalyzer{
...
         Analyzer wa=new WhitespaceAnalyzer();
         public TokenStream tokenStream(String fieldName, Reader reader) {
               TokenStream tokenStream=null;

               try {
                       tokenStream = wa.reusableTokenStream(fieldName, reader);
                       while(tokenStream.incrementToken()){
                               //do something
                       }
               } catch (IOException e1) {
                       e1.printStackTrace();
               }
/*
               try {
                       tokenStream.reset();
                       //tokenStream =
wa.reusableTokenStream(fieldName, reader);
                       while(tokenStream.incrementToken()){
                               //it don't come here, which means the
tokenStream is not resuable.
                               System.out.println("resuable");
                       }
               } catch (IOException e) {
                       e.printStackTrace();
               }
*/
              MyFilter stream = new MyFilter(tokenStream);
              return stream;
      }
}

2010/5/28 Erick Erickson <[hidden email]>:

> What is the problem you're seeing? Maybe a stack trace?
> You haven't told us what the incorrect behavior is.
>
> Best
> Erick
>
> On Fri, May 28, 2010 at 12:52 AM, Li Li <[hidden email]> wrote:
>
>> I want to analyzer a text twice so that I can get some statistic
>> information from this text
>>                                TokenStream tokenStream=null;
>>                Analyzer wa=new WhitespaceAnalyzer();
>>                try {
>>                        tokenStream = wa.reusableTokenStream(fieldName,
>> reader);
>>                        while(tokenStream.incrementToken()){
>>                                //do something
>>                        }
>>                } catch (IOException e1) {
>>                        e1.printStackTrace();
>>                }
>>
>>                try {
>>                                                tokenStream.reset();
>>                        //tokenStream = wa.reusableTokenStream(fieldName,
>> reader);
>>                        while(tokenStream.incrementToken()){
>>                                                                //not here
>>                                System.out.println("resuable");
>>
>>                        }
>>                } catch (IOException e) {
>>                        // TODO Auto-generated catch block
>>                        e.printStackTrace();
>>                }
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: how to reuse a tokenStream?

Adriano Crestani-2
Hi,

If the reader supports reset, try to also reset it.

Anyway, if in your final code, you are really using WhitespaceAnalyzer, I
guess you should use WhitespaceTokenizer instead of WhitespaceAnalyzer:

WhitespaceTokenizer tokenizer = new WhitespaceTokenizer(reader);
// consume tokens
reader.reset();
tokenizer.reset(reader);
// consume it again

Best Regards,
Adriano Crestani

On Sat, May 29, 2010 at 1:43 AM, Li Li <[hidden email]> wrote:

> I want to implement an analyzer which use WhitespaceAnalyzer first
> then my tokenFilter. But my filter need not global information of
> token such as how many times a token occur. So in  tokenStream method.
> I iterate the tokenStream to get all the things I need. Then
> pass this information to my own TokenFilter. But my tokenfilter's
> incrementToken method will always get nothing from the TokenStream
> because it's already consumed.
> public MyAnalyzer{
> ...
>         Analyzer wa=new WhitespaceAnalyzer();
>         public TokenStream tokenStream(String fieldName, Reader reader) {
>               TokenStream tokenStream=null;
>
>               try {
>                       tokenStream = wa.reusableTokenStream(fieldName,
> reader);
>                       while(tokenStream.incrementToken()){
>                               //do something
>                       }
>               } catch (IOException e1) {
>                       e1.printStackTrace();
>               }
> /*
>               try {
>                       tokenStream.reset();
>                       //tokenStream =
> wa.reusableTokenStream(fieldName, reader);
>                       while(tokenStream.incrementToken()){
>                                //it don't come here, which means the
> tokenStream is not resuable.
>                                System.out.println("resuable");
>                       }
>               } catch (IOException e) {
>                        e.printStackTrace();
>               }
> */
>              MyFilter stream = new MyFilter(tokenStream);
>              return stream;
>      }
> }
>
> 2010/5/28 Erick Erickson <[hidden email]>:
> > What is the problem you're seeing? Maybe a stack trace?
> > You haven't told us what the incorrect behavior is.
> >
> > Best
> > Erick
> >
> > On Fri, May 28, 2010 at 12:52 AM, Li Li <[hidden email]> wrote:
> >
> >> I want to analyzer a text twice so that I can get some statistic
> >> information from this text
> >>                                TokenStream tokenStream=null;
> >>                Analyzer wa=new WhitespaceAnalyzer();
> >>                try {
> >>                        tokenStream = wa.reusableTokenStream(fieldName,
> >> reader);
> >>                        while(tokenStream.incrementToken()){
> >>                                //do something
> >>                        }
> >>                } catch (IOException e1) {
> >>                        e1.printStackTrace();
> >>                }
> >>
> >>                try {
> >>                                                tokenStream.reset();
> >>                        //tokenStream = wa.reusableTokenStream(fieldName,
> >> reader);
> >>                        while(tokenStream.incrementToken()){
> >>                                                                //not
> here
> >>                                System.out.println("resuable");
> >>
> >>                        }
> >>                } catch (IOException e) {
> >>                        // TODO Auto-generated catch block
> >>                        e.printStackTrace();
> >>                }
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>