Retrieving Tokens

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Retrieving Tokens

rishabh9
Hi,

I have created my own Tokenizer and I am indexing the documents using the
same.

I wanted to know if there is a way to retrieve the tokens (created by my
custom tokenizer) from the index.
Do we have to modify the code to get these tokens?

Regards,
Rishabh
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving Tokens

Yonik Seeley-2
On Dec 19, 2007 10:59 AM, Rishabh Joshi <[hidden email]> wrote:
> I have created my own Tokenizer and I am indexing the documents using the
> same.
>
> I wanted to know if there is a way to retrieve the tokens (created by my
> custom tokenizer) from the index.

If you want the tokens in the index, see the luke request handler.

If you want the tokens for a specific document, it's more
complicated... Lucene maintains an *inverted* index... terms point to
documents, so by default there is no way to ask for all of the terms
in a certain document.  One could ask lucene to store the terms for
certain fields (called term vectors), but that requires extra space in
the index, and solr doesn't yet have a way to ask that they be
retrieved.

What are you trying to do with the tokens?

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving Tokens

rishabh9
> What are you trying to do with the tokens?

Yonik, we wanted a "tokenizer" that would tokenize the content of a document
as per our requirements, and then store them in the index so that, we could
retrieve those tokens at search time, for further processing in our
application.

Regards,
Rishabh

On Dec 19, 2007 10:02 PM, Yonik Seeley <[hidden email]> wrote:

> On Dec 19, 2007 10:59 AM, Rishabh Joshi <[hidden email]> wrote:
> > I have created my own Tokenizer and I am indexing the documents using
> the
> > same.
> >
> > I wanted to know if there is a way to retrieve the tokens (created by my
> > custom tokenizer) from the index.
>
> If you want the tokens in the index, see the luke request handler.
>
> If you want the tokens for a specific document, it's more
> complicated... Lucene maintains an *inverted* index... terms point to
> documents, so by default there is no way to ask for all of the terms
> in a certain document.  One could ask lucene to store the terms for
> certain fields (called term vectors), but that requires extra space in
> the index, and solr doesn't yet have a way to ask that they be
> retrieved.
>
> What are you trying to do with the tokens?
>
> -Yonik
>
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving Tokens

Erick Erickson
I think that what Yonik wants is a higher-level response.
*Why* do you want to process the tokens later? What is the
use case you're trying to satisfy?

Best
Erick

On Dec 20, 2007 1:37 AM, Rishabh Joshi <[hidden email]> wrote:

> > What are you trying to do with the tokens?
>
> Yonik, we wanted a "tokenizer" that would tokenize the content of a
> document
> as per our requirements, and then store them in the index so that, we
> could
> retrieve those tokens at search time, for further processing in our
> application.
>
> Regards,
> Rishabh
>
> On Dec 19, 2007 10:02 PM, Yonik Seeley <[hidden email]> wrote:
>
> > On Dec 19, 2007 10:59 AM, Rishabh Joshi <[hidden email]> wrote:
> > > I have created my own Tokenizer and I am indexing the documents using
> > the
> > > same.
> > >
> > > I wanted to know if there is a way to retrieve the tokens (created by
> my
> > > custom tokenizer) from the index.
> >
> > If you want the tokens in the index, see the luke request handler.
> >
> > If you want the tokens for a specific document, it's more
> > complicated... Lucene maintains an *inverted* index... terms point to
> > documents, so by default there is no way to ask for all of the terms
> > in a certain document.  One could ask lucene to store the terms for
> > certain fields (called term vectors), but that requires extra space in
> > the index, and solr doesn't yet have a way to ask that they be
> > retrieved.
> >
> > What are you trying to do with the tokens?
> >
> > -Yonik
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving Tokens

Eswar K
Yonik/Erick,

We are building a custome Search which is to be done in 2 parts executed at
different points of time. As a result of it, the first step we want tokenize
the information and store it, which we want to retrieve a later point of
time for further processing and then store it back into the index. This
processed information is what we want the users to be able to search on.

Regards,
Eswar

On Dec 20, 2007 8:15 PM, Erick Erickson <[hidden email]> wrote:

> I think that what Yonik wants is a higher-level response.
> *Why* do you want to process the tokens later? What is the
> use case you're trying to satisfy?
>
> Best
> Erick
>
> On Dec 20, 2007 1:37 AM, Rishabh Joshi <[hidden email]> wrote:
>
> > > What are you trying to do with the tokens?
> >
> > Yonik, we wanted a "tokenizer" that would tokenize the content of a
> > document
> > as per our requirements, and then store them in the index so that, we
> > could
> > retrieve those tokens at search time, for further processing in our
> > application.
> >
> > Regards,
> > Rishabh
> >
> > On Dec 19, 2007 10:02 PM, Yonik Seeley <[hidden email]> wrote:
> >
> > > On Dec 19, 2007 10:59 AM, Rishabh Joshi <[hidden email]> wrote:
> > > > I have created my own Tokenizer and I am indexing the documents
> using
> > > the
> > > > same.
> > > >
> > > > I wanted to know if there is a way to retrieve the tokens (created
> by
> > my
> > > > custom tokenizer) from the index.
> > >
> > > If you want the tokens in the index, see the luke request handler.
> > >
> > > If you want the tokens for a specific document, it's more
> > > complicated... Lucene maintains an *inverted* index... terms point to
> > > documents, so by default there is no way to ask for all of the terms
> > > in a certain document.  One could ask lucene to store the terms for
> > > certain fields (called term vectors), but that requires extra space in
> > > the index, and solr doesn't yet have a way to ask that they be
> > > retrieved.
> > >
> > > What are you trying to do with the tokens?
> > >
> > > -Yonik
> > >
> >
>