document retrieval, nested field and HTMLStripStandardTokenizerFactory

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

document retrieval, nested field and HTMLStripStandardTokenizerFactory

Vinci
Hi all,

I am working for developing the interface for Solr with JSON. And some question here:
1. Can I limit the number of returned document in config file to avoid misconfiguration pull down the server?
2. How can I retrieve the document by unique key for result view purpose ? And how can I do the xslt transformation on it?
3. can I use nested field in document like this?
<field name="a">
   <field name ="bid">0000</field>
   <field name ="btext">1111</field>
</field>
4. Does HTMLStripStandardTokenizerFactory do the same thing as solr.HTMLStripWhitespaceTokenizerFactory but only their target difference?
And Can I use HTMLStripStandardTokenizerFactory with TokenizerFactory which extended from  BaseTokenizerFactory?
5. If I use HTMLStripStandardTokenizerFactory, do I need to escape the html character in field element?

Thank you,
Vinci
Reply | Threaded
Open this post in threaded view
|

Re: document retrieval, nested field and HTMLStripStandardTokenizerFactory

hossman

: 1. Can I limit the number of returned document in config file to avoid
: misconfiguration pull down the server?

You can configure it with an invariant value in your requestHandler config
... so it won't matter how many the client asks for, they'll get the
number you pick (or less if there aren't that many) ... but there is no
way to let them pick, but "limit" the value.

: 2. How can I retrieve the document by unique key for result view purpose ?

make your uniqueKey field searchable.

: And how can I do the xslt transformation on it?

http://wiki.apache.org/solr/XsltResponseWriter   ?

: 3. can I use nested field in document like this?

nope.

: 4. Does HTMLStripStandardTokenizerFactory do the same thing as
: solr.HTMLStripWhitespaceTokenizerFactory but only their target difference?
: And Can I use HTMLStripStandardTokenizerFactory with TokenizerFactory which
: extended from  BaseTokenizerFactory?

the html stripping happens prior to true "tokenizetion" ... so the
difference is one is based on the the StandardTokenizer, one uses the
WhitespaceTokenizer ... if you want to use a differnet tokenizer you just
have to write a new factory for your Tokenizer that wraps the reader .. if
you look at the source of hte existing ones it's pretty straight forward.

(Hmm... maybe we should add a ReaderWrapperFactory that can be optionally
specificed using a <reader> config inside an <analyzer> config?)

: 5. If I use HTMLStripStandardTokenizerFactory, do I need to escape the html
: character in field element?

you mean when sending Solr your data using the XmlUpdateRequestHandler?
... yes.  XML is the message format, your HTML is the data, the data has
to be properly XML escaped no matter what it is.




-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: document retrieval, nested field and HTMLStripStandardTokenizerFactory

Vinci
Hi hossman,

Thank you for your reply, question for the searchable field: Am I declare the field to be indexed in schema is enough to make it searchable? (Assume I write my schema based on the default one)

Thank you,
Vinci
hossman wrote
: 1. Can I limit the number of returned document in config file to avoid
: misconfiguration pull down the server?

You can configure it with an invariant value in your requestHandler config
... so it won't matter how many the client asks for, they'll get the
number you pick (or less if there aren't that many) ... but there is no
way to let them pick, but "limit" the value.

: 2. How can I retrieve the document by unique key for result view purpose ?

make your uniqueKey field searchable.

: And how can I do the xslt transformation on it?

http://wiki.apache.org/solr/XsltResponseWriter   ?

: 3. can I use nested field in document like this?

nope.

: 4. Does HTMLStripStandardTokenizerFactory do the same thing as
: solr.HTMLStripWhitespaceTokenizerFactory but only their target difference?
: And Can I use HTMLStripStandardTokenizerFactory with TokenizerFactory which
: extended from  BaseTokenizerFactory?

the html stripping happens prior to true "tokenizetion" ... so the
difference is one is based on the the StandardTokenizer, one uses the
WhitespaceTokenizer ... if you want to use a differnet tokenizer you just
have to write a new factory for your Tokenizer that wraps the reader .. if
you look at the source of hte existing ones it's pretty straight forward.

(Hmm... maybe we should add a ReaderWrapperFactory that can be optionally
specificed using a <reader> config inside an <analyzer> config?)

: 5. If I use HTMLStripStandardTokenizerFactory, do I need to escape the html
: character in field element?

you mean when sending Solr your data using the XmlUpdateRequestHandler?
... yes.  XML is the message format, your HTML is the data, the data has
to be properly XML escaped no matter what it is.




-Hoss
Reply | Threaded
Open this post in threaded view
|

Re: document retrieval, nested field and HTMLStripStandardTokenizerFactory

Otis Gospodnetic-2
In reply to this post by Vinci
For a field to be searchable it has to be indexed (and not just stored).

Otis

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Vinci <[hidden email]>
To: [hidden email]
Sent: Thursday, March 27, 2008 4:43:02 AM
Subject: Re: document retrieval, nested field and HTMLStripStandardTokenizerFactory


Hi hossman,

Thank you for your reply, question for the searchable field: Am I declare
the field to be indexed in schema is enough to make it searchable? (Assume I
write my schema based on the default one)

Thank you,
Vinci

hossman wrote:

>
>
> : 1. Can I limit the number of returned document in config file to avoid
> : misconfiguration pull down the server?
>
> You can configure it with an invariant value in your requestHandler config
> ... so it won't matter how many the client asks for, they'll get the
> number you pick (or less if there aren't that many) ... but there is no
> way to let them pick, but "limit" the value.
>
> : 2. How can I retrieve the document by unique key for result view purpose
> ?
>
> make your uniqueKey field searchable.
>
> : And how can I do the xslt transformation on it?
>
> http://wiki.apache.org/solr/XsltResponseWriter   ?
>
> : 3. can I use nested field in document like this?
>
> nope.
>
> : 4. Does HTMLStripStandardTokenizerFactory do the same thing as
> : solr.HTMLStripWhitespaceTokenizerFactory but only their target
> difference?
> : And Can I use HTMLStripStandardTokenizerFactory with TokenizerFactory
> which
> : extended from  BaseTokenizerFactory?
>
> the html stripping happens prior to true "tokenizetion" ... so the
> difference is one is based on the the StandardTokenizer, one uses the
> WhitespaceTokenizer ... if you want to use a differnet tokenizer you just
> have to write a new factory for your Tokenizer that wraps the reader .. if
> you look at the source of hte existing ones it's pretty straight forward.
>
> (Hmm... maybe we should add a ReaderWrapperFactory that can be optionally
> specificed using a <reader> config inside an <analyzer> config?)
>
> : 5. If I use HTMLStripStandardTokenizerFactory, do I need to escape the
> html
> : character in field element?
>
> you mean when sending Solr your data using the XmlUpdateRequestHandler?
> ... yes.  XML is the message format, your HTML is the data, the data has
> to be properly XML escaped no matter what it is.
>
>
>
>
> -Hoss
>
>
>

--
View this message in context: http://www.nabble.com/document-retrieval%2C-nested-field-and-HTMLStripStandardTokenizerFactory-tp16300794p16323359.html
Sent from the Solr - User mailing list archive at Nabble.com.