truncate string field type

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

truncate string field type

Zahra Aminolroaya
I want to truncate my string field type due to its number of bytes limit. I
wrote the following in my schema:


<fieldType name="string" class="solr.StrField" sortMissingLast="true"/>
  <analyzer type="index">
      <tokenizer class="solr.KeywordTokenizerFactory"/>
      <filter class="solr.TruncateTokenFilterFactory" prefixLength="32700"/>
   </analyzer>
   <analyzer type="query">
      <tokenizer class="solr.KeywordTokenizerFactory"/>
      <filter class="solr.TruncateTokenFilterFactory" prefixLength="32700"/>
   </analyzer>
</fieldType>

However, I found that StrField (string) does not support specifying an
analyzer. Besides, prefixLength in TruncateTokenFilterFactory could not be
more than 1000.

I want to have the same application of string. Do you think it is reasonable
to use  "text_general" field type with solr.KeywordTokenizerFactory filter
to have the same application? Do I lose any feature?

If I use text_general, it is not needed to truncate.





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: truncate string field type

Alexandre Rafalovitch
Did you look into UpdateRequestProcessors?

There is a truncate one there.

Regards,
    Alex

On Sun, Jul 8, 2018, 12:44 AM Zahra Aminolroaya, <[hidden email]>
wrote:

> I want to truncate my string field type due to its number of bytes limit. I
> wrote the following in my schema:
>
>
> <fieldType name="string" class="solr.StrField" sortMissingLast="true"/>
>   <analyzer type="index">
>       <tokenizer class="solr.KeywordTokenizerFactory"/>
>       <filter class="solr.TruncateTokenFilterFactory"
> prefixLength="32700"/>
>    </analyzer>
>    <analyzer type="query">
>       <tokenizer class="solr.KeywordTokenizerFactory"/>
>       <filter class="solr.TruncateTokenFilterFactory"
> prefixLength="32700"/>
>    </analyzer>
> </fieldType>
>
> However, I found that StrField (string) does not support specifying an
> analyzer. Besides, prefixLength in TruncateTokenFilterFactory could not be
> more than 1000.
>
> I want to have the same application of string. Do you think it is
> reasonable
> to use  "text_general" field type with solr.KeywordTokenizerFactory filter
> to have the same application? Do I lose any feature?
>
> If I use text_general, it is not needed to truncate.
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
Reply | Threaded
Open this post in threaded view
|

Re: truncate string field type

Erick Erickson
Why do you want to add such long strings to your index in the first
place? There are almost useless for search, you want tokenized
(text_general is a good place to start) if you want to search for
words within the string.

"The number of bytes limit" is 32K or so, right? What do you want to
do with the data going in there?

There may be good reasons, but I've seen confusion around strings in the past.

Best,
Erick

On Sat, Jul 7, 2018 at 11:12 PM, Alexandre Rafalovitch
<[hidden email]> wrote:

> Did you look into UpdateRequestProcessors?
>
> There is a truncate one there.
>
> Regards,
>     Alex
>
> On Sun, Jul 8, 2018, 12:44 AM Zahra Aminolroaya, <[hidden email]>
> wrote:
>
>> I want to truncate my string field type due to its number of bytes limit. I
>> wrote the following in my schema:
>>
>>
>> <fieldType name="string" class="solr.StrField" sortMissingLast="true"/>
>>   <analyzer type="index">
>>       <tokenizer class="solr.KeywordTokenizerFactory"/>
>>       <filter class="solr.TruncateTokenFilterFactory"
>> prefixLength="32700"/>
>>    </analyzer>
>>    <analyzer type="query">
>>       <tokenizer class="solr.KeywordTokenizerFactory"/>
>>       <filter class="solr.TruncateTokenFilterFactory"
>> prefixLength="32700"/>
>>    </analyzer>
>> </fieldType>
>>
>> However, I found that StrField (string) does not support specifying an
>> analyzer. Besides, prefixLength in TruncateTokenFilterFactory could not be
>> more than 1000.
>>
>> I want to have the same application of string. Do you think it is
>> reasonable
>> to use  "text_general" field type with solr.KeywordTokenizerFactory filter
>> to have the same application? Do I lose any feature?
>>
>> If I use text_general, it is not needed to truncate.
>>
>>
>>
>>
>>
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
Reply | Threaded
Open this post in threaded view
|

Re: truncate string field type

Zahra Aminolroaya
Thanks Alexandre and Erick. Erick I want to use my regular expression to
search a field and Solr text field token the document, so the regular
expression result will not be valid. I want Solr not to token my doc,
although I will lose some terms using solr string.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: truncate string field type

Alexandre Rafalovitch
Are you sure Solr is the right tool for you? Regexp searches is the really
last resort approach in the domain.

I suggest that maybe you rethink your actual business case (share it here)
to benefiy from tokenization or look if other tools are better.

As it is, you are using a drill to hammer nails.....

Regards,
    Alex

On Tue, Jul 10, 2018, 2:44 AM Zahra Aminolroaya, <[hidden email]>
wrote:

> Thanks Alexandre and Erick. Erick I want to use my regular expression to
> search a field and Solr text field token the document, so the regular
> expression result will not be valid. I want Solr not to token my doc,
> although I will lose some terms using solr string.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
Reply | Threaded
Open this post in threaded view
|

Re: truncate string field type

Zahra Aminolroaya
suppose I want to search the "l(i|a)*on k(i|e)*ng". there is a space between
two words. I want solr to retrieve the exact match that these two words or
their other cases are adjacent. If I want to use text field type, each one
of these words are considered as tokens, so solr may bring back other
results too; However, we have strict costumers who only need exact matches
if any result is available not more!



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html