Solr 8.5.2 indexing issue

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr 8.5.2 indexing issue

gnandre
Hi,

I have the following document which fails to get indexed.

{
        "asset_id":"add-ons:576deefef7453a9189aa039b66500eb2",

"reference_url":"modeling-a-high-speed-backplane-part-3-4-port-s-parameters-to-differential-tdr-and-tdt.html"}

I am not sure what is so special about the content in the reference_url
field.

reference_url field is defined as follows in schema:
<field name="reference_url" type="string" stored="true" indexed="false"
multiValued="false"/>
<fieldType name="string" class="solr.StrField" sortMissingLast="true"
omitNorms="true"/>

It throws the following error.

Status: {"data":{"responseHeader":{"status":400,"QTime":18},"error":{"metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","java.lang.IndexOutOfBoundsException"],"msg":"Exception
writing document id add-ons:576deefef7453a9189aa039b66500eb2 to the index;
possible analysis
error.","code":400}},"status":400,"config":{"method":"POST","transformRequest":[null],"transformResponse":[null],"jsonpCallbackParam":"callback","headers":{"Content-type":"application/json","Accept":"application/json,
text/plain, */*","X-Requested-With":"XMLHttpRequest"},"data":"[{\n
\"asset_id\":\"add-ons:576deefef7453a9189aa039b66500eb2\",\n
\"reference_url\":\"modeling-a-high-speed-backplane-part-3-4-port-s-parameters-to-differential-tdr-and-tdt.html\"}]","url":"add-ons/update","params":{"wt":"json","_":1593304427428,"commitWithin":1000,"overwrite":true},"timeout":10000},"statusText":"Bad
Request","xhrStatus":"complete","resource":{"0":"[","1":"{","2":"\n","3":"
","4":" ","5":" ","6":" ","7":" ","8":" ","9":" ","10":"
","11":"\"","12":"a","13":"s","14":"s","15":"e","16":"t","17":"_","18":"i","19":"d","20":"\"","21":":","22":"\"","23":"a","24":"d","25":"d","26":"-","27":"o","28":"n","29":"s","30":":","31":"5","32":"7","33":"6","34":"d","35":"e","36":"e","37":"f","38":"e","39":"f","40":"7","41":"4","42":"5","43":"3","44":"a","45":"9","46":"1","47":"8","48":"9","49":"a","50":"a","51":"0","52":"3","53":"9","54":"b","55":"6","56":"6","57":"5","58":"0","59":"0","60":"e","61":"b","62":"2","63":"\"","64":",","65":"\n","66":"
","67":" ","68":" ","69":" ","70":" ","71":" ","72":" ","73":"
","74":"\"","75":"r","76":"e","77":"f","78":"e","79":"r","80":"e","81":"n","82":"c","83":"e","84":"_","85":"u","86":"r","87":"l","88":"\"","89":":","90":"\"","91":"m","92":"o","93":"d","94":"e","95":"l","96":"i","97":"n","98":"g","99":"-","100":"a","101":"-","102":"h","103":"i","104":"g","105":"h","106":"-","107":"s","108":"p","109":"e","110":"e","111":"d","112":"-","113":"b","114":"a","115":"c","116":"k","117":"p","118":"l","119":"a","120":"n","121":"e","122":"-","123":"p","124":"a","125":"r","126":"t","127":"-","128":"3","129":"-","130":"4","131":"-","132":"p","133":"o","134":"r","135":"t","136":"-","137":"s","138":"-","139":"p","140":"a","141":"r","142":"a","143":"m","144":"e","145":"t","146":"e","147":"r","148":"s","149":"-","150":"t","151":"o","152":"-","153":"d","154":"i","155":"f","156":"f","157":"e","158":"r","159":"e","160":"n","161":"t","162":"i","163":"a","164":"l","165":"-","166":"t","167":"d","168":"r","169":"-","170":"a","171":"n","172":"d","173":"-","174":"t","175":"d","176":"t","177":".","178":"h","179":"t","180":"m","181":"l","182":"\"","183":"}","184":"]"}}
Reply | Threaded
Open this post in threaded view
|

Re: Solr 8.5.2 indexing issue

Erick Erickson
How are you sending this to Solr? I just tried 8.5, submitting that doc through the admin UI and it works fine.
I defined “asset_id” with as the same type as your reference_url field.

And does the log on the Solr node that tries to index this give any more info?

Best,
Erick

> On Jun 27, 2020, at 10:45 PM, gnandre <[hidden email]> wrote:
>
> {
>        "asset_id":"add-ons:576deefef7453a9189aa039b66500eb2",
>
> "reference_url":"modeling-a-high-speed-backplane-part-3-4-port-s-parameters-to-differential-tdr-and-tdt.html"}

Reply | Threaded
Open this post in threaded view
|

Re: Solr 8.5.2 indexing issue

gnandre
It seems that the issue is not with reference_url field itself. There is
one copy field which has the reference_url field as source and another
field called url_path as destination.
This destination field url_path has the following field type definition.

  <fieldType name="url_path_text" class="solr.TextField">
    <analyzer type="index">
      <tokenizer class="solr.PatternTokenizerFactory"
pattern="(https?://(www\.([^/]+)?)?|/([^/]+\.[^/]+$)?|\.?organization\.[^/]+|[?#].*$)"
group="-1"/>
      <filter class="solr.WordDelimiterGraphFilterFactory"
protected="protect.txt" preserveOriginal="1" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0"  splitOnCaseChange="1"/>
 <filter class="solr.FlattenGraphFilterFactory"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.ICUNormalizer2FilterFactory" name="nfkc"
mode="compose"/>
      <filter class="solr.SynonymGraphFilterFactory"
synonyms="synonyms_en.txt" ignoreCase="true" expand="false"/>
 <filter class="solr.FlattenGraphFilterFactory"/>
      <filter class="solr.SnowballPorterFilterFactory"
protected="protwords.txt" language="English"/>
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
      <filter class="solr.WordDelimiterGraphFilterFactory"
protected="protect.txt" preserveOriginal="1" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0"  splitOnCaseChange="1"/>
 <filter class="solr.FlattenGraphFilterFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.ICUNormalizer2FilterFactory" name="nfkc"
mode="compose"/>
      <filter class="solr.SnowballPorterFilterFactory"
protected="protwords.txt" language="English"/>
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
  </fieldType>

If I remove  SynonymGraphFilterFactory and FlattenGraphFilterFactory in
above field type definition then it works otherwise it throws the
same error (IndexOutOfBoundsException) .

On Sun, Jun 28, 2020 at 9:06 AM Erick Erickson <[hidden email]>
wrote:

> How are you sending this to Solr? I just tried 8.5, submitting that doc
> through the admin UI and it works fine.
> I defined “asset_id” with as the same type as your reference_url field.
>
> And does the log on the Solr node that tries to index this give any more
> info?
>
> Best,
> Erick
>
> > On Jun 27, 2020, at 10:45 PM, gnandre <[hidden email]> wrote:
> >
> > {
> >        "asset_id":"add-ons:576deefef7453a9189aa039b66500eb2",
> >
> >
> "reference_url":"modeling-a-high-speed-backplane-part-3-4-port-s-parameters-to-differential-tdr-and-tdt.html"}
>
>