metatag.description while index data

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

metatag.description while index data

polu.amar
Hi Nutch Team,

We are trying to crwal a websites which is korea and japanees langaugae
based, while doing to index data into solr we are getting into below error,
kindly suggest how to resolve this error.

versions:
nutch: 1.14
solr:6.63
jdk:1.8
zookeeper: 3.35

Error from hadoop.log:
--------------------
java.lang.Exception:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://localhost:8983/solr/samplecore: ERROR: [doc=
https://abc.koria.kr/] multiple values encountered for non multiValued
field metatag.description: [ Korea 공식 사이트입니다.,  Korea 공식 사이트입니다.]
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://localhost:8983/solr/samplecore: ERROR: [doc=
https://abc.koria.kr/] multiple values encountered for non multiValued
field metatag.description: [ Korea 공식 사이트입니다.,  Korea 공식 사이트입니다.]

2018-08-30 18:40:05,152 ERROR indexer.IndexingJob - Indexer:
java.io.IOException: Job failed!

common errors:
unknown field 'digest'
multiValued field metatag.description:

I have tried as per below thread
https://wiki.apache.org/nutch/IndexMetatags

Added in  my nutch-site.xml
<property>
<name>plugin.includes</name>
<value>protocol-httpclient|urlfilter-regex|index-(basic|more)|query-(basic|site|url|lang)|indexer-solr|nutch-extensionpoints|parse-(html|tika|text|msexcel|msword|mspowerpoint|pdf)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)|parse-(html|tika|metatags|msword|msexcel|pdf)|index-(basic|anchor|more|metadata)|language-identifier</value>
</property>

<!-- Used only if plugin parse-metatags is enabled. -->
<property>
<name>metatags.names</name>
<value>*</value>
<description> Names of the metatags to extract, separated by ','.
  Use '*' to extract all metatags. Prefixes the names with 'metatag.'
  in the parse-metadata. For instance to index description and keywords,
  you need to activate the plugin index-metadata and set the value of the
  parameter 'index.parse.md' to 'metatag.description,metatag.keywords'.
</description>
</property>

<property>
  <name>index.parse.md</name>
  <value>*</value>
  <description>
  Comma-separated list of keys to be taken from the parse metadata to
generate fields.
  Can be used e.g. for 'description' or 'keywords' provided that these
values are generated
  by a parser (see parse-metatags plugin)
  </description>
</property>

Added in solr scheme.xml

<fields>
....
<!-- fields for the metatags plugin -->
<field name="metatag.description" type="text" stored="true" indexed="true"/>
<field name="metatag.keywords" type="text" stored="true" indexed="true"/>
...
</fields>
--




------------------------------

Thanks and Regards,

*Amarnath Polu*
Reply | Threaded
Open this post in threaded view
|

Re: metatag.description while index data

BlackIce
try making these fields "Multivalued", like so:

<!-- fields for the metatags plugin -->
<field name="metatag.description" type="text" stored="true" indexed="true"
multiValued="true"/>
<field name="metatag.keywords" type="text" stored="true" indexed="true"
multiValued="true"/>
...
</fields>

On Thu, Aug 30, 2018 at 1:45 PM Amarnatha Reddy <[hidden email]> wrote:

> Hi Nutch Team,
>
> We are trying to crwal a websites which is korea and japanees langaugae
> based, while doing to index data into solr we are getting into below error,
> kindly suggest how to resolve this error.
>
> versions:
> nutch: 1.14
> solr:6.63
> jdk:1.8
> zookeeper: 3.35
>
> Error from hadoop.log:
> --------------------
> java.lang.Exception:
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
> from server at http://localhost:8983/solr/samplecore: ERROR: [doc=
> https://abc.koria.kr/] multiple values encountered for non multiValued
> field metatag.description: [ Korea 공식 사이트입니다.,  Korea 공식 사이트입니다.]
>         at
>
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
> Caused by:
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
> from server at http://localhost:8983/solr/samplecore: ERROR: [doc=
> https://abc.koria.kr/] multiple values encountered for non multiValued
> field metatag.description: [ Korea 공식 사이트입니다.,  Korea 공식 사이트입니다.]
>
> 2018-08-30 18:40:05,152 ERROR indexer.IndexingJob - Indexer:
> java.io.IOException: Job failed!
>
> common errors:
> unknown field 'digest'
> multiValued field metatag.description:
>
> I have tried as per below thread
> https://wiki.apache.org/nutch/IndexMetatags
>
> Added in  my nutch-site.xml
> <property>
> <name>plugin.includes</name>
>
> <value>protocol-httpclient|urlfilter-regex|index-(basic|more)|query-(basic|site|url|lang)|indexer-solr|nutch-extensionpoints|parse-(html|tika|text|msexcel|msword|mspowerpoint|pdf)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)|parse-(html|tika|metatags|msword|msexcel|pdf)|index-(basic|anchor|more|metadata)|language-identifier</value>
> </property>
>
> <!-- Used only if plugin parse-metatags is enabled. -->
> <property>
> <name>metatags.names</name>
> <value>*</value>
> <description> Names of the metatags to extract, separated by ','.
>   Use '*' to extract all metatags. Prefixes the names with 'metatag.'
>   in the parse-metadata. For instance to index description and keywords,
>   you need to activate the plugin index-metadata and set the value of the
>   parameter 'index.parse.md' to 'metatag.description,metatag.keywords'.
> </description>
> </property>
>
> <property>
>   <name>index.parse.md</name>
>   <value>*</value>
>   <description>
>   Comma-separated list of keys to be taken from the parse metadata to
> generate fields.
>   Can be used e.g. for 'description' or 'keywords' provided that these
> values are generated
>   by a parser (see parse-metatags plugin)
>   </description>
> </property>
>
> Added in solr scheme.xml
>
> <fields>
> ....
> <!-- fields for the metatags plugin -->
> <field name="metatag.description" type="text" stored="true"
> indexed="true"/>
> <field name="metatag.keywords" type="text" stored="true" indexed="true"/>
> ...
> </fields>
> --
>
>
>
>
> ------------------------------
>
> Thanks and Regards,
>
> *Amarnath Polu*
>
Reply | Threaded
Open this post in threaded view
|

Re: metatag.description while index data

polu.amar
Still am facing the same issue after changing the suggested values any clue
please

Amarnath

On Thu 30 Aug, 2018, 7:50 PM BlackIce, <[hidden email]> wrote:

> try making these fields "Multivalued", like so:
>
> <!-- fields for the metatags plugin -->
> <field name="metatag.description" type="text" stored="true" indexed="true"
> multiValued="true"/>
> <field name="metatag.keywords" type="text" stored="true" indexed="true"
> multiValued="true"/>
> ...
> </fields>
>
> On Thu, Aug 30, 2018 at 1:45 PM Amarnatha Reddy <[hidden email]>
> wrote:
>
> > Hi Nutch Team,
> >
> > We are trying to crwal a websites which is korea and japanees langaugae
> > based, while doing to index data into solr we are getting into below
> error,
> > kindly suggest how to resolve this error.
> >
> > versions:
> > nutch: 1.14
> > solr:6.63
> > jdk:1.8
> > zookeeper: 3.35
> >
> > Error from hadoop.log:
> > --------------------
> > java.lang.Exception:
> > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
> Error
> > from server at http://localhost:8983/solr/samplecore: ERROR: [doc=
> > https://abc.koria.kr/] multiple values encountered for non multiValued
> > field metatag.description: [ Korea 공식 사이트입니다.,  Korea 공식 사이트입니다.]
> >         at
> >
> >
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
> >         at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
> > Caused by:
> > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
> Error
> > from server at http://localhost:8983/solr/samplecore: ERROR: [doc=
> > https://abc.koria.kr/] multiple values encountered for non multiValued
> > field metatag.description: [ Korea 공식 사이트입니다.,  Korea 공식 사이트입니다.]
> >
> > 2018-08-30 18:40:05,152 ERROR indexer.IndexingJob - Indexer:
> > java.io.IOException: Job failed!
> >
> > common errors:
> > unknown field 'digest'
> > multiValued field metatag.description:
> >
> > I have tried as per below thread
> > https://wiki.apache.org/nutch/IndexMetatags
> >
> > Added in  my nutch-site.xml
> > <property>
> > <name>plugin.includes</name>
> >
> >
> <value>protocol-httpclient|urlfilter-regex|index-(basic|more)|query-(basic|site|url|lang)|indexer-solr|nutch-extensionpoints|parse-(html|tika|text|msexcel|msword|mspowerpoint|pdf)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)|parse-(html|tika|metatags|msword|msexcel|pdf)|index-(basic|anchor|more|metadata)|language-identifier</value>
> > </property>
> >
> > <!-- Used only if plugin parse-metatags is enabled. -->
> > <property>
> > <name>metatags.names</name>
> > <value>*</value>
> > <description> Names of the metatags to extract, separated by ','.
> >   Use '*' to extract all metatags. Prefixes the names with 'metatag.'
> >   in the parse-metadata. For instance to index description and keywords,
> >   you need to activate the plugin index-metadata and set the value of the
> >   parameter 'index.parse.md' to 'metatag.description,metatag.keywords'.
> > </description>
> > </property>
> >
> > <property>
> >   <name>index.parse.md</name>
> >   <value>*</value>
> >   <description>
> >   Comma-separated list of keys to be taken from the parse metadata to
> > generate fields.
> >   Can be used e.g. for 'description' or 'keywords' provided that these
> > values are generated
> >   by a parser (see parse-metatags plugin)
> >   </description>
> > </property>
> >
> > Added in solr scheme.xml
> >
> > <fields>
> > ....
> > <!-- fields for the metatags plugin -->
> > <field name="metatag.description" type="text" stored="true"
> > indexed="true"/>
> > <field name="metatag.keywords" type="text" stored="true" indexed="true"/>
> > ...
> > </fields>
> > --
> >
> >
> >
> >
> > ------------------------------
> >
> > Thanks and Regards,
> >
> > *Amarnath Polu*
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: metatag.description while index data

BlackIce
Sorry if this seems trivial, but did you reload the collection and/or
restart Solr?

On Thu, Aug 30, 2018 at 4:19 PM Amarnatha Reddy <[hidden email]> wrote:

> Still am facing the same issue after changing the suggested values any clue
> please
>
> Amarnath
>
> On Thu 30 Aug, 2018, 7:50 PM BlackIce, <[hidden email]> wrote:
>
> > try making these fields "Multivalued", like so:
> >
> > <!-- fields for the metatags plugin -->
> > <field name="metatag.description" type="text" stored="true"
> indexed="true"
> > multiValued="true"/>
> > <field name="metatag.keywords" type="text" stored="true" indexed="true"
> > multiValued="true"/>
> > ...
> > </fields>
> >
> > On Thu, Aug 30, 2018 at 1:45 PM Amarnatha Reddy <[hidden email]>
> > wrote:
> >
> > > Hi Nutch Team,
> > >
> > > We are trying to crwal a websites which is korea and japanees langaugae
> > > based, while doing to index data into solr we are getting into below
> > error,
> > > kindly suggest how to resolve this error.
> > >
> > > versions:
> > > nutch: 1.14
> > > solr:6.63
> > > jdk:1.8
> > > zookeeper: 3.35
> > >
> > > Error from hadoop.log:
> > > --------------------
> > > java.lang.Exception:
> > > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
> > Error
> > > from server at http://localhost:8983/solr/samplecore: ERROR: [doc=
> > > https://abc.koria.kr/] multiple values encountered for non multiValued
> > > field metatag.description: [ Korea 공식 사이트입니다.,  Korea 공식 사이트입니다.]
> > >         at
> > >
> > >
> >
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
> > >         at
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
> > > Caused by:
> > > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
> > Error
> > > from server at http://localhost:8983/solr/samplecore: ERROR: [doc=
> > > https://abc.koria.kr/] multiple values encountered for non multiValued
> > > field metatag.description: [ Korea 공식 사이트입니다.,  Korea 공식 사이트입니다.]
> > >
> > > 2018-08-30 18:40:05,152 ERROR indexer.IndexingJob - Indexer:
> > > java.io.IOException: Job failed!
> > >
> > > common errors:
> > > unknown field 'digest'
> > > multiValued field metatag.description:
> > >
> > > I have tried as per below thread
> > > https://wiki.apache.org/nutch/IndexMetatags
> > >
> > > Added in  my nutch-site.xml
> > > <property>
> > > <name>plugin.includes</name>
> > >
> > >
> >
> <value>protocol-httpclient|urlfilter-regex|index-(basic|more)|query-(basic|site|url|lang)|indexer-solr|nutch-extensionpoints|parse-(html|tika|text|msexcel|msword|mspowerpoint|pdf)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)|parse-(html|tika|metatags|msword|msexcel|pdf)|index-(basic|anchor|more|metadata)|language-identifier</value>
> > > </property>
> > >
> > > <!-- Used only if plugin parse-metatags is enabled. -->
> > > <property>
> > > <name>metatags.names</name>
> > > <value>*</value>
> > > <description> Names of the metatags to extract, separated by ','.
> > >   Use '*' to extract all metatags. Prefixes the names with 'metatag.'
> > >   in the parse-metadata. For instance to index description and
> keywords,
> > >   you need to activate the plugin index-metadata and set the value of
> the
> > >   parameter 'index.parse.md' to
> 'metatag.description,metatag.keywords'.
> > > </description>
> > > </property>
> > >
> > > <property>
> > >   <name>index.parse.md</name>
> > >   <value>*</value>
> > >   <description>
> > >   Comma-separated list of keys to be taken from the parse metadata to
> > > generate fields.
> > >   Can be used e.g. for 'description' or 'keywords' provided that these
> > > values are generated
> > >   by a parser (see parse-metatags plugin)
> > >   </description>
> > > </property>
> > >
> > > Added in solr scheme.xml
> > >
> > > <fields>
> > > ....
> > > <!-- fields for the metatags plugin -->
> > > <field name="metatag.description" type="text" stored="true"
> > > indexed="true"/>
> > > <field name="metatag.keywords" type="text" stored="true"
> indexed="true"/>
> > > ...
> > > </fields>
> > > --
> > >
> > >
> > >
> > >
> > > ------------------------------
> > >
> > > Thanks and Regards,
> > >
> > > *Amarnath Polu*
> > >
> >
>