Solr integration in nutch-1.1dev

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr integration in nutch-1.1dev

Markus Jelsma
Hi,


I've got a copy of the nutch-2010-05-11_04-34-41 nightly build because i need
Tika to parse JPEG images and that would be in 1.1 as i read somewhere [1].

First i fetch only a single HTML page and send it to Solr as i did with 1.0
but it fails now. Here's what Solr thinks of the request:


---------------
May 17, 2010 2:25:32 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: ERROR: multiple values
encountered for non multiValued copy field id: <URL HERE>
        at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:260)
        at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
        at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:94)
        at
org.apache.solr.update.processor.SignatureUpdateProcessorFactory$SignatureUpdateProcessor.processAdd(SignatureUpdateProcessorFactory.java:162)
        at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
        at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
        at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
        at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
        at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
        at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
        at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
        at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
        at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
        at java.lang.Thread.run(Thread.java:619)
---------------


Well, this obviously is wrong. Although i am still using the old 1.0
schema.xml, it still isn't multiValued in the nightly build's schema.xml file.

Below Nutch's relevant log lines:


---------------
2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: content dest:
content
2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: site dest: site
2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: title dest:
title
2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: host dest: host
2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: segment dest:
segment
2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: boost dest:
boost
2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: digest dest:
digest
2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: tstamp dest:
tstamp
2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: url dest: id
2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: url dest: url
2010-05-17 14:25:31,821 INFO  collection.CollectionManager - Instantiating
CollectionManager
2010-05-17 14:25:31,822 INFO  collection.CollectionManager - initializing
CollectionManager
2010-05-17 14:25:31,849 INFO  collection.CollectionManager - file has1
elements
2010-05-17 14:25:32,474 WARN  mapred.LocalJobRunner - job_local_0001
org.apache.solr.common.SolrException: Bad Request

Bad Request

request: http://127.0.0.1:8080/solr/update?wt=javabin&version=1
        at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424)
        at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
        at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
        at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:74)
        at
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
        at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
---------------

Because i still use my old 1.0 configuration files i get the following warning
from Nutch but doesn't look like it's related to the Sorl integration:

---------------
2010-05-17 14:34:11,529 WARN  conf.Configuration - DEPRECATED: hadoop-site.xml
found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use
core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of
core-default.xml, mapred-default.xml and hdfs-default.xml respectively
---------------

Did i just stumble upon a regression in 1.1dev and should i file a bug or
could something else spoil the fun?



[1]: http://lucene.472066.n3.nabble.com/Adding-jpeg-parser-to-nutch-
td710135.html

Cheers,

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply | Threaded
Open this post in threaded view
|

Re: Solr integration in nutch-1.1dev

Julien Nioche-4
Hi Markus,

This has been solved last week and is in the trunk of the SVN repository.
The nightly build has just been fixed after the move to the TLP so the
version you are using does not have the fix yet. Check
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ to get the latest
build or check it out from SVN

J.
--
DigitalPebble Ltd
http://www.digitalpebble.com

On 17 May 2010 14:26, Markus Jelsma <[hidden email]> wrote:

> Hi,
>
>
> I've got a copy of the nutch-2010-05-11_04-34-41 nightly build because i
> need
> Tika to parse JPEG images and that would be in 1.1 as i read somewhere [1].
>
> First i fetch only a single HTML page and send it to Solr as i did with 1.0
> but it fails now. Here's what Solr thinks of the request:
>
>
> ---------------
> May 17, 2010 2:25:32 PM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: ERROR: multiple values
> encountered for non multiValued copy field id: <URL HERE>
>        at
> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:260)
>        at
>
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
>        at
>
> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:94)
>        at
>
> org.apache.solr.update.processor.SignatureUpdateProcessorFactory$SignatureUpdateProcessor.processAdd(SignatureUpdateProcessorFactory.java:162)
>        at
> org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
>        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
>        at
>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
>        at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>        at
>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>        at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>        at
>
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>        at
>
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>        at
>
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>        at
>
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>        at
>
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>        at
>
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>        at
>
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>        at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
>        at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
>        at
>
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>        at
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
>        at java.lang.Thread.run(Thread.java:619)
> ---------------
>
>
> Well, this obviously is wrong. Although i am still using the old 1.0
> schema.xml, it still isn't multiValued in the nightly build's schema.xml
> file.
>
> Below Nutch's relevant log lines:
>
>
> ---------------
> 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: content
> dest:
> content
> 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: site dest:
> site
> 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: title dest:
> title
> 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: host dest:
> host
> 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: segment
> dest:
> segment
> 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: boost dest:
> boost
> 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: digest dest:
> digest
> 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: tstamp dest:
> tstamp
> 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: url dest: id
> 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: url dest:
> url
> 2010-05-17 14:25:31,821 INFO  collection.CollectionManager - Instantiating
> CollectionManager
> 2010-05-17 14:25:31,822 INFO  collection.CollectionManager - initializing
> CollectionManager
> 2010-05-17 14:25:31,849 INFO  collection.CollectionManager - file has1
> elements
> 2010-05-17 14:25:32,474 WARN  mapred.LocalJobRunner - job_local_0001
> org.apache.solr.common.SolrException: Bad Request
>
> Bad Request
>
> request: http://127.0.0.1:8080/solr/update?wt=javabin&version=1
>        at
>
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424)
>        at
>
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
>        at
>
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
>        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
>        at
> org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:74)
>        at
>
> org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
>        at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
>        at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> ---------------
>
> Because i still use my old 1.0 configuration files i get the following
> warning
> from Nutch but doesn't look like it's related to the Sorl integration:
>
> ---------------
> 2010-05-17 14:34:11,529 WARN  conf.Configuration - DEPRECATED:
> hadoop-site.xml
> found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use
> core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of
> core-default.xml, mapred-default.xml and hdfs-default.xml respectively
> ---------------
>
> Did i just stumble upon a regression in 1.1dev and should i file a bug or
> could something else spoil the fun?
>
>
>
> [1]: http://lucene.472066.n3.nabble.com/Adding-jpeg-parser-to-nutch-
> td710135.html<http://lucene.472066.n3.nabble.com/Adding-jpeg-parser-to-nutch-%0Atd710135.html>
>
> Cheers,
>
> Markus Jelsma - Technisch Architect - Buyways BV
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr integration in nutch-1.1dev

Markus Jelsma
Hello Julien,


I picked today's build from your URL but the problem persists as reported
earlier. Any more ideas on how to tackle this?


Cheers,

On Monday 17 May 2010 15:50:55 Julien Nioche wrote:

> Hi Markus,
>
> This has been solved last week and is in the trunk of the SVN repository.
> The nightly build has just been fixed after the move to the TLP so the
> version you are using does not have the fix yet. Check
> http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ to get the latest
> build or check it out from SVN
>
> J.
>
> > Hi,
> >
> >
> > I've got a copy of the nutch-2010-05-11_04-34-41 nightly build because i
> > need
> > Tika to parse JPEG images and that would be in 1.1 as i read somewhere
> > [1].
> >
> > First i fetch only a single HTML page and send it to Solr as i did with
> > 1.0 but it fails now. Here's what Solr thinks of the request:
> >
> >
> > ---------------
> > May 17, 2010 2:25:32 PM org.apache.solr.common.SolrException log
> > SEVERE: org.apache.solr.common.SolrException: ERROR: multiple values
> > encountered for non multiValued copy field id: <URL HERE>
> >        at
> > org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:26
> >0) at
> >
> > org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateP
> >rocessorFactory.java:60) at
> >
> > org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateP
> >rocessorFactory.java:94) at
> >
> > org.apache.solr.update.processor.SignatureUpdateProcessorFactory$Signatur
> >eUpdateProcessor.processAdd(SignatureUpdateProcessorFactory.java:162) at
> > org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
> >        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
> >        at
> >
> > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conten
> >tStreamHandlerBase.java:54) at
> >
> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBa
> >se.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> > at
> >
> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.jav
> >a:338) at
> >
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.ja
> >va:241) at
> >
> > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat
> >ionFilterChain.java:235) at
> >
> > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte
> >rChain.java:206) at
> >
> > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve
> >.java:233) at
> >
> > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve
> >.java:191) at
> >
> > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:
> >128) at
> >
> > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:
> >102) at
> >
> > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.j
> >ava:109) at
> > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:28
> >6) at
> > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845
> >) at
> >
> > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(H
> >ttp11Protocol.java:583) at
> > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
> >        at java.lang.Thread.run(Thread.java:619)
> > ---------------
> >
> >
> > Well, this obviously is wrong. Although i am still using the old 1.0
> > schema.xml, it still isn't multiValued in the nightly build's schema.xml
> > file.
> >
> > Below Nutch's relevant log lines:
> >
> >
> > ---------------
> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: content
> > dest:
> > content
> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: site dest:
> > site
> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: title
> > dest: title
> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: host dest:
> > host
> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: segment
> > dest:
> > segment
> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: boost
> > dest: boost
> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: digest
> > dest: digest
> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: tstamp
> > dest: tstamp
> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: url dest:
> > id 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: url
> > dest: url
> > 2010-05-17 14:25:31,821 INFO  collection.CollectionManager -
> > Instantiating CollectionManager
> > 2010-05-17 14:25:31,822 INFO  collection.CollectionManager - initializing
> > CollectionManager
> > 2010-05-17 14:25:31,849 INFO  collection.CollectionManager - file has1
> > elements
> > 2010-05-17 14:25:32,474 WARN  mapred.LocalJobRunner - job_local_0001
> > org.apache.solr.common.SolrException: Bad Request
> >
> > Bad Request
> >
> > request: http://127.0.0.1:8080/solr/update?wt=javabin&version=1
> >        at
> >
> > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt
> >tpSolrServer.java:424) at
> >
> > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt
> >tpSolrServer.java:243) at
> >
> > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstra
> >ctUpdateRequest.java:105) at
> > org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at
> > org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:74)
> >        at
> >
> > org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.
> >java:48) at
> > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
> >        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
> >        at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> > ---------------
> >
> > Because i still use my old 1.0 configuration files i get the following
> > warning
> > from Nutch but doesn't look like it's related to the Sorl integration:
> >
> > ---------------
> > 2010-05-17 14:34:11,529 WARN  conf.Configuration - DEPRECATED:
> > hadoop-site.xml
> > found in the classpath. Usage of hadoop-site.xml is deprecated. Instead
> > use core-site.xml, mapred-site.xml and hdfs-site.xml to override
> > properties of core-default.xml, mapred-default.xml and hdfs-default.xml
> > respectively ---------------
> >
> > Did i just stumble upon a regression in 1.1dev and should i file a bug or
> > could something else spoil the fun?
> >
> >
> >
> > [1]: http://lucene.472066.n3.nabble.com/Adding-jpeg-parser-to-nutch-
> > td710135.html<http://lucene.472066.n3.nabble.com/Adding-jpeg-parser-to-nu
> >tch-%0Atd710135.html>
> >
> > Cheers,
> >
> > Markus Jelsma - Technisch Architect - Buyways BV
> > http://www.linkedin.com/in/markus17
> > 050-8536620 / 06-50258350
>

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply | Threaded
Open this post in threaded view
|

RE: Solr integration in nutch-1.1dev

Brian Tingle
Update the solr schema.xml so that it allows multiple values for that field?

|-----Original Message-----
|From: Markus Jelsma [mailto:[hidden email]]
|Sent: Tuesday, May 25, 2010 4:49 AM
|To: [hidden email]
|Subject: Re: Solr integration in nutch-1.1dev
|
|Hello Julien,
|
|
|I picked today's build from your URL but the problem persists as reported
|earlier. Any more ideas on how to tackle this?
|
|
|Cheers,
|
|On Monday 17 May 2010 15:50:55 Julien Nioche wrote:
|> Hi Markus,
|>
|> This has been solved last week and is in the trunk of the SVN repository.
|> The nightly build has just been fixed after the move to the TLP so the
|> version you are using does not have the fix yet. Check
|> http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ to get the latest
|> build or check it out from SVN
|>
|> J.
|>
|> > Hi,
|> >
|> >
|> > I've got a copy of the nutch-2010-05-11_04-34-41 nightly build because i
|> > need
|> > Tika to parse JPEG images and that would be in 1.1 as i read somewhere
|> > [1].
|> >
|> > First i fetch only a single HTML page and send it to Solr as i did with
|> > 1.0 but it fails now. Here's what Solr thinks of the request:
|> >
|> >
|> > ---------------
|> > May 17, 2010 2:25:32 PM org.apache.solr.common.SolrException log
|> > SEVERE: org.apache.solr.common.SolrException: ERROR: multiple values
|> > encountered for non multiValued copy field id: <URL HERE>
|> >        at
|> >
|org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:26
|> >0) at
|> >
|> >
|org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateP
|> >rocessorFactory.java:60) at
|> >
|> >
|org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateP
|> >rocessorFactory.java:94) at
|> >
|> >
|org.apache.solr.update.processor.SignatureUpdateProcessorFactory$Signatur
|> >eUpdateProcessor.processAdd(SignatureUpdateProcessorFactory.java:162) at
|> > org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
|> >        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
|> >        at
|> >
|> >
|org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conten
|> >tStreamHandlerBase.java:54) at
|> >
|> >
|org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBa
|> >se.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
|> > at
|> >
|> >
|org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.jav
|> >a:338) at
|> >
|> >
|org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.ja
|> >va:241) at
|> >
|> >
|org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat
|> >ionFilterChain.java:235) at
|> >
|> >
|org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte
|> >rChain.java:206) at
|> >
|> >
|org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve
|> >.java:233) at
|> >
|> >
|org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve
|> >.java:191) at
|> >
|> >
|org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:
|> >128) at
|> >
|> >
|org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:
|> >102) at
|> >
|> >
|org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.j
|> >ava:109) at
|> >
|org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:28
|> >6) at
|> >
|org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845
|> >) at
|> >
|> >
|org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(H
|> >ttp11Protocol.java:583) at
|> > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
|> >        at java.lang.Thread.run(Thread.java:619)
|> > ---------------
|> >
|> >
|> > Well, this obviously is wrong. Although i am still using the old 1.0
|> > schema.xml, it still isn't multiValued in the nightly build's schema.xml
|> > file.
|> >
|> > Below Nutch's relevant log lines:
|> >
|> >
|> > ---------------
|> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: content
|> > dest:
|> > content
|> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: site
|dest:
|> > site
|> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: title
|> > dest: title
|> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: host
|dest:
|> > host
|> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: segment
|> > dest:
|> > segment
|> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: boost
|> > dest: boost
|> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: digest
|> > dest: digest
|> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: tstamp
|> > dest: tstamp
|> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: url dest:
|> > id 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: url
|> > dest: url
|> > 2010-05-17 14:25:31,821 INFO  collection.CollectionManager -
|> > Instantiating CollectionManager
|> > 2010-05-17 14:25:31,822 INFO  collection.CollectionManager -
|initializing
|> > CollectionManager
|> > 2010-05-17 14:25:31,849 INFO  collection.CollectionManager - file has1
|> > elements
|> > 2010-05-17 14:25:32,474 WARN  mapred.LocalJobRunner - job_local_0001
|> > org.apache.solr.common.SolrException: Bad Request
|> >
|> > Bad Request
|> >
|> > request: http://127.0.0.1:8080/solr/update?wt=javabin&version=1
|> >        at
|> >
|> >
|org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt
|> >tpSolrServer.java:424) at
|> >
|> >
|org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt
|> >tpSolrServer.java:243) at
|> >
|> >
|org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstra
|> >ctUpdateRequest.java:105) at
|> > org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at
|> > org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:74)
|> >        at
|> >
|> >
|org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.
|> >java:48) at
|> > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
|> >        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
|> >        at
|> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
|> > ---------------
|> >
|> > Because i still use my old 1.0 configuration files i get the following
|> > warning
|> > from Nutch but doesn't look like it's related to the Sorl integration:
|> >
|> > ---------------
|> > 2010-05-17 14:34:11,529 WARN  conf.Configuration - DEPRECATED:
|> > hadoop-site.xml
|> > found in the classpath. Usage of hadoop-site.xml is deprecated. Instead
|> > use core-site.xml, mapred-site.xml and hdfs-site.xml to override
|> > properties of core-default.xml, mapred-default.xml and hdfs-default.xml
|> > respectively ---------------
|> >
|> > Did i just stumble upon a regression in 1.1dev and should i file a bug
|or
|> > could something else spoil the fun?
|> >
|> >
|> >
|> > [1]: http://lucene.472066.n3.nabble.com/Adding-jpeg-parser-to-nutch-
|> > td710135.html<http://lucene.472066.n3.nabble.com/Adding-jpeg-parser-to-
|nu
|> >tch-%0Atd710135.html>
|> >
|> > Cheers,
|> >
|> > Markus Jelsma - Technisch Architect - Buyways BV
|> > http://www.linkedin.com/in/markus17
|> > 050-8536620 / 06-50258350
|>
|
|Markus Jelsma - Technisch Architect - Buyways BV
|http://www.linkedin.com/in/markus17
|050-8536620 / 06-50258350

Reply | Threaded
Open this post in threaded view
|

RE: Solr integration in nutch-1.1dev

Markus Jelsma
Hi Brian,

 

Thanks for your reply. But as can be seen in the stacktrace, it's the ID field of a document. It cannot be set to accommodate multiple values and it wouldn't make sense either. The ID field should contain the URL of the fetched and parsed content. Also, you can clearly see the mapping in the included Nutch logs; it maps the URL field to Solr's ID field as well as mapping the URL to the URL field which doesn't make sense but it'm still the example schema and mapping configuration. Also, i couldn't image if multiple values for a URL field in Nutch itself makes any sense at all, how would a piece of content on a distinct URL have more than one URL?

 

Do you or anybody else have an idea to solve this mystery? I'm also not getting much from Nutch' logs, they don't mention anything else accept that sending the data over to a Solr instance failed.

 

Cheers,
 
-----Original message-----
From: Brian Tingle <[hidden email]>
Sent: Tue 25-05-2010 20:47
To: [hidden email]; Markus Jelsma <[hidden email]>;
Subject: RE: Solr integration in nutch-1.1dev

Update the solr schema.xml so that it allows multiple values for that field?

|-----Original Message-----
|From: Markus Jelsma [mailto:[hidden email]]
|Sent: Tuesday, May 25, 2010 4:49 AM
|To: [hidden email]
|Subject: Re: Solr integration in nutch-1.1dev
|
|Hello Julien,
|
|
|I picked today's build from your URL but the problem persists as reported
|earlier. Any more ideas on how to tackle this?
|
|
|Cheers,
|
|On Monday 17 May 2010 15:50:55 Julien Nioche wrote:
|> Hi Markus,
|>
|> This has been solved last week and is in the trunk of the SVN repository.
|> The nightly build has just been fixed after the move to the TLP so the
|> version you are using does not have the fix yet. Check
|> http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ to get the latest
|> build or check it out from SVN
|>
|> J.
|>
|> > Hi,
|> >
|> >
|> > I've got a copy of the nutch-2010-05-11_04-34-41 nightly build because i
|> > need
|> > Tika to parse JPEG images and that would be in 1.1 as i read somewhere
|> > [1].
|> >
|> > First i fetch only a single HTML page and send it to Solr as i did with
|> > 1.0 but it fails now. Here's what Solr thinks of the request:
|> >
|> >
|> > ---------------
|> > May 17, 2010 2:25:32 PM org.apache.solr.common.SolrException log
|> > SEVERE: org.apache.solr.common.SolrException: ERROR: multiple values
|> > encountered for non multiValued copy field id: <URL HERE>
|> >        at
|> >
|org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:26
|> >0) at
|> >
|> >
|org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateP
|> >rocessorFactory.java:60) at
|> >
|> >
|org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateP
|> >rocessorFactory.java:94) at
|> >
|> >
|org.apache.solr.update.processor.SignatureUpdateProcessorFactory$Signatur
|> >eUpdateProcessor.processAdd(SignatureUpdateProcessorFactory.java:162) at
|> > org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
|> >        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
|> >        at
|> >
|> >
|org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conten
|> >tStreamHandlerBase.java:54) at
|> >
|> >
|org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBa
|> >se.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
|> > at
|> >
|> >
|org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.jav
|> >a:338) at
|> >
|> >
|org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.ja
|> >va:241) at
|> >
|> >
|org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat
|> >ionFilterChain.java:235) at
|> >
|> >
|org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte
|> >rChain.java:206) at
|> >
|> >
|org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve
|> >.java:233) at
|> >
|> >
|org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve
|> >.java:191) at
|> >
|> >
|org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:
|> >128) at
|> >
|> >
|org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:
|> >102) at
|> >
|> >
|org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.j
|> >ava:109) at
|> >
|org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:28
|> >6) at
|> >
|org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845
|> >) at
|> >
|> >
|org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(H
|> >ttp11Protocol.java:583) at
|> > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
|> >        at java.lang.Thread.run(Thread.java:619)
|> > ---------------
|> >
|> >
|> > Well, this obviously is wrong. Although i am still using the old 1.0
|> > schema.xml, it still isn't multiValued in the nightly build's schema.xml
|> > file.
|> >
|> > Below Nutch's relevant log lines:
|> >
|> >
|> > ---------------
|> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: content
|> > dest:
|> > content
|> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: site
|dest:
|> > site
|> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: title
|> > dest: title
|> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: host
|dest:
|> > host
|> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: segment
|> > dest:
|> > segment
|> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: boost
|> > dest: boost
|> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: digest
|> > dest: digest
|> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: tstamp
|> > dest: tstamp
|> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: url dest:
|> > id 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: url
|> > dest: url
|> > 2010-05-17 14:25:31,821 INFO  collection.CollectionManager -
|> > Instantiating CollectionManager
|> > 2010-05-17 14:25:31,822 INFO  collection.CollectionManager -
|initializing
|> > CollectionManager
|> > 2010-05-17 14:25:31,849 INFO  collection.CollectionManager - file has1
|> > elements
|> > 2010-05-17 14:25:32,474 WARN  mapred.LocalJobRunner - job_local_0001
|> > org.apache.solr.common.SolrException: Bad Request
|> >
|> > Bad Request
|> >
|> > request: http://127.0.0.1:8080/solr/update?wt=javabin&version=1
|> >        at
|> >
|> >
|org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt
|> >tpSolrServer.java:424) at
|> >
|> >
|org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt
|> >tpSolrServer.java:243) at
|> >
|> >
|org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstra
|> >ctUpdateRequest.java:105) at
|> > org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at
|> > org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:74)
|> >        at
|> >
|> >
|org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.
|> >java:48) at
|> > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
|> >        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
|> >        at
|> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
|> > ---------------
|> >
|> > Because i still use my old 1.0 configuration files i get the following
|> > warning
|> > from Nutch but doesn't look like it's related to the Sorl integration:
|> >
|> > ---------------
|> > 2010-05-17 14:34:11,529 WARN  conf.Configuration - DEPRECATED:
|> > hadoop-site.xml
|> > found in the classpath. Usage of hadoop-site.xml is deprecated. Instead
|> > use core-site.xml, mapred-site.xml and hdfs-site.xml to override
|> > properties of core-default.xml, mapred-default.xml and hdfs-default.xml
|> > respectively ---------------
|> >
|> > Did i just stumble upon a regression in 1.1dev and should i file a bug
|or
|> > could something else spoil the fun?
|> >
|> >
|> >
|> > [1]: http://lucene.472066.n3.nabble.com/Adding-jpeg-parser-to-nutch-
|> > td710135.html<http://lucene.472066.n3.nabble.com/Adding-jpeg-parser-to-
|nu
|> >tch-%0Atd710135.html>
|> >
|> > Cheers,
|> >
|> > Markus Jelsma - Technisch Architect - Buyways BV
|> > http://www.linkedin.com/in/markus17
|> > 050-8536620 / 06-50258350
|>
|
|Markus Jelsma - Technisch Architect - Buyways BV
|http://www.linkedin.com/in/markus17
|050-8536620 / 06-50258350

Reply | Threaded
Open this post in threaded view
|

RE: Solr integration in nutch-1.1dev

Brian Tingle
I think I had the same problem, I just checked my schema.xml ... it looks like I just commented out the copyField source="url" dest="id"

<!-- copyField source="url" dest="id"/ -->

|-----Original Message-----
|From: Markus Jelsma [mailto:[hidden email]]
|Sent: Tuesday, May 25, 2010 12:04 PM
|To: [hidden email]
|Subject: RE: Solr integration in nutch-1.1dev
|
|Hi Brian,
|
|
|
|Thanks for your reply. But as can be seen in the stacktrace, it's the ID
|field of a document. It cannot be set to accommodate multiple values and it
|wouldn't make sense either. The ID field should contain the URL of the
|fetched and parsed content. Also, you can clearly see the mapping in the
|included Nutch logs; it maps the URL field to Solr's ID field as well as
|mapping the URL to the URL field which doesn't make sense but it'm still the
|example schema and mapping configuration. Also, i couldn't image if multiple
|values for a URL field in Nutch itself makes any sense at all, how would a
|piece of content on a distinct URL have more than one URL?
|
|
|
|Do you or anybody else have an idea to solve this mystery? I'm also not
|getting much from Nutch' logs, they don't mention anything else accept that
|sending the data over to a Solr instance failed.
|
|
|
|Cheers,
|
|-----Original message-----
|From: Brian Tingle <[hidden email]>
|Sent: Tue 25-05-2010 20:47
|To: [hidden email]; Markus Jelsma <[hidden email]>;
|Subject: RE: Solr integration in nutch-1.1dev
|
|Update the solr schema.xml so that it allows multiple values for that field?
|
||-----Original Message-----
||From: Markus Jelsma [mailto:[hidden email]]
||Sent: Tuesday, May 25, 2010 4:49 AM
||To: [hidden email]
||Subject: Re: Solr integration in nutch-1.1dev
||
||Hello Julien,
||
||
||I picked today's build from your URL but the problem persists as reported
||earlier. Any more ideas on how to tackle this?
||
||
||Cheers,
||
||On Monday 17 May 2010 15:50:55 Julien Nioche wrote:
||> Hi Markus,
||>
||> This has been solved last week and is in the trunk of the SVN repository.
||> The nightly build has just been fixed after the move to the TLP so the
||> version you are using does not have the fix yet. Check
||> http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ to get the latest
||> build or check it out from SVN
||>
||> J.
||>
||> > Hi,
||> >
||> >
||> > I've got a copy of the nutch-2010-05-11_04-34-41 nightly build because
|i
||> > need
||> > Tika to parse JPEG images and that would be in 1.1 as i read somewhere
||> > [1].
||> >
||> > First i fetch only a single HTML page and send it to Solr as i did with
||> > 1.0 but it fails now. Here's what Solr thinks of the request:
||> >
||> >
||> > ---------------
||> > May 17, 2010 2:25:32 PM org.apache.solr.common.SolrException log
||> > SEVERE: org.apache.solr.common.SolrException: ERROR: multiple values
||> > encountered for non multiValued copy field id: <URL HERE>
||> >        at
||> >
||org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:26
||> >0) at
||> >
||> >
||org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateP
||> >rocessorFactory.java:60) at
||> >
||> >
||org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateP
||> >rocessorFactory.java:94) at
||> >
||> >
||org.apache.solr.update.processor.SignatureUpdateProcessorFactory$Signatur
||> >eUpdateProcessor.processAdd(SignatureUpdateProcessorFactory.java:162) at
||> > org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
||> >        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
||> >        at
||> >
||> >
||org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conten
||> >tStreamHandlerBase.java:54) at
||> >
||> >
||org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBa
||> >se.java:131) at
|org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
||> > at
||> >
||> >
||org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.jav
||> >a:338) at
||> >
||> >
||org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.ja
||> >va:241) at
||> >
||> >
||org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat
||> >ionFilterChain.java:235) at
||> >
||> >
||org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte
||> >rChain.java:206) at
||> >
||> >
||org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve
||> >.java:233) at
||> >
||> >
||org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve
||> >.java:191) at
||> >
||> >
||org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:
||> >128) at
||> >
||> >
||org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:
||> >102) at
||> >
||> >
||org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.j
||> >ava:109) at
||> >
||org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:28
||> >6) at
||> >
||org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845
||> >) at
||> >
||> >
||org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(H
||> >ttp11Protocol.java:583) at
||> > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
||> >        at java.lang.Thread.run(Thread.java:619)
||> > ---------------
||> >
||> >
||> > Well, this obviously is wrong. Although i am still using the old 1.0
||> > schema.xml, it still isn't multiValued in the nightly build's
|schema.xml
||> > file.
||> >
||> > Below Nutch's relevant log lines:
||> >
||> >
||> > ---------------
||> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: content
||> > dest:
||> > content
||> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: site
||dest:
||> > site
||> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: title
||> > dest: title
||> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: host
||dest:
||> > host
||> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: segment
||> > dest:
||> > segment
||> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: boost
||> > dest: boost
||> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: digest
||> > dest: digest
||> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: tstamp
||> > dest: tstamp
||> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: url
|dest:
||> > id 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: url
||> > dest: url
||> > 2010-05-17 14:25:31,821 INFO  collection.CollectionManager -
||> > Instantiating CollectionManager
||> > 2010-05-17 14:25:31,822 INFO  collection.CollectionManager -
||initializing
||> > CollectionManager
||> > 2010-05-17 14:25:31,849 INFO  collection.CollectionManager - file has1
||> > elements
||> > 2010-05-17 14:25:32,474 WARN  mapred.LocalJobRunner - job_local_0001
||> > org.apache.solr.common.SolrException: Bad Request
||> >
||> > Bad Request
||> >
||> > request: http://127.0.0.1:8080/solr/update?wt=javabin&version=1
||> >        at
||> >
||> >
||org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt
||> >tpSolrServer.java:424) at
||> >
||> >
||org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt
||> >tpSolrServer.java:243) at
||> >
||> >
||org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstra
||> >ctUpdateRequest.java:105) at
||> > org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at
||> > org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:74)
||> >        at
||> >
||> >
||org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.
||> >java:48) at
||> > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
||> >        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
||> >        at
||> >
|org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
||> > ---------------
||> >
||> > Because i still use my old 1.0 configuration files i get the following
||> > warning
||> > from Nutch but doesn't look like it's related to the Sorl integration:
||> >
||> > ---------------
||> > 2010-05-17 14:34:11,529 WARN  conf.Configuration - DEPRECATED:
||> > hadoop-site.xml
||> > found in the classpath. Usage of hadoop-site.xml is deprecated. Instead
||> > use core-site.xml, mapred-site.xml and hdfs-site.xml to override
||> > properties of core-default.xml, mapred-default.xml and hdfs-default.xml
||> > respectively ---------------
||> >
||> > Did i just stumble upon a regression in 1.1dev and should i file a bug
||or
||> > could something else spoil the fun?
||> >
||> >
||> >
||> > [1]: http://lucene.472066.n3.nabble.com/Adding-jpeg-parser-to-nutch-
||> > td710135.html<http://lucene.472066.n3.nabble.com/Adding-jpeg-parser-to-
||nu
||> >tch-%0Atd710135.html>
||> >
||> > Cheers,
||> >
||> > Markus Jelsma - Technisch Architect - Buyways BV
||> > http://www.linkedin.com/in/markus17
||> > 050-8536620 / 06-50258350
||>
||
||Markus Jelsma - Technisch Architect - Buyways BV
||http://www.linkedin.com/in/markus17
||050-8536620 / 06-50258350

Reply | Threaded
Open this post in threaded view
|

RE: Solr integration in nutch-1.1dev

Markus Jelsma
Hi Brian,

 

Again, thanks for the help. I have looked up the schema file from the trunk and 1.0 tag using web svn. It seems you are right, although a cannot confirm as of yet, i will return to work tomorrow. Anyway, the solrindex-mapping configuration in 1.1-dev does not show any weird stuff related to the ID field. It does, however, have a copyField from url to url which makes no sense to me. The suspect is the copyField directive in the schema.xml from the 1.0 tag, it contains a copyField directive from URL to ID which disappeared in trunk some time ago. 1.1-dev, according to svn, introduced the solrindex-mapping configuration which already maps the URL to the ID field and because i have the old schema.xml file in my Solr instance, it would, of course, copyField to an already occupied ID field.

 

I'd bet that's the issue here and if so, perhaps it would be best to investigate all new relevant configuration files the next time instead of assuming the schema.xml file wouldn't change. Back on this tomorrow and thanks for the useful pointer!

 

Cheers,


 
-----Original message-----
From: Brian Tingle <[hidden email]>
Sent: Tue 25-05-2010 21:11
To: [hidden email];
Subject: RE: Solr integration in nutch-1.1dev

I think I had the same problem, I just checked my schema.xml ... it looks like I just commented out the copyField source="url" dest="id"

<!-- copyField source="url" dest="id"/ -->

|-----Original Message-----
|From: Markus Jelsma [mailto:[hidden email]]
|Sent: Tuesday, May 25, 2010 12:04 PM
|To: [hidden email]
|Subject: RE: Solr integration in nutch-1.1dev
|
|Hi Brian,
|
|
|
|Thanks for your reply. But as can be seen in the stacktrace, it's the ID
|field of a document. It cannot be set to accommodate multiple values and it
|wouldn't make sense either. The ID field should contain the URL of the
|fetched and parsed content. Also, you can clearly see the mapping in the
|included Nutch logs; it maps the URL field to Solr's ID field as well as
|mapping the URL to the URL field which doesn't make sense but it'm still the
|example schema and mapping configuration. Also, i couldn't image if multiple
|values for a URL field in Nutch itself makes any sense at all, how would a
|piece of content on a distinct URL have more than one URL?
|
|
|
|Do you or anybody else have an idea to solve this mystery? I'm also not
|getting much from Nutch' logs, they don't mention anything else accept that
|sending the data over to a Solr instance failed.
|
|
|
|Cheers,
|
|-----Original message-----
|From: Brian Tingle <[hidden email]>
|Sent: Tue 25-05-2010 20:47
|To: [hidden email]; Markus Jelsma <[hidden email]>;
|Subject: RE: Solr integration in nutch-1.1dev
|
|Update the solr schema.xml so that it allows multiple values for that field?
|
||-----Original Message-----
||From: Markus Jelsma [mailto:[hidden email]]
||Sent: Tuesday, May 25, 2010 4:49 AM
||To: [hidden email]
||Subject: Re: Solr integration in nutch-1.1dev
||
||Hello Julien,
||
||
||I picked today's build from your URL but the problem persists as reported
||earlier. Any more ideas on how to tackle this?
||
||
||Cheers,
||
||On Monday 17 May 2010 15:50:55 Julien Nioche wrote:
||> Hi Markus,
||>
||> This has been solved last week and is in the trunk of the SVN repository.
||> The nightly build has just been fixed after the move to the TLP so the
||> version you are using does not have the fix yet. Check
||> http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ to get the latest
||> build or check it out from SVN
||>
||> J.
||>
||> > Hi,
||> >
||> >
||> > I've got a copy of the nutch-2010-05-11_04-34-41 nightly build because
|i
||> > need
||> > Tika to parse JPEG images and that would be in 1.1 as i read somewhere
||> > [1].
||> >
||> > First i fetch only a single HTML page and send it to Solr as i did with
||> > 1.0 but it fails now. Here's what Solr thinks of the request:
||> >
||> >
||> > ---------------
||> > May 17, 2010 2:25:32 PM org.apache.solr.common.SolrException log
||> > SEVERE: org.apache.solr.common.SolrException: ERROR: multiple values
||> > encountered for non multiValued copy field id: <URL HERE>
||> >        at
||> >
||org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:26
||> >0) at
||> >
||> >
||org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateP
||> >rocessorFactory.java:60) at
||> >
||> >
||org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateP
||> >rocessorFactory.java:94) at
||> >
||> >
||org.apache.solr.update.processor.SignatureUpdateProcessorFactory$Signatur
||> >eUpdateProcessor.processAdd(SignatureUpdateProcessorFactory.java:162) at
||> > org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
||> >        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
||> >        at
||> >
||> >
||org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conten
||> >tStreamHandlerBase.java:54) at
||> >
||> >
||org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBa
||> >se.java:131) at
|org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
||> > at
||> >
||> >
||org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.jav
||> >a:338) at
||> >
||> >
||org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.ja
||> >va:241) at
||> >
||> >
||org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat
||> >ionFilterChain.java:235) at
||> >
||> >
||org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte
||> >rChain.java:206) at
||> >
||> >
||org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve
||> >.java:233) at
||> >
||> >
||org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve
||> >.java:191) at
||> >
||> >
||org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:
||> >128) at
||> >
||> >
||org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:
||> >102) at
||> >
||> >
||org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.j
||> >ava:109) at
||> >
||org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:28
||> >6) at
||> >
||org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845
||> >) at
||> >
||> >
||org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(H
||> >ttp11Protocol.java:583) at
||> > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
||> >        at java.lang.Thread.run(Thread.java:619)
||> > ---------------
||> >
||> >
||> > Well, this obviously is wrong. Although i am still using the old 1.0
||> > schema.xml, it still isn't multiValued in the nightly build's
|schema.xml
||> > file.
||> >
||> > Below Nutch's relevant log lines:
||> >
||> >
||> > ---------------
||> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: content
||> > dest:
||> > content
||> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: site
||dest:
||> > site
||> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: title
||> > dest: title
||> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: host
||dest:
||> > host
||> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: segment
||> > dest:
||> > segment
||> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: boost
||> > dest: boost
||> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: digest
||> > dest: digest
||> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: tstamp
||> > dest: tstamp
||> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: url
|dest:
||> > id 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: url
||> > dest: url
||> > 2010-05-17 14:25:31,821 INFO  collection.CollectionManager -
||> > Instantiating CollectionManager
||> > 2010-05-17 14:25:31,822 INFO  collection.CollectionManager -
||initializing
||> > CollectionManager
||> > 2010-05-17 14:25:31,849 INFO  collection.CollectionManager - file has1
||> > elements
||> > 2010-05-17 14:25:32,474 WARN  mapred.LocalJobRunner - job_local_0001
||> > org.apache.solr.common.SolrException: Bad Request
||> >
||> > Bad Request
||> >
||> > request: http://127.0.0.1:8080/solr/update?wt=javabin&version=1
||> >        at
||> >
||> >
||org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt
||> >tpSolrServer.java:424) at
||> >
||> >
||org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt
||> >tpSolrServer.java:243) at
||> >
||> >
||org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstra
||> >ctUpdateRequest.java:105) at
||> > org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at
||> > org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:74)
||> >        at
||> >
||> >
||org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.
||> >java:48) at
||> > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
||> >        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
||> >        at
||> >
|org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
||> > ---------------
||> >
||> > Because i still use my old 1.0 configuration files i get the following
||> > warning
||> > from Nutch but doesn't look like it's related to the Sorl integration:
||> >
||> > ---------------
||> > 2010-05-17 14:34:11,529 WARN  conf.Configuration - DEPRECATED:
||> > hadoop-site.xml
||> > found in the classpath. Usage of hadoop-site.xml is deprecated. Instead
||> > use core-site.xml, mapred-site.xml and hdfs-site.xml to override
||> > properties of core-default.xml, mapred-default.xml and hdfs-default.xml
||> > respectively ---------------
||> >
||> > Did i just stumble upon a regression in 1.1dev and should i file a bug
||or
||> > could something else spoil the fun?
||> >
||> >
||> >
||> > [1]: http://lucene.472066.n3.nabble.com/Adding-jpeg-parser-to-nutch-
||> > td710135.html<http://lucene.472066.n3.nabble.com/Adding-jpeg-parser-to-
||nu
||> >tch-%0Atd710135.html>
||> >
||> > Cheers,
||> >
||> > Markus Jelsma - Technisch Architect - Buyways BV
||> > http://www.linkedin.com/in/markus17
||> > 050-8536620 / 06-50258350
||>
||
||Markus Jelsma - Technisch Architect - Buyways BV
||http://www.linkedin.com/in/markus17
||050-8536620 / 06-50258350

Reply | Threaded
Open this post in threaded view
|

Re: Solr integration in nutch-1.1dev

Markus Jelsma
Confirmed! It was the old schema.xml file. Next time i'd better check for
differences :)

On Tuesday 25 May 2010 21:38:45 Markus Jelsma wrote:

> Hi Brian,
>
>  
>
> Again, thanks for the help. I have looked up the schema file from the trunk
>  and 1.0 tag using web svn. It seems you are right, although a cannot
>  confirm as of yet, i will return to work tomorrow. Anyway, the
>  solrindex-mapping configuration in 1.1-dev does not show any weird stuff
>  related to the ID field. It does, however, have a copyField from url to
>  url which makes no sense to me. The suspect is the copyField directive in
>  the schema.xml from the 1.0 tag, it contains a copyField directive from
>  URL to ID which disappeared in trunk some time ago. 1.1-dev, according to
>  svn, introduced the solrindex-mapping configuration which already maps the
>  URL to the ID field and because i have the old schema.xml file in my Solr
>  instance, it would, of course, copyField to an already occupied ID field.
>
>  
>
> I'd bet that's the issue here and if so, perhaps it would be best to
>  investigate all new relevant configuration files the next time instead of
>  assuming the schema.xml file wouldn't change. Back on this tomorrow and
>  thanks for the useful pointer!
>
>  
>
> Cheers,
>
>
>  
> -----Original message-----
> From: Brian Tingle <[hidden email]>
> Sent: Tue 25-05-2010 21:11
> To: [hidden email];
> Subject: RE: Solr integration in nutch-1.1dev
>
> I think I had the same problem, I just checked my schema.xml ... it looks
>  like I just commented out the copyField source="url" dest="id"
>
> <!-- copyField source="url" dest="id"/ -->
>
> |-----Original Message-----
> |From: Markus Jelsma [mailto:[hidden email]]
> |Sent: Tuesday, May 25, 2010 12:04 PM
> |To: [hidden email]
> |Subject: RE: Solr integration in nutch-1.1dev
> |
> |Hi Brian,
> |
> |
> |
> |Thanks for your reply. But as can be seen in the stacktrace, it's the ID
> |field of a document. It cannot be set to accommodate multiple values and
> | it wouldn't make sense either. The ID field should contain the URL of the
> | fetched and parsed content. Also, you can clearly see the mapping in the
> | included Nutch logs; it maps the URL field to Solr's ID field as well as
> | mapping the URL to the URL field which doesn't make sense but it'm still
> | the example schema and mapping configuration. Also, i couldn't image if
> | multiple values for a URL field in Nutch itself makes any sense at all,
> | how would a piece of content on a distinct URL have more than one URL?
> |
> |
> |
> |Do you or anybody else have an idea to solve this mystery? I'm also not
> |getting much from Nutch' logs, they don't mention anything else accept
> | that sending the data over to a Solr instance failed.
> |
> |
> |
> |Cheers,
> |
> |-----Original message-----
> |From: Brian Tingle <[hidden email]>
> |Sent: Tue 25-05-2010 20:47
> |To: [hidden email]; Markus Jelsma <[hidden email]>;
> |Subject: RE: Solr integration in nutch-1.1dev
> |
> |Update the solr schema.xml so that it allows multiple values for that
> | field?
> |
> ||-----Original Message-----
> ||From: Markus Jelsma [mailto:[hidden email]]
> ||Sent: Tuesday, May 25, 2010 4:49 AM
> ||To: [hidden email]
> ||Subject: Re: Solr integration in nutch-1.1dev
> ||
> ||Hello Julien,
> ||
> ||
> ||I picked today's build from your URL but the problem persists as reported
> ||earlier. Any more ideas on how to tackle this?
> ||
> ||
> ||Cheers,
> ||
> ||On Monday 17 May 2010 15:50:55 Julien Nioche wrote:
> ||> Hi Markus,
> ||>
> ||> This has been solved last week and is in the trunk of the SVN
> ||> repository. The nightly build has just been fixed after the move to the
> ||> TLP so the version you are using does not have the fix yet. Check
> ||> http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ to get the
> ||> latest build or check it out from SVN
> ||>
> ||> J.
> ||>
> ||> > Hi,
> ||> >
> ||> >
> ||> > I've got a copy of the nutch-2010-05-11_04-34-41 nightly build
> ||> > because
> |
> |i
> |
> ||> > need
> ||> > Tika to parse JPEG images and that would be in 1.1 as i read
> ||> > somewhere [1].
> ||> >
> ||> > First i fetch only a single HTML page and send it to Solr as i did
> ||> > with 1.0 but it fails now. Here's what Solr thinks of the request:
> ||> >
> ||> >
> ||> > ---------------
> ||> > May 17, 2010 2:25:32 PM org.apache.solr.common.SolrException log
> ||> > SEVERE: org.apache.solr.common.SolrException: ERROR: multiple values
> ||> > encountered for non multiValued copy field id: <URL HERE>
> ||> >        at
> ||
> ||org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:26
> ||
> ||> >0) at
> ||
> ||org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateP
> ||
> ||> >rocessorFactory.java:60) at
> ||
> ||org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateP
> ||
> ||> >rocessorFactory.java:94) at
> ||
> ||org.apache.solr.update.processor.SignatureUpdateProcessorFactory$Signatur
> ||
> ||> >eUpdateProcessor.processAdd(SignatureUpdateProcessorFactory.java:162)
> ||> > at
> ||> > org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
> ||> > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at
> ||
> ||org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conten
> ||
> ||> >tStreamHandlerBase.java:54) at
> ||
> ||org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBa
> ||
> ||> >se.java:131) at
> |
> |org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> |
> ||> > at
> ||
> ||org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.jav
> ||
> ||> >a:338) at
> ||
> ||org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.ja
> ||
> ||> >va:241) at
> ||
> ||org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat
> ||
> ||> >ionFilterChain.java:235) at
> ||
> ||org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte
> ||
> ||> >rChain.java:206) at
> ||
> ||org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve
> ||
> ||> >.java:233) at
> ||
> ||org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve
> ||
> ||> >.java:191) at
> ||
> ||org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:
> ||> >128) at
> ||
> ||org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:
> ||> >102) at
> ||
> ||org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.j
> ||
> ||> >ava:109) at
> ||
> ||org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:28
> ||
> ||> >6) at
> ||
> ||org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845
> ||
> ||> >) at
> ||
> ||org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(H
> ||
> ||> >ttp11Protocol.java:583) at
> ||> > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:44
> ||> >7) at java.lang.Thread.run(Thread.java:619)
> ||> > ---------------
> ||> >
> ||> >
> ||> > Well, this obviously is wrong. Although i am still using the old 1.0
> ||> > schema.xml, it still isn't multiValued in the nightly build's
> |
> |schema.xml
> |
> ||> > file.
> ||> >
> ||> > Below Nutch's relevant log lines:
> ||> >
> ||> >
> ||> > ---------------
> ||> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source:
> ||> > content dest:
> ||> > content
> ||> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: site
> ||
> ||dest:
> ||> > site
> ||> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: title
> ||> > dest: title
> ||> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: host
> ||
> ||dest:
> ||> > host
> ||> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source:
> ||> > segment dest:
> ||> > segment
> ||> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: boost
> ||> > dest: boost
> ||> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: digest
> ||> > dest: digest
> ||> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: tstamp
> ||> > dest: tstamp
> ||> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: url
> |
> |dest:
> ||> > id 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: url
> ||> > dest: url
> ||> > 2010-05-17 14:25:31,821 INFO  collection.CollectionManager -
> ||> > Instantiating CollectionManager
> ||> > 2010-05-17 14:25:31,822 INFO  collection.CollectionManager -
> ||
> ||initializing
> ||
> ||> > CollectionManager
> ||> > 2010-05-17 14:25:31,849 INFO  collection.CollectionManager - file
> ||> > has1 elements
> ||> > 2010-05-17 14:25:32,474 WARN  mapred.LocalJobRunner - job_local_0001
> ||> > org.apache.solr.common.SolrException: Bad Request
> ||> >
> ||> > Bad Request
> ||> >
> ||> > request: http://127.0.0.1:8080/solr/update?wt=javabin&version=1
> ||> >        at
> ||
> ||org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt
> ||
> ||> >tpSolrServer.java:424) at
> ||
> ||org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt
> ||
> ||> >tpSolrServer.java:243) at
> ||
> ||org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstra
> ||
> ||> >ctUpdateRequest.java:105) at
> ||> > org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at
> ||> > org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:74)
> ||> >        at
> ||
> ||org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.
> ||
> ||> >java:48) at
> ||> > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474
> ||> >) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at
> |
> |org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> |
> ||> > ---------------
> ||> >
> ||> > Because i still use my old 1.0 configuration files i get the
> ||> > following warning
> ||> > from Nutch but doesn't look like it's related to the Sorl
> ||> > integration:
> ||> >
> ||> > ---------------
> ||> > 2010-05-17 14:34:11,529 WARN  conf.Configuration - DEPRECATED:
> ||> > hadoop-site.xml
> ||> > found in the classpath. Usage of hadoop-site.xml is deprecated.
> ||> > Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to
> ||> > override properties of core-default.xml, mapred-default.xml and
> ||> > hdfs-default.xml respectively ---------------
> ||> >
> ||> > Did i just stumble upon a regression in 1.1dev and should i file a
> ||> > bug
> ||
> ||or
> ||
> ||> > could something else spoil the fun?
> ||> >
> ||> >
> ||> >
> ||> > [1]: http://lucene.472066.n3.nabble.com/Adding-jpeg-parser-to-nutch-
> ||> > td710135.html<http://lucene.472066.n3.nabble.com/Adding-jpeg-parser-t
> ||> >o-
> ||
> ||nu
> ||
> ||> >tch-%0Atd710135.html>
> ||> >
> ||> > Cheers,
> ||> >
> ||> > Markus Jelsma - Technisch Architect - Buyways BV
> ||> > http://www.linkedin.com/in/markus17
> ||> > 050-8536620 / 06-50258350
> ||
> ||Markus Jelsma - Technisch Architect - Buyways BV
> ||http://www.linkedin.com/in/markus17
> ||050-8536620 / 06-50258350
>

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350