DIH for TikaEntityProcessor

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

DIH for TikaEntityProcessor

Martin Frank Hansen (MHQ)

Hi,

 

I am trying to read documents from a file system into Solr, using dataimporthandler but keep getting the following errors:

 

Exception while processing: files document : null:org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to java.io.InputStream
         at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:61)
         at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:270)
         at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
         at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)
         at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
         at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
         at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
         at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
         at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
         at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
         at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to java.io.InputStream
         at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:132)
         at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
         ... 9 more

 

 

 

 

Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to java.io.InputStream

         at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:271)

         at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)

         at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)

         at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)

         at java.lang.Thread.run(Thread.java:748)

Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to java.io.InputStream

         at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:417)

         at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)

         at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)

         ... 4 more

Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to java.io.InputStream

         at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:61)

         at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:270)

         at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)

         at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)

         at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)

         ... 6 more

Caused by: java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to java.io.InputStream

         at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:132)

         at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)

         ... 9 more

 

 

My data-config file looks as follows:

 

<dataConfig>

  <dataSource name="bin" type="BinFileDataSource" />

  <document>

      <entity name="files" processor="FileListEntityProcessor" baseDir="D:/CAPTIA/docs/19107" fileName=".*DOC" recursive="true" rootEntity="false" dataSource="bin" onError="skip">

        <field column="fileAbsolutePath" name="id" />

       

        <entity

         name="read_file"

         processor="TikaEntityProcessor"

         url="${files.fileAbsolutePath}"

         >

          <field column="text" name="content" />         

        </entity>      

      </entity>     

  </document>

</dataConfig>

 

And in the Schema I basically have two fields:

 

<field name="Id" type="string" indexed="true" stored="true" required="true" multiValued="false"/>

<field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>

 

Any help is appreciated.

 

 

Martin Frank Hansen

 

Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du KMD’s Privatlivspolitik, der fortæller, hvordan vi behandler oplysninger om dig.

Protection of your personal data is important to us. Here you can read KMD’s Privacy Policy outlining how we process your personal data.

Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi dig slette e-mailen i dit system uden at videresende eller kopiere den. Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning er fri for virus og andre fejl, som kan påvirke computeren eller it-systemet, hvori den modtages og læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget ansvar for tab og skade, som er opstået i forbindelse med at modtage og bruge e-mailen.

Please note that this message may contain confidential information. If you have received this message by mistake, please inform the sender of the mistake by sending a reply, then delete the message from your system without making, distributing or retaining any copies of it. Although we believe that the message and any attachments are free from viruses and other errors that might affect the computer or it-system where it is received and read, the recipient opens the message at his or her own risk. We assume no responsibility for any loss or damage arising from the receipt or use of this message.

Reply | Threaded
Open this post in threaded view
|

SV: DIH for TikaEntityProcessor

Martin Frank Hansen (MHQ)

Hi again,

 

Can anybody help me? Any suggestions to why I am getting the error below?

 

 

Martin Frank Hansen, Senior Data Analytiker

Data, IM & Analytics


Lautrupparken 40-42, DK-2750 Ballerup
E-mail [hidden email]  Web www.kmd.dk
Mobil +4525571418

 

Fra: Martin Frank Hansen (MHQ)
Sendt: 10. oktober 2018 10:15
Til: solr-user <[hidden email]>
Emne: DIH for TikaEntityProcessor

 

Hi,

 

I am trying to read documents from a file system into Solr, using dataimporthandler but keep getting the following errors:

 

Exception while processing: files document : null:org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to java.io.InputStream
         at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:61)
         at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:270)
         at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
         at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)
         at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
         at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
         at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
         at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
         at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
         at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
         at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to java.io.InputStream
         at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:132)
         at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
         ... 9 more

 

 

 

 

Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to java.io.InputStream

         at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:271)

         at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)

         at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)

         at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)

         at java.lang.Thread.run(Thread.java:748)

Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to java.io.InputStream

         at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:417)

         at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)

         at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)

         ... 4 more

Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to java.io.InputStream

         at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:61)

         at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:270)

         at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)

         at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)

         at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)

         ... 6 more

Caused by: java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to java.io.InputStream

         at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:132)

         at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)

         ... 9 more

 

 

My data-config file looks as follows:

 

<dataConfig>

  <dataSource name="bin" type="BinFileDataSource" />

  <document>

      <entity name="files" processor="FileListEntityProcessor" baseDir="D:/CAPTIA/docs/19107" fileName=".*DOC" recursive="true" rootEntity="false" dataSource="bin" onError="skip">

        <field column="fileAbsolutePath" name="id" />

       

        <entity

         name="read_file"

         processor="TikaEntityProcessor"

         url="${files.fileAbsolutePath}"

         >

          <field column="text" name="content" />         

        </entity>      

      </entity>     

  </document>

</dataConfig>

 

And in the Schema I basically have two fields:

 

<field name="Id" type="string" indexed="true" stored="true" required="true" multiValued="false"/>

<field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>

 

Any help is appreciated.

 

 

Martin Frank Hansen

 

Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du KMD’s Privatlivspolitik, der fortæller, hvordan vi behandler oplysninger om dig.

Protection of your personal data is important to us. Here you can read KMD’s Privacy Policy outlining how we process your personal data.

Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi dig slette e-mailen i dit system uden at videresende eller kopiere den. Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning er fri for virus og andre fejl, som kan påvirke computeren eller it-systemet, hvori den modtages og læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget ansvar for tab og skade, som er opstået i forbindelse med at modtage og bruge e-mailen.

Please note that this message may contain confidential information. If you have received this message by mistake, please inform the sender of the mistake by sending a reply, then delete the message from your system without making, distributing or retaining any copies of it. Although we believe that the message and any attachments are free from viruses and other errors that might affect the computer or it-system where it is received and read, the recipient opens the message at his or her own risk. We assume no responsibility for any loss or damage arising from the receipt or use of this message.

Reply | Threaded
Open this post in threaded view
|

Re: DIH for TikaEntityProcessor

Kamuela Lau
Hi,

I was unable to reproduce the error that you got with the information
provided.
Below are the data-config.xml and managed-schema fields I used; the
data-config is mostly the same
(I think that BinFileDataSource doesn't actually require a dataSource, so I
think it's safe to put dataSource="null"):

<dataConfig>
  <dataSource name="bin" type="BinFileDataSource"/>
  <document>
      <entity name="files" processor="FileListEntityProcessor"
baseDir="/path/to/sampleData" fileName=".*doc" recursive="true"
rootEntity="false" dataSource="bin" onError="skip">
        <field column="fileAbsolutePath" name="id"/>
        <entity name="read_file" processor="TikaEntityProcessor"
url="${files.fileAbsolutePath}">
          <field column="text" name="text"/>
        </entity>
      </entity>
  </document>
</dataConfig>

And from the managed schema:
    <field name="id" type="string" indexed="true" stored="true"
required="true" multiValued="false" />
    <!-- docValues are enabled by default for long type so we don't need to
index the version field  -->
    <field name="_version_" type="plong" indexed="false" stored="false"/>
    <field name="_root_" type="string" indexed="true" stored="false"
docValues="false" />
    <field name="text" type="text_general" indexed="true" stored="true"
multiValued="true"/>

When I had field column="text" name="content", the documents were still
indexed, but the text/content was not (as I had no content field in the
schema).
I used the default config, and Solr version 7.5.0; I was able to import the
data just fine (I also tested with .*DOC). Is there any other information
you can provide that can help me reproduce this error?
On Fri, Oct 12, 2018 at 4:11 PM Martin Frank Hansen (MHQ) <[hidden email]>
wrote:

> Hi again,
>
>
>
> Can anybody help me? Any suggestions to why I am getting the error below?
>
>
>
>
>
> *Martin Frank Hansen*, Senior Data Analytiker
>
> Data, IM & Analytics
>
> [image: cid:image001.png@01D383C9.6C129A60]
>
>
> Lautrupparken 40-42, DK-2750 Ballerup
> E-mail [hidden email]  Web www.kmd.dk
> Mobil +4525571418
>
>
>
> *Fra:* Martin Frank Hansen (MHQ)
> *Sendt:* 10. oktober 2018 10:15
> *Til:* solr-user <[hidden email]>
> *Emne:* DIH for TikaEntityProcessor
>
>
>
> Hi,
>
>
>
> I am trying to read documents from a file system into Solr, using
> dataimporthandler but keep getting the following errors:
>
>
>
> Exception while processing: files document : null:org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to java.io.InputStream
>
>          at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:61)
>
>          at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:270)
>
>          at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
>
>          at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)
>
>          at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
>
>          at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
>
>          at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
>
>          at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
>
>          at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
>
>          at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
>
>          at java.lang.Thread.run(Thread.java:748)
>
> Caused by: java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to java.io.InputStream
>
>          at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:132)
>
>          at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
>
>          ... 9 more
>
>
>
>
>
>
>
>
>
> Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to
> java.io.InputStream
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:271)
>
>          at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
>
>          at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
>
>          at
> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
>
>          at java.lang.Thread.run(Thread.java:748)
>
> Caused by: java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to
> java.io.InputStream
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:417)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
>
>          ... 4 more
>
> Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to
> java.io.InputStream
>
>          at
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:61)
>
>          at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:270)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
>
>          ... 6 more
>
> Caused by: java.lang.ClassCastException: java.io.InputStreamReader cannot
> be cast to java.io.InputStream
>
>          at
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:132)
>
>          at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
>
>          ... 9 more
>
>
>
>
>
> My data-config file looks as follows:
>
>
>
> <dataConfig>
>
>   <dataSource name="bin" type="BinFileDataSource" />
>
>   <document>
>
>       <entity name="files" processor="FileListEntityProcessor" baseDir="
> D:/CAPTIA/docs/19107" fileName=".*DOC" recursive="true" rootEntity="false"
> dataSource="bin" onError="skip">
>
>         <field column="fileAbsolutePath" name="id" />
>
>
>
>         <entity
>
>          name="read_file"
>
>          processor="TikaEntityProcessor"
>
>          url="${files.fileAbsolutePath}"
>
>          >
>
>           <field column="text" name="content" />
>
>         </entity>
>
>       </entity>
>
>   </document>
>
> </dataConfig>
>
>
>
> And in the Schema I basically have two fields:
>
>
>
> <field name="Id" type="string" indexed="true" stored="true" required="true
> " multiValued="false"/>
>
> <field name="text" type="text_general" indexed="true" stored="false"
> multiValued="true"/>
>
>
>
> Any help is appreciated.
>
>
>
>
>
> *Martin Frank Hansen*
>
>
>
> Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du KMD’s
> Privatlivspolitik <http://www.kmd.dk/Privatlivspolitik>, der fortæller,
> hvordan vi behandler oplysninger om dig.
>
> Protection of your personal data is important to us. Here you can read KMD’s
> Privacy Policy <http://www.kmd.net/Privacy-Policy> outlining how we
> process your personal data.
>
> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information.
> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst
> informere afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi
> dig slette e-mailen i dit system uden at videresende eller kopiere den.
> Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning er fri
> for virus og andre fejl, som kan påvirke computeren eller it-systemet,
> hvori den modtages og læses, åbnes den på modtagerens eget ansvar. Vi
> påtager os ikke noget ansvar for tab og skade, som er opstået i forbindelse
> med at modtage og bruge e-mailen.
>
> Please note that this message may contain confidential information. If you
> have received this message by mistake, please inform the sender of the
> mistake by sending a reply, then delete the message from your system
> without making, distributing or retaining any copies of it. Although we
> believe that the message and any attachments are free from viruses and
> other errors that might affect the computer or it-system where it is
> received and read, the recipient opens the message at his or her own risk.
> We assume no responsibility for any loss or damage arising from the receipt
> or use of this message.
>
Reply | Threaded
Open this post in threaded view
|

Re: DIH for TikaEntityProcessor

Kamuela Lau
Also, just wondering, have you have tried to specify dataSource="bin" for
read_file?

On Fri, Oct 12, 2018 at 6:38 PM Kamuela Lau <[hidden email]> wrote:

> Hi,
>
> I was unable to reproduce the error that you got with the information
> provided.
> Below are the data-config.xml and managed-schema fields I used; the
> data-config is mostly the same
> (I think that BinFileDataSource doesn't actually require a dataSource, so
> I think it's safe to put dataSource="null"):
>
> <dataConfig>
>   <dataSource name="bin" type="BinFileDataSource"/>
>   <document>
>       <entity name="files" processor="FileListEntityProcessor"
> baseDir="/path/to/sampleData" fileName=".*doc" recursive="true"
> rootEntity="false" dataSource="bin" onError="skip">
>         <field column="fileAbsolutePath" name="id"/>
>         <entity name="read_file" processor="TikaEntityProcessor"
> url="${files.fileAbsolutePath}">
>           <field column="text" name="text"/>
>         </entity>
>       </entity>
>   </document>
> </dataConfig>
>
> And from the managed schema:
>     <field name="id" type="string" indexed="true" stored="true"
> required="true" multiValued="false" />
>     <!-- docValues are enabled by default for long type so we don't need
> to index the version field  -->
>     <field name="_version_" type="plong" indexed="false" stored="false"/>
>     <field name="_root_" type="string" indexed="true" stored="false"
> docValues="false" />
>     <field name="text" type="text_general" indexed="true" stored="true"
> multiValued="true"/>
>
> When I had field column="text" name="content", the documents were still
> indexed, but the text/content was not (as I had no content field in the
> schema).
> I used the default config, and Solr version 7.5.0; I was able to import
> the data just fine (I also tested with .*DOC). Is there any other
> information you can provide that can help me reproduce this error?
>
>
>
>
> On Fri, Oct 12, 2018 at 4:11 PM Martin Frank Hansen (MHQ) <[hidden email]>
> wrote:
>
>> Hi again,
>>
>>
>>
>> Can anybody help me? Any suggestions to why I am getting the error below?
>>
>>
>>
>>
>>
>> *Martin Frank Hansen*, Senior Data Analytiker
>>
>> Data, IM & Analytics
>>
>> [image: cid:image001.png@01D383C9.6C129A60]
>>
>>
>> Lautrupparken 40-42, DK-2750 Ballerup
>> E-mail [hidden email]  Web www.kmd.dk
>> Mobil +4525571418
>>
>>
>>
>> *Fra:* Martin Frank Hansen (MHQ)
>> *Sendt:* 10. oktober 2018 10:15
>> *Til:* solr-user <[hidden email]>
>> *Emne:* DIH for TikaEntityProcessor
>>
>>
>>
>> Hi,
>>
>>
>>
>> I am trying to read documents from a file system into Solr, using
>> dataimporthandler but keep getting the following errors:
>>
>>
>>
>> Exception while processing: files document : null:org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to java.io.InputStream
>>
>>          at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:61)
>>
>>          at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:270)
>>
>>          at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
>>
>>          at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)
>>
>>          at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
>>
>>          at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
>>
>>          at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
>>
>>          at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
>>
>>          at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
>>
>>          at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
>>
>>          at java.lang.Thread.run(Thread.java:748)
>>
>> Caused by: java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to java.io.InputStream
>>
>>          at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:132)
>>
>>          at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
>>
>>          ... 9 more
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Full Import failed:java.lang.RuntimeException:
>> java.lang.RuntimeException:
>> org.apache.solr.handler.dataimport.DataImportHandlerException:
>> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to
>> java.io.InputStream
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:271)
>>
>>          at
>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
>>
>>          at
>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
>>
>>          at
>> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
>>
>>          at java.lang.Thread.run(Thread.java:748)
>>
>> Caused by: java.lang.RuntimeException:
>> org.apache.solr.handler.dataimport.DataImportHandlerException:
>> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to
>> java.io.InputStream
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:417)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
>>
>>          ... 4 more
>>
>> Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
>> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to
>> java.io.InputStream
>>
>>          at
>> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:61)
>>
>>          at
>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:270)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
>>
>>          ... 6 more
>>
>> Caused by: java.lang.ClassCastException: java.io.InputStreamReader cannot
>> be cast to java.io.InputStream
>>
>>          at
>> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:132)
>>
>>          at
>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
>>
>>          ... 9 more
>>
>>
>>
>>
>>
>> My data-config file looks as follows:
>>
>>
>>
>> <dataConfig>
>>
>>   <dataSource name="bin" type="BinFileDataSource" />
>>
>>   <document>
>>
>>       <entity name="files" processor="FileListEntityProcessor" baseDir="
>> D:/CAPTIA/docs/19107" fileName=".*DOC" recursive="true" rootEntity="false
>> " dataSource="bin" onError="skip">
>>
>>         <field column="fileAbsolutePath" name="id" />
>>
>>
>>
>>         <entity
>>
>>          name="read_file"
>>
>>          processor="TikaEntityProcessor"
>>
>>          url="${files.fileAbsolutePath}"
>>
>>          >
>>
>>           <field column="text" name="content" />
>>
>>         </entity>
>>
>>       </entity>
>>
>>   </document>
>>
>> </dataConfig>
>>
>>
>>
>> And in the Schema I basically have two fields:
>>
>>
>>
>> <field name="Id" type="string" indexed="true" stored="true" required="
>> true" multiValued="false"/>
>>
>> <field name="text" type="text_general" indexed="true" stored="false"
>> multiValued="true"/>
>>
>>
>>
>> Any help is appreciated.
>>
>>
>>
>>
>>
>> *Martin Frank Hansen*
>>
>>
>>
>> Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder
>> du KMD’s Privatlivspolitik <http://www.kmd.dk/Privatlivspolitik>, der
>> fortæller, hvordan vi behandler oplysninger om dig.
>>
>> Protection of your personal data is important to us. Here you can read KMD’s
>> Privacy Policy <http://www.kmd.net/Privacy-Policy> outlining how we
>> process your personal data.
>>
>> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information.
>> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst
>> informere afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi
>> dig slette e-mailen i dit system uden at videresende eller kopiere den.
>> Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning er fri
>> for virus og andre fejl, som kan påvirke computeren eller it-systemet,
>> hvori den modtages og læses, åbnes den på modtagerens eget ansvar. Vi
>> påtager os ikke noget ansvar for tab og skade, som er opstået i forbindelse
>> med at modtage og bruge e-mailen.
>>
>> Please note that this message may contain confidential information. If
>> you have received this message by mistake, please inform the sender of the
>> mistake by sending a reply, then delete the message from your system
>> without making, distributing or retaining any copies of it. Although we
>> believe that the message and any attachments are free from viruses and
>> other errors that might affect the computer or it-system where it is
>> received and read, the recipient opens the message at his or her own risk.
>> We assume no responsibility for any loss or damage arising from the receipt
>> or use of this message.
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: DIH for TikaEntityProcessor

Alexandre Rafalovitch
In reply to this post by Martin Frank Hansen (MHQ)
Solr ships with DIH Tika example that seems 90% identical to yours. Can you
get that to run? If it works, then you can focus on the 10% difference.

Perhaps it is explicit dataSource=null in the outer entity? Or maybe
format=text on the inner one.

Regards,
     Alex


On Fri, Oct 12, 2018, 3:11 AM Martin Frank Hansen (MHQ), <[hidden email]> wrote:

> Hi again,
>
>
>
> Can anybody help me? Any suggestions to why I am getting the error below?
>
>
>
>
>
> *Martin Frank Hansen*, Senior Data Analytiker
>
> Data, IM & Analytics
>
> [image: cid:image001.png@01D383C9.6C129A60]
>
>
> Lautrupparken 40-42, DK-2750 Ballerup
> E-mail [hidden email]  Web www.kmd.dk
> Mobil +4525571418
>
>
>
> *Fra:* Martin Frank Hansen (MHQ)
> *Sendt:* 10. oktober 2018 10:15
> *Til:* solr-user <[hidden email]>
> *Emne:* DIH for TikaEntityProcessor
>
>
>
> Hi,
>
>
>
> I am trying to read documents from a file system into Solr, using
> dataimporthandler but keep getting the following errors:
>
>
>
> Exception while processing: files document : null:org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to java.io.InputStream
>
>          at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:61)
>
>          at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:270)
>
>          at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
>
>          at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)
>
>          at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
>
>          at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
>
>          at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
>
>          at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
>
>          at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
>
>          at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
>
>          at java.lang.Thread.run(Thread.java:748)
>
> Caused by: java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to java.io.InputStream
>
>          at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:132)
>
>          at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
>
>          ... 9 more
>
>
>
>
>
>
>
>
>
> Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to
> java.io.InputStream
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:271)
>
>          at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
>
>          at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
>
>          at
> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
>
>          at java.lang.Thread.run(Thread.java:748)
>
> Caused by: java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to
> java.io.InputStream
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:417)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
>
>          ... 4 more
>
> Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to
> java.io.InputStream
>
>          at
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:61)
>
>          at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:270)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
>
>          ... 6 more
>
> Caused by: java.lang.ClassCastException: java.io.InputStreamReader cannot
> be cast to java.io.InputStream
>
>          at
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:132)
>
>          at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
>
>          ... 9 more
>
>
>
>
>
> My data-config file looks as follows:
>
>
>
> <dataConfig>
>
>   <dataSource name="bin" type="BinFileDataSource" />
>
>   <document>
>
>       <entity name="files" processor="FileListEntityProcessor" baseDir="
> D:/CAPTIA/docs/19107" fileName=".*DOC" recursive="true" rootEntity="false"
> dataSource="bin" onError="skip">
>
>         <field column="fileAbsolutePath" name="id" />
>
>
>
>         <entity
>
>          name="read_file"
>
>          processor="TikaEntityProcessor"
>
>          url="${files.fileAbsolutePath}"
>
>          >
>
>           <field column="text" name="content" />
>
>         </entity>
>
>       </entity>
>
>   </document>
>
> </dataConfig>
>
>
>
> And in the Schema I basically have two fields:
>
>
>
> <field name="Id" type="string" indexed="true" stored="true" required="true
> " multiValued="false"/>
>
> <field name="text" type="text_general" indexed="true" stored="false"
> multiValued="true"/>
>
>
>
> Any help is appreciated.
>
>
>
>
>
> *Martin Frank Hansen*
>
>
>
> Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du KMD’s
> Privatlivspolitik <http://www.kmd.dk/Privatlivspolitik>, der fortæller,
> hvordan vi behandler oplysninger om dig.
>
> Protection of your personal data is important to us. Here you can read KMD’s
> Privacy Policy <http://www.kmd.net/Privacy-Policy> outlining how we
> process your personal data.
>
> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information.
> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst
> informere afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi
> dig slette e-mailen i dit system uden at videresende eller kopiere den.
> Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning er fri
> for virus og andre fejl, som kan påvirke computeren eller it-systemet,
> hvori den modtages og læses, åbnes den på modtagerens eget ansvar. Vi
> påtager os ikke noget ansvar for tab og skade, som er opstået i forbindelse
> med at modtage og bruge e-mailen.
>
> Please note that this message may contain confidential information. If you
> have received this message by mistake, please inform the sender of the
> mistake by sending a reply, then delete the message from your system
> without making, distributing or retaining any copies of it. Although we
> believe that the message and any attachments are free from viruses and
> other errors that might affect the computer or it-system where it is
> received and read, the recipient opens the message at his or her own risk.
> We assume no responsibility for any loss or damage arising from the receipt
> or use of this message.
>
Reply | Threaded
Open this post in threaded view
|

SV: DIH for TikaEntityProcessor

Martin Frank Hansen (MHQ)
In reply to this post by Kamuela Lau
Hi Kamuela,

Thanks for your answer.

I still get the same error, so I think I will try with the tech-products example to see if it works there as Alexendre suggest in the mail above.

Martin Frank Hansen,

-----Oprindelig meddelelse-----
Fra: Kamuela Lau <[hidden email]>
Sendt: 12. oktober 2018 11:38
Til: [hidden email]
Emne: Re: DIH for TikaEntityProcessor

Hi,

I was unable to reproduce the error that you got with the information provided.
Below are the data-config.xml and managed-schema fields I used; the data-config is mostly the same (I think that BinFileDataSource doesn't actually require a dataSource, so I think it's safe to put dataSource="null"):

<dataConfig>
  <dataSource name="bin" type="BinFileDataSource"/>
  <document>
      <entity name="files" processor="FileListEntityProcessor"
baseDir="/path/to/sampleData" fileName=".*doc" recursive="true"
rootEntity="false" dataSource="bin" onError="skip">
        <field column="fileAbsolutePath" name="id"/>
        <entity name="read_file" processor="TikaEntityProcessor"
url="${files.fileAbsolutePath}">
          <field column="text" name="text"/>
        </entity>
      </entity>
  </document>
</dataConfig>

And from the managed schema:
    <field name="id" type="string" indexed="true" stored="true"
required="true" multiValued="false" />
    <!-- docValues are enabled by default for long type so we don't need to index the version field  -->
    <field name="_version_" type="plong" indexed="false" stored="false"/>
    <field name="_root_" type="string" indexed="true" stored="false"
docValues="false" />
    <field name="text" type="text_general" indexed="true" stored="true"
multiValued="true"/>

When I had field column="text" name="content", the documents were still indexed, but the text/content was not (as I had no content field in the schema).
I used the default config, and Solr version 7.5.0; I was able to import the data just fine (I also tested with .*DOC). Is there any other information you can provide that can help me reproduce this error?
On Fri, Oct 12, 2018 at 4:11 PM Martin Frank Hansen (MHQ) <[hidden email]>
wrote:

> Hi again,
>
>
>
> Can anybody help me? Any suggestions to why I am getting the error below?
>
>
>
>
>
> *Martin Frank Hansen*, Senior Data Analytiker
>
> Data, IM & Analytics
>
> [image: cid:image001.png@01D383C9.6C129A60]
>
>
> Lautrupparken 40-42, DK-2750 Ballerup
> E-mail [hidden email]  Web www.kmd.dk
> Mobil +4525571418
>
>
>
> *Fra:* Martin Frank Hansen (MHQ)
> *Sendt:* 10. oktober 2018 10:15
> *Til:* solr-user <[hidden email]>
> *Emne:* DIH for TikaEntityProcessor
>
>
>
> Hi,
>
>
>
> I am trying to read documents from a file system into Solr, using
> dataimporthandler but keep getting the following errors:
>
>
>
> Exception while processing: files document :
> null:org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast
> to java.io.InputStream
>
>          at
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndT
> hrow(DataImportHandlerException.java:61)
>
>          at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Enti
> tyProcessorWrapper.java:270)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder
> .java:476)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder
> .java:517)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder
> .java:415)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.ja
> va:330)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:
> 233)
>
>          at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpor
> ter.java:424)
>
>          at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.ja
> va:483)
>
>          at
> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(Data
> Importer.java:466)
>
>          at java.lang.Thread.run(Thread.java:748)
>
> Caused by: java.lang.ClassCastException: java.io.InputStreamReader
> cannot be cast to java.io.InputStream
>
>          at
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEnt
> ityProcessor.java:132)
>
>          at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Enti
> tyProcessorWrapper.java:267)
>
>          ... 9 more
>
>
>
>
>
>
>
>
>
> Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast
> to java.io.InputStream
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:
> 271)
>
>          at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpor
> ter.java:424)
>
>          at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.ja
> va:483)
>
>          at
> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(Data
> Importer.java:466)
>
>          at java.lang.Thread.run(Thread.java:748)
>
> Caused by: java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast
> to java.io.InputStream
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder
> .java:417)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.ja
> va:330)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:
> 233)
>
>          ... 4 more
>
> Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast
> to java.io.InputStream
>
>          at
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndT
> hrow(DataImportHandlerException.java:61)
>
>          at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Enti
> tyProcessorWrapper.java:270)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder
> .java:476)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder
> .java:517)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder
> .java:415)
>
>          ... 6 more
>
> Caused by: java.lang.ClassCastException: java.io.InputStreamReader
> cannot be cast to java.io.InputStream
>
>          at
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEnt
> ityProcessor.java:132)
>
>          at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Enti
> tyProcessorWrapper.java:267)
>
>          ... 9 more
>
>
>
>
>
> My data-config file looks as follows:
>
>
>
> <dataConfig>
>
>   <dataSource name="bin" type="BinFileDataSource" />
>
>   <document>
>
>       <entity name="files" processor="FileListEntityProcessor" baseDir="
> D:/CAPTIA/docs/19107" fileName=".*DOC" recursive="true" rootEntity="false"
> dataSource="bin" onError="skip">
>
>         <field column="fileAbsolutePath" name="id" />
>
>
>
>         <entity
>
>          name="read_file"
>
>          processor="TikaEntityProcessor"
>
>          url="${files.fileAbsolutePath}"
>
>          >
>
>           <field column="text" name="content" />
>
>         </entity>
>
>       </entity>
>
>   </document>
>
> </dataConfig>
>
>
>
> And in the Schema I basically have two fields:
>
>
>
> <field name="Id" type="string" indexed="true" stored="true"
> required="true " multiValued="false"/>
>
> <field name="text" type="text_general" indexed="true" stored="false"
> multiValued="true"/>
>
>
>
> Any help is appreciated.
>
>
>
>
>
> *Martin Frank Hansen*
>
>
>
> Beskyttelse af dine personlige oplysninger er vigtig for os. Her
> finder du KMD’s Privatlivspolitik
> <http://www.kmd.dk/Privatlivspolitik>, der fortæller, hvordan vi behandler oplysninger om dig.
>
> Protection of your personal data is important to us. Here you can read
> KMD’s Privacy Policy <http://www.kmd.net/Privacy-Policy> outlining how
> we process your personal data.
>
> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information.
> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst
> informere afsender om fejlen ved at bruge svarfunktionen. Samtidig
> beder vi dig slette e-mailen i dit system uden at videresende eller kopiere den.
> Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning
> er fri for virus og andre fejl, som kan påvirke computeren eller
> it-systemet, hvori den modtages og læses, åbnes den på modtagerens
> eget ansvar. Vi påtager os ikke noget ansvar for tab og skade, som er
> opstået i forbindelse med at modtage og bruge e-mailen.
>
> Please note that this message may contain confidential information. If
> you have received this message by mistake, please inform the sender of
> the mistake by sending a reply, then delete the message from your
> system without making, distributing or retaining any copies of it.
> Although we believe that the message and any attachments are free from
> viruses and other errors that might affect the computer or it-system
> where it is received and read, the recipient opens the message at his or her own risk.
> We assume no responsibility for any loss or damage arising from the
> receipt or use of this message.
>
Reply | Threaded
Open this post in threaded view
|

SV: DIH for TikaEntityProcessor

Martin Frank Hansen (MHQ)
In reply to this post by Kamuela Lau
You sir just made my day!!!

It worked!!! Thanks a million!


Martin Frank Hansen,

-----Oprindelig meddelelse-----
Fra: Kamuela Lau <[hidden email]>
Sendt: 12. oktober 2018 11:41
Til: [hidden email]
Emne: Re: DIH for TikaEntityProcessor

Also, just wondering, have you have tried to specify dataSource="bin" for read_file?

On Fri, Oct 12, 2018 at 6:38 PM Kamuela Lau <[hidden email]> wrote:

> Hi,
>
> I was unable to reproduce the error that you got with the information
> provided.
> Below are the data-config.xml and managed-schema fields I used; the
> data-config is mostly the same (I think that BinFileDataSource doesn't
> actually require a dataSource, so I think it's safe to put
> dataSource="null"):
>
> <dataConfig>
>   <dataSource name="bin" type="BinFileDataSource"/>
>   <document>
>       <entity name="files" processor="FileListEntityProcessor"
> baseDir="/path/to/sampleData" fileName=".*doc" recursive="true"
> rootEntity="false" dataSource="bin" onError="skip">
>         <field column="fileAbsolutePath" name="id"/>
>         <entity name="read_file" processor="TikaEntityProcessor"
> url="${files.fileAbsolutePath}">
>           <field column="text" name="text"/>
>         </entity>
>       </entity>
>   </document>
> </dataConfig>
>
> And from the managed schema:
>     <field name="id" type="string" indexed="true" stored="true"
> required="true" multiValued="false" />
>     <!-- docValues are enabled by default for long type so we don't
> need to index the version field  -->
>     <field name="_version_" type="plong" indexed="false" stored="false"/>
>     <field name="_root_" type="string" indexed="true" stored="false"
> docValues="false" />
>     <field name="text" type="text_general" indexed="true" stored="true"
> multiValued="true"/>
>
> When I had field column="text" name="content", the documents were
> still indexed, but the text/content was not (as I had no content field
> in the schema).
> I used the default config, and Solr version 7.5.0; I was able to
> import the data just fine (I also tested with .*DOC). Is there any
> other information you can provide that can help me reproduce this error?
>
>
>
>
> On Fri, Oct 12, 2018 at 4:11 PM Martin Frank Hansen (MHQ) <[hidden email]>
> wrote:
>
>> Hi again,
>>
>>
>>
>> Can anybody help me? Any suggestions to why I am getting the error below?
>>
>>
>>
>>
>>
>> *Martin Frank Hansen*, Senior Data Analytiker
>>
>> Data, IM & Analytics
>>
>> [image: cid:image001.png@01D383C9.6C129A60]
>>
>>
>> Lautrupparken 40-42, DK-2750 Ballerup E-mail [hidden email]  Web
>> www.kmd.dk Mobil +4525571418
>>
>>
>>
>> *Fra:* Martin Frank Hansen (MHQ)
>> *Sendt:* 10. oktober 2018 10:15
>> *Til:* solr-user <[hidden email]>
>> *Emne:* DIH for TikaEntityProcessor
>>
>>
>>
>> Hi,
>>
>>
>>
>> I am trying to read documents from a file system into Solr, using
>> dataimporthandler but keep getting the following errors:
>>
>>
>>
>> Exception while processing: files document :
>> null:org.apache.solr.handler.dataimport.DataImportHandlerException:
>> java.lang.ClassCastException: java.io.InputStreamReader cannot be
>> cast to java.io.InputStream
>>
>>          at
>> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAnd
>> Throw(DataImportHandlerException.java:61)
>>
>>          at
>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
>> ityProcessorWrapper.java:270)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
>> r.java:476)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
>> r.java:517)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
>> r.java:415)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
>> ava:330)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
>> :233)
>>
>>          at
>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo
>> rter.java:424)
>>
>>          at
>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
>> ava:483)
>>
>>          at
>> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(Dat
>> aImporter.java:466)
>>
>>          at java.lang.Thread.run(Thread.java:748)
>>
>> Caused by: java.lang.ClassCastException: java.io.InputStreamReader
>> cannot be cast to java.io.InputStream
>>
>>          at
>> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEn
>> tityProcessor.java:132)
>>
>>          at
>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
>> ityProcessorWrapper.java:267)
>>
>>          ... 9 more
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Full Import failed:java.lang.RuntimeException:
>> java.lang.RuntimeException:
>> org.apache.solr.handler.dataimport.DataImportHandlerException:
>> java.lang.ClassCastException: java.io.InputStreamReader cannot be
>> cast to java.io.InputStream
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
>> :271)
>>
>>          at
>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo
>> rter.java:424)
>>
>>          at
>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
>> ava:483)
>>
>>          at
>> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(Dat
>> aImporter.java:466)
>>
>>          at java.lang.Thread.run(Thread.java:748)
>>
>> Caused by: java.lang.RuntimeException:
>> org.apache.solr.handler.dataimport.DataImportHandlerException:
>> java.lang.ClassCastException: java.io.InputStreamReader cannot be
>> cast to java.io.InputStream
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
>> r.java:417)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
>> ava:330)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
>> :233)
>>
>>          ... 4 more
>>
>> Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
>> java.lang.ClassCastException: java.io.InputStreamReader cannot be
>> cast to java.io.InputStream
>>
>>          at
>> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAnd
>> Throw(DataImportHandlerException.java:61)
>>
>>          at
>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
>> ityProcessorWrapper.java:270)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
>> r.java:476)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
>> r.java:517)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
>> r.java:415)
>>
>>          ... 6 more
>>
>> Caused by: java.lang.ClassCastException: java.io.InputStreamReader
>> cannot be cast to java.io.InputStream
>>
>>          at
>> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEn
>> tityProcessor.java:132)
>>
>>          at
>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
>> ityProcessorWrapper.java:267)
>>
>>          ... 9 more
>>
>>
>>
>>
>>
>> My data-config file looks as follows:
>>
>>
>>
>> <dataConfig>
>>
>>   <dataSource name="bin" type="BinFileDataSource" />
>>
>>   <document>
>>
>>       <entity name="files" processor="FileListEntityProcessor" baseDir="
>> D:/CAPTIA/docs/19107" fileName=".*DOC" recursive="true"
>> rootEntity="false " dataSource="bin" onError="skip">
>>
>>         <field column="fileAbsolutePath" name="id" />
>>
>>
>>
>>         <entity
>>
>>          name="read_file"
>>
>>          processor="TikaEntityProcessor"
>>
>>          url="${files.fileAbsolutePath}"
>>
>>          >
>>
>>           <field column="text" name="content" />
>>
>>         </entity>
>>
>>       </entity>
>>
>>   </document>
>>
>> </dataConfig>
>>
>>
>>
>> And in the Schema I basically have two fields:
>>
>>
>>
>> <field name="Id" type="string" indexed="true" stored="true" required="
>> true" multiValued="false"/>
>>
>> <field name="text" type="text_general" indexed="true" stored="false"
>> multiValued="true"/>
>>
>>
>>
>> Any help is appreciated.
>>
>>
>>
>>
>>
>> *Martin Frank Hansen*
>>
>>
>>
>> Beskyttelse af dine personlige oplysninger er vigtig for os. Her
>> finder du KMD’s Privatlivspolitik
>> <http://www.kmd.dk/Privatlivspolitik>, der fortæller, hvordan vi behandler oplysninger om dig.
>>
>> Protection of your personal data is important to us. Here you can
>> read KMD’s Privacy Policy <http://www.kmd.net/Privacy-Policy>
>> outlining how we process your personal data.
>>
>> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information.
>> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst
>> informere afsender om fejlen ved at bruge svarfunktionen. Samtidig
>> beder vi dig slette e-mailen i dit system uden at videresende eller kopiere den.
>> Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning
>> er fri for virus og andre fejl, som kan påvirke computeren eller
>> it-systemet, hvori den modtages og læses, åbnes den på modtagerens
>> eget ansvar. Vi påtager os ikke noget ansvar for tab og skade, som er
>> opstået i forbindelse med at modtage og bruge e-mailen.
>>
>> Please note that this message may contain confidential information.
>> If you have received this message by mistake, please inform the
>> sender of the mistake by sending a reply, then delete the message
>> from your system without making, distributing or retaining any copies
>> of it. Although we believe that the message and any attachments are
>> free from viruses and other errors that might affect the computer or
>> it-system where it is received and read, the recipient opens the message at his or her own risk.
>> We assume no responsibility for any loss or damage arising from the
>> receipt or use of this message.
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: DIH for TikaEntityProcessor

Kamuela Lau
Glad to help :)

2018年10月12日(金) 21:10 Martin Frank Hansen (MHQ) <[hidden email]>:

> You sir just made my day!!!
>
> It worked!!! Thanks a million!
>
>
> Martin Frank Hansen,
>
> -----Oprindelig meddelelse-----
> Fra: Kamuela Lau <[hidden email]>
> Sendt: 12. oktober 2018 11:41
> Til: [hidden email]
> Emne: Re: DIH for TikaEntityProcessor
>
> Also, just wondering, have you have tried to specify dataSource="bin" for
> read_file?
>
> On Fri, Oct 12, 2018 at 6:38 PM Kamuela Lau <[hidden email]> wrote:
>
> > Hi,
> >
> > I was unable to reproduce the error that you got with the information
> > provided.
> > Below are the data-config.xml and managed-schema fields I used; the
> > data-config is mostly the same (I think that BinFileDataSource doesn't
> > actually require a dataSource, so I think it's safe to put
> > dataSource="null"):
> >
> > <dataConfig>
> >   <dataSource name="bin" type="BinFileDataSource"/>
> >   <document>
> >       <entity name="files" processor="FileListEntityProcessor"
> > baseDir="/path/to/sampleData" fileName=".*doc" recursive="true"
> > rootEntity="false" dataSource="bin" onError="skip">
> >         <field column="fileAbsolutePath" name="id"/>
> >         <entity name="read_file" processor="TikaEntityProcessor"
> > url="${files.fileAbsolutePath}">
> >           <field column="text" name="text"/>
> >         </entity>
> >       </entity>
> >   </document>
> > </dataConfig>
> >
> > And from the managed schema:
> >     <field name="id" type="string" indexed="true" stored="true"
> > required="true" multiValued="false" />
> >     <!-- docValues are enabled by default for long type so we don't
> > need to index the version field  -->
> >     <field name="_version_" type="plong" indexed="false" stored="false"/>
> >     <field name="_root_" type="string" indexed="true" stored="false"
> > docValues="false" />
> >     <field name="text" type="text_general" indexed="true" stored="true"
> > multiValued="true"/>
> >
> > When I had field column="text" name="content", the documents were
> > still indexed, but the text/content was not (as I had no content field
> > in the schema).
> > I used the default config, and Solr version 7.5.0; I was able to
> > import the data just fine (I also tested with .*DOC). Is there any
> > other information you can provide that can help me reproduce this error?
> >
> >
> >
> >
> > On Fri, Oct 12, 2018 at 4:11 PM Martin Frank Hansen (MHQ) <[hidden email]>
> > wrote:
> >
> >> Hi again,
> >>
> >>
> >>
> >> Can anybody help me? Any suggestions to why I am getting the error
> below?
> >>
> >>
> >>
> >>
> >>
> >> *Martin Frank Hansen*, Senior Data Analytiker
> >>
> >> Data, IM & Analytics
> >>
> >> [image: cid:image001.png@01D383C9.6C129A60]
> >>
> >>
> >> Lautrupparken 40-42, DK-2750 Ballerup E-mail [hidden email]  Web
> >> www.kmd.dk Mobil +4525571418
> >>
> >>
> >>
> >> *Fra:* Martin Frank Hansen (MHQ)
> >> *Sendt:* 10. oktober 2018 10:15
> >> *Til:* solr-user <[hidden email]>
> >> *Emne:* DIH for TikaEntityProcessor
> >>
> >>
> >>
> >> Hi,
> >>
> >>
> >>
> >> I am trying to read documents from a file system into Solr, using
> >> dataimporthandler but keep getting the following errors:
> >>
> >>
> >>
> >> Exception while processing: files document :
> >> null:org.apache.solr.handler.dataimport.DataImportHandlerException:
> >> java.lang.ClassCastException: java.io.InputStreamReader cannot be
> >> cast to java.io.InputStream
> >>
> >>          at
> >> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAnd
> >> Throw(DataImportHandlerException.java:61)
> >>
> >>          at
> >> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
> >> ityProcessorWrapper.java:270)
> >>
> >>          at
> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
> >> r.java:476)
> >>
> >>          at
> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
> >> r.java:517)
> >>
> >>          at
> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
> >> r.java:415)
> >>
> >>          at
> >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
> >> ava:330)
> >>
> >>          at
> >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
> >> :233)
> >>
> >>          at
> >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo
> >> rter.java:424)
> >>
> >>          at
> >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
> >> ava:483)
> >>
> >>          at
> >> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(Dat
> >> aImporter.java:466)
> >>
> >>          at java.lang.Thread.run(Thread.java:748)
> >>
> >> Caused by: java.lang.ClassCastException: java.io.InputStreamReader
> >> cannot be cast to java.io.InputStream
> >>
> >>          at
> >> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEn
> >> tityProcessor.java:132)
> >>
> >>          at
> >> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
> >> ityProcessorWrapper.java:267)
> >>
> >>          ... 9 more
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> Full Import failed:java.lang.RuntimeException:
> >> java.lang.RuntimeException:
> >> org.apache.solr.handler.dataimport.DataImportHandlerException:
> >> java.lang.ClassCastException: java.io.InputStreamReader cannot be
> >> cast to java.io.InputStream
> >>
> >>          at
> >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
> >> :271)
> >>
> >>          at
> >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo
> >> rter.java:424)
> >>
> >>          at
> >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
> >> ava:483)
> >>
> >>          at
> >> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(Dat
> >> aImporter.java:466)
> >>
> >>          at java.lang.Thread.run(Thread.java:748)
> >>
> >> Caused by: java.lang.RuntimeException:
> >> org.apache.solr.handler.dataimport.DataImportHandlerException:
> >> java.lang.ClassCastException: java.io.InputStreamReader cannot be
> >> cast to java.io.InputStream
> >>
> >>          at
> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
> >> r.java:417)
> >>
> >>          at
> >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
> >> ava:330)
> >>
> >>          at
> >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
> >> :233)
> >>
> >>          ... 4 more
> >>
> >> Caused by:
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> >> java.lang.ClassCastException: java.io.InputStreamReader cannot be
> >> cast to java.io.InputStream
> >>
> >>          at
> >> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAnd
> >> Throw(DataImportHandlerException.java:61)
> >>
> >>          at
> >> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
> >> ityProcessorWrapper.java:270)
> >>
> >>          at
> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
> >> r.java:476)
> >>
> >>          at
> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
> >> r.java:517)
> >>
> >>          at
> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
> >> r.java:415)
> >>
> >>          ... 6 more
> >>
> >> Caused by: java.lang.ClassCastException: java.io.InputStreamReader
> >> cannot be cast to java.io.InputStream
> >>
> >>          at
> >> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEn
> >> tityProcessor.java:132)
> >>
> >>          at
> >> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
> >> ityProcessorWrapper.java:267)
> >>
> >>          ... 9 more
> >>
> >>
> >>
> >>
> >>
> >> My data-config file looks as follows:
> >>
> >>
> >>
> >> <dataConfig>
> >>
> >>   <dataSource name="bin" type="BinFileDataSource" />
> >>
> >>   <document>
> >>
> >>       <entity name="files" processor="FileListEntityProcessor" baseDir="
> >> D:/CAPTIA/docs/19107" fileName=".*DOC" recursive="true"
> >> rootEntity="false " dataSource="bin" onError="skip">
> >>
> >>         <field column="fileAbsolutePath" name="id" />
> >>
> >>
> >>
> >>         <entity
> >>
> >>          name="read_file"
> >>
> >>          processor="TikaEntityProcessor"
> >>
> >>          url="${files.fileAbsolutePath}"
> >>
> >>          >
> >>
> >>           <field column="text" name="content" />
> >>
> >>         </entity>
> >>
> >>       </entity>
> >>
> >>   </document>
> >>
> >> </dataConfig>
> >>
> >>
> >>
> >> And in the Schema I basically have two fields:
> >>
> >>
> >>
> >> <field name="Id" type="string" indexed="true" stored="true" required="
> >> true" multiValued="false"/>
> >>
> >> <field name="text" type="text_general" indexed="true" stored="false"
> >> multiValued="true"/>
> >>
> >>
> >>
> >> Any help is appreciated.
> >>
> >>
> >>
> >>
> >>
> >> *Martin Frank Hansen*
> >>
> >>
> >>
> >> Beskyttelse af dine personlige oplysninger er vigtig for os. Her
> >> finder du KMD’s Privatlivspolitik
> >> <http://www.kmd.dk/Privatlivspolitik>, der fortæller, hvordan vi
> behandler oplysninger om dig.
> >>
> >> Protection of your personal data is important to us. Here you can
> >> read KMD’s Privacy Policy <http://www.kmd.net/Privacy-Policy>
> >> outlining how we process your personal data.
> >>
> >> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig
> information.
> >> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst
> >> informere afsender om fejlen ved at bruge svarfunktionen. Samtidig
> >> beder vi dig slette e-mailen i dit system uden at videresende eller
> kopiere den.
> >> Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning
> >> er fri for virus og andre fejl, som kan påvirke computeren eller
> >> it-systemet, hvori den modtages og læses, åbnes den på modtagerens
> >> eget ansvar. Vi påtager os ikke noget ansvar for tab og skade, som er
> >> opstået i forbindelse med at modtage og bruge e-mailen.
> >>
> >> Please note that this message may contain confidential information.
> >> If you have received this message by mistake, please inform the
> >> sender of the mistake by sending a reply, then delete the message
> >> from your system without making, distributing or retaining any copies
> >> of it. Although we believe that the message and any attachments are
> >> free from viruses and other errors that might affect the computer or
> >> it-system where it is received and read, the recipient opens the
> message at his or her own risk.
> >> We assume no responsibility for any loss or damage arising from the
> >> receipt or use of this message.
> >>
> >
>