Data Impor Handlert

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Data Impor Handlert

DINSD | SPAutores
Hi

Based on this document there are two ways to index document on the Solr platform, https://lucidworks.com/post/indexing-with-solrj/

Quote:
"Two popular methods of indexing existing data are the Data Import Handler (DIH) and Tika (Solr Cell)/ExtractingRequestHandler"

Now that DHI has been discontinued, only supported by a community package, are there any other options?

Best regards
Rui Pimentel
Assinatura SPA
Rui Pimentel


DINSD - Departamento de Informática / SPA Digital
      
Please consider the environment before printing this email

Esta mensagem electrónica, incluindo qualquer dos seus anexos, contém informação PRIVADA, CONFIDENCIAL e de DIVULGAÇÃO PROIBIDA,e destina-se unicamente à pessoa e endereço electrónico acima indicados.
Se não for o destinatário desta mensagem, agradecemos que a elimine e nos comunique de imediato através do telefone  +351 21 359 44 00 ou por email para: [hidden email]

This electronic mail transmission including any attachment hereof, contains information that is PRIVATE, CONFIDENTIAL and PROTECTED FROM DISCLOSURE, and it is only for the use of the person and the e-mail address above indicated. If you have received this electronic mail transmission in error, please destroy it and notify us immediately through the telephone number  +351 21 359 44 00 or at the e-mail address:  [hidden email]
 
Assinatura SPA
Reply | Threaded
Open this post in threaded view
|

Re: Data Impor Handlert

Alexandre Rafalovitch
Solr now has package managers and DIH is one of the packages to reflect the fact that its development cycle is not locked to Solr's and to reduce core download. Tika may be heading the same way, as running Tika inside the Solr process could cause memory issues with complex PDFs.

In terms of other ways of pre-process and load data into Solr, there are things like:

Other commercial solutions also exist, such as StreamSets:

And, of course, you can always roll your own with SolrJ.

Regards,
  Alex.



On Thu, 15 Oct 2020 at 10:08, DINSD | SPAutores <[hidden email]> wrote:
Hi

Based on this document there are two ways to index document on the Solr platform, https://lucidworks.com/post/indexing-with-solrj/

Quote:
"Two popular methods of indexing existing data are the Data Import Handler (DIH) and Tika (Solr Cell)/ExtractingRequestHandler"

Now that DHI has been discontinued, only supported by a community package, are there any other options?

Best regards
Rui Pimentel

Rui Pimentel


DINSD - Departamento de Informática / SPA Digital
      
Please consider the environment before printing this email

Esta mensagem electrónica, incluindo qualquer dos seus anexos, contém informação PRIVADA, CONFIDENCIAL e de DIVULGAÇÃO PROIBIDA,e destina-se unicamente à pessoa e endereço electrónico acima indicados.
Se não for o destinatário desta mensagem, agradecemos que a elimine e nos comunique de imediato através do telefone  +351 21 359 44 00 ou por email para: [hidden email]

This electronic mail transmission including any attachment hereof, contains information that is PRIVATE, CONFIDENTIAL and PROTECTED FROM DISCLOSURE, and it is only for the use of the person and the e-mail address above indicated. If you have received this electronic mail transmission in error, please destroy it and notify us immediately through the telephone number  +351 21 359 44 00 or at the e-mail address:  [hidden email]
 
Reply | Threaded
Open this post in threaded view
|

Re: Data Impor Handlert

DINSD | SPAutores

Hi Alexander,

Very useful the provided information.

Assinatura SPA Many thanks and best regards
Rui Pimentel


DINSD - Departamento de Informática / SPA Digital
      
Please consider the environment before printing this email

Esta mensagem electrónica, incluindo qualquer dos seus anexos, contém informação PRIVADA, CONFIDENCIAL e de DIVULGAÇÃO PROIBIDA,e destina-se unicamente à pessoa e endereço electrónico acima indicados.
Se não for o destinatário desta mensagem, agradecemos que a elimine e nos comunique de imediato através do telefone  +351 21 359 44 00 ou por email para: [hidden email]

This electronic mail transmission including any attachment hereof, contains information that is PRIVATE, CONFIDENTIAL and PROTECTED FROM DISCLOSURE, and it is only for the use of the person and the e-mail address above indicated. If you have received this electronic mail transmission in error, please destroy it and notify us immediately through the telephone number  +351 21 359 44 00 or at the e-mail address:  [hidden email]
 
Assinatura SPA
On 2020-10-15 15:33, Alexandre Rafalovitch wrote:
Solr now has package managers and DIH is one of the packages to reflect the fact that its development cycle is not locked to Solr's and to reduce core download. Tika may be heading the same way, as running Tika inside the Solr process could cause memory issues with complex PDFs.

In terms of other ways of pre-process and load data into Solr, there are things like:

Other commercial solutions also exist, such as StreamSets:

And, of course, you can always roll your own with SolrJ.

Regards,
  Alex.



On Thu, 15 Oct 2020 at 10:08, DINSD | SPAutores [hidden email] wrote:
Hi

Based on this document there are two ways to index document on the Solr platform, https://lucidworks.com/post/indexing-with-solrj/

Quote:
"Two popular methods of indexing existing data are the Data Import Handler (DIH) and Tika (Solr Cell)/ExtractingRequestHandler"

Now that DHI has been discontinued, only supported by a community package, are there any other options?

Best regards
Rui Pimentel

Rui Pimentel


DINSD - Departamento de Informática / SPA Digital
      
Please consider the environment before printing this email

Esta mensagem electrónica, incluindo qualquer dos seus anexos, contém informação PRIVADA, CONFIDENCIAL e de DIVULGAÇÃO PROIBIDA,e destina-se unicamente à pessoa e endereço electrónico acima indicados.
Se não for o destinatário desta mensagem, agradecemos que a elimine e nos comunique de imediato através do telefone  +351 21 359 44 00 ou por email para: [hidden email]

This electronic mail transmission including any attachment hereof, contains information that is PRIVATE, CONFIDENTIAL and PROTECTED FROM DISCLOSURE, and it is only for the use of the person and the e-mail address above indicated. If you have received this electronic mail transmission in error, please destroy it and notify us immediately through the telephone number  +351 21 359 44 00 or at the e-mail address:  [hidden email]