Importing a csv file encapsulated by " creates a large copyField field of all fields combined.

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Importing a csv file encapsulated by " creates a large copyField field of all fields combined.

rhys J
I am trying to import a csv file to my solr core.

It looks like this:

"user_id","name","email","client","classification","default_client","disabled","dm_password","manager"
"A2M","Art Morse","[hidden email]","Morse
Moving","Morse","","X","blue0show",""
"ABW","Amy Wiedner","[hidden email]","Pyramid","","","
","shawn",""
"J2P","Joan Padal","[hidden email]","Berger","","","
","skew3cues",""
"ALB","Anna Bachman","[hidden email]","Berger","","","
","wary#scan",""
"B1B","Bridget Baker","[hidden email]","Reliable","","","
","laps,hear",""
"B1K","Bev Klein"," ","Nor-Cal","",""," ","pipe3hour",""
"B1L","Beverly Leonard","[hidden email]","Reliable","","","
","gail6copy",""
"CMD","Christal Davis","[hidden email]","SMMoving","","","
","risk-pair",""
"BEB","Bob Barnum","[hidden email]","Berger","",""," ","mets=pol",""

I have set up the schema via the API, and have all the fields that are
listed on the top line of the csv file.

When I finish the import, it returns no errors. But when I go to look at
the schema, it's created a 2 fields in the managed-schema file:

<field
name="_user_id___name___email___client___classification___default_client___disabled___dm_password___manager_"
type="text_general"/>

and

 <copyField
source="_user_id___name___email___client___classification___default_client___disabled___dm_password___manager_"
dest="_user_id___name___email___client___classification___default_client___disabled___dm_password___manager__str"
maxChars="256"/>
Reply | Threaded
Open this post in threaded view
|

Re: Importing a csv file encapsulated by " creates a large copyField field of all fields combined.

Alexandre Rafalovitch
What command do you use to get the file into Solr? My guess that you
are somehow not hitting the correct handler. Perhaps you are sending
it to extract handler (designed for PDF, MSWord, etc) rather than the
correct CSV handler.

Solr comes with the examples of how to index CSV command.
See for example:
https://github.com/apache/lucene-solr/blob/master/solr/example/films/README.txt#L39
Also reference documentation:
https://lucene.apache.org/solr/guide/8_1/uploading-data-with-index-handlers.html

Regards,
   Alex.

On Mon, 21 Oct 2019 at 13:04, rhys J <[hidden email]> wrote:

>
> I am trying to import a csv file to my solr core.
>
> It looks like this:
>
> "user_id","name","email","client","classification","default_client","disabled","dm_password","manager"
> "A2M","Art Morse","[hidden email]","Morse
> Moving","Morse","","X","blue0show",""
> "ABW","Amy Wiedner","[hidden email]","Pyramid","","","
> ","shawn",""
> "J2P","Joan Padal","[hidden email]","Berger","","","
> ","skew3cues",""
> "ALB","Anna Bachman","[hidden email]","Berger","","","
> ","wary#scan",""
> "B1B","Bridget Baker","[hidden email]","Reliable","","","
> ","laps,hear",""
> "B1K","Bev Klein"," ","Nor-Cal","",""," ","pipe3hour",""
> "B1L","Beverly Leonard","[hidden email]","Reliable","","","
> ","gail6copy",""
> "CMD","Christal Davis","[hidden email]","SMMoving","","","
> ","risk-pair",""
> "BEB","Bob Barnum","[hidden email]","Berger","",""," ","mets=pol",""
>
> I have set up the schema via the API, and have all the fields that are
> listed on the top line of the csv file.
>
> When I finish the import, it returns no errors. But when I go to look at
> the schema, it's created a 2 fields in the managed-schema file:
>
> <field
> name="_user_id___name___email___client___classification___default_client___disabled___dm_password___manager_"
> type="text_general"/>
>
> and
>
>  <copyField
> source="_user_id___name___email___client___classification___default_client___disabled___dm_password___manager_"
> dest="_user_id___name___email___client___classification___default_client___disabled___dm_password___manager__str"
> maxChars="256"/>
Reply | Threaded
Open this post in threaded view
|

Re: Importing a csv file encapsulated by " creates a large copyField field of all fields combined.

rhys J
I am using this command:

curl '
http://localhost:8983/solr/users/update/csv?commit=true&separator=%09&encapsulator=%20&escape=\&stream.file=/tmp/users.csv
'

On Mon, Oct 21, 2019 at 1:22 PM Alexandre Rafalovitch <[hidden email]>
wrote:

> What command do you use to get the file into Solr? My guess that you
> are somehow not hitting the correct handler. Perhaps you are sending
> it to extract handler (designed for PDF, MSWord, etc) rather than the
> correct CSV handler.
>
> Solr comes with the examples of how to index CSV command.
> See for example:
>
> https://github.com/apache/lucene-solr/blob/master/solr/example/films/README.txt#L39
> Also reference documentation:
>
> https://lucene.apache.org/solr/guide/8_1/uploading-data-with-index-handlers.html
>
> Regards,
>    Alex.
>
> On Mon, 21 Oct 2019 at 13:04, rhys J <[hidden email]> wrote:
> >
> > I am trying to import a csv file to my solr core.
> >
> > It looks like this:
> >
> >
> "user_id","name","email","client","classification","default_client","disabled","dm_password","manager"
> > "A2M","Art Morse","[hidden email]","Morse
> > Moving","Morse","","X","blue0show",""
> > "ABW","Amy Wiedner","[hidden email]
> ","Pyramid","","","
> > ","shawn",""
> > "J2P","Joan Padal","[hidden email]","Berger","","","
> > ","skew3cues",""
> > "ALB","Anna Bachman","[hidden email]","Berger","","","
> > ","wary#scan",""
> > "B1B","Bridget Baker","[hidden email]","Reliable","","","
> > ","laps,hear",""
> > "B1K","Bev Klein"," ","Nor-Cal","",""," ","pipe3hour",""
> > "B1L","Beverly Leonard","[hidden email]","Reliable","","","
> > ","gail6copy",""
> > "CMD","Christal Davis","[hidden email]","SMMoving","","","
> > ","risk-pair",""
> > "BEB","Bob Barnum","[hidden email]","Berger","",""," ","mets=pol",""
> >
> > I have set up the schema via the API, and have all the fields that are
> > listed on the top line of the csv file.
> >
> > When I finish the import, it returns no errors. But when I go to look at
> > the schema, it's created a 2 fields in the managed-schema file:
> >
> > <field
> >
> name="_user_id___name___email___client___classification___default_client___disabled___dm_password___manager_"
> > type="text_general"/>
> >
> > and
> >
> >  <copyField
> >
> source="_user_id___name___email___client___classification___default_client___disabled___dm_password___manager_"
> >
> dest="_user_id___name___email___client___classification___default_client___disabled___dm_password___manager__str"
> > maxChars="256"/>
>
Reply | Threaded
Open this post in threaded view
|

Re: Importing a csv file encapsulated by " creates a large copyField field of all fields combined.

Shawn Heisey-2
On 10/21/2019 11:24 AM, rhys J wrote:
> I am using this command:
>
> curl '
> http://localhost:8983/solr/users/update/csv?commit=true&separator=%09&encapsulator=%20&escape=\&stream.file=/tmp/users.csv
> '

The sequence %20 is a URL encoding of a space. If you intend the
encapsulator character to be a double quote, you should be using %22
instead.

The sequence %09 is a tab character, sometimes known as Ctrl-I.  Your
CSV looks like it's using a comma, which is %2C instead.

The defaults for the CSV import should be a double quote for
encapsulation and a comma for a separator, with \ as the escape
character ... so perhaps you should just leave those parameters off of
the URL.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Importing a csv file encapsulated by " creates a large copyField field of all fields combined.

rhys J
Thank you, that worked perfectly. I can't believe I didn't notice the
separator was a tab.