|
Is this possible in DataImportHandler I want the following XML to all collapse into one Author field <AuthorList CompleteYN="Y"> <Author ValidYN="Y"> <LastName>Sørlie</LastName> <ForeName>T</ForeName> <Initials>T</Initials> </Author> <Author ValidYN="Y"> <LastName>Perou</LastName> <ForeName>C M</ForeName> <Initials>CM</Initials> </Author> <Author ValidYN="Y"> <LastName>Tibshirani</LastName> <ForeName>R</ForeName> <Initials>R</Initials> </Author> ... So my XPATH is like |
|
Sorry hit send too soon. Continued the email below
On 4/30/12 4:46 PM, "Twomey, David" <[hidden email]> wrote: > >Is this possible in DataImportHandler > >I want the following XML to all collapse into one mult-valued Author field > ><AuthorList CompleteYN="Y"> > <Author ValidYN="Y"> > <LastName>Sørlie</LastName> > <ForeName>T</ForeName> > <Initials>T</Initials> > </Author> > <Author ValidYN="Y"> > <LastName>Perou</LastName> > <ForeName>C M</ForeName> > <Initials>CM</Initials> > </Author> > <Author ValidYN="Y"> > <LastName>Tibshirani</LastName> > <ForeName>R</ForeName> > <Initials>R</Initials> > </Author> >... > >So my XPATH is like >xpath="/MedlineCitationSet/MedlineCitation/AuthorList/??" >commonField="true" /> > |
|
Answering my own question: I think I can do this by writing a script that
concats the Lastname, Forname and Initials and adding that to xpath = /AuthorList/Author Yes? On 4/30/12 4:49 PM, "Twomey, David" <[hidden email]> wrote: >Sorry hit send too soon. Continued the email below > >On 4/30/12 4:46 PM, "Twomey, David" <[hidden email]> wrote: > >> >>Is this possible in DataImportHandler >> >>I want the following XML to all collapse into one mult-valued Author >>field >> >><AuthorList CompleteYN="Y"> >> <Author ValidYN="Y"> >> <LastName>Sørlie</LastName> >> <ForeName>T</ForeName> >> <Initials>T</Initials> >> </Author> >> <Author ValidYN="Y"> >> <LastName>Perou</LastName> >> <ForeName>C M</ForeName> >> <Initials>CM</Initials> >> </Author> >> <Author ValidYN="Y"> >> <LastName>Tibshirani</LastName> >> <ForeName>R</ForeName> >> <Initials>R</Initials> >> </Author> >>... >> >>So my XPATH is like >>xpath="/MedlineCitationSet/MedlineCitation/AuthorList/??" >>commonField="true" /> > >> > |
|
This post was updated on .
Hi David,
I think you should add this option : flatten=true and then could you try to use this XPath : /MedlineCitationSet/MedlineCitation/AuthorList/Author see here for the description : http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1 I don't think that the commonField option is needed here, I think you should suppress it. Ludovic.
Jouve
France. |
|
Ludovic,
Thanks for your help. I tried your suggestion but it didn't work for Authors. Below are 3 snippets from data-config.xml, the XML file and the XML response from the DB Data-config: <entity name="medlineFiles" processor="XPathEntityProcessor" url="${medlineFileList.fileAbsolutePath}" forEach="/MedlineCitationSet/MedlineCitation" transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer,Log Transformer" logTemplate=" processing ${medlineFileList.fileAbsolutePath}" logLevel="info" flatten="true" stream="true"> <field column="pmid" xpath="/MedlineCitationSet/MedlineCitation/PMID" commonField="true" /> <field column="journal_name" xpath="/MedlineCitationSet/MedlineCitation/Article/Journal/Title" commonField="true" /> <field column="title" xpath="/MedlineCitationSet/MedlineCitation/Article/ArticleTitle" commonField="true" /> <field column="abstract" xpath="/MedlineCitationSet/MedlineCitation/Article/Abstract/AbstractText" commonField="true" /> <field column="author" xpath="/MedlineCitationSet/MedlineCitation/Article/AuthorList/Author" commonField="false" /> <field column="year" xpath="/MedlineCitationSet/MedlineCitation/Article/Journal/JournalIssue/Pub Date/Year" commonField="true" /> </entity> XML Snippet for Author: <AuthorList CompleteYN="Y"> <Author ValidYN="Y"> <LastName>Malathi</LastName> <ForeName>K</ForeName> <Initials>K</Initials> </Author> <Author ValidYN="Y"> <LastName>Xiao</LastName> <ForeName>Y</ForeName> <Initials>Y</Initials> </Author> <Author ValidYN="Y"> <LastName>Mitchell</LastName> <ForeName>A P</ForeName> <Initials>AP</Initials> </Author> </AuthorList> Response from SOLR: <arr name="author"> <str></str> <str></str> <str></str> <str></str> <str></str> <str></str> <str></str> <str></str> <str></str> <str></str> <str></str> <str></str> <str></str> <str></str> </arr> <str name="journal_name">Journal of cancer research and clinical oncology</str> Thanks David On 5/1/12 8:05 AM, "lboutros" <[hidden email]> wrote: >Hi David, > >I think you should add this option : flatten=true > >and the could you try to use this XPath : > >/MedlineCitationSet/MedlineCitation/AuthorList/Author > >see here for the description : > >http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config >.xml-1 > >I don't think the that the commonField option is needed here, I think you >should suppress it. > >Ludovic. > >----- >Jouve >France. >-- >View this message in context: >http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3952812. >html >Sent from the Solr - User mailing list archive at Nabble.com. |
|
Hi David,
what do you want to do with the 'commonField' option ? Is it possible to have the part of the schema for the author field please ? Is the author field stored ? Ludovic.
Jouve
France. |
|
Is what I want even possible with XPathEntityProcessor?
It sort of works now - I didn't realize the "flatten" attribute is an attribute of field instead of entity. BUT it's still not what I would like. The XML looks like below and it's nested within /MedlineCitationSet/MedlineCitation/Article/ <AuthorList CompleteYN="Y"> <Author ValidYN="Y"> <LastName>Starremans</LastName> <ForeName>Patrick G J F</ForeName> <Initials>PG</Initials> </Author><Author ValidYN="Y"> <LastName>van der Kemp</LastName> <ForeName>Annemiete W C M</ForeName> <Initials>AW</Initials> </Author> <Author ValidYN="Y"> <LastName>Knoers</LastName> <ForeName>Nine V A M</ForeName> <Initials>NV</Initials> </Author> <Author ValidYN="Y"> <LastName>van den Heuvel</LastName> <ForeName>Lambertus P W J</ForeName> <Initials>LP</Initials> </Author> </AuthorList> What I would like to see in the index author field is <author>Starremans PG, Van der Kemp AW, etc </author> note "lastname Initials", no forename. When I set Xpath like this <field column="author" xpath="/MedlineCitationSet/MedlineCitation/Article/AuthorList/Author" flatten="true" /> I get this in the index <arr name="author"> <str>Starremans Patrick G J F PG</str> <str>Van der Kemp Annemiete W C M AW</str> . . </arr> note: the forename field is included My author field in the schema.xml is <field name="author" type="textgen" indexed="true" stored="true" multiValued="true" required="false"/> So is this even possible with XPathEntityProcessor? Thanks David On 5/3/12 8:40 AM, "lboutros" <[hidden email]<mailto:[hidden email]>> wrote: Hi David, what do you want to do with the 'commonField' option ? Is it possible to have the part of the schema for the author field please ? Is the author field stored ? Ludovic. ----- Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3959097.html Sent from the Solr - User mailing list archive at Nabble.com. |
|
ok, not that easy :)
I did not test it myself but it seems that you could use an XSL preprocessing with the 'xsl' option in your XPathEntityProcessor : http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1 You could transform the author part as you wish and then import the author field with your actual configuration. Ludovic.
Jouve
France. |
|
The XPath implementation in DIH is very minimal- it is tuned for
speed, not features. The XSL option lets you do everything you could want, with a slower engine. On Thu, May 3, 2012 at 7:30 AM, lboutros <[hidden email]> wrote: > ok, not that easy :) > > I did not test it myself but it seems that you could use an XSL > preprocessing with the 'xsl' option in your XPathEntityProcessor : > > http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1 > > You could transform the author part as you wish and then import the author > field with your actual configuration. > > Ludovic. > > ----- > Jouve > France. > -- > View this message in context: http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3959397.html > Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog [hidden email] |
| Powered by Nabble | Edit this page |
