Index relational XML with DataImportHandler

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Index relational XML with DataImportHandler

Tobias Berg
Hi,

I'm trying to index a set of stores and their articles. I have two
XML-files, one that contains the data of the stores and one that contains
articles for each store. I'm using DIH with XPathEntityProcessor to process
the file containing the store, and using a nested entity I try to get all
articles that belongs to the specific store. The problem I encounter is
that every store gets the same articles.

For testing purposes I've stripped down the xml-files to only include id:s
for testing purposes. The store file (StoresTest.xml) looks like this:

<?xml version="1.0" encoding="utf-8"?>
<Stores><Store><Id>0102</Id></Store><Store><Id>0104</Id></Store></Stores>

The Store-Articles relations file (StoreArticlesTest.xml) looks like this:
<?xml version="1.0" encoding="utf-8"?><StoreArticles><Store
StoreId="0102"><ArticleId>18004</ArticleId></Store><Store
StoreId="0104"><ArticleId>17004</ArticleId><ArticleId>10004</ArticleId></Store></StoreArticles>

And my dih-config file looks like this:

<dataConfig>
        <dataSource type="FileDataSource" encoding="UTF-8" />
        <document>
   <entity name="store"
processor="XPathEntityProcessor"
stream="true"
forEach="/Stores/Store"
url="../../../data/StoresTest.xml"
transformer="TemplateTransformer"
>
<field column="id"  xpath="/Stores/Store/Id" />
<entity name="storearticle"
processor="XPathEntityProcessor"
stream="true"
forEach="/StoreArticles"
url="../../../data/StoreArticlesTest.xml"
transformer="LogTransformer"
logTemplate="Processing ${store.id}" logLevel="info"
rootEntity="true">
 <field column="store_articles_txt" xpath="/StoreArticles/Store[@StoreId='${
store.id}']/ArticleId" />
</entity>
   </entity>
</document>
</dataConfig>

The result I get in Solr is this:

<response>
<lst name="responseHeader">...</lst>
<result name="response" numFound="2" start="0">
<doc>
<str name="id">0102</str>
<arr name="store_articles_txt">
<str>18004</str>
</arr>
</doc>
<doc>
<str name="id">0104</str>
<arr name="store_articles_txt">
<str>18004</str>
</arr>
</doc>
</result>
</response>

As you see, both stores gets the article for the first store. I would have
expected the second store to have two articles: 17004 and 10004.

In the log messages printed using LogTransformer I see that each
store.idis processed but somehow it only picks up the articles for the
first store.

Any ideas?

/Tobias Berg
Reply | Threaded
Open this post in threaded view
|

Re: Index relational XML with DataImportHandler

iorixxx
> I'm trying to index a set of stores and their articles. I
> have two
> XML-files, one that contains the data of the stores and one
> that contains
> articles for each store. I'm using DIH with
> XPathEntityProcessor to process
> the file containing the store, and using a nested entity I
> try to get all
> articles that belongs to the specific store. The problem I
> encounter is
> that every store gets the same articles.
>
> For testing purposes I've stripped down the xml-files to
> only include id:s
> for testing purposes. The store file (StoresTest.xml) looks
> like this:
>
> <?xml version="1.0" encoding="utf-8"?>
> <Stores><Store><Id>0102</Id></Store><Store><Id>0104</Id></Store></Stores>
>
> The Store-Articles relations file (StoreArticlesTest.xml)
> looks like this:
> <?xml version="1.0"
> encoding="utf-8"?><StoreArticles><Store
> StoreId="0102"><ArticleId>18004</ArticleId></Store><Store
> StoreId="0104"><ArticleId>17004</ArticleId><ArticleId>10004</ArticleId></Store></StoreArticles>
>
> And my dih-config file looks like this:
>
> <dataConfig>
>         <dataSource
> type="FileDataSource" encoding="UTF-8" />
>         <document>
>    <entity name="store"
> processor="XPathEntityProcessor"
> stream="true"
> forEach="/Stores/Store"
> url="../../../data/StoresTest.xml"
> transformer="TemplateTransformer"
> >
> <field column="id"  xpath="/Stores/Store/Id" />
> <entity name="storearticle"
> processor="XPathEntityProcessor"
> stream="true"
> forEach="/StoreArticles"
> url="../../../data/StoreArticlesTest.xml"
> transformer="LogTransformer"
> logTemplate="Processing ${store.id}" logLevel="info"
> rootEntity="true">
>  <field column="store_articles_txt"
> xpath="/StoreArticles/Store[@StoreId='${
> store.id}']/ArticleId" />
> </entity>
>    </entity>
> </document>
> </dataConfig>
>
> The result I get in Solr is this:
>
> <response>
> <lst name="responseHeader">...</lst>
> <result name="response" numFound="2" start="0">
> <doc>
> <str name="id">0102</str>
> <arr name="store_articles_txt">
> <str>18004</str>
> </arr>
> </doc>
> <doc>
> <str name="id">0104</str>
> <arr name="store_articles_txt">
> <str>18004</str>
> </arr>
> </doc>
> </result>
> </response>
>
> As you see, both stores gets the article for the first
> store. I would have
> expected the second store to have two articles: 17004 and
> 10004.
>
> In the log messages printed using LogTransformer I see that
> each
> store.idis processed but somehow it only picks up the
> articles for the
> first store.
>
> Any ideas?

What happens when you set <entity name="store" rootEntity="false" ?
What is your uniqueKey in schema.xml?
Reply | Threaded
Open this post in threaded view
|

Re: Index relational XML with DataImportHandler

Tobias Berg
My uniqeKey in scema.xml is id. I've tried adding pk="id" to the store
entity but it makes no difference.

The result is the same if I set rootEntity="false" on the store entity.
However I added debug and verbose output to the dataimporthandler and I
noticed a slight change in how the nested queries are executed. Below is
with rootEntity="true":

<response>
<lst name="responseHeader">...</lst>
<lst name="initArgs">...</lst>
<str name="command">full-import</str>
<str name="mode">debug</str>
<arr name="documents"/>
<lst name="verbose-output">
<lst name="entity:store">
<lst name="document#1">
<str name="query">../../../data/StoresTest.xml</str>
<str name="time-taken">0:0:0.1</str>
<str>----------- row #1-------------</str>
<str name="id">0102</str>
<str name="$forEach">/Stores/Store</str>
<str>---------------------------------------------</str>
<lst name="entity:storearticle">
<str name="query">../../../data/StoreArticlesTest.xml</str>
<str name="query">../../../data/StoreArticlesTest.xml</str>
<str name="time-taken">0:0:0.1</str>
<str name="time-taken">0:0:0.1</str>
<str>----------- row #1-------------</str>
<arr name="store_articles_txt">
<str>18004</str>
</arr>
<str name="$forEach">/StoreArticles</str>
<str>---------------------------------------------</str>
<lst name="transformer:LogTransformer">
<str>---------------------------------------------</str>
<arr name="store_articles_txt">
<str>18004</str>
</arr>
<str name="$forEach">/StoreArticles</str>
<str>---------------------------------------------</str>
</lst>
</lst>
</lst>
<lst name="document#2">
<str>----------- row #1-------------</str>
<str name="id">0104</str>
<str name="$forEach">/Stores/Store</str>
<str>---------------------------------------------</str>
<lst name="entity:storearticle">
<str name="query">../../../data/StoreArticlesTest.xml</str>
<str name="query">../../../data/StoreArticlesTest.xml</str>
<str name="query">../../../data/StoreArticlesTest.xml</str>
<str name="query">../../../data/StoreArticlesTest.xml</str>
<str name="time-taken">0:0:0.0</str>
<str name="time-taken">0:0:0.0</str>
<str name="time-taken">0:0:0.0</str>
<str name="time-taken">0:0:0.0</str>
<str>----------- row #1-------------</str>
<arr name="store_articles_txt">
<str>18004</str>
</arr>
<str name="$forEach">/StoreArticles</str>
<str>---------------------------------------------</str>
<lst name="transformer:LogTransformer">
<str>---------------------------------------------</str>
<arr name="store_articles_txt">
<str>18004</str>
</arr>
<str name="$forEach">/StoreArticles</str>
<str>---------------------------------------------</str>
</lst>
</lst>
</lst>
<lst name="document#3"/>
</lst>
</lst>
<str name="status">idle</str>
<str name="importResponse">Configuration Re-loaded sucessfully</str>
<lst name="statusMessages">...</lst>
<str name="WARNING">...</str>
</response>

And with rootEntity="false":

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">40</int>
</lst>
<lst name="initArgs">
<lst name="defaults">
<str name="config">import-test-articles-config.xml</str>
</lst>
</lst>
<str name="command">full-import</str>
<str name="mode">debug</str>
<arr name="documents"/>
<lst name="verbose-output">
<lst name="entity:store">
<str name="query">../../../data/StoresTest.xml</str>
<str name="query">../../../data/StoresTest.xml</str>
<str name="time-taken">0:0:0.10</str>
<str name="time-taken">0:0:0.10</str>
<str>----------- row #1-------------</str>
<str name="id">0102</str>
<str name="$forEach">/Stores/Store</str>
<str>---------------------------------------------</str>
<lst name="entity:storearticle">
<lst name="document#1">
<str name="query">../../../data/StoreArticlesTest.xml</str>
<str name="time-taken">0:0:0.0</str>
<str>----------- row #1-------------</str>
<arr name="store_articles_txt">
<str>18004</str>
</arr>
<str name="$forEach">/StoreArticles</str>
<str>---------------------------------------------</str>
<lst name="transformer:LogTransformer">
<str>---------------------------------------------</str>
<arr name="store_articles_txt">
<str>18004</str>
</arr>
<str name="$forEach">/StoreArticles</str>
<str>---------------------------------------------</str>
</lst>
</lst>
<lst name="document#2"/>
</lst>
<str>----------- row #2-------------</str>
<str name="id">0104</str>
<str name="$forEach">/Stores/Store</str>
<str>---------------------------------------------</str>
<lst name="entity:storearticle">
<lst name="document#2">
<str name="query">../../../data/StoreArticlesTest.xml</str>
<str name="query">../../../data/StoreArticlesTest.xml</str>
<str name="time-taken">0:0:0.0</str>
<str name="time-taken">0:0:0.0</str>
<str>----------- row #1-------------</str>
<arr name="store_articles_txt">
<str>18004</str>
</arr>
<str name="$forEach">/StoreArticles</str>
<str>---------------------------------------------</str>
<lst name="transformer:LogTransformer">
<str>---------------------------------------------</str>
<arr name="store_articles_txt">
<str>18004</str>
</arr>
<str name="$forEach">/StoreArticles</str>
<str>---------------------------------------------</str>
</lst>
</lst>
<lst name="document#3"/>
</lst>
</lst>
</lst>
<str name="status">idle</str>
<str name="importResponse">Configuration Re-loaded sucessfully</str>
<lst name="statusMessages">...</lst>
<str name="WARNING">...</str>
</response>

I'm not very familiar with the verbose output but it seems like with
rootEntity="true", one query is made to retrieve the stores and then two,
and four queries are made to the nested store-article. With
rootEntity="false", two queries are made to retrieve the stores and then
one, and two queries are made to the nested store-article. It seems odd
that both these cases produces multiple queries for the second store, but
maybe that's expected?

Anyway, althought the queries differs, the result is the same.

/Tobias

2012/7/22 Ahmet Arslan <[hidden email]>

> > I'm trying to index a set of stores and their articles. I
> > have two
> > XML-files, one that contains the data of the stores and one
> > that contains
> > articles for each store. I'm using DIH with
> > XPathEntityProcessor to process
> > the file containing the store, and using a nested entity I
> > try to get all
> > articles that belongs to the specific store. The problem I
> > encounter is
> > that every store gets the same articles.
> >
> > For testing purposes I've stripped down the xml-files to
> > only include id:s
> > for testing purposes. The store file (StoresTest.xml) looks
> > like this:
> >
> > <?xml version="1.0" encoding="utf-8"?>
> > <Stores><Store><Id>0102</Id></Store><Store><Id>0104</Id></Store></Stores>
> >
> > The Store-Articles relations file (StoreArticlesTest.xml)
> > looks like this:
> > <?xml version="1.0"
> > encoding="utf-8"?><StoreArticles><Store
> > StoreId="0102"><ArticleId>18004</ArticleId></Store><Store
> >
> StoreId="0104"><ArticleId>17004</ArticleId><ArticleId>10004</ArticleId></Store></StoreArticles>
> >
> > And my dih-config file looks like this:
> >
> > <dataConfig>
> >         <dataSource
> > type="FileDataSource" encoding="UTF-8" />
> >         <document>
> >    <entity name="store"
> > processor="XPathEntityProcessor"
> > stream="true"
> > forEach="/Stores/Store"
> > url="../../../data/StoresTest.xml"
> > transformer="TemplateTransformer"
> > >
> > <field column="id"  xpath="/Stores/Store/Id" />
> > <entity name="storearticle"
> > processor="XPathEntityProcessor"
> > stream="true"
> > forEach="/StoreArticles"
> > url="../../../data/StoreArticlesTest.xml"
> > transformer="LogTransformer"
> > logTemplate="Processing ${store.id}" logLevel="info"
> > rootEntity="true">
> >  <field column="store_articles_txt"
> > xpath="/StoreArticles/Store[@StoreId='${
> > store.id}']/ArticleId" />
> > </entity>
> >    </entity>
> > </document>
> > </dataConfig>
> >
> > The result I get in Solr is this:
> >
> > <response>
> > <lst name="responseHeader">...</lst>
> > <result name="response" numFound="2" start="0">
> > <doc>
> > <str name="id">0102</str>
> > <arr name="store_articles_txt">
> > <str>18004</str>
> > </arr>
> > </doc>
> > <doc>
> > <str name="id">0104</str>
> > <arr name="store_articles_txt">
> > <str>18004</str>
> > </arr>
> > </doc>
> > </result>
> > </response>
> >
> > As you see, both stores gets the article for the first
> > store. I would have
> > expected the second store to have two articles: 17004 and
> > 10004.
> >
> > In the log messages printed using LogTransformer I see that
> > each
> > store.idis processed but somehow it only picks up the
> > articles for the
> > first store.
> >
> > Any ideas?
>
> What happens when you set <entity name="store" rootEntity="false" ?
> What is your uniqueKey in schema.xml?
>
Reply | Threaded
Open this post in threaded view
|

Re: Index relational XML with DataImportHandler

Alexandre Rafalovitch
In reply to this post by Tobias Berg
I am still struggling with nested DIH myself, but I notice that your
correlation condition is on the field level (@StoreId='${store.id}).
Were you planning to repeat it for each field definition?

Have you tried putting it instead in the forEach section?

Alternatively, maybe you need to use $skipDoc as in the Wikipedia
import example?

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Sat, Jul 21, 2012 at 1:34 PM, Tobias Berg <[hidden email]> wrote:

> Hi,
>
> I'm trying to index a set of stores and their articles. I have two
> XML-files, one that contains the data of the stores and one that contains
> articles for each store. I'm using DIH with XPathEntityProcessor to process
> the file containing the store, and using a nested entity I try to get all
> articles that belongs to the specific store. The problem I encounter is
> that every store gets the same articles.
>
> For testing purposes I've stripped down the xml-files to only include id:s
> for testing purposes. The store file (StoresTest.xml) looks like this:
>
> <?xml version="1.0" encoding="utf-8"?>
> <Stores><Store><Id>0102</Id></Store><Store><Id>0104</Id></Store></Stores>
>
> The Store-Articles relations file (StoreArticlesTest.xml) looks like this:
> <?xml version="1.0" encoding="utf-8"?><StoreArticles><Store
> StoreId="0102"><ArticleId>18004</ArticleId></Store><Store
> StoreId="0104"><ArticleId>17004</ArticleId><ArticleId>10004</ArticleId></Store></StoreArticles>
>
> And my dih-config file looks like this:
>
> <dataConfig>
>         <dataSource type="FileDataSource" encoding="UTF-8" />
>         <document>
>    <entity name="store"
> processor="XPathEntityProcessor"
> stream="true"
> forEach="/Stores/Store"
> url="../../../data/StoresTest.xml"
> transformer="TemplateTransformer"
>>
> <field column="id"  xpath="/Stores/Store/Id" />
> <entity name="storearticle"
> processor="XPathEntityProcessor"
> stream="true"
> forEach="/StoreArticles"
> url="../../../data/StoreArticlesTest.xml"
> transformer="LogTransformer"
> logTemplate="Processing ${store.id}" logLevel="info"
> rootEntity="true">
>  <field column="store_articles_txt" xpath="/StoreArticles/Store[@StoreId='${
> store.id}']/ArticleId" />
> </entity>
>    </entity>
> </document>
> </dataConfig>
>
> The result I get in Solr is this:
>
> <response>
> <lst name="responseHeader">...</lst>
> <result name="response" numFound="2" start="0">
> <doc>
> <str name="id">0102</str>
> <arr name="store_articles_txt">
> <str>18004</str>
> </arr>
> </doc>
> <doc>
> <str name="id">0104</str>
> <arr name="store_articles_txt">
> <str>18004</str>
> </arr>
> </doc>
> </result>
> </response>
>
> As you see, both stores gets the article for the first store. I would have
> expected the second store to have two articles: 17004 and 10004.
>
> In the log messages printed using LogTransformer I see that each
> store.idis processed but somehow it only picks up the articles for the
> first store.
>
> Any ideas?
>
> /Tobias Berg
Reply | Threaded
Open this post in threaded view
|

Re: Index relational XML with DataImportHandler

Tobias Berg
The articleId field is the only field in the correlation file so I just
need to get that one working.

I tried butting the condition in the forEach secion. If I hardcode a value,
like 0104, it works but it doesn't work with the variable. Haven't looked
at the sourcecode yet but maybe forEach doesn't support variables? That
could be a nice patch :)

I thought about $skipDoc but can't figure out how I want to use it, since I
want to add the field, it's just that it picks the wrong value. Do you have
something in mind in how to use it for my use-case?

I'll take a look at the source code to see if it can be a bug.

/Tobias

2012/7/22 Alexandre Rafalovitch <[hidden email]>

> I am still struggling with nested DIH myself, but I notice that your
> correlation condition is on the field level (@StoreId='${store.id}).
> Were you planning to repeat it for each field definition?
>
> Have you tried putting it instead in the forEach section?
>
> Alternatively, maybe you need to use $skipDoc as in the Wikipedia
> import example?
>
> Regards,
>    Alex.
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Sat, Jul 21, 2012 at 1:34 PM, Tobias Berg <[hidden email]>
> wrote:
> > Hi,
> >
> > I'm trying to index a set of stores and their articles. I have two
> > XML-files, one that contains the data of the stores and one that contains
> > articles for each store. I'm using DIH with XPathEntityProcessor to
> process
> > the file containing the store, and using a nested entity I try to get all
> > articles that belongs to the specific store. The problem I encounter is
> > that every store gets the same articles.
> >
> > For testing purposes I've stripped down the xml-files to only include
> id:s
> > for testing purposes. The store file (StoresTest.xml) looks like this:
> >
> > <?xml version="1.0" encoding="utf-8"?>
> > <Stores><Store><Id>0102</Id></Store><Store><Id>0104</Id></Store></Stores>
> >
> > The Store-Articles relations file (StoreArticlesTest.xml) looks like
> this:
> > <?xml version="1.0" encoding="utf-8"?><StoreArticles><Store
> > StoreId="0102"><ArticleId>18004</ArticleId></Store><Store
> >
> StoreId="0104"><ArticleId>17004</ArticleId><ArticleId>10004</ArticleId></Store></StoreArticles>
> >
> > And my dih-config file looks like this:
> >
> > <dataConfig>
> >         <dataSource type="FileDataSource" encoding="UTF-8" />
> >         <document>
> >    <entity name="store"
> > processor="XPathEntityProcessor"
> > stream="true"
> > forEach="/Stores/Store"
> > url="../../../data/StoresTest.xml"
> > transformer="TemplateTransformer"
> >>
> > <field column="id"  xpath="/Stores/Store/Id" />
> > <entity name="storearticle"
> > processor="XPathEntityProcessor"
> > stream="true"
> > forEach="/StoreArticles"
> > url="../../../data/StoreArticlesTest.xml"
> > transformer="LogTransformer"
> > logTemplate="Processing ${store.id}" logLevel="info"
> > rootEntity="true">
> >  <field column="store_articles_txt"
> xpath="/StoreArticles/Store[@StoreId='${
> > store.id}']/ArticleId" />
> > </entity>
> >    </entity>
> > </document>
> > </dataConfig>
> >
> > The result I get in Solr is this:
> >
> > <response>
> > <lst name="responseHeader">...</lst>
> > <result name="response" numFound="2" start="0">
> > <doc>
> > <str name="id">0102</str>
> > <arr name="store_articles_txt">
> > <str>18004</str>
> > </arr>
> > </doc>
> > <doc>
> > <str name="id">0104</str>
> > <arr name="store_articles_txt">
> > <str>18004</str>
> > </arr>
> > </doc>
> > </result>
> > </response>
> >
> > As you see, both stores gets the article for the first store. I would
> have
> > expected the second store to have two articles: 17004 and 10004.
> >
> > In the log messages printed using LogTransformer I see that each
> > store.idis processed but somehow it only picks up the articles for the
> > first store.
> >
> > Any ideas?
> >
> > /Tobias Berg
>
Reply | Threaded
Open this post in threaded view
|

Re: Index relational XML with DataImportHandler

Tobias Berg
Ok, problem found by digging in the source code. If it is a bug or "works
by design" I don't know but the reason is when the translation of the
vaiable ${store.id} is made.

The translation is made in the method initXpathReader() with these lines:

>           String xpath = field.get(XPATH);
>           *xpath = context.replaceTokens(xpath);*
>           xpathReader.addField(field.get(DataImporter.COLUMN),
>                   xpath,
>
> Boolean.parseBoolean(field.get(DataImporter.MULTI_VALUED)),
>                   flags);
>         }


The line  *xpath = context.replaceTokens(xpath); *translates the variable
to it's actual value. initXpathReader() is called in the init() method but
is *only *called once for each entity definition:

      if (xpathReader == null)
>           initXpathReader();


This means that the first time initXpathReader() is called, ${store.id} is
translated to 0102 (the first id of the store). When the next store id is
encountered, the xpathReader is already initialized so initXpathReader() is
not called, thus the xpath expression is not updated with the new store id.

There is a bunch of other things happening in the initXpathReader so I'm
not sure if it's safe to just remove the null-check. But, looking at the
SQLEntityProcessor, the translation of the variables in the query string is
performed in the getRow() method, and not in the init method so I think
that the null-check should either be removed or that the xpath expression
translation should be moved so it is performed each time.

/Tobias

2012/7/22 Tobias Berg <[hidden email]>

> The articleId field is the only field in the correlation file so I just
> need to get that one working.
>
> I tried butting the condition in the forEach secion. If I hardcode a
> value, like 0104, it works but it doesn't work with the variable. Haven't
> looked at the sourcecode yet but maybe forEach doesn't support variables?
> That could be a nice patch :)
>
> I thought about $skipDoc but can't figure out how I want to use it, since
> I want to add the field, it's just that it picks the wrong value. Do you
> have something in mind in how to use it for my use-case?
>
> I'll take a look at the source code to see if it can be a bug.
>
> /Tobias
>
> 2012/7/22 Alexandre Rafalovitch <[hidden email]>
>
>> I am still struggling with nested DIH myself, but I notice that your
>> correlation condition is on the field level (@StoreId='${store.id}).
>> Were you planning to repeat it for each field definition?
>>
>> Have you tried putting it instead in the forEach section?
>>
>> Alternatively, maybe you need to use $skipDoc as in the Wikipedia
>> import example?
>>
>> Regards,
>>    Alex.
>> Personal blog: http://blog.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all
>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>> book)
>>
>>
>> On Sat, Jul 21, 2012 at 1:34 PM, Tobias Berg <[hidden email]>
>> wrote:
>> > Hi,
>> >
>> > I'm trying to index a set of stores and their articles. I have two
>> > XML-files, one that contains the data of the stores and one that
>> contains
>> > articles for each store. I'm using DIH with XPathEntityProcessor to
>> process
>> > the file containing the store, and using a nested entity I try to get
>> all
>> > articles that belongs to the specific store. The problem I encounter is
>> > that every store gets the same articles.
>> >
>> > For testing purposes I've stripped down the xml-files to only include
>> id:s
>> > for testing purposes. The store file (StoresTest.xml) looks like this:
>> >
>> > <?xml version="1.0" encoding="utf-8"?>
>> >
>> <Stores><Store><Id>0102</Id></Store><Store><Id>0104</Id></Store></Stores>
>> >
>> > The Store-Articles relations file (StoreArticlesTest.xml) looks like
>> this:
>> > <?xml version="1.0" encoding="utf-8"?><StoreArticles><Store
>> > StoreId="0102"><ArticleId>18004</ArticleId></Store><Store
>> >
>> StoreId="0104"><ArticleId>17004</ArticleId><ArticleId>10004</ArticleId></Store></StoreArticles>
>> >
>> > And my dih-config file looks like this:
>> >
>> > <dataConfig>
>> >         <dataSource type="FileDataSource" encoding="UTF-8" />
>> >         <document>
>> >    <entity name="store"
>> > processor="XPathEntityProcessor"
>> > stream="true"
>> > forEach="/Stores/Store"
>> > url="../../../data/StoresTest.xml"
>> > transformer="TemplateTransformer"
>> >>
>> > <field column="id"  xpath="/Stores/Store/Id" />
>> > <entity name="storearticle"
>> > processor="XPathEntityProcessor"
>> > stream="true"
>> > forEach="/StoreArticles"
>> > url="../../../data/StoreArticlesTest.xml"
>> > transformer="LogTransformer"
>> > logTemplate="Processing ${store.id}" logLevel="info"
>> > rootEntity="true">
>> >  <field column="store_articles_txt"
>> xpath="/StoreArticles/Store[@StoreId='${
>> > store.id}']/ArticleId" />
>> > </entity>
>> >    </entity>
>> > </document>
>> > </dataConfig>
>> >
>> > The result I get in Solr is this:
>> >
>> > <response>
>> > <lst name="responseHeader">...</lst>
>> > <result name="response" numFound="2" start="0">
>> > <doc>
>> > <str name="id">0102</str>
>> > <arr name="store_articles_txt">
>> > <str>18004</str>
>> > </arr>
>> > </doc>
>> > <doc>
>> > <str name="id">0104</str>
>> > <arr name="store_articles_txt">
>> > <str>18004</str>
>> > </arr>
>> > </doc>
>> > </result>
>> > </response>
>> >
>> > As you see, both stores gets the article for the first store. I would
>> have
>> > expected the second store to have two articles: 17004 and 10004.
>> >
>> > In the log messages printed using LogTransformer I see that each
>> > store.idis processed but somehow it only picks up the articles for the
>> > first store.
>> >
>> > Any ideas?
>> >
>> > /Tobias Berg
>>
>
>