Quantcast

XPathProcessor foreach not working properly inside another entity

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

XPathProcessor foreach not working properly inside another entity

penela
Hi!

What I'm trying to do is get RSS urls from a MySQL DB of my own, an use them as the url endpoint for indexing the feed articles (mixing db and rss core DIH examples to some extent).

My data-config looks like this:
<dataConfig>
    <dataSource type="URLDataSource" name="rss-ds" />
    <dataSource type="JdbcDataSource" name="db-ds" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://127.0.0.1/tapmeme"  /> 
    <document>
                <entity name="feed" dataSource="db-ds" query="SELECT TAPmeme.urls.urlID as 'id', TAPmeme.urls.url as 'entity-name', TAPmeme.urls.source as 'url' FROM TAPmeme.urls">
                       
                        <entity name="rss" dataSource="rss-ds"
                                        pk="link"                
                        url="${feed.url}"
                        processor="XPathEntityProcessor"
                        forEach="/rss/channel | /rss/channel/item"
                        transformer="DateFormatTransformer">
                                       
                    <field column="source" xpath="/rss/channel/title" commonField="true" />
                    <field column="source-link" xpath="/rss/channel/link" commonField="true" />
                    <field column="subject" xpath="/rss/channel/subject" commonField="true" />
                               
                    <field column="title" xpath="/rss/channel/item/title" />
                    <field column="link" xpath="/rss/channel/item/link" />
                    <field column="description" xpath="/rss/channel/item/description" />
                    <field column="creator" xpath="/rss/channel/item/creator" />
                    <field column="item-subject" xpath="/rss/channel/item/subject" />
                    <field column="date" xpath="/rss/channel/item/date" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss" />
                </entity>
        </entity>
    </document>
</dataConfig>

(The table schema is a bit messed up with wrong named keys after too much testing, but it shouldn't be an issue here).

The issue with that is that foreach is not working properly (it works if using only the URLDataSource), and it only indexes the first article of each RSS feed.

Any ideas?

Thanks!
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: XPathProcessor foreach not working properly inside another entity

penela
After a bit of better targeted search on the forum, I''ve found this solution by Noble Paull:
http://lucene.472066.n3.nabble.com/DIH-Http-input-bug-problem-with-two-level-RSS-walker-tp491046p491047.html

Using rootEntity="false" in the outer entity seems to make it work as expected.

Thanks!
Loading...