help with dataimport delta query

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

help with dataimport delta query

Joel Nylund
Hi, I have solr all working nicely, except im trying to get deltas to  
work on my data import handler

Here is a simplification of my data import config, I have a table  
called "Book" which has categories, im doing subquries for the  
category info and calling a javascript helper. This all works  
perfectly for the regular query.

I added these lines for the delta stuff:

        deltaImportQuery="SELECT f.id,f.title
                        FROM Book f
                        f.id='${dataimporter.delta.job_jobs_id}'"
                deltaQuery="SELECT id FROM `Book` WHERE fm.inMyList=1 AND  
lastModifiedDate > '${dataimporter.last_index_time}'"  >
               
basically im trying to rows that lastModifiedDate is newer than the  
last index (or deltaindex).

I run:
http://localhost:8983/solr/dataimport?command=delta-import

And it says in logs:

Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.DataImporter doDeltaImport
INFO: Starting Delta Import
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.SolrWriter  
readIndexerProperties
INFO: Read dataimport.properties
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
doDelta
INFO: Starting delta collection.
Nov 23, 2009 2:33:02 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/dataimport params={command=delta-import}  
status=0 QTime=0
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
collectDelta
INFO: Running ModifiedRowKey() for Entity: category
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
collectDelta
INFO: Completed ModifiedRowKey for Entity: category rows obtained : 0
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
collectDelta
INFO: Completed DeletedRowKey for Entity: category rows obtained : 0
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
collectDelta
INFO: Completed parentDeltaQuery for Entity: category
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
collectDelta
INFO: Running ModifiedRowKey() for Entity: item
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
collectDelta
INFO: Completed ModifiedRowKey for Entity: item rows obtained : 0
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
collectDelta
INFO: Completed DeletedRowKey for Entity: item rows obtained : 0
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
collectDelta
INFO: Completed parentDeltaQuery for Entity: item
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
doDelta
INFO: Delta Import completed successfully
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
execute
INFO: Time taken = 0:0:0.21

But the browser says no documents added/modified (even though one  
record in db is a match)

Is there a way to turn debugging so I can see the queries the DIH is  
sending to the db?

Any other ideas of what I could be doing wrong?

thanks
Joel


<document name="doc">
     <entity name="item"
       query="SELECT f.id, f.title
                FROM Book f
                WHERE f.inMyList=1"
                deltaImportQuery="SELECT f.id,f.title
                        FROM Book f
                        f.id='${dataimporter.delta.job_jobs_id}'"
                deltaQuery="SELECT id FROM `Book` WHERE fm.inMyList=1 AND  
lastModifiedDate > '${dataimporter.last_index_time}'"  >
               
            <field column="id" name="id" />
            <field column="title" name="title" />
  <entity name="category"  
transformer="script:SplitAndPrettyCategory" query="select fc.bookId,  
group_concat(cr.name) as categoryName,
                 from BookCat fc
                 where fc.bookId = '${item.id}' AND
                 group by fc.bookId">
                 <field column="categoryType" name="categoryType" />
                 </entity>
     </entity>
    </document>


Reply | Threaded
Open this post in threaded view
|

Re: help with dataimport delta query

Joel Nylund
got to love it when yahoo thinks your own mail is spam, anyone have  
any ideas how to get logging to work with 1.4.

I went to the admin panel and set all logging to finest.

In my jetty std out I see no SQL for any of the dataimport handler  
run. I see

Nov 23, 2009 9:26:27 PM  
org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Time taken for getConnection(): 6
Nov 23, 2009 9:26:32 PM  
org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Creating a connection for entity category with URL: jdbc:mysql://
localhost/feeddb
Nov 23, 2009 9:26:32 PM  
org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Time taken for getConnection(): 5


But no sql, from looking at the source, it looks like it should be  
logging the sql if Im in debug mode.

any ideas, I think I am losing my mind.

my full import works, but the delta does nothing

thanks
Joel



On Nov 23, 2009, at 2:49 PM, Joel Nylund wrote:

> Hi, I have solr all working nicely, except im trying to get deltas  
> to work on my data import handler
>
> Here is a simplification of my data import config, I have a table  
> called "Book" which has categories, im doing subquries for the  
> category info and calling a javascript helper. This all works  
> perfectly for the regular query.
>
> I added these lines for the delta stuff:
>
> deltaImportQuery="SELECT f.id,f.title
> FROM Book f
> f.id='${dataimporter.delta.job_jobs_id}'"
> deltaQuery="SELECT id FROM `Book` WHERE fm.inMyList=1 AND  
> lastModifiedDate > '${dataimporter.last_index_time}'"  >
>
> basically im trying to rows that lastModifiedDate is newer than the  
> last index (or deltaindex).
>
> I run:
> http://localhost:8983/solr/dataimport?command=delta-import
>
> And it says in logs:
>
> Nov 23, 2009 2:33:02 PM  
> org.apache.solr.handler.dataimport.DataImporter doDeltaImport
> INFO: Starting Delta Import
> Nov 23, 2009 2:33:02 PM  
> org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties
> INFO: Read dataimport.properties
> Nov 23, 2009 2:33:02 PM  
> org.apache.solr.handler.dataimport.DocBuilder doDelta
> INFO: Starting delta collection.
> Nov 23, 2009 2:33:02 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/dataimport params={command=delta-import}  
> status=0 QTime=0
> Nov 23, 2009 2:33:02 PM  
> org.apache.solr.handler.dataimport.DocBuilder collectDelta
> INFO: Running ModifiedRowKey() for Entity: category
> Nov 23, 2009 2:33:02 PM  
> org.apache.solr.handler.dataimport.DocBuilder collectDelta
> INFO: Completed ModifiedRowKey for Entity: category rows obtained : 0
> Nov 23, 2009 2:33:02 PM  
> org.apache.solr.handler.dataimport.DocBuilder collectDelta
> INFO: Completed DeletedRowKey for Entity: category rows obtained : 0
> Nov 23, 2009 2:33:02 PM  
> org.apache.solr.handler.dataimport.DocBuilder collectDelta
> INFO: Completed parentDeltaQuery for Entity: category
> Nov 23, 2009 2:33:02 PM  
> org.apache.solr.handler.dataimport.DocBuilder collectDelta
> INFO: Running ModifiedRowKey() for Entity: item
> Nov 23, 2009 2:33:02 PM  
> org.apache.solr.handler.dataimport.DocBuilder collectDelta
> INFO: Completed ModifiedRowKey for Entity: item rows obtained : 0
> Nov 23, 2009 2:33:02 PM  
> org.apache.solr.handler.dataimport.DocBuilder collectDelta
> INFO: Completed DeletedRowKey for Entity: item rows obtained : 0
> Nov 23, 2009 2:33:02 PM  
> org.apache.solr.handler.dataimport.DocBuilder collectDelta
> INFO: Completed parentDeltaQuery for Entity: item
> Nov 23, 2009 2:33:02 PM  
> org.apache.solr.handler.dataimport.DocBuilder doDelta
> INFO: Delta Import completed successfully
> Nov 23, 2009 2:33:02 PM  
> org.apache.solr.handler.dataimport.DocBuilder execute
> INFO: Time taken = 0:0:0.21
>
> But the browser says no documents added/modified (even though one  
> record in db is a match)
>
> Is there a way to turn debugging so I can see the queries the DIH is  
> sending to the db?
>
> Any other ideas of what I could be doing wrong?
>
> thanks
> Joel
>
>
> <document name="doc">
>    <entity name="item"
>      query="SELECT f.id, f.title
> FROM Book f
> WHERE f.inMyList=1"
> deltaImportQuery="SELECT f.id,f.title
> FROM Book f
> f.id='${dataimporter.delta.job_jobs_id}'"
> deltaQuery="SELECT id FROM `Book` WHERE fm.inMyList=1 AND  
> lastModifiedDate > '${dataimporter.last_index_time}'"  >
>
>           <field column="id" name="id" />
>           <field column="title" name="title" />
> <entity name="category"  
> transformer="script:SplitAndPrettyCategory" query="select fc.bookId,  
> group_concat(cr.name) as categoryName,
> from BookCat fc
> where fc.bookId = '${item.id}' AND
> group by fc.bookId">
> <field column="categoryType" name="categoryType" />
> </entity>
>    </entity>
>   </document>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: help with dataimport delta query

Noble Paul നോബിള്‍  नोब्ळ्-2
In reply to this post by Joel Nylund
I guess the field names do not match
in the deltaQuery you are selecting the field id

and in the deltaImportQuery you us the field as
${dataimporter.delta.job_jobs_id}
I guess it should be ${dataimporter.delta.id}

On Tue, Nov 24, 2009 at 1:19 AM, Joel Nylund <[hidden email]> wrote:

> Hi, I have solr all working nicely, except im trying to get deltas to work
> on my data import handler
>
> Here is a simplification of my data import config, I have a table called
> "Book" which has categories, im doing subquries for the category info and
> calling a javascript helper. This all works perfectly for the regular query.
>
> I added these lines for the delta stuff:
>
>        deltaImportQuery="SELECT f.id,f.title
>                        FROM Book f
>                        f.id='${dataimporter.delta.job_jobs_id}'"
>                deltaQuery="SELECT id FROM `Book` WHERE fm.inMyList=1 AND
> lastModifiedDate > '${dataimporter.last_index_time}'"  >
>
> basically im trying to rows that lastModifiedDate is newer than the last
> index (or deltaindex).
>
> I run:
> http://localhost:8983/solr/dataimport?command=delta-import
>
> And it says in logs:
>
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DataImporter
> doDeltaImport
> INFO: Starting Delta Import
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.SolrWriter
> readIndexerProperties
> INFO: Read dataimport.properties
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> doDelta
> INFO: Starting delta collection.
> Nov 23, 2009 2:33:02 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/dataimport params={command=delta-import}
> status=0 QTime=0
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Running ModifiedRowKey() for Entity: category
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed ModifiedRowKey for Entity: category rows obtained : 0
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed DeletedRowKey for Entity: category rows obtained : 0
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed parentDeltaQuery for Entity: category
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Running ModifiedRowKey() for Entity: item
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed ModifiedRowKey for Entity: item rows obtained : 0
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed DeletedRowKey for Entity: item rows obtained : 0
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed parentDeltaQuery for Entity: item
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> doDelta
> INFO: Delta Import completed successfully
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> execute
> INFO: Time taken = 0:0:0.21
>
> But the browser says no documents added/modified (even though one record in
> db is a match)
>
> Is there a way to turn debugging so I can see the queries the DIH is sending
> to the db?
>
> Any other ideas of what I could be doing wrong?
>
> thanks
> Joel
>
>
> <document name="doc">
>    <entity name="item"
>      query="SELECT f.id, f.title
>                FROM Book f
>                WHERE f.inMyList=1"
>                deltaImportQuery="SELECT f.id,f.title
>                        FROM Book f
>                        f.id='${dataimporter.delta.job_jobs_id}'"
>                deltaQuery="SELECT id FROM `Book` WHERE fm.inMyList=1 AND
> lastModifiedDate > '${dataimporter.last_index_time}'"  >
>
>           <field column="id" name="id" />
>           <field column="title" name="title" />
>                <entity name="category"
> transformer="script:SplitAndPrettyCategory" query="select fc.bookId,
> group_concat(cr.name) as categoryName,
>                 from BookCat fc
>                 where fc.bookId = '${item.id}' AND
>                 group by fc.bookId">
>                 <field column="categoryType" name="categoryType" />
>                 </entity>
>    </entity>
>   </document>
>
>
>



--
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com
Reply | Threaded
Open this post in threaded view
|

Re: help with dataimport delta query

Joel Nylund
Thanks that was it, well really this part:

${dataimporter.delta.job_jobs_id}

I thought the jobs_id was part of the DIH, but I guess it was just the example, duh!

thanks
Joel


--- On Tue, 11/24/09, Noble Paul നോബിള്‍  नोब्ळ् <[hidden email]> wrote:

> From: Noble Paul നോബിള്‍  नोब्ळ् <[hidden email]>
> Subject: Re: help with dataimport delta query
> To: [hidden email]
> Date: Tuesday, November 24, 2009, 12:15 AM
> I guess the field names do not match
> in the deltaQuery you are selecting the field id
>
> and in the deltaImportQuery you us the field as
> ${dataimporter.delta.job_jobs_id}
> I guess it should be ${dataimporter.delta.id}
>
> On Tue, Nov 24, 2009 at 1:19 AM, Joel Nylund <[hidden email]>
> wrote:
> > Hi, I have solr all working nicely, except im trying
> to get deltas to work
> > on my data import handler
> >
> > Here is a simplification of my data import config, I
> have a table called
> > "Book" which has categories, im doing subquries for
> the category info and
> > calling a javascript helper. This all works perfectly
> for the regular query.
> >
> > I added these lines for the delta stuff:
> >
> >        deltaImportQuery="SELECT f.id,f.title
> >                        FROM Book f
> >                      
>  f.id='${dataimporter.delta.job_jobs_id}'"
> >                deltaQuery="SELECT id FROM
> `Book` WHERE fm.inMyList=1 AND
> > lastModifiedDate >
> '${dataimporter.last_index_time}'"  >
> >
> > basically im trying to rows that lastModifiedDate is
> newer than the last
> > index (or deltaindex).
> >
> > I run:
> > http://localhost:8983/solr/dataimport?command=delta-import
> >
> > And it says in logs:
> >
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DataImporter
> > doDeltaImport
> > INFO: Starting Delta Import
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.SolrWriter
> > readIndexerProperties
> > INFO: Read dataimport.properties
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > doDelta
> > INFO: Starting delta collection.
> > Nov 23, 2009 2:33:02 PM org.apache.solr.core.SolrCore
> execute
> > INFO: [] webapp=/solr path=/dataimport
> params={command=delta-import}
> > status=0 QTime=0
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > collectDelta
> > INFO: Running ModifiedRowKey() for Entity: category
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > collectDelta
> > INFO: Completed ModifiedRowKey for Entity: category
> rows obtained : 0
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > collectDelta
> > INFO: Completed DeletedRowKey for Entity: category
> rows obtained : 0
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > collectDelta
> > INFO: Completed parentDeltaQuery for Entity: category
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > collectDelta
> > INFO: Running ModifiedRowKey() for Entity: item
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > collectDelta
> > INFO: Completed ModifiedRowKey for Entity: item rows
> obtained : 0
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > collectDelta
> > INFO: Completed DeletedRowKey for Entity: item rows
> obtained : 0
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > collectDelta
> > INFO: Completed parentDeltaQuery for Entity: item
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > doDelta
> > INFO: Delta Import completed successfully
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > execute
> > INFO: Time taken = 0:0:0.21
> >
> > But the browser says no documents added/modified (even
> though one record in
> > db is a match)
> >
> > Is there a way to turn debugging so I can see the
> queries the DIH is sending
> > to the db?
> >
> > Any other ideas of what I could be doing wrong?
> >
> > thanks
> > Joel
> >
> >
> > <document name="doc">
> >    <entity name="item"
> >      query="SELECT f.id, f.title
> >                FROM Book f
> >                WHERE f.inMyList=1"
> >                deltaImportQuery="SELECT
> f.id,f.title
> >                        FROM Book f
> >                      
>  f.id='${dataimporter.delta.job_jobs_id}'"
> >                deltaQuery="SELECT id FROM
> `Book` WHERE fm.inMyList=1 AND
> > lastModifiedDate >
> '${dataimporter.last_index_time}'"  >
> >
> >           <field column="id" name="id" />
> >           <field column="title" name="title"
> />
> >                <entity name="category"
> > transformer="script:SplitAndPrettyCategory"
> query="select fc.bookId,
> > group_concat(cr.name) as categoryName,
> >                 from BookCat fc
> >                 where fc.bookId = '${item.id}'
> AND
> >                 group by fc.bookId">
> >                 <field
> column="categoryType" name="categoryType" />
> >                 </entity>
> >    </entity>
> >   </document>
> >
> >
> >
>
>
>
> --
> -----------------------------------------------------
> Noble Paul | Principal Engineer| AOL | http://aol.com
>