DIH and UpdateRequestProcessor#finish

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

DIH and UpdateRequestProcessor#finish

Erik Hatcher
Shouldn't DIH, I presume in either SolrWriter or DataImportHandler,  
call processor.finish()?

Maybe DataImportHandler should subclass ContentStreamHandlerBase,  
which calls #finish already.  This would mean we implement a new  
ContentStreamLoader.  This would allow DIH to hand the streams off as  
either data sources or data to entities, right?  This is where we want  
to head with Tika integration into DIH, methinks.

Thoughts?

        Erik

Reply | Threaded
Open this post in threaded view
|

Re: DIH and UpdateRequestProcessor#finish

Erik Hatcher
And just to fill in the blanks that I missed before, DataImportHandler  
currently does handle a single content stream.  One stream is pretty  
much all I've ever used, but there can be more than one and it would  
seem rude for a handler to ignore them.

I still think subclassing ContentStreamHandlerBase and doing the work  
in ContentStreamHandlerBase#load seems the best way to go.

Mainly I was just curious about UpdateRequestProcessor#finish though,  
which DIH currently does not call (that I see).

        Erik

On Jul 31, 2009, at 9:26 PM, Erik Hatcher wrote:

> Shouldn't DIH, I presume in either SolrWriter or DataImportHandler,  
> call processor.finish()?
>
> Maybe DataImportHandler should subclass ContentStreamHandlerBase,  
> which calls #finish already.  This would mean we implement a new  
> ContentStreamLoader.  This would allow DIH to hand the streams off  
> as either data sources or data to entities, right?  This is where we  
> want to head with Tika integration into DIH, methinks.
>
> Thoughts?
>
> Erik

Reply | Threaded
Open this post in threaded view
|

Re: DIH and UpdateRequestProcessor#finish

Noble Paul നോബിള്‍  नोब्ळ्-2
In reply to this post by Erik Hatcher
On Sat, Aug 1, 2009 at 6:56 AM, Erik Hatcher<[hidden email]> wrote:
> Shouldn't DIH, I presume in either SolrWriter or DataImportHandler, call
> processor.finish()?
soon after  commit DIH should call finish.
>
> Maybe DataImportHandler should subclass ContentStreamHandlerBase, which
> calls #finish already.  This would mean we implement a new
> ContentStreamLoader.  This would allow DIH to hand the streams off as either
> data sources or data to entities, right?  This is where we want to head with
> Tika integration into DIH, methinks.
If you wish to handle 'push' data DIH already has a
ContentStreamDataSource. I guess Tika Integration would be easy with
that
>
> Thoughts?
>
>        Erik
>
>



--
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com