Need some DIH Entity Processor development advice...
have a situation where we have data coming from several long-running queries
hitting multiple relational databases.Other data comes in fixed-width text file feeds, etc.All of this has to be joined and
denormalized and made into nice SOLR documents.I've been wanting to use DIH as it seems
to already provide 90% of what we need.The rest can some in the form of custom transformers & Entity
Processors that I can write…
big need is to have disk-backed caches.For instance, a child entity that pulls back millions of rows will beat
up the db using a regular SQLEntityProcessor whereas the
CachedSQLEntityProcessor puts everything in memory in a HashMap so it will only
scale to a point.For fixed-width
text files, there doesn't seem to be any Cached implementations at all.
I've written a custom Entity Processor that creates a temporary Lucene index to
use as a disk cache.Initial tests
are promising but with one little problem.I need a place to close the Lucene index reader and then delete the
temporary index.It seemed easy
enough to override the "destroy()" method from
EntityProcessorBase.But to my
surprise, it seems that both destroy() and init() get called every time a new
Primary Key is called up from the cache.(see DocBuilder.buildDocument()).Just to be sure I wasn't crazy, I added a "destroy()" method
to CachedSqlEntityProcessor and found it indeed gets called every time a new
Primary Key is called from the cache.In fact, the first couple of lines in cacheInit() in EntityProcessorBase
seem to be there to cope with the fact that both destroy() and init() get
called over and over again during the lifecycle of the object.
also noticed that destroy() isn't actually implemented anywhere in the
prepacked Entity Processors.This
makes me wonder if it is a mistake.Should DocBuilder be changed to call destroy() only once per lifecycle
for each EntityProcessor object?If
so I think I can have a patch in JIRA in short order.
do I best accomplish my clean-up tasks?Advice is greatly appreciated.