[jira] Commented: (NUTCH-501) implementing a different caching mechanism for objects

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-501) implementing a different caching mechanism for objects

Parth (Jira)
Implement a different caching mechanism for objects
 cached in configuration
In-Reply-To: <25017749.1182168266331.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/NUTCH-501?page=3Dcom.atlassian.=
jira.plugin.system.issuetabpanels:comment-tabpanel#action_12505807 ]=20

Andrzej Bialecki  commented on NUTCH-501:
-----------------------------------------

ObjectCache should support caching objects that fall under the same key, bu=
t are differently configured. This situation occurs when running in "local"=
 mode, and using Nutch tools to perform several workflows with different co=
nfigs - in such cases there is a single instance of ObjectCache created wit=
hin a JVM, and using this implementation of ObjectCache objects coming from=
 different configuration contexts would be set/retrieved in wrong contexts.

This is very much similar to the issue in NUTCH-169. If we use ObjectCache =
the way you proposed we would revert to the situation before NUTCH-169.

I propose to modify ObjectCache to store multiple objects under the same ke=
y, additionally indexed by Configuration id - and to modify all ObjectCache=
 methods to take a Configuration parameter.

Currently Configuration instances don't have a unique id (unless you count =
a job id available in mapred.job.id - but this becomes available only after=
 you submit a job), and they don't implement any sensible hashCode(), so it=
's difficult to produce a key uniquely tied to a config instance. The way N=
utch uses Configuration, it's always created either via NutchConfiguration.=
create() or new NutchJob(getConf()) - we could generate unique object.cache=
.id property there, and use it later on in ObjectCache to retrieve the righ=
t set of key/value pairs. Similarly, if ObjectCache gets a Configuration in=
stance without a unique key, it could create one, stick it into Configurati=
on, and use it from now on.

The problem with this approach is that over time the ObjectCache would accu=
mulate values from past, no longer valid contexts.

> implementing a different caching mechanism for objects
Implement a different caching mechanism for objects cached in configuration
> -------------------------------------------------------------------------=
----------------------------------------------------------

>
>                 Key: NUTCH-501
>                 URL: https://issues.apache.org/jira/browse/NUTCH-501
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Do=C4=9Facan G=C3=BCney
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-501_draft.patch
>
>
> As per HADOOP-1343, Configuration.setObject and Configuration.getObject (=
which are used by Nutch to cache arbitrary objects) are deprecated and will=
 be removed soon. We have to implement an alternative caching mechanism and=
 replace all usages of Configuration.{getObject,setObject} with the new mec=
hanism.

--=20
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.