Quantcast

Sharing dih "dictionaries"

classic Classic list List threaded Threaded
6 messages Options
bcm
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Sharing dih "dictionaries"

bcm
This post has NOT been accepted by the mailing list yet.
I'm not really sure how to title this but here's what I'm trying to do.

I have a query that creates a rather large dictionary of codes that are shared across multiple fields of a base entity.  I'm using the cachedsqlentityprocessor but I was curious if there was a way to join this multiple times to the base entity so I can avoid having to reload it for each column join.

Ex:
<entity name="parts" query="select name, code1, code2, code3 from parts">
  <field column="name" name="name" />
    <entity name="shareddictionary1" query="select code, description from partcodes" where="code=parts.code1">
      <field column="description" name="code1desc" /></entity>
    <entity name="shareddictionary2" query="select code, description from partcodes" where="code=parts.code2">
      <field column="description" name="code1desc" /></entity>
    <entity name="shareddictionary3" query="select code, description from partcodes" where="code=parts.code3">
      <field column="description" name="code1desc" /></entity>
</entity>

Kind of a simplified example but in this case the dictionary query has to be run 3 times to join 3 different columns.  It would be nice if I could load the data set once as an entity and specify how to join it in code without requiring a separate sql query.  Any ideas?
bcm
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Sharing dih "dictionaries"

bcm
I'm not really sure how to title this but here's what I'm trying to do.

I have a query that creates a rather large dictionary of codes that are shared across multiple fields of a base entity.  I'm using the cachedsqlentityprocessor but I was curious if there was a way to join this multiple times to the base entity so I can avoid having to reload it for each column join.

Ex:
<entity name="parts" query="select name, code1, code2, code3 from parts">
  <field column="name" name="name" />
    <entity name="shareddictionary1" query="select code, description from partcodes" where="code=parts.code1">
      <field column="description" name="code1desc" /></entity>
    <entity name="shareddictionary2" query="select code, description from partcodes" where="code=parts.code2">
      <field column="description" name="code1desc" /></entity>
    <entity name="shareddictionary3" query="select code, description from partcodes" where="code=parts.code3">
      <field column="description" name="code1desc" /></entity>
</entity>

Kind of a simplified example but in this case the dictionary query has to be run 3 times to join 3 different columns.  It would be nice if I could load the data set once as an entity and specify how to join it in code without requiring a separate sql query.  Any ideas?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sharing dih "dictionaries"

Mikhail Khludnev
It looks like https://issues.apache.org/jira/browse/SOLR-2382 or even
https://issues.apache.org/jira/browse/SOLR-2613.
I guess by using SOLR-2382 you can specify your own SortedMapBackedCache
subclass which is able to share your Dictionary.

Regards

On Tue, Dec 6, 2011 at 12:26 AM, Brent Mills <[hidden email]> wrote:

> I'm not really sure how to title this but here's what I'm trying to do.
>
> I have a query that creates a rather large dictionary of codes that are
> shared across multiple fields of a base entity.  I'm using the
> cachedsqlentityprocessor but I was curious if there was a way to join this
> multiple times to the base entity so I can avoid having to reload it for
> each column join.
>
> Ex:
> <entity name="parts" query="select name, code1, code2, code3 from parts">
>  <field column="name" name="name" />
>    <entity name="shareddictionary1" query="select code, description from
> partcodes" where="code=parts.code1">
>      <field column="description" name="code1desc" /></entity>
>    <entity name="shareddictionary2" query="select code, description from
> partcodes" where="code=parts.code2">
>      <field column="description" name="code1desc" /></entity>
>    <entity name="shareddictionary3" query="select code, description from
> partcodes" where="code=parts.code3">
>      <field column="description" name="code1desc" /></entity>
> </entity>
>
> Kind of a simplified example but in this case the dictionary query has to
> be run 3 times to join 3 different columns.  It would be nice if I could
> load the data set once as an entity and specify how to join it in code
> without requiring a separate sql query.  Any ideas?
>



--
Sincerely yours
Mikhail Khludnev
Developer
Grid Dynamics
tel. 1-415-738-8644
Skype: mkhludnev
<http://www.griddynamics.com>
 <[hidden email]>
bcm
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Sharing dih "dictionaries"

bcm
You're totally correct.  There's actually a link on the DIH page now which wasn't there when I had read it a long time ago.  I'm really looking forward to 4.0, it's got a ton of great new features.  Thanks for the links!!

-----Original Message-----
From: Mikhail Khludnev [mailto:[hidden email]]
Sent: Monday, December 05, 2011 10:45 PM
To: [hidden email]
Subject: Re: Sharing dih "dictionaries"

It looks like https://issues.apache.org/jira/browse/SOLR-2382 or even https://issues.apache.org/jira/browse/SOLR-2613.
I guess by using SOLR-2382 you can specify your own SortedMapBackedCache subclass which is able to share your Dictionary.

Regards

On Tue, Dec 6, 2011 at 12:26 AM, Brent Mills <[hidden email]> wrote:

> I'm not really sure how to title this but here's what I'm trying to do.
>
> I have a query that creates a rather large dictionary of codes that
> are shared across multiple fields of a base entity.  I'm using the
> cachedsqlentityprocessor but I was curious if there was a way to join
> this multiple times to the base entity so I can avoid having to reload
> it for each column join.
>
> Ex:
> <entity name="parts" query="select name, code1, code2, code3 from
> parts">  <field column="name" name="name" />
>    <entity name="shareddictionary1" query="select code, description
> from partcodes" where="code=parts.code1">
>      <field column="description" name="code1desc" /></entity>
>    <entity name="shareddictionary2" query="select code, description
> from partcodes" where="code=parts.code2">
>      <field column="description" name="code1desc" /></entity>
>    <entity name="shareddictionary3" query="select code, description
> from partcodes" where="code=parts.code3">
>      <field column="description" name="code1desc" /></entity>
> </entity>
>
> Kind of a simplified example but in this case the dictionary query has
> to be run 3 times to join 3 different columns.  It would be nice if I
> could load the data set once as an entity and specify how to join it
> in code without requiring a separate sql query.  Any ideas?
>



--
Sincerely yours
Mikhail Khludnev
Developer
Grid Dynamics
tel. 1-415-738-8644
Skype: mkhludnev
<http://www.griddynamics.com>
 <[hidden email]>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Sharing dih "dictionaries"

Dyer, James
Just FYI that the final piece of SOLR-2382 has not been committed, and instead has been spun off to SOLR-2943.  So it you're using Trunk and you need the ability to persist a cache on disk and then read it back again later as an DIH entity, you'll need both SOLR-2943 and also a cache implementation.  We're using the BDB-JE cache from SOLR-2613 in production.  There is also one backed with a Lucene index (SOLR-2948).

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Brent Mills [mailto:[hidden email]]
Sent: Tuesday, December 06, 2011 2:43 PM
To: [hidden email]
Subject: RE: Sharing dih "dictionaries"

You're totally correct.  There's actually a link on the DIH page now which wasn't there when I had read it a long time ago.  I'm really looking forward to 4.0, it's got a ton of great new features.  Thanks for the links!!

-----Original Message-----
From: Mikhail Khludnev [mailto:[hidden email]]
Sent: Monday, December 05, 2011 10:45 PM
To: [hidden email]
Subject: Re: Sharing dih "dictionaries"

It looks like https://issues.apache.org/jira/browse/SOLR-2382 or even https://issues.apache.org/jira/browse/SOLR-2613.
I guess by using SOLR-2382 you can specify your own SortedMapBackedCache subclass which is able to share your Dictionary.

Regards

On Tue, Dec 6, 2011 at 12:26 AM, Brent Mills <[hidden email]> wrote:

> I'm not really sure how to title this but here's what I'm trying to do.
>
> I have a query that creates a rather large dictionary of codes that
> are shared across multiple fields of a base entity.  I'm using the
> cachedsqlentityprocessor but I was curious if there was a way to join
> this multiple times to the base entity so I can avoid having to reload
> it for each column join.
>
> Ex:
> <entity name="parts" query="select name, code1, code2, code3 from
> parts">  <field column="name" name="name" />
>    <entity name="shareddictionary1" query="select code, description
> from partcodes" where="code=parts.code1">
>      <field column="description" name="code1desc" /></entity>
>    <entity name="shareddictionary2" query="select code, description
> from partcodes" where="code=parts.code2">
>      <field column="description" name="code1desc" /></entity>
>    <entity name="shareddictionary3" query="select code, description
> from partcodes" where="code=parts.code3">
>      <field column="description" name="code1desc" /></entity>
> </entity>
>
> Kind of a simplified example but in this case the dictionary query has
> to be run 3 times to join 3 different columns.  It would be nice if I
> could load the data set once as an entity and specify how to join it
> in code without requiring a separate sql query.  Any ideas?
>



--
Sincerely yours
Mikhail Khludnev
Developer
Grid Dynamics
tel. 1-415-738-8644
Skype: mkhludnev
<http://www.griddynamics.com>
 <[hidden email]>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sharing dih "dictionaries"

Mikhail Khludnev
AFAIK DIH jar is separated from Solr war. Isn't there a chance to use DIH
from 4.0 in Solr 3.4?

James,
Sorry for hijacking the thread.
But, do you have a chance to review
https://issues.apache.org/jira/browse/SOLR-2947 I want to provide a patch
for fixing multi-threading in DIH. But formally speaking, this issue in
addition with https://issues.apache.org/jira/browse/SOLR-2933 blocks me.

Regards

On Wed, Dec 7, 2011 at 1:11 AM, Dyer, James <[hidden email]>wrote:

> Just FYI that the final piece of SOLR-2382 has not been committed, and
> instead has been spun off to SOLR-2943.  So it you're using Trunk and you
> need the ability to persist a cache on disk and then read it back again
> later as an DIH entity, you'll need both SOLR-2943 and also a cache
> implementation.  We're using the BDB-JE cache from SOLR-2613 in production.
>  There is also one backed with a Lucene index (SOLR-2948).
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Brent Mills [mailto:[hidden email]]
> Sent: Tuesday, December 06, 2011 2:43 PM
> To: [hidden email]
> Subject: RE: Sharing dih "dictionaries"
>
> You're totally correct.  There's actually a link on the DIH page now which
> wasn't there when I had read it a long time ago.  I'm really looking
> forward to 4.0, it's got a ton of great new features.  Thanks for the
> links!!
>
> -----Original Message-----
> From: Mikhail Khludnev [mailto:[hidden email]]
> Sent: Monday, December 05, 2011 10:45 PM
> To: [hidden email]
> Subject: Re: Sharing dih "dictionaries"
>
> It looks like https://issues.apache.org/jira/browse/SOLR-2382 or even
> https://issues.apache.org/jira/browse/SOLR-2613.
> I guess by using SOLR-2382 you can specify your own SortedMapBackedCache
> subclass which is able to share your Dictionary.
>
> Regards
>
> On Tue, Dec 6, 2011 at 12:26 AM, Brent Mills <[hidden email]> wrote:
>
> > I'm not really sure how to title this but here's what I'm trying to do.
> >
> > I have a query that creates a rather large dictionary of codes that
> > are shared across multiple fields of a base entity.  I'm using the
> > cachedsqlentityprocessor but I was curious if there was a way to join
> > this multiple times to the base entity so I can avoid having to reload
> > it for each column join.
> >
> > Ex:
> > <entity name="parts" query="select name, code1, code2, code3 from
> > parts">  <field column="name" name="name" />
> >    <entity name="shareddictionary1" query="select code, description
> > from partcodes" where="code=parts.code1">
> >      <field column="description" name="code1desc" /></entity>
> >    <entity name="shareddictionary2" query="select code, description
> > from partcodes" where="code=parts.code2">
> >      <field column="description" name="code1desc" /></entity>
> >    <entity name="shareddictionary3" query="select code, description
> > from partcodes" where="code=parts.code3">
> >      <field column="description" name="code1desc" /></entity>
> > </entity>
> >
> > Kind of a simplified example but in this case the dictionary query has
> > to be run 3 times to join 3 different columns.  It would be nice if I
> > could load the data set once as an entity and specify how to join it
> > in code without requiring a separate sql query.  Any ideas?
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Developer
> Grid Dynamics
> tel. 1-415-738-8644
> Skype: mkhludnev
> <http://www.griddynamics.com>
>  <[hidden email]>
>



--
Sincerely yours
Mikhail Khludnev
Developer
Grid Dynamics
tel. 1-415-738-8644
Skype: mkhludnev
<http://www.griddynamics.com>
 <[hidden email]>
Loading...