Merging solr indexes with duplicate keys - merging duplicate documents

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Merging solr indexes with duplicate keys - merging duplicate documents

gagan_goku
Hi folks

We have a use case where i have 2 solr indexes with the same schema but
different field populated, for example:

Common schema:
<field name="url" type="text" />      // Unique key
<field name="product_name" type="text" />
<field name="image" type="text" />
<field name="brand" type="text" />
<field name="description" type="text" />

<field name="out_of_stock" type="boolean" />
<field name="num_likes" type="int" />
<field name="num_add_2_cart" type="int" />

Now i have one index which stores the information about products (first 5
fields). This index is built every 2 days.
I have a 2nd index which stores social signals (url + out_of_stock  +
num_likes + num_add_2_cart). This index is built every 2 hours and is used
for a near realtime boosting products.
The processes for building these indexes are independent, and for
operational management and for sake of reuse i would like to build these
indexes separately.

My question is, is there a convenient way of merging these 2 indexes (other
than applying document updates in a loop)? The IndexMergeTool from lucene
is not capable of applying document updates and would end up keeping either
first 5 field or last 3.

Thanks
Gagan
Reply | Threaded
Open this post in threaded view
|

Re: Merging solr indexes with duplicate keys - merging duplicate documents

Malcolm Upayavira Holmes


On Sun, Mar 31, 2013, at 05:53 AM, Gagandeep singh wrote:

> Hi folks
>
> We have a use case where i have 2 solr indexes with the same schema but
> different field populated, for example:
>
> Common schema:
> <field name="url" type="text" />      // Unique key
> <field name="product_name" type="text" />
> <field name="image" type="text" />
> <field name="brand" type="text" />
> <field name="description" type="text" />
>
> <field name="out_of_stock" type="boolean" />
> <field name="num_likes" type="int" />
> <field name="num_add_2_cart" type="int" />
>
> Now i have one index which stores the information about products (first 5
> fields). This index is built every 2 days.
> I have a 2nd index which stores social signals (url + out_of_stock  +
> num_likes + num_add_2_cart). This index is built every 2 hours and is
> used
> for a near realtime boosting products.
> The processes for building these indexes are independent, and for
> operational management and for sake of reuse i would like to build these
> indexes separately.
>
> My question is, is there a convenient way of merging these 2 indexes
> (other
> than applying document updates in a loop)? The IndexMergeTool from lucene
> is not capable of applying document updates and would end up keeping
> either
> first 5 field or last 3.

Why do you want to merge them? What sort of queries do you want to do?
What sort of responses do you need?

Upayavira
Reply | Threaded
Open this post in threaded view
|

Re: Merging solr indexes with duplicate keys - merging duplicate documents

gagan_goku
Not sure if my mail was unclear, but I want to merge them so that i can
make use of social signals when searching. A signal like num_likes can be
used as a multiplicative boost to show documents that are hot.

The reason why we are building 2 separate indexes is because our base data
doesn't change fast enough, whereas social signals are much more realtime.
So the question is, is there a way of merging 2 indexes which can handle
duplicate documents the way i want it to?


Thanks
Gagan


On Sun, Mar 31, 2013 at 8:53 PM, Upayavira <[hidden email]> wrote:

>
>
> On Sun, Mar 31, 2013, at 05:53 AM, Gagandeep singh wrote:
> > Hi folks
> >
> > We have a use case where i have 2 solr indexes with the same schema but
> > different field populated, for example:
> >
> > Common schema:
> > <field name="url" type="text" />      // Unique key
> > <field name="product_name" type="text" />
> > <field name="image" type="text" />
> > <field name="brand" type="text" />
> > <field name="description" type="text" />
> >
> > <field name="out_of_stock" type="boolean" />
> > <field name="num_likes" type="int" />
> > <field name="num_add_2_cart" type="int" />
> >
> > Now i have one index which stores the information about products (first 5
> > fields). This index is built every 2 days.
> > I have a 2nd index which stores social signals (url + out_of_stock  +
> > num_likes + num_add_2_cart). This index is built every 2 hours and is
> > used
> > for a near realtime boosting products.
> > The processes for building these indexes are independent, and for
> > operational management and for sake of reuse i would like to build these
> > indexes separately.
> >
> > My question is, is there a convenient way of merging these 2 indexes
> > (other
> > than applying document updates in a loop)? The IndexMergeTool from lucene
> > is not capable of applying document updates and would end up keeping
> > either
> > first 5 field or last 3.
>
> Why do you want to merge them? What sort of queries do you want to do?
> What sort of responses do you need?
>
> Upayavira
>
Reply | Threaded
Open this post in threaded view
|

Re: Merging solr indexes with duplicate keys - merging duplicate documents

Ted Dunning
Just scan one index and do read-modify-write on the other index.

There are probably better ways to do this by storing your fast moving
social signals in a non-lucene storage system.  Even something as simple as
a sequential buffer in a file might be sufficient for your needs.  Unless
you have symmetric needs for querying both sides, it may pay to retain
design flexibility on this point.


On Sun, Mar 31, 2013 at 7:44 PM, Gagandeep singh <[hidden email]>wrote:

> Not sure if my mail was unclear, but I want to merge them so that i can
> make use of social signals when searching. A signal like num_likes can be
> used as a multiplicative boost to show documents that are hot.
>
> The reason why we are building 2 separate indexes is because our base data
> doesn't change fast enough, whereas social signals are much more realtime.
> So the question is, is there a way of merging 2 indexes which can handle
> duplicate documents the way i want it to?
>
>
> Thanks
> Gagan
>
>
> On Sun, Mar 31, 2013 at 8:53 PM, Upayavira <[hidden email]> wrote:
>
> >
> >
> > On Sun, Mar 31, 2013, at 05:53 AM, Gagandeep singh wrote:
> > > Hi folks
> > >
> > > We have a use case where i have 2 solr indexes with the same schema but
> > > different field populated, for example:
> > >
> > > Common schema:
> > > <field name="url" type="text" />      // Unique key
> > > <field name="product_name" type="text" />
> > > <field name="image" type="text" />
> > > <field name="brand" type="text" />
> > > <field name="description" type="text" />
> > >
> > > <field name="out_of_stock" type="boolean" />
> > > <field name="num_likes" type="int" />
> > > <field name="num_add_2_cart" type="int" />
> > >
> > > Now i have one index which stores the information about products
> (first 5
> > > fields). This index is built every 2 days.
> > > I have a 2nd index which stores social signals (url + out_of_stock  +
> > > num_likes + num_add_2_cart). This index is built every 2 hours and is
> > > used
> > > for a near realtime boosting products.
> > > The processes for building these indexes are independent, and for
> > > operational management and for sake of reuse i would like to build
> these
> > > indexes separately.
> > >
> > > My question is, is there a convenient way of merging these 2 indexes
> > > (other
> > > than applying document updates in a loop)? The IndexMergeTool from
> lucene
> > > is not capable of applying document updates and would end up keeping
> > > either
> > > first 5 field or last 3.
> >
> > Why do you want to merge them? What sort of queries do you want to do?
> > What sort of responses do you need?
> >
> > Upayavira
> >
>