Solr document server/component

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr document server/component

Jason Rutherglen
It would be useful to enable a Solr document server (running
Cassandra, HBase, or Voldemort) to be transparently integrated
into a Solr search cluster. Meaning if I do a document update,
Solr underneath sends the document data to the dedicated
document server cluster. When performing a query, the results
would automatically include documents from the document servers.
This would fairly seamlessly separate the stored fields from the
indexed fields. Has there been work along these lines yet?
Reply | Threaded
Open this post in threaded view
|

Re: Solr document server/component

Grant Ingersoll-2


On Aug 10, 2009, at 7:30 PM, Jason Rutherglen wrote:

> It would be useful to enable a Solr document server (running
> Cassandra, HBase, or Voldemort) to be transparently integrated
> into a Solr search cluster. Meaning if I do a document update,
> Solr underneath sends the document data to the dedicated
> document server cluster. When performing a query, the results
> would automatically include documents from the document servers.
> This would fairly seamlessly separate the stored fields from the
> indexed fields. Has there been work along these lines yet?

I don't think any work has been done on it, but you are correct.  I'd  
add in memcached and even, gasp, a DB, as well...  Of course, the more  
important thing is the interface.
Reply | Threaded
Open this post in threaded view
|

Re: Solr document server/component

Noble Paul നോബിള്‍  नोब्ळ्-2
the point is that the search will be done on committed docs and the
stored fields will show uncommitted data.how are we going to deal with
this mismatch?

On Tue, Aug 11, 2009 at 6:50 AM, Grant Ingersoll<[hidden email]> wrote:

>
>
> On Aug 10, 2009, at 7:30 PM, Jason Rutherglen wrote:
>
>> It would be useful to enable a Solr document server (running
>> Cassandra, HBase, or Voldemort) to be transparently integrated
>> into a Solr search cluster. Meaning if I do a document update,
>> Solr underneath sends the document data to the dedicated
>> document server cluster. When performing a query, the results
>> would automatically include documents from the document servers.
>> This would fairly seamlessly separate the stored fields from the
>> indexed fields. Has there been work along these lines yet?
>
> I don't think any work has been done on it, but you are correct.  I'd add in
> memcached and even, gasp, a DB, as well...  Of course, the more important
> thing is the interface.
>



--
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com
Reply | Threaded
Open this post in threaded view
|

Re: Solr document server/component

Jason Rutherglen
The ids would be stored in Solr, when a query is executed the
ids would be looked up from the DocumentService interface which
underneath would obtain the document data (aka stored fields).
Solr would then return the results combining the two, and
present the stored fields as it does today.

For updating something similar would occur, where the stored
fields are shuffled off to the DocumentService update method.

> how are we going to deal with this mismatch?

This is why we'd want to use optimistic concurrency, meaning
versions of ids encoded in the documents (which would also be
stored in the index ala GData).

2009/8/11 Noble Paul നോബിള്‍  नोब्ळ् <[hidden email]>:

> the point is that the search will be done on committed docs and the
> stored fields will show uncommitted data.how are we going to deal with
> this mismatch?
>
> On Tue, Aug 11, 2009 at 6:50 AM, Grant Ingersoll<[hidden email]> wrote:
>>
>>
>> On Aug 10, 2009, at 7:30 PM, Jason Rutherglen wrote:
>>
>>> It would be useful to enable a Solr document server (running
>>> Cassandra, HBase, or Voldemort) to be transparently integrated
>>> into a Solr search cluster. Meaning if I do a document update,
>>> Solr underneath sends the document data to the dedicated
>>> document server cluster. When performing a query, the results
>>> would automatically include documents from the document servers.
>>> This would fairly seamlessly separate the stored fields from the
>>> indexed fields. Has there been work along these lines yet?
>>
>> I don't think any work has been done on it, but you are correct.  I'd add in
>> memcached and even, gasp, a DB, as well...  Of course, the more important
>> thing is the interface.
>>
>
>
>
> --
> -----------------------------------------------------
> Noble Paul | Principal Engineer| AOL | http://aol.com
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr document server/component

Noble Paul നോബിള്‍  नोब्ळ्-2
> This is why we'd want to use optimistic concurrency, meaning
> versions of ids encoded in the documents (which would also be
> stored in the index ala GData).

So ,when the stored fields are fetched from the "DocumentService" the
version number also is sent?

As Grant mentioned , it would be nice to have the interface ready and
make it possible to have multiple implementations

--
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com
Reply | Threaded
Open this post in threaded view
|

Re: Solr document server/component

Jason Rutherglen
> So ,when the stored fields are fetched from the
"DocumentService" the version number also is sent?

Yes.

You're probably wondering how we manage the older versions of an
object (aka id + version) in the DocService. We'd probably
delete the object as it's deleted from the index?

I use the word "object" to denote something different than a
document which can be short lived. An object exists over multiple
updates.

We can start with the interface, and implement a default
implementation using Apache licensed Voldemort
http://project-voldemort.com ?

2009/8/11 Noble Paul നോബിള്‍  नोब्ळ् <[hidden email]>:

>> This is why we'd want to use optimistic concurrency, meaning
>> versions of ids encoded in the documents (which would also be
>> stored in the index ala GData).
>
> So ,when the stored fields are fetched from the "DocumentService" the
> version number also is sent?
>
> As Grant mentioned , it would be nice to have the interface ready and
> make it possible to have multiple implementations
>
> --
> -----------------------------------------------------
> Noble Paul | Principal Engineer| AOL | http://aol.com
>