GData

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

GData

Doug Cutting
How hard would it be to build a GData server using Solr?  An
open-source, Lucene-based GData server would be a good thing to have.
Does this fit in Solr, or should it be a separate project?

http://code.google.com/apis/gdata/overview.html

Another summer of code project?

Doug
Reply | Threaded
Open this post in threaded view
|

Re: GData

Erik Hatcher
It would require some work to add this to Solr, but not a huge  
effort.  One of the most crucial missing pieces that I'm beginning to  
feel a strong need for is being able to update a single field in a  
Lucene index.  I notice the GData protocol supports this:

        <http://code.google.com/apis/gdata/protocol.html#Updating-an-entry>

So to turn it around to ask you a question, what would it take to  
allow a Lucene document to be "updatable" at the field granularity,  
such that no other fields need to be specified again?

The idea of using HTTP 1.1 PUT/DELETE methods has been discussed for  
Solr before, and I think it'd be a great idea to support Atom and  
GData, and perhaps even "legacy" RSS.  Currently Solr's request and  
response handling are pretty intertwined with the rest of the system  
and some decoupling needs to take place to facilitate plug-ability in  
the external interfaces.  Nothing too awfully difficult I don't  
think, but not something that is currently possible out of the box.

        Erik



On Apr 20, 2006, at 11:59 AM, Doug Cutting wrote:

> How hard would it be to build a GData server using Solr?  An open-
> source, Lucene-based GData server would be a good thing to have.  
> Does this fit in Solr, or should it be a separate project?
>
> http://code.google.com/apis/gdata/overview.html
>
> Another summer of code project?
>
> Doug

Reply | Threaded
Open this post in threaded view
|

Re: GData

Yonik Seeley
On 4/20/06, Erik Hatcher <[hidden email]> wrote:
> So to turn it around to ask you a question, what would it take to
> allow a Lucene document to be "updatable" at the field granularity,
> such that no other fields need to be specified again?

That sounds like quite a job in Lucene... one thing for a stored
field, but quite another for indexed fields.   Even if you could
update things like TermDocs, you don't know what terms are currently
pointing to your document.  I personally don't see an easy (or
remotely practical) way.

The easiest way I can think of to get that effect is to store all the
fields so you can re-create the Document and change the field being
updated.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: GData

Chris Hostetter-3

: The easiest way I can think of to get that effect is to store all the
: fields so you can re-create the Document and change the field being
: updated.

My brief reading of hte GData URL Doug sent suggestes that the overall
theme is content storage -- if that's the goal, mandating that "modify"
operations require all fields be stored wouldn't sacrifice functionality.

As far as the output format -- it seems to me that just like A9's
OpenSearch this cold probably be done entirely with an XSLT.

The input format is the trickier part .. that's where we'd definitely need
a more pluggable "parser" for dealing with incoming data. ... but we're
going to need that anyway if we want to support posting CSV files and
things like that.


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: GData

Erik Hatcher
In reply to this post by Yonik Seeley
This was mainly very wishful thinking on my part :)

Sadly, I'm still far from an expert on the low-level Lucene  
internals, so I'm only waving my hands at a high-level here.

Storing all fields is not a practical solution, at least in many  
situations.  So the GData update is quite a tricky one then.

        Erik


On Apr 20, 2006, at 1:11 PM, Yonik Seeley wrote:

> On 4/20/06, Erik Hatcher <[hidden email]> wrote:
>> So to turn it around to ask you a question, what would it take to
>> allow a Lucene document to be "updatable" at the field granularity,
>> such that no other fields need to be specified again?
>
> That sounds like quite a job in Lucene... one thing for a stored
> field, but quite another for indexed fields.   Even if you could
> update things like TermDocs, you don't know what terms are currently
> pointing to your document.  I personally don't see an easy (or
> remotely practical) way.
>
> The easiest way I can think of to get that effect is to store all the
> fields so you can re-create the Document and change the field being
> updated.
>
> -Yonik

Reply | Threaded
Open this post in threaded view
|

Re: GData

Erik Hatcher
In reply to this post by Chris Hostetter-3

On Apr 20, 2006, at 1:22 PM, Chris Hostetter wrote:

>
> : The easiest way I can think of to get that effect is to store all  
> the
> : fields so you can re-create the Document and change the field being
> : updated.
>
> My brief reading of hte GData URL Doug sent suggestes that the overall
> theme is content storage -- if that's the goal, mandating that  
> "modify"
> operations require all fields be stored wouldn't sacrifice  
> functionality.

Ah, good point.

> As far as the output format -- it seems to me that just like A9's
> OpenSearch this cold probably be done entirely with an XSLT.

XSLT is one of the most evil creations ever bestowed upon humans.  :O

But, at least Solr is set up for this sort of thing already and  
perhaps it could be just the thing to get this working.  I still will  
be working towards a more flexible output system to allow results to  
be formatted, for example, in Ruby code itself to allow for the most  
performant way to communicate results.

        Erik


Reply | Threaded
Open this post in threaded view
|

Re: GData

jason rutherglen-2
In reply to this post by Doug Cutting
Does this mean that the Google system does some sort of realtime replication?

----- Original Message ----
From: Doug Cutting <[hidden email]>
To: [hidden email]
Sent: Thursday, April 20, 2006 8:59:01 AM
Subject: GData

How hard would it be to build a GData server using Solr?  An
open-source, Lucene-based GData server would be a good thing to have.
Does this fit in Solr, or should it be a separate project?

http://code.google.com/apis/gdata/overview.html

Another summer of code project?

Doug



Reply | Threaded
Open this post in threaded view
|

Re: GData

YoavShapira
One might think the Google GData system is agnostic to and works quite
well with a distributed filesystem like Google's
(http://labs.google.com/papers/gfs-sosp2003.pdf)

Getting back to Doug's original point about this as a possible SoC
project: it seems a little too big from the technical discussion so
far.  Part of the SoC's goal is to motivate a student into being able
to say they created something, and getting them to enjoy the
open-source development process.  If a piece of work is big enough
that a student working on it part-time during the summer can't finish
it (or finish a beta / 1.0 version), it could be frustrating to the
student, which would go against the spirit of SoC as I understand
it...

Yoav

On 4/20/06, jason rutherglen <[hidden email]> wrote:

> Does this mean that the Google system does some sort of realtime replication?
>
> ----- Original Message ----
> From: Doug Cutting <[hidden email]>
> To: [hidden email]
> Sent: Thursday, April 20, 2006 8:59:01 AM
> Subject: GData
>
> How hard would it be to build a GData server using Solr?  An
> open-source, Lucene-based GData server would be a good thing to have.
> Does this fit in Solr, or should it be a separate project?
>
> http://code.google.com/apis/gdata/overview.html
>
> Another summer of code project?
>
> Doug
>
>
>
>
>


--
Yoav Shapira
Nimalex LLC
1 Mifflin Place, Suite 310
Cambridge, MA, USA
[hidden email] / www.yoavshapira.com
Reply | Threaded
Open this post in threaded view
|

Re: GData

Yonik Seeley
On 4/20/06, Yoav Shapira <[hidden email]> wrote:
> Getting back to Doug's original point about this as a possible SoC
> project: it seems a little too big from the technical discussion so
> far.

There is probably a lot one could do to tailor the scope of many
projects to fit (making many parts of the spec optional, etc).

> Part of the SoC's goal is to motivate a student into being able
> to say they created something, and getting them to enjoy the
> open-source development process.

Agreed... you would want something that works at the end, but a
minimal implementation that just supported CREATE, GET, and maybe the
simplest query, might not take too long.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: GData

Yoav Shapira-2
> There is probably a lot one could do to tailor the scope of many
> projects to fit (making many parts of the spec optional, etc).
>
> > Part of the SoC's goal is to motivate a student into being able
> > to say they created something, and getting them to enjoy the
> > open-source development process.
>
> Agreed... you would want something that works at the end, but a
> minimal implementation that just supported CREATE, GET, and maybe the
> simplest query, might not take too long.

Yup, totally agreed, just wanted to make sure we kept that student
perspective in mind.

Yoav


--
Yoav Shapira
Nimalex LLC
1 Mifflin Place, Suite 310
Cambridge, MA, USA
[hidden email] / www.yoavshapira.com
Reply | Threaded
Open this post in threaded view
|

Re: GData

Doug Cutting
In reply to this post by YoavShapira
Yoav Shapira wrote:
> Getting back to Doug's original point about this as a possible SoC
> project: it seems a little too big from the technical discussion so
> far.

It might actually be a simpler project if it were standalone: not built
into Solr, but rather a Lucene contrib project.  One only has to write a
few servlets that translate each requests into Lucene events: add,
delete, delete+add, or query.  It wouldn't have lots of Solr's fancy
features (faceted searching, replication, etc.) but could still be a
very useful thing.  Do folks think that would be a tractable SoC project?

Doug
Reply | Threaded
Open this post in threaded view
|

Re: GData

Yoav Shapira-2
Hola,

> It might actually be a simpler project if it were standalone: not built
> into Solr, but rather a Lucene contrib project.  One only has to write a
> few servlets that translate each requests into Lucene events: add,
> delete, delete+add, or query.  It wouldn't have lots of Solr's fancy
> features (faceted searching, replication, etc.) but could still be a
> very useful thing.  Do folks think that would be a tractable SoC project?
>
> Doug

Yeah, and a cool one at that, +1.

Yoav
Reply | Threaded
Open this post in threaded view
|

Re: GData

Doug Cutting
Yoav Shapira wrote:
> Yeah, and a cool one at that, +1.

Would you (or someone else) be willing to co-mentor this one with me?
I'm travelling the month of July, so I'm hesitant to be the sole mentor.
  (I'll be online, but at reduced capacity.)

If I have a co-mentor, then I'd be happy to write up the proposal.

Doug
Reply | Threaded
Open this post in threaded view
|

Re: GData

Yonik Seeley
In reply to this post by Doug Cutting
On 4/20/06, Doug Cutting <[hidden email]> wrote:
> It might actually be a simpler project if it were standalone: not built
> into Solr, but rather a Lucene contrib project. One only has to write a
> few servlets that translate each requests into Lucene events: add,
> delete, delete+add, or query.

At first blush, that's the approach I would take with Solr too (a
gdata specific Servlet that interfaced to Solr).  So I don't see a big
difference in difficulty level.

It shouldn't be hard to take a straight lucene-servlet version and
adapt it to Solr later, so It would be a benefit regardless.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: GData

Yonik Seeley
In reply to this post by Doug Cutting
On 4/20/06, Doug Cutting <[hidden email]> wrote:
> Would you (or someone else) be willing to co-mentor this one with me?
> I'm travelling the month of July, so I'm hesitant to be the sole mentor.
>   (I'll be online, but at reduced capacity.)
>
> If I have a co-mentor, then I'd be happy to write up the proposal.

OK, I'm up for it...
I don't really know my summer schedule, but I doubt I would be out
more than a week at a time.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: GData

Yoav Shapira-2
Good.  I'll be available on a time-permitting basis as always, but I
don't want to commit as a mentor for this, so having you two makes me
feel at ease ;)

Yoav

On 4/20/06, Yonik Seeley <[hidden email]> wrote:

> On 4/20/06, Doug Cutting <[hidden email]> wrote:
> > Would you (or someone else) be willing to co-mentor this one with me?
> > I'm travelling the month of July, so I'm hesitant to be the sole mentor.
> >   (I'll be online, but at reduced capacity.)
> >
> > If I have a co-mentor, then I'd be happy to write up the proposal.
>
> OK, I'm up for it...
> I don't really know my summer schedule, but I doubt I would be out
> more than a week at a time.
>
> -Yonik
>


--
Yoav Shapira
Nimalex LLC
1 Mifflin Place, Suite 310
Cambridge, MA, USA
[hidden email] / www.yoavshapira.com