[VOTE] merge lucene/solr development (take 3)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
126 messages Options
1234567
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Dennis Kubes-2
True.  There are features that aren't useful for every search.  But the
features in Lucene are meant for full text search, not for serving full
text search.  Maybe faceting was a bad example, it was the first that
came to mind and defines what many people use Solr for.

Lucene IMO is a full text search *library*.  When features are added to
it, that is the perspective that should be taken.  Does it work as a
general purpose indexing library?

I am all for adding in *very* useful features, especially when someone
else has done the work, as long as they fall into that boundary.  But
Solr isn't a search library, it is a search server.  Aren't those
separate responsibilities?  Should we take some of the things out of
Solr and put them into Lucene?  Absolutely.  Should we merge to do this.
  No, not IMO.

And Power is good in the right hands, but that is another discussion :)

Dennis

Ted Dunning wrote:

> There are scads of features of Lucene that are not useful for all
> applications (payloads, for one example, back compatibility for another).
>
> The point is that the option to use faceting or not would be *very* useful
> for all search applications.  Power is good, especially since somebody else
> has done the work already.
>
> On Mon, Mar 8, 2010 at 10:10 PM, Dennis Kubes <[hidden email]> wrote:
>
>> Faceting for example, great feature, but not useful in every full text
>> search.
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Mattmann, Chris A (3010)
In reply to this post by Grant Ingersoll-2
Hey Grant,

On 3/9/10 5:49 AM, "Grant Ingersoll" <[hidden email]> wrote:

> For that matter, why do we even need to have this discussion at all?  Most of
> us Solr committers are Lucene committers.  We can simply start committing Solr
> code to Lucene such that in 6 months the whole discussion is moot and the
> three committers on Solr who aren't Lucene committers can earn their Lucene
> merit very quickly by patching the "Solr" portion of Lucene.

Sure, if folks agree on those patches and the community finds them useful,
and the patches follow the dev process of Lucene(-java), then so be it.
However, it seems like this could have been done already, no? Many of the
things you and others have discussed merging have been around for a while
besides spatial. Is it simply developers/resources that is lacking in
Lucene(-java) and time? Or are there other reasons? It sounds to me based on
the desire to sync up tests, to follow the same release schedule/etc., that
there are in fact, other reasons.

> We can move all
> the code to it's appropriate place, add a contrib module for the WAR stuff and
> the response writers and voila, Solr is in Lucene, the dev mailing lists have
> merged by the fact that Solr dev would be defunct and all of the proposals in
> this vote are implemented simply by employing our commit privileges in a
> concerted way.  Yet, somehow, me thinks that isn't a good solution either,
> right?  Yet it is perfectly "legal" and is just as valid a solution as the
> "poaching" solution and in a lot of ways seems to be what Chris is proposing.

Whether or not what you're saying is good or what I'm saying is good or not
will be decided by the community within Lucene(-java), as well as the one
within Solr. All I'm for is not circumventing that process, in any
direction. If what you suggest above happens in a concise, traceable,
beneficial way to both projects and communities, then I'm for that.

At the same time, I also favor insulation wherever possible and I personally
like the separation of the 2 projects. I have built 10s of projects that
have simply used Lucene as an API and had no need for Solr, and I've built
10s of projects where Solr made perfect sense. So, I appreciate their
separation. I also have a lot of experience in these types of situations as
I've been involved in 2-3 of them over the past few years at NASA in terms
of maintaining separation and merging projects/etc. There are quite a few
lessons learned that I have been trying to share but that have seemingly not
really been appreciated and that have been in my mind dismissed, rather than
discussed, through this process.

Cheers,
Chris


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [hidden email]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Dennis Kubes-2
In reply to this post by Ted Dunning
It was late when I wrote that, maybe my analogy was not clear.  You are
echoing what I was trying to say that that Hadoop != Nutch and it
wouldn't have been as useful if it had only ever been viewed that way.
I think part of this discussion is looking at Lucene as needing things
that are beyond it.  That should be other projects.

Here is my logic FWIW:

Solr depends on Lucene.
Many other projects depend critically on Lucene
Not all of those projects depend on Solr
Solr and Lucene have different responsibilities
Therefore Solr != Lucene and should not merge dev.

Should Solr work more closely to move some of it pieces into Lucene if
they are applicable.  Yes.  To me that doesn't mean merge.

Dennis

Ted Dunning wrote:

> This logic escapes me.
>
> Nutch hatched Hadoop.  Hadoop was perceived to be of much broader utility
> than just for nutch so it was made more general and a separate project was
> formed.  Hadoop does not depend on Nutch.
>
> Lucene existed.  Solr was built to make it easier to use Lucene.  The
> developers of Solr built a bunch of stuff that was specific to server-ness
> and a bunch of stuff that would have general utility to many Lucene
> developers.  Solr depends critically on Lucene and can be seen as a Lucene
> wrapper.
>
> How does this analogy fit together?  Is it supposed to be Hadoop is to Nutch
> as Solr is to Lucene?  That seems so clearly wrong it can't be what you were
> saying.
>
> On Mon, Mar 8, 2010 at 10:10 PM, Dennis Kubes <[hidden email]> wrote:
>
>>> 3) For new Lucene features, there would be an effort to integrate it
>>> into Solr.
>> No.  Because by specializing towards Solr, or Nutch, or any of the hundred
>> other applications that use Lucene, it looses its general applicability.
>>  Where would Hadoop be if it never made it past Nutch?
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Mattmann, Chris A (3010)
Hi Dennis,

> It was late when I wrote that, maybe my analogy was not clear.  You are
> echoing what I was trying to say that that Hadoop != Nutch and it
> wouldn't have been as useful if it had only ever been viewed that way.
> I think part of this discussion is looking at Lucene as needing things
> that are beyond it.  That should be other projects.
>
> Here is my logic FWIW:
>
> Solr depends on Lucene.
> Many other projects depend critically on Lucene
> Not all of those projects depend on Solr
> Solr and Lucene have different responsibilities
> Therefore Solr != Lucene and should not merge dev.
>
> Should Solr work more closely to move some of it pieces into Lucene if
> they are applicable.  Yes.  To me that doesn't mean merge.

+1, my sentiments exactly.

Cheers,
Chris

>
> Dennis
>
> Ted Dunning wrote:
>> This logic escapes me.
>>
>> Nutch hatched Hadoop.  Hadoop was perceived to be of much broader utility
>> than just for nutch so it was made more general and a separate project was
>> formed.  Hadoop does not depend on Nutch.
>>
>> Lucene existed.  Solr was built to make it easier to use Lucene.  The
>> developers of Solr built a bunch of stuff that was specific to server-ness
>> and a bunch of stuff that would have general utility to many Lucene
>> developers.  Solr depends critically on Lucene and can be seen as a Lucene
>> wrapper.
>>
>> How does this analogy fit together?  Is it supposed to be Hadoop is to Nutch
>> as Solr is to Lucene?  That seems so clearly wrong it can't be what you were
>> saying.
>>
>> On Mon, Mar 8, 2010 at 10:10 PM, Dennis Kubes <[hidden email]> wrote:
>>
>>>> 3) For new Lucene features, there would be an effort to integrate it
>>>> into Solr.
>>> No.  Because by specializing towards Solr, or Nutch, or any of the hundred
>>> other applications that use Lucene, it looses its general applicability.
>>>  Where would Hadoop be if it never made it past Nutch?
>>>
>>
>


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [hidden email]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Robert Muir
In reply to this post by Mattmann, Chris A (3010)
> besides spatial. Is it simply developers/resources that is lacking in
> Lucene(-java) and time? Or are there other reasons?

While its true my analysis patches to fix solr just sit there in JIRA
because Solr developers
are busy working on other tasks such as spatial, on the other hand,
Chris Male's spatial
patches to lucene just sit there in JIRA because Lucene developers are
busy working on
other tasks such as analysis.

So in my opinion, it would be nice if both projects tried to make the
best of our limited
 resources.

--
Robert Muir
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Dennis Kubes-2
In reply to this post by Michael McCandless-2


Michael McCandless wrote:

> On Tue, Mar 9, 2010 at 5:10 AM, Andrzej Bialecki <[hidden email]> wrote:
>
>> Re: Nutch components - those that are reusable in Lucene or Solr
>> contexts eventually find their way to respective projects, witness
>> e.g. CommonGrams.
>
> In fact I think this is a great example -- as far as I can tell,
> CommonGrams was poached from Nutch, into Solr, and then was
> nurtured/improved in both projects separately right?
>
> So.... can/should we freely poach across all our sub projects?

IMO yes.  Absolutely.  That is exactly what OSS is all about.  Find
something useful, improve upon it.

>
> It has obvious downsides (it's essentially a fork that will confuse
> those users that use both Solr & Lucene, in the short term, until
> things "stabilize" into a clean refactoring; it's double the dev; we
> must re-sync with time; etc.).

True.  OSS development is messy at times.  And it can take longer.

>
> But it has a massive upside: it means we don't rely only on "push"
> (Solr devs to push into Lucene or vice/versa).  We can also use "pull"
> (Lucene devs can pull pieces from Nutch/Solr into Lucene).  It becomes
> a 2-way street for "properly" factoring our shared code with time.
>
> If we had that freedom ("poaching is perfectly fine"), then,
> interested devs could freely "refactor" across sub projects.

There is nothing stopping any developer, committer or not from making
changes to any apache project including Nutch, Lucene, and Solr.
Merging doesn't change or improve that.  At best it makes it more
confusing where responsibilities lie.

Dennis

>
> Not having this freedom today, and not having merged dev, is stunting
> both Solr & Lucene's growth.
>
> Mike
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Yonik Seeley
In reply to this post by Mattmann, Chris A (3010)
On Tue, Mar 9, 2010 at 9:48 AM, Mattmann, Chris A (388J)
<[hidden email]> wrote:
> I have built 10s of projects that
> have simply used Lucene as an API and had no need for Solr, and I've built
> 10s of projects where Solr made perfect sense. So, I appreciate their
> separation.

As does everyone - which is why there will always be separate
downloads.  As a user, the only side affect you should see is an
improved Lucene and Solr.

Saying that Solr should move some stuff to Lucene for Lucene's
benefit, without regard to if it's actually benefitial to Solr, is a
non-starter.  The lucene/solr committers have been down that road
before.  The solution that most committers agreed would improve the
development of both projects is to merge development.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Mattmann, Chris A (3010)
Hi Yonik,

>> I have built 10s of projects that
>> have simply used Lucene as an API and had no need for Solr, and I've built
>> 10s of projects where Solr made perfect sense. So, I appreciate their
>> separation.
>
> As does everyone - which is why there will always be separate
> downloads.  As a user, the only side affect you should see is an
> improved Lucene and Solr.

Developers make downloads. Software processes guide developers who are
producing those downloads. Policies guide the direction of a project. They
are intimately intertwined.

>
> Saying that Solr should move some stuff to Lucene for Lucene's
> benefit, without regard to if it's actually benefitial to Solr, is a
> non-starter.  

I'm not sure it's Solr's decision what the Lucene committers decide to move
to Lucene, neither is it Lucene's decision in the opposite direction. These
are all Apache projects, subprojects of the Lucene TLP. I'm not sure what
the debate is? If Solr wants elements from Lucene that aren't part of Solr
yet b/c Solr is relying on an old version of Lucene:

1. upgrade to Lucene trunk and address the issues it brings in Solr
2. duplicate the Lucene code in Solr, address any issues there, and then
contribute it back

I'd recommend the same to any project, regardless of what TLP it resides in,
and in many cases, where it's at the ASF, or Sourceforge, or wherever.

It seems kind of incestuous and an abuse of power to make the case that
"well because we're all committers on both projects, then this..." I keep
hearing a lot of talk about "hats", which in analogy means that though you
are one person you have different concerns/projects/etc. This is another
example of the need to maintain separate hats.

Cheers,
Chris
 

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [hidden email]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Robert Muir
> 2. duplicate the Lucene code in Solr, address any issues there, and then
> contribute it back
>

Not that I can stop anything, but -1 to any further analysis code
duplication. There has to be a better way.

Its stupid to duplicate the code. Its also stupid to just move the code.

In my opinion, if Solr has an analysis feature that belongs in lucene,
we want not just the code, but improvements, ideas, comments from solr
developers that have experience with it, we want to pay
attention to open Solr JIRA tickets against that feature, we want
things like the Solr example schema to have good "defaults" for any
potential improvements to it, and we want the wiki to reflect
additional improvements it supports.

--
Robert Muir
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Mattmann, Chris A (3010)
Hi Robert,

>> 2. duplicate the Lucene code in Solr, address any issues there, and then
>> contribute it back
>>
>
> Not that I can stop anything, but -1 to any further analysis code
> duplication. There has to be a better way.

There might be, but as a first start, duplication is a quick way to get
going and experiment. As solutions that evolve over time are matured, the
time can come for integration. Parallel tracks allows projects to move
forward operationally, and enforces insulation, loose coupling and other
properties.

>
> Its stupid to duplicate the code. Its also stupid to just move the code.

I'm not sure anything is "stupid" per-se. There are different approaches
which address different concerns.

>
> In my opinion, if Solr has an analysis feature that belongs in lucene,
> we want not just the code, but improvements, ideas, comments from solr
> developers that have experience with it, we want to pay
> attention to open Solr JIRA tickets against that feature, we want
> things like the Solr example schema to have good "defaults" for any
> potential improvements to it, and we want the wiki to reflect
> additional improvements it supports.

Then I think it's fair to say that those that want this can follow the
normal Apache process to do so:

* subscribe to the mailing lists
* post JIRA issues with patches
* get those patches committed
* become a committer, etc.

Cheers,
Chris


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [hidden email]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Grant Ingersoll-2
In reply to this post by Mattmann, Chris A (3010)

On Mar 9, 2010, at 9:48 AM, Mattmann, Chris A (388J) wrote:

> Hey Grant,
>
> On 3/9/10 5:49 AM, "Grant Ingersoll" <[hidden email]> wrote:
>
>> For that matter, why do we even need to have this discussion at all?  Most of
>> us Solr committers are Lucene committers.  We can simply start committing Solr
>> code to Lucene such that in 6 months the whole discussion is moot and the
>> three committers on Solr who aren't Lucene committers can earn their Lucene
>> merit very quickly by patching the "Solr" portion of Lucene.
>
> Sure, if folks agree on those patches and the community finds them useful,
> and the patches follow the dev process of Lucene(-java), then so be it.
> However, it seems like this could have been done already, no? Many of the
> things you and others have discussed merging have been around for a while
> besides spatial. Is it simply developers/resources that is lacking in
> Lucene(-java) and time? Or are there other reasons? It sounds to me based on
> the desire to sync up tests, to follow the same release schedule/etc., that
> there are in fact, other reasons.

Um, I'm a committer.  I've earned the right to apply patches that fit with the project and I've earned the merit to make that decision.  So have all the other committers.   Besides the fact, all I would be committing are the things people have already expressed an interest in anyway.

>
>> We can move all
>> the code to it's appropriate place, add a contrib module for the WAR stuff and
>> the response writers and voila, Solr is in Lucene, the dev mailing lists have
>> merged by the fact that Solr dev would be defunct and all of the proposals in
>> this vote are implemented simply by employing our commit privileges in a
>> concerted way.  Yet, somehow, me thinks that isn't a good solution either,
>> right?  Yet it is perfectly "legal" and is just as valid a solution as the
>> "poaching" solution and in a lot of ways seems to be what Chris is proposing.
>
> Whether or not what you're saying is good or what I'm saying is good or not
> will be decided by the community within Lucene(-java), as well as the one
> within Solr. All I'm for is not circumventing that process, in any
> direction. If what you suggest above happens in a concise, traceable,
> beneficial way to both projects and communities, then I'm for that.

No one is circumventing any process and the implication is just wrong.  We are having the discussion.  But even so, as a committer, my job is to work within community to fix/improve the code.  Right now, I see lots of room for improvement in Lucene by integrating some of those things from Solr (and Nutch) while keeping Lucene, Solr and Nutch whole from an end user perspective.  At the same time, I want to see Solr and Nutch whole.   Any other implication is simply wrong.

>
> At the same time, I also favor insulation wherever possible and I personally
> like the separation of the 2 projects. I have built 10s of projects that
> have simply used Lucene as an API and had no need for Solr, and I've built
> 10s of projects where Solr made perfect sense.

And how at all would those 10 projects be affected at all?  Please read the proposal again.  It's not like there is going to be some uber JAR.  I won't let it happen as I have more than 10 projects that are pure Lucene.  Part of my day job is supporting Lucene.  I've spent the past 5 years of my life donating to Lucene, and so have many others.  The argument is simply invalid and has been refuted so many times now by ALL those who actually do the work that I don't understand why you insist on bringing it up over and over again.


> So, I appreciate their
> separation. I also have a lot of experience in these types of situations as
> I've been involved in 2-3 of them over the past few years at NASA in terms
> of maintaining separation and merging projects/etc. There are quite a few
> lessons learned that I have been trying to share but that have seemingly not
> really been appreciated and that have been in my mind dismissed, rather than
> discussed, through this process.

I'd hardly say they've been dismissed.  This isn't about you, it's about what is best for the project.  You have one opinion, that, by the face of the votes, is in the minority.  It doesn't make the majority right and you wrong.  In fact, those in the majority are trying to answer your concerns and come up with a better suggestion.  This is in fact the process and it is how the ASF works.  This is one of the great things about the Lucene community.  We have real discussions about the issues.

-Grant
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Grant Ingersoll-2
In reply to this post by Mattmann, Chris A (3010)
I think many of the objections I've seen so far come from the fact that people don't really know what Solr is.  Solr is much more than simply a "server" around Lucene.

Look at the other thread.  Here's a minimal list of things that a very large chunk of people who writes a Lucene app for production has to do:

1. Analyzers
2. Functions
3. Schema (although likely abstracted/reworked)
4. Warming/Reopen - this is hard code to get right and I've seen many people do it wrong.  It is also yet another area of duplication where something started in Solr b/c for years the Lucene community had no interest in donating code for it (incRef/decRef)
5. Faceting

If someone came in and contributed all of those things to Lucene, there would be no objection.  Simply the fact that Solr has other things around it doesn't mean people have to use them and no one is proposing some Uber JAR.


On Mar 9, 2010, at 10:13 AM, Mattmann, Chris A (388J) wrote:

> Hi Yonik,
>
>>> I have built 10s of projects that
>>> have simply used Lucene as an API and had no need for Solr, and I've built
>>> 10s of projects where Solr made perfect sense. So, I appreciate their
>>> separation.
>>
>> As does everyone - which is why there will always be separate
>> downloads.  As a user, the only side affect you should see is an
>> improved Lucene and Solr.
>
> Developers make downloads. Software processes guide developers who are
> producing those downloads. Policies guide the direction of a project. They
> are intimately intertwined.
>
>>
>> Saying that Solr should move some stuff to Lucene for Lucene's
>> benefit, without regard to if it's actually benefitial to Solr, is a
>> non-starter.  
>
> I'm not sure it's Solr's decision what the Lucene committers decide to move
> to Lucene, neither is it Lucene's decision in the opposite direction. These
> are all Apache projects, subprojects of the Lucene TLP. I'm not sure what
> the debate is? If Solr wants elements from Lucene that aren't part of Solr
> yet b/c Solr is relying on an old version of Lucene:
>
> 1. upgrade to Lucene trunk and address the issues it brings in Solr
> 2. duplicate the Lucene code in Solr, address any issues there, and then
> contribute it back
>
> I'd recommend the same to any project, regardless of what TLP it resides in,
> and in many cases, where it's at the ASF, or Sourceforge, or wherever.
>
> It seems kind of incestuous and an abuse of power to make the case that
> "well because we're all committers on both projects, then this..." I keep
> hearing a lot of talk about "hats", which in analogy means that though you
> are one person you have different concerns/projects/etc. This is another
> example of the need to maintain separate hats.
>
> Cheers,
> Chris
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: [hidden email]
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Dennis Kubes-2
I agree.  Most of those things can/should be moved into Lucene.  That
doesn't necessitate merging.  Separate responsibilities.

 > For that matter, why do we even need to have this discussion at all?
  > Most of us Solr committers are Lucene committers.  We can simply start
 > committing Solr code to Lucene such that in 6 months the whole
 > discussion is moot and the three committers on Solr who aren't Lucene
 > committers can earn their Lucene merit very quickly by patching the
 > "Solr" portion of Lucene.  We can move all the code to it's
 > appropriate place, add a contrib module for the WAR stuff and the
 > response writers and voila, Solr is in Lucene, the dev mailing lists
 > have merged by the fact that Solr dev would be defunct and all of the
 > proposals in this vote are implemented simply by employing our commit
 > privileges in a concerted way.

Am I reading you right.  Are you are proposing a hostile takeover of the
Lucene project?  Even being committers there needs to be discussion with
the community about the best path.  Or are you suggesting we bypass
discussion?  I am now even more concerned that merging is not the right
way to go.

Dennis

Grant Ingersoll wrote:

> I think many of the objections I've seen so far come from the fact that people don't really know what Solr is.  Solr is much more than simply a "server" around Lucene.
>
> Look at the other thread.  Here's a minimal list of things that a very large chunk of people who writes a Lucene app for production has to do:
>
> 1. Analyzers
> 2. Functions
> 3. Schema (although likely abstracted/reworked)
> 4. Warming/Reopen - this is hard code to get right and I've seen many people do it wrong.  It is also yet another area of duplication where something started in Solr b/c for years the Lucene community had no interest in donating code for it (incRef/decRef)
> 5. Faceting
>
> If someone came in and contributed all of those things to Lucene, there would be no objection.  Simply the fact that Solr has other things around it doesn't mean people have to use them and no one is proposing some Uber JAR.
>
>
> On Mar 9, 2010, at 10:13 AM, Mattmann, Chris A (388J) wrote:
>
>> Hi Yonik,
>>
>>>> I have built 10s of projects that
>>>> have simply used Lucene as an API and had no need for Solr, and I've built
>>>> 10s of projects where Solr made perfect sense. So, I appreciate their
>>>> separation.
>>> As does everyone - which is why there will always be separate
>>> downloads.  As a user, the only side affect you should see is an
>>> improved Lucene and Solr.
>> Developers make downloads. Software processes guide developers who are
>> producing those downloads. Policies guide the direction of a project. They
>> are intimately intertwined.
>>
>>> Saying that Solr should move some stuff to Lucene for Lucene's
>>> benefit, without regard to if it's actually benefitial to Solr, is a
>>> non-starter.  
>> I'm not sure it's Solr's decision what the Lucene committers decide to move
>> to Lucene, neither is it Lucene's decision in the opposite direction. These
>> are all Apache projects, subprojects of the Lucene TLP. I'm not sure what
>> the debate is? If Solr wants elements from Lucene that aren't part of Solr
>> yet b/c Solr is relying on an old version of Lucene:
>>
>> 1. upgrade to Lucene trunk and address the issues it brings in Solr
>> 2. duplicate the Lucene code in Solr, address any issues there, and then
>> contribute it back
>>
>> I'd recommend the same to any project, regardless of what TLP it resides in,
>> and in many cases, where it's at the ASF, or Sourceforge, or wherever.
>>
>> It seems kind of incestuous and an abuse of power to make the case that
>> "well because we're all committers on both projects, then this..." I keep
>> hearing a lot of talk about "hats", which in analogy means that though you
>> are one person you have different concerns/projects/etc. This is another
>> example of the need to maintain separate hats.
>>
>> Cheers,
>> Chris
>>
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: [hidden email]
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Mattmann, Chris A (3010)
In reply to this post by Grant Ingersoll-2

> On Mar 9, 2010, at 9:48 AM, Mattmann, Chris A (388J) wrote:
>
>> Hey Grant,
>>
>> On 3/9/10 5:49 AM, "Grant Ingersoll" <[hidden email]> wrote:
>>
>>> For that matter, why do we even need to have this discussion at all?  Most
>>> of
>>> us Solr committers are Lucene committers.  We can simply start committing
>>> Solr
>>> code to Lucene such that in 6 months the whole discussion is moot and the
>>> three committers on Solr who aren't Lucene committers can earn their Lucene
>>> merit very quickly by patching the "Solr" portion of Lucene.
>>
>> Sure, if folks agree on those patches and the community finds them useful,
>> and the patches follow the dev process of Lucene(-java), then so be it.
>> However, it seems like this could have been done already, no? Many of the
>> things you and others have discussed merging have been around for a while
>> besides spatial. Is it simply developers/resources that is lacking in
>> Lucene(-java) and time? Or are there other reasons? It sounds to me based on
>> the desire to sync up tests, to follow the same release schedule/etc., that
>> there are in fact, other reasons.
>
> Um, I'm a committer.  I've earned the right to apply patches that fit with the
> project and I've earned the merit to make that decision.  So have all the
> other committers.   Besides the fact, all I would be committing are the things
> people have already expressed an interest in anyway.

Then it's not an issue, right? Additionally, you didn't really answer my
question to what the cause of it not happening is (resources/time/process?)

>
>>
>>> We can move all
>>> the code to it's appropriate place, add a contrib module for the WAR stuff
>>> and
>>> the response writers and voila, Solr is in Lucene, the dev mailing lists
>>> have
>>> merged by the fact that Solr dev would be defunct and all of the proposals
>>> in
>>> this vote are implemented simply by employing our commit privileges in a
>>> concerted way.  Yet, somehow, me thinks that isn't a good solution either,
>>> right?  Yet it is perfectly "legal" and is just as valid a solution as the
>>> "poaching" solution and in a lot of ways seems to be what Chris is
>>> proposing.
>>
>> Whether or not what you're saying is good or what I'm saying is good or not
>> will be decided by the community within Lucene(-java), as well as the one
>> within Solr. All I'm for is not circumventing that process, in any
>> direction. If what you suggest above happens in a concise, traceable,
>> beneficial way to both projects and communities, then I'm for that.
>
> No one is circumventing any process and the implication is just wrong.  We are
> having the discussion.  But even so, as a committer, my job is to work within
> community to fix/improve the code.  Right now, I see lots of room for
> improvement in Lucene by integrating some of those things from Solr (and
> Nutch) while keeping Lucene, Solr and Nutch whole from an end user
> perspective.  At the same time, I want to see Solr and Nutch whole.   Any
> other implication is simply wrong.

Sweeping proposals with TBDs leave room for implications. Smaller, concrete
steps do not.

>
>>
>> At the same time, I also favor insulation wherever possible and I personally
>> like the separation of the 2 projects. I have built 10s of projects that
>> have simply used Lucene as an API and had no need for Solr, and I've built
>> 10s of projects where Solr made perfect sense.
>
> And how at all would those 10 projects be affected at all?  Please read the
> proposal again.  It's not like there is going to be some uber JAR.

I think the point that you and some others are missing is that JARs are not
the only artifact of a system. Just as you develop in Lucene "officially" as
an ASF committer for Lucene(-java), and just as you and others develop in
Solr "officially" as Solr ASF committers, it doesn't mean others don't also
develop using the same code on their own projects. JARs are not the only
artifact that is being reused.

> I won't
> let it happen as I have more than 10 projects that are pure Lucene.  Part of
> my day job is supporting Lucene.  I've spent the past 5 years of my life
> donating to Lucene, and so have many others.  The argument is simply invalid
> and has been refuted so many times now by ALL those who actually do the work
> that I don't understand why you insist on bringing it up over and over again.

It's extremely unclear to me about how you can be so sure about how
something will work when there have been many questions, the proposal itself
includes TBDs or things as Yonik put it that "will be figured out later."

>
>
>> So, I appreciate their
>> separation. I also have a lot of experience in these types of situations as
>> I've been involved in 2-3 of them over the past few years at NASA in terms
>> of maintaining separation and merging projects/etc. There are quite a few
>> lessons learned that I have been trying to share but that have seemingly not
>> really been appreciated and that have been in my mind dismissed, rather than
>> discussed, through this process.
>
> I'd hardly say they've been dismissed.  This isn't about you, it's about what
> is best for the project.  You have one opinion, that, by the face of the
> votes, is in the minority.

Well I'm sorry you feel that way. However, like I said it seems to be like
the discussion of the real issues is only happening recently over the past
few days. There has been a lot of effort to push this through before then.

> It doesn't make the majority right and you wrong.

Appreciate it, in fact I'm not looking to assign right and wrong.

> In fact, those in the majority are trying to answer your concerns and come up
> with a better suggestion.

The recent discussions have been picked up and I am feeling like we're
starting to discuss the issues so that's a good thing.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [hidden email]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Robert Muir
In reply to this post by Mattmann, Chris A (3010)
> There might be, but as a first start, duplication is a quick way to get
> going and experiment. As solutions that evolve over time are matured, the
> time can come for integration. Parallel tracks allows projects to move
> forward operationally, and enforces insulation, loose coupling and other
> properties.

Unfortunately, this experiment has already happened and has failed.

Instead it just creates more work, especially when it comes time for
code maintenance. This is one reason why Solr is still using
deprecated analysis APIs and one reason why they cannot use Lucene
trunk.

If there weren't so much duplication, then doing efforts such as this
would be easier, instead of having to convert two SynonymFilters we
would only have to convert one.


--
Robert Muir
[hidden email]
Reply | Threaded
Open this post in threaded view
|

RE: [VOTE] merge lucene/solr development (take 3)

Uwe Schindler
In reply to this post by Yonik Seeley
Here my vote:

+1 for the latest proposal to merge the development.

I am still against the requirement that all changes in Lucene need all tests to pass in solr, but that can be discussed later. I would like to simply open an issue then, if a test does not pass and let the Solr people fix it (applies to all bugs in solr's tests). Also releases at the same time for both projects should not be coupled. Each project should be able to release when they think it's time.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [hidden email]

> -----Original Message-----
> From: Yonik Seeley [mailto:[hidden email]]
> Sent: Tuesday, March 09, 2010 3:12 AM
> To: [hidden email]
> Subject: [VOTE] merge lucene/solr development (take 3)
>
> Apoligies in advance for calling yet another vote, but I just wanted
> to make sure this was official.
> Mike's second VOTE thread could probably technically stand on it's own
> (since it included PMC votes), but given that I said in my previous
> VOTE thread that I was just polling Lucene/Solr committers and would
> call a second PMC vote, that may have acted to suppress PMC votes on
> Mike's thread also.
>
> Please vote for the proposal quoted below to merge lucene/solr
> development.
> Here's my +1
>
> -Yonik
>
> Mike's call for a VOTE (amongst lucene/solr committers +11 to -1):
> http://search.lucidimagination.com/search/document/a400ffe62ae21aca/vot
> e_merge_the_development_of_solr_lucene_take_2#22d7cd086d9c5cf0
> > Subject: Merge the development of Solr/Lucene (take 2)
> > A new vote, that slightly changes proposal from last vote (adding
> only
> > that Lucene can cut a release even if Solr doesn't):
> >
> >  * Merging the dev lists into a single list.
> >
> >  * Merging committers.
> >
> >  * When any change is committed (to a module that "belongs to" Solr
> or
> >    to Lucene), all tests must pass.
> >
> >  * Release details will be decided by dev community, but, Lucene may
> >    release without Solr.
> >
> >  * Modulariize the sources: pull things out of Lucene's core (break
> >    out query parser, move all core queries & analyzers under their
> >    contrib counterparts), pull things out of Solr's core (analyzers,
> >    queries).
> >
> > These things would not change:
> >
> >  * Besides modularizing (above), the source code would remain
> factored
> >    into separate dirs/modules the way it is now.
> >
> >  * Issue tracking remains separate (SOLR-XXX and LUCENE-XXX
> >    issues).
> >
> >  * User's lists remain separate.
> >
> >  * Web sites remain separate.
> >
> >  * Release artifacts/jars remain separate.

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Yonik Seeley
In reply to this post by Mattmann, Chris A (3010)
On Tue, Mar 9, 2010 at 11:00 AM, Mattmann, Chris A (388J)
<[hidden email]> wrote:
> However, like I said it seems to be like
> the discussion of the real issues is only happening recently over the past
> few days.

This certainly isn't new territory for lucene/solr devs though - the
issue of what belongs in Solr and what belongs in Lucene, and problems
around pulling out schema and faceting and putting it in Lucene have
come up before (also in lengthy threads).

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Mattmann, Chris A (3010)
In reply to this post by Robert Muir
Hi Robert,

>> There might be, but as a first start, duplication is a quick way to get
>> going and experiment. As solutions that evolve over time are matured, the
>> time can come for integration. Parallel tracks allows projects to move
>> forward operationally, and enforces insulation, loose coupling and other
>> properties.
>
> Unfortunately, this experiment has already happened and has failed.
>
> Instead it just creates more work, especially when it comes time for
> code maintenance. This is one reason why Solr is still using
> deprecated analysis APIs and one reason why they cannot use Lucene
> trunk.

Can you provide more detail on how it's failed? Did it fail because Solr
wasn't able to upgrade to a newer Lucene that would fix the deprecations? If
so, what were the reasons?

>
> If there weren't so much duplication, then doing efforts such as this
> would be easier, instead of having to convert two SynonymFilters we
> would only have to convert one.

With which "hat" on? Where are you converting them? I'm guessing you mean
one for Solr and one for Lucene(-java), right?

I hear you on the duplication of patches/etc. Sadly, it's a trade in SE to
maintain good separation of concerns, which provides other benefits
(identification of load bearing walls; insulation of code changes so that
upstream or downstream providers aren't well affected, etc. etc.)

One potential solution to this was what was originally proposed by Mike: a
shared analyzers, that then Lucene(-java), Solr, and Nutch can choose to
depend on. That might help.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [hidden email]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Grant Ingersoll-2
In reply to this post by Dennis Kubes-2

On Mar 9, 2010, at 10:59 AM, Dennis Kubes wrote:

> I agree.  Most of those things can/should be moved into Lucene.  That doesn't necessitate merging.  Separate responsibilities.
>
> > For that matter, why do we even need to have this discussion at all?  > Most of us Solr committers are Lucene committers.  We can simply start
> > committing Solr code to Lucene such that in 6 months the whole
> > discussion is moot and the three committers on Solr who aren't Lucene
> > committers can earn their Lucene merit very quickly by patching the
> > "Solr" portion of Lucene.  We can move all the code to it's
> > appropriate place, add a contrib module for the WAR stuff and the
> > response writers and voila, Solr is in Lucene, the dev mailing lists
> > have merged by the fact that Solr dev would be defunct and all of the
> > proposals in this vote are implemented simply by employing our commit
> > privileges in a concerted way.
>
> Am I reading you right.  Are you are proposing a hostile takeover of the Lucene project?  Even being committers there needs to be discussion with the community about the best path.  Or are you suggesting we bypass discussion?  I am now even more concerned that merging is not the right way to go.
>

No.  Would you please re-read it and not quote me out of context.  You left the next sentence off, which is of course the vital one.
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Mattmann, Chris A (3010)
In reply to this post by Yonik Seeley
Hey Yonik,

>> However, like I said it seems to be like
>> the discussion of the real issues is only happening recently over the past
>> few days.
>
> This certainly isn't new territory for lucene/solr devs though - the
> issue of what belongs in Solr and what belongs in Lucene, and problems
> around pulling out schema and faceting and putting it in Lucene have
> come up before (also in lengthy threads).

I hear ya. I've seen some on solr-{user|dev}@ within the past months.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [hidden email]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


1234567