[VOTE] merge lucene/solr development (take 3)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
126 messages Options
12345 ... 7
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Mark Miller-3
Just to clarify - reading back, I see Mike put some examples of what
could be pulled into Lucene core - I personally don't see that as a
binding part of the vote - they are what we hope to do and are examples
of things that are not controversial. I think quite obviously, anything
after the merge would be taken as we normally take it - we would make
issues, someone would have to actually do the work, etc. Mike listed
some things that make sense to go into Lucene, and I doubt anyone is
going to argue, but it would be weird to say the result of this Vote
demands we move all queries from Solr into Lucene. It will just allow us
to do so and it would make sense to do so.

Mike took that from an earlier email proposing the merge - its not
really a "road plan" that we are voting for in terms of what goes where
- we are voting for the merge of development. What goes where should be
determined like we normally do that stuff.

- Mark

On 03/09/2010 12:26 AM, Mark Miller wrote:

> On 03/09/2010 12:14 AM, Michael Busch wrote:
>> On 3/8/10 8:24 PM, Grant Ingersoll wrote:
>>> I don't think any of it's a showstopper,
>>
>> I'm surprised here after reading the Apache voting page. This
>> proposal contains points that involve code restructurings.
>
>
> The veto is reserved for "code modifications" not reorganizations of
> development. And the veto requires a valid technical reason against a
> specific code change.
>
> Also, we have decided on no code restructurings - the hope is to allow
> them (and in the past you have championed some of the ones we hope to
> see), but there are no restructurings that are part of the vote. The
> change says nothing about what will happen regarding the code - the
> community would decide that as we go. If you have to pick one of the 3
> buckets, this is procedural.
>
> http://www.apache.org/foundation/voting.html
>
> --
> - Mark
>
>    

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Mark Miller-3
In reply to this post by Mattmann, Chris A (3010)
Hey Chris,

see my response to Michael.

But quickly,

the first star is not a code change. Its procedural.

the second star, and I'm sure youll have arguments with this :), is not
something we are specifically voting on. The reason we are merging dev
is obviously so that those changes can occur - but this vote is not to
force those changes. Even those against the merge would like to see
those changes. Putting more queries, querparsers, and analyzers into
Lucene is not a controversial change :)

On 03/09/2010 12:33 AM, Mattmann, Chris A (388J) wrote:

> On 3/8/10 9:26 PM, "Mark Miller"<[hidden email]>  wrote:
>
>    
>> Also, we have decided on no code restructurings - the hope is to allow
>> them (and in the past you have championed some of the ones we hope to
>> see), but there are no restructurings that are part of the vote.
>>      
> Ummm, that's not true.
>
> Mike's last proposal listed these points:
>
>   * When any change is committed (to a module that "belongs to" Solr or
>     to Lucene), all tests must pass.
>   * Modulariize the sources: pull things out of Lucene's core (break
>     out query parser, move all core queries&  analyzers under their
>     contrib counterparts), pull things out of Solr's core (analyzers,
>     queries).
>
> If those don't have to do with code changes, then I'm not sure what they are
> and would appreciate clarification.
>
> Cheers,
> Chris
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: [hidden email]
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>    


--
- Mark

http://www.lucidimagination.com



Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Mark Miller-3
Frankly, if you guys insist, we could drop the modularize point and take
yet another vote. If that's going to be your veto toehold, we don't need
it cluttering things up. Modularizing Lucene doesn't need to be in there
(though it already is somewhat modularized, and people plan to work
further along those lines regardless of this vote). Specific things we
would like to pull from Solr into Lucene don't need to be in there. All
of a sudden I'm agreeing with Hoss about goals rather than actual steps
;) Because those points are not important to this vote at all - they are
more examples of what we will be able to do than mandates. They are the
goodness that will come, not reasons for vetoes. (nor do I agree they
fall under the "code modication veto for a valid technical reason" anyway)


This is about merging dev so we can put code where it belongs and do
things that can make sense - its not a vote where specific code
refactorings matter at all - we don't develop and organize code with PMC
votes.

On 03/09/2010 12:40 AM, Mark Miller wrote:

> Hey Chris,
>
> see my response to Michael.
>
> But quickly,
>
> the first star is not a code change. Its procedural.
>
> the second star, and I'm sure youll have arguments with this :), is
> not something we are specifically voting on. The reason we are merging
> dev is obviously so that those changes can occur - but this vote is
> not to force those changes. Even those against the merge would like to
> see those changes. Putting more queries, querparsers, and analyzers
> into Lucene is not a controversial change :)
>
> On 03/09/2010 12:33 AM, Mattmann, Chris A (388J) wrote:
>> On 3/8/10 9:26 PM, "Mark Miller"<[hidden email]>  wrote:
>>
>>> Also, we have decided on no code restructurings - the hope is to allow
>>> them (and in the past you have championed some of the ones we hope to
>>> see), but there are no restructurings that are part of the vote.
>> Ummm, that's not true.
>>
>> Mike's last proposal listed these points:
>>
>>   * When any change is committed (to a module that "belongs to" Solr or
>>     to Lucene), all tests must pass.
>>   * Modulariize the sources: pull things out of Lucene's core (break
>>     out query parser, move all core queries&  analyzers under their
>>     contrib counterparts), pull things out of Solr's core (analyzers,
>>     queries).
>>
>> If those don't have to do with code changes, then I'm not sure what
>> they are
>> and would appreciate clarification.
>>
>> Cheers,
>> Chris
>>
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: [hidden email]
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>
>


--
- Mark

http://www.lucidimagination.com



Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Dennis Kubes-2
I believe this is a question of identity.  What is Lucene?

IMO Lucene is a full text search library, that is it's purpose.  It
isn't trying to be a search server or a search engine.  It is easy to
include as a library and is used on everything from embedded servers to
www search engines.

Quoting from Yonik's previous posting:

 > Some in Lucene development have expressed a desire to make Lucene more
 > of a complete solution, rather than just a core full-text search
 > library... things like a data schema, faceting, etc.  The Lucene
 > project already has an enterprise search platform with these
 > features... that's Solr.

So is Lucene a full text search library or is it something different?
And isn't that something different already Solr?  Why should they be the
same thing when their goals aren't the same?

 > Trying to pull popular pieces out of Solr
 > makes life harder for Solr developers, brings our projects into
 > conflict, and is often unsuccessful (witness the largely failed
 > migration of FunctionQueries from Solr to Lucene).

I feel for you, really.  I remember trying to develop in Nutch on Hadoop
0.04.  But the logic is not correct.  Just because Solr wants X feature
and Solr uses Lucene != everyone who uses Lucene wants X.  Faceting for
example, great feature, but not useful in every full text search.

 > For Lucene to achieve the ultimate in usability for users, it can't
 > require Java experience... it needs higher level abstractions provided
 > by Solr.

I don't believe this to be true.  If the Lucene community had wanted
very general language agnostic search, it would have happened by now.
Lucene is a Java API.  Solr on the other hand is a server and therefore
should be language agnostic.

 > The other benefit to Lucene would be to bring features to developers
 > much sooner... Solr has had features years before they were developed
 > in Lucene, and currently has more developers working with it.

"We have more developers than you do" isn't a valid reason to merge,
especially in open source software.  Maybe in the corporate world.  IMO
if Solr has more developers and want some architecture changed in Lucene
and it is to the benefit of the entire Lucene community, then those
changes can be proposed and voted upon.

 > Esp with Solr not using Lucene trunk, if a Solr developer wants a
 > feature quickly, they cannot add it to Lucene (even if it might make
 > sense there) since that introduces a big unpredictable lag

Solr has the option of not using Lucene.  If something needs to go into
Lucene, it should be voted on and support all of the different uses for
Lucene.  As a friend told me recently, specialization is for insects.

 > 1) Solr would go back to using Lucene's trunk

Use trunk, don't use trunk.  That is up to the Solr project.  It
shouldn't influence Lucene's behavior.

 > 2) For new Solr features, there would be an effort to abstract it such
 > that non-Solr users could use the functionality (faceting, field
 > collapsing, etc)

Can you say that every feature would be applicable to a full text search
library.  If not then it is beyond the core responsibilities of Lucene.

 > 3) For new Lucene features, there would be an effort to integrate it
 > into Solr.

No.  Because by specializing towards Solr, or Nutch, or any of the
hundred other applications that use Lucene, it looses its general
applicability.  Where would Hadoop be if it never made it past Nutch?

 > 4) Releases would be synchronized... Lucene and Solr would release at
 > the same time.

So synchronize your releases.  Communicate.

I am open to listening to your responses, but all of this is to say my
vote is still currently -1.

Dennis
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Ted Dunning
There are scads of features of Lucene that are not useful for all
applications (payloads, for one example, back compatibility for another).

The point is that the option to use faceting or not would be *very* useful
for all search applications.  Power is good, especially since somebody else
has done the work already.

On Mon, Mar 8, 2010 at 10:10 PM, Dennis Kubes <[hidden email]> wrote:

> Faceting for example, great feature, but not useful in every full text
> search.
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Ted Dunning
In reply to this post by Dennis Kubes-2
This logic escapes me.

Nutch hatched Hadoop.  Hadoop was perceived to be of much broader utility
than just for nutch so it was made more general and a separate project was
formed.  Hadoop does not depend on Nutch.

Lucene existed.  Solr was built to make it easier to use Lucene.  The
developers of Solr built a bunch of stuff that was specific to server-ness
and a bunch of stuff that would have general utility to many Lucene
developers.  Solr depends critically on Lucene and can be seen as a Lucene
wrapper.

How does this analogy fit together?  Is it supposed to be Hadoop is to Nutch
as Solr is to Lucene?  That seems so clearly wrong it can't be what you were
saying.

On Mon, Mar 8, 2010 at 10:10 PM, Dennis Kubes <[hidden email]> wrote:

> > 3) For new Lucene features, there would be an effort to integrate it
> > into Solr.
>
> No.  Because by specializing towards Solr, or Nutch, or any of the hundred
> other applications that use Lucene, it looses its general applicability.
>  Where would Hadoop be if it never made it past Nutch?
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Shalin Shekhar Mangar
In reply to this post by Yonik Seeley
On Tue, Mar 9, 2010 at 7:41 AM, Yonik Seeley <[hidden email]> wrote:

>
> Mike's call for a VOTE (amongst lucene/solr committers +11 to -1):
>
> http://search.lucidimagination.com/search/document/a400ffe62ae21aca/vote_merge_the_development_of_solr_lucene_take_2#22d7cd086d9c5cf0
> > Subject: Merge the development of Solr/Lucene (take 2)
> > A new vote, that slightly changes proposal from last vote (adding only
> > that Lucene can cut a release even if Solr doesn't):
> >
> >  * Merging the dev lists into a single list.
> >
> >  * Merging committers.
> >
> >  * When any change is committed (to a module that "belongs to" Solr or
> >    to Lucene), all tests must pass.
> >
> >  * Release details will be decided by dev community, but, Lucene may
> >    release without Solr.
> >
> >  * Modulariize the sources: pull things out of Lucene's core (break
> >    out query parser, move all core queries & analyzers under their
> >    contrib counterparts), pull things out of Solr's core (analyzers,
> >    queries).
> >
> > These things would not change:
> >
> >  * Besides modularizing (above), the source code would remain factored
> >    into separate dirs/modules the way it is now.
> >
> >  * Issue tracking remains separate (SOLR-XXX and LUCENE-XXX
> >    issues).
> >
> >  * User's lists remain separate.
> >
> >  * Web sites remain separate.
> >
> >  * Release artifacts/jars remain separate.
>

+1

I think that, in the long term, this move will prove beneficial for both the
projects.

--
Regards,
Shalin Shekhar Mangar.
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Michael McCandless-2
In reply to this post by Yonik Seeley
+1

Mike

On Mon, Mar 8, 2010 at 9:11 PM, Yonik Seeley <[hidden email]> wrote:

> Apoligies in advance for calling yet another vote, but I just wanted
> to make sure this was official.
> Mike's second VOTE thread could probably technically stand on it's own
> (since it included PMC votes), but given that I said in my previous
> VOTE thread that I was just polling Lucene/Solr committers and would
> call a second PMC vote, that may have acted to suppress PMC votes on
> Mike's thread also.
>
> Please vote for the proposal quoted below to merge lucene/solr development.
> Here's my +1
>
> -Yonik
>
> Mike's call for a VOTE (amongst lucene/solr committers +11 to -1):
> http://search.lucidimagination.com/search/document/a400ffe62ae21aca/vote_merge_the_development_of_solr_lucene_take_2#22d7cd086d9c5cf0
>> Subject: Merge the development of Solr/Lucene (take 2)
>> A new vote, that slightly changes proposal from last vote (adding only
>> that Lucene can cut a release even if Solr doesn't):
>>
>>  * Merging the dev lists into a single list.
>>
>>  * Merging committers.
>>
>>  * When any change is committed (to a module that "belongs to" Solr or
>>    to Lucene), all tests must pass.
>>
>>  * Release details will be decided by dev community, but, Lucene may
>>    release without Solr.
>>
>>  * Modulariize the sources: pull things out of Lucene's core (break
>>    out query parser, move all core queries & analyzers under their
>>    contrib counterparts), pull things out of Solr's core (analyzers,
>>    queries).
>>
>> These things would not change:
>>
>>  * Besides modularizing (above), the source code would remain factored
>>    into separate dirs/modules the way it is now.
>>
>>  * Issue tracking remains separate (SOLR-XXX and LUCENE-XXX
>>    issues).
>>
>>  * User's lists remain separate.
>>
>>  * Web sites remain separate.
>>
>>  * Release artifacts/jars remain separate.
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3) - can we take a pause?

Ian Holsman (Lists)
In reply to this post by Ted Dunning
guys.. there is a lot of discussion going on.
and a awful lot of voting.
and as a interesting onlooker, I am really confused about what exactly
the vote is for.. there are so many interjections and clarifications
that I'm not sure
what is going on

can we stop calling for a vote for say 72 hours or a week, and just
discuss it a bit, get the proposal on what you want to be done clarified
and then call for a vote?

it's not like there is a house on fire, and this really seems like it is
getting pushed.


On 3/9/10 5:46 PM, Ted Dunning wrote:

> There are scads of features of Lucene that are not useful for all
> applications (payloads, for one example, back compatibility for another).
>
> The point is that the option to use faceting or not would be *very* useful
> for all search applications.  Power is good, especially since somebody else
> has done the work already.
>
> On Mon, Mar 8, 2010 at 10:10 PM, Dennis Kubes<[hidden email]>  wrote:
>
>    
>> Faceting for example, great feature, but not useful in every full text
>> search.
>>
>>      
>    

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Jukka Zitting
In reply to this post by Yonik Seeley
Hi,

On Tue, Mar 9, 2010 at 3:11 AM, Yonik Seeley <[hidden email]> wrote:
> Please vote for the proposal quoted below to merge lucene/solr development.

+0 with my PMC member hat on, as I think this matter is up to the
Lucene and Solr developers to decide.

That said, I generally think having multiple distinct development
communities under one PMC is a bit troublesome (as we're seeing in
this discussion), so consolidating the community seems like a good
direction given that the technical synergies are there. On the other
hand I share Chris' concern about the massive scope of this vote. All
the proposed changes could IMHO just as well be handled as separate
and more easily reversible steps.

BR,

Jukka Zitting
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Andrzej Białecki-2
In reply to this post by Grant Ingersoll-2
On 2010-03-09 05:24, Grant Ingersoll wrote:

> In the end, for me anyway, the current separation hurts Lucene a good
> deal as much as it hurts Solr, if not more.  Likewise, I wish some of
> the Nutch committers would speak up, as I'm sure there are some
> pieces of Nutch that are "core" too, but for a lack of visibility
> down lower in Lucene committer wise, especially as Nutch as looking
> to refactor into more components.  Obviously not the crawling stuff,
> but perhaps some of Nutch's analyzer and low level Lucene stuff would
> make sense to be pushed lower in the stack.

With my Nutch hat on, I'm -0 to this current vote.

If the primary devs really insist on going this way, so be it, but I
think that long-term it brings more challenges than it solves, among
them the danger that Lucene ceases to be a general purpose Java search
library (where being Java-centric is nothing wrong) and caters too much
to Solr's needs at the expense of other projects.

Re: Nutch components - those that are reusable in Lucene or Solr
contexts eventually find their way to respective projects, witness e.g.
CommonGrams. Other stuff makes sense only in Nutch and it would be a
mistake to push it by force to become e.g. a contrib module in Lucene if
it's not applicable to a majority of Lucene community. Refactoring to
increase reuse doesn't mean we have to merge Nutch with Lucene, it's
just a cleaner separation of concerns. Anyway, that's not the topic of
the current vote.

--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Michael McCandless-2
On Tue, Mar 9, 2010 at 5:10 AM, Andrzej Bialecki <[hidden email]> wrote:

> Re: Nutch components - those that are reusable in Lucene or Solr
> contexts eventually find their way to respective projects, witness
> e.g. CommonGrams.

In fact I think this is a great example -- as far as I can tell,
CommonGrams was poached from Nutch, into Solr, and then was
nurtured/improved in both projects separately right?

So.... can/should we freely poach across all our sub projects?

It has obvious downsides (it's essentially a fork that will confuse
those users that use both Solr & Lucene, in the short term, until
things "stabilize" into a clean refactoring; it's double the dev; we
must re-sync with time; etc.).

But it has a massive upside: it means we don't rely only on "push"
(Solr devs to push into Lucene or vice/versa).  We can also use "pull"
(Lucene devs can pull pieces from Nutch/Solr into Lucene).  It becomes
a 2-way street for "properly" factoring our shared code with time.

If we had that freedom ("poaching is perfectly fine"), then,
interested devs could freely "refactor" across sub projects.

Not having this freedom today, and not having merged dev, is stunting
both Solr & Lucene's growth.

Mike
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Andrzej Białecki-2
On 2010-03-09 11:40, Michael McCandless wrote:
> On Tue, Mar 9, 2010 at 5:10 AM, Andrzej Bialecki<[hidden email]>  wrote:
>
>> Re: Nutch components - those that are reusable in Lucene or Solr
>> contexts eventually find their way to respective projects, witness
>> e.g. CommonGrams.
>
> In fact I think this is a great example -- as far as I can tell,
> CommonGrams was poached from Nutch, into Solr, and then was
> nurtured/improved in both projects separately right?

Right. In fact, Nutch would like to eventually delegate indexing solely
to Solr, at which point we will reuse the CommonGrams from Solr.

>
> So.... can/should we freely poach across all our sub projects?

In my opinion: with proper attribution, by all means!

>
> It has obvious downsides (it's essentially a fork that will confuse
> those users that use both Solr&  Lucene, in the short term, until
> things "stabilize" into a clean refactoring; it's double the dev; we
> must re-sync with time; etc.).
>
> But it has a massive upside: it means we don't rely only on "push"
> (Solr devs to push into Lucene or vice/versa).  We can also use "pull"
> (Lucene devs can pull pieces from Nutch/Solr into Lucene).  It becomes
> a 2-way street for "properly" factoring our shared code with time.
>
> If we had that freedom ("poaching is perfectly fine"), then,
> interested devs could freely "refactor" across sub projects.
>
> Not having this freedom today, and not having merged dev, is stunting
> both Solr&  Lucene's growth.

Erhm.. don't we have this freedom already??? Another example is
TimeLimitedCollector - poaching _is_ perfectly fine as far as I'm
concerned. All projects are under the same license, often also share the
same people, so I see no reason not to share freely where it makes sense
from technical POV, though we may sometimes succumb to the NIH syndrome ;)

This push/pull between the projects reminds me of discussions with my
clients when I try to convince them to open-source some generic
functionality: long-term you can only benefit greatly from getting rid
of generic code, if there's a lively community with focus just on that
functionality - you don't have to maintain it and you reap the benefits
of external development, and you can focus on developing of what's
unique to your project.

So, I'm all for the poaching ;) but IMHO this doesn't necessitate the
merge, just refactoring, push/pull and the right mindset.
--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Michael McCandless-2
On Tue, Mar 9, 2010 at 6:09 AM, Andrzej Bialecki <[hidden email]> wrote:
> On 2010-03-09 11:40, Michael McCandless wrote:
>>
>> On Tue, Mar 9, 2010 at 5:10 AM, Andrzej Bialecki<[hidden email]>  wrote:
>>
>> So.... can/should we freely poach across all our sub projects?
>
> In my opinion: with proper attribution, by all means!

+1

I think shifting to this ("poaching is fine") would be healthy for all
subs.  Pull & push would let code flow freely two-way across the
projects.

>>> Re: Nutch components - those that are reusable in Lucene or Solr
>>> contexts eventually find their way to respective projects, witness
>>> e.g. CommonGrams.
>>
>> In fact I think this is a great example -- as far as I can tell,
>> CommonGrams was poached from Nutch, into Solr, and then was
>> nurtured/improved in both projects separately right?
>
> Right. In fact, Nutch would like to eventually delegate indexing solely to
> Solr, at which point we will reuse the CommonGrams from Solr.

Exactly -- this is how the refactoring would play out with time.
Begins with poaching but eventually winds up with a single source
again (just "moved" from one sub to another).

Ie, if Lucene poached all Solr analyzers, as well as its own core
analyzers, moving all analyzers into contrib/analyzers, then
eventually Solr/Nutch would just use Lucene's contrib/analyzers as the
single source.

Others (whoever has the itch/time) can poach function queries, facets,
etc.

The freedom to poach gives us a powerful push AND pull tool to make
this refactoring gradually over time.

Something interesting can be born in Solr (just because that's where
the itch first arrived) and then freely poached & refactored later by
someone wearing a Lucene dev hat.

> So, I'm all for the poaching ;) but IMHO this doesn't necessitate the merge,
> just refactoring, push/pull and the right mindset.

In fact in my opinion, if we are free to poach across all subs, we
don't need to merge -- poaching solves the primary pain I feel with
our subs now (splintering of code/dev across the subs,
preventing/frustrating people like Robert who put in tons of effort to
improve our analyzers only to see patches languish).

Mike
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Grant Ingersoll-2
In reply to this post by Michael McCandless-2

On Mar 9, 2010, at 5:40 AM, Michael McCandless wrote:

> On Tue, Mar 9, 2010 at 5:10 AM, Andrzej Bialecki <[hidden email]> wrote:
>
>> Re: Nutch components - those that are reusable in Lucene or Solr
>> contexts eventually find their way to respective projects, witness
>> e.g. CommonGrams.
>
> In fact I think this is a great example -- as far as I can tell,
> CommonGrams was poached from Nutch, into Solr, and then was
> nurtured/improved in both projects separately right?
>
> So.... can/should we freely poach across all our sub projects?
>
> It has obvious downsides (it's essentially a fork that will confuse
> those users that use both Solr & Lucene, in the short term, until
> things "stabilize" into a clean refactoring; it's double the dev; we
> must re-sync with time; etc.).
>
> But it has a massive upside: it means we don't rely only on "push"
> (Solr devs to push into Lucene or vice/versa).  We can also use "pull"
> (Lucene devs can pull pieces from Nutch/Solr into Lucene).  It becomes
> a 2-way street for "properly" factoring our shared code with time.
>
> If we had that freedom ("poaching is perfectly fine"), then,
> interested devs could freely "refactor" across sub projects.
>

As someone who works on both, I don't think it is fine.  Just look at the function query mess.  Just look at the version mess.  It's very frustrating as a developer and it makes me choose between two projects that I happen to like equally, but for different reasons.  If I worked on Nutch, I would feel the same way.

Also, I do look at Solr/Lucene differently.  There is almost complete overlap in the committer base.  Nutch is not that way, nor is any other project.  I simply don't think Lucene will end up being geared toward Solr because there are so many users of Lucene here they will prevent that from happening.  

-Grant
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Andrzej Białecki-2
On 2010-03-09 13:21, Grant Ingersoll wrote:
>
> On Mar 9, 2010, at 5:40 AM, Michael McCandless wrote:

>> If we had that freedom ("poaching is perfectly fine"), then,
>> interested devs could freely "refactor" across sub projects.
>>
>
> As someone who works on both, I don't think it is fine.  Just look at
> the function query mess.  Just look at the version mess.  It's very
> frustrating as a developer and it makes me choose between two
> projects that I happen to like equally, but for different reasons.
> If I worked on Nutch, I would feel the same way.

The mess happened afaik due to a lack of communication and NIH. It's
true that
if Sol had been merged with Lucene then only one version would have won.
This may still happen with enough cooperation on both sides, even
without the merge.

Anyway, my vote hovers near 0 either way, as you said it's different for
Nutch.

--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Michael McCandless-2
In reply to this post by Grant Ingersoll-2
On Tue, Mar 9, 2010 at 7:21 AM, Grant Ingersoll <[hidden email]> wrote:

>> If we had that freedom ("poaching is perfectly fine"), then,
>> interested devs could freely "refactor" across sub projects.
>
> As someone who works on both, I don't think it is fine.  Just look at the function query mess.  Just look at the version mess.  It's very frustrating as a developer and it makes me choose between two projects that I happen to like equally, but for different reasons.  If I worked on Nutch, I would feel the same way.

But... Lucene should poach from external (eg non-Apache) projects, if
the license works?

Ie if some great analyzer is out there, and Robert spots it, and the
license works, we should poach it?  (In fact he just did this w/
Andrzej's Polish stemmer ;) ).

So we have something of a double standard...

And, ironically, I think it's the fact that there's so much committer
overlap between Solr and Lucene that is causing this antagonism
towards poaching.

When in fact I think poaching, at a wider scale (across unrelated
projects) is a very useful means for any healthy open source software
to evolve.

Why should Lucene be prevented from having a useful feature just
because Solr happened to create it first?

Mike
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Grant Ingersoll-2

On Mar 9, 2010, at 8:21 AM, Michael McCandless wrote:

> On Tue, Mar 9, 2010 at 7:21 AM, Grant Ingersoll <[hidden email]> wrote:
>
>>> If we had that freedom ("poaching is perfectly fine"), then,
>>> interested devs could freely "refactor" across sub projects.
>>
>> As someone who works on both, I don't think it is fine.  Just look at the function query mess.  Just look at the version mess.  It's very frustrating as a developer and it makes me choose between two projects that I happen to like equally, but for different reasons.  If I worked on Nutch, I would feel the same way.
>
> But... Lucene should poach from external (eg non-Apache) projects, if
> the license works?
>
> Ie if some great analyzer is out there, and Robert spots it, and the
> license works, we should poach it?  (In fact he just did this w/
> Andrzej's Polish stemmer ;) ).

I'd prefer "donate" to poach, but, realize that isn't always the case.


>
> So we have something of a double standard...
>
> And, ironically, I think it's the fact that there's so much committer
> overlap between Solr and Lucene that is causing this antagonism
> towards poaching.
>
> When in fact I think poaching, at a wider scale (across unrelated
> projects) is a very useful means for any healthy open source software
> to evolve.
>
> Why should Lucene be prevented from having a useful feature just
> because Solr happened to create it first?

But why should I be forced to maintain two versions due to some arbitrary code separation?  And why should you force a good chunk of us to do a whole lot of extra work simply because of some arbitrary code separation?  Here, it is the Lucene PMC that releases code and it is just silly that with all of this overlap at the committer level we still have this duplication.   I can't speak for the external projects (I don't believe any of them have even responded here other than Jackrabbit), but if they don't like it, they should get more involved in the community and work to be committers.  

At any rate, this is exactly why merging makes sense.  You would no longer have this issue of "first".  I would no longer have to choose where to add my spatial work based on some arbitrary line that someone drew in the sand that isn't all that pertinent anymore given the desires of most in the community to blur that line.  It would be available to everyone.

For that matter, why do we even need to have this discussion at all?  Most of us Solr committers are Lucene committers.  We can simply start committing Solr code to Lucene such that in 6 months the whole discussion is moot and the three committers on Solr who aren't Lucene committers can earn their Lucene merit very quickly by patching the "Solr" portion of Lucene.  We can move all the code to it's appropriate place, add a contrib module for the WAR stuff and the response writers and voila, Solr is in Lucene, the dev mailing lists have merged by the fact that Solr dev would be defunct and all of the proposals in this vote are implemented simply by employing our commit privileges in a concerted way.  Yet, somehow, me thinks that isn't a good solution either, right?  Yet it is perfectly "legal" and is just as valid a solution as the "poaching" solution and in a lot of ways seems to be what Chris is proposing.

-Grant






Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Mattmann, Chris A (3010)
In reply to this post by Michael McCandless-2
Hi Mike,

>> As someone who works on both, I don't think it is fine.  Just look at the
>> function query mess.  Just look at the version mess.  It's very frustrating
>> as a developer and it makes me choose between two projects that I happen to
>> like equally, but for different reasons.  If I worked on Nutch, I would feel
>> the same way.
>
> But... Lucene should poach from external (eg non-Apache) projects, if
> the license works?
>
> Ie if some great analyzer is out there, and Robert spots it, and the
> license works, we should poach it?  (In fact he just did this w/
> Andrzej's Polish stemmer ;) ).

Yep. This is what I was talking about before when I was talking about
"insulation". Code duplication is a fact of software development, and
happens all the time in open source, ROTS, GOTS, OTS,
research/academia/whatever. It doesn't suffice to say it's bad in all cases,
nor is it always good either.

In this case, it maintains the separation between projects that are really
layered on top of one another (Lucene being the lower layer, and Solr being
the higher).

In addition, FWIW, I agree with Andrzej that to the best of my knowledge,
there is nothing wrong with doing so at the ASF, with proper attribution and
so long as the licenses are compatible.

>
> So we have something of a double standard...

Yep.

>
> And, ironically, I think it's the fact that there's so much committer
> overlap between Solr and Lucene that is causing this antagonism
> towards poaching.
>
> When in fact I think poaching, at a wider scale (across unrelated
> projects) is a very useful means for any healthy open source software
> to evolve.

Agreed. It allows sound innovative technology infusion and solutions to
develop over time, and then be integrated back into the operational fray
with reduced risk and cost.

>
> Why should Lucene be prevented from having a useful feature just
> because Solr happened to create it first?

IMO, it shouldn't.

Cheers,
Chris


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [hidden email]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] merge lucene/solr development (take 3)

Yonik Seeley
In reply to this post by Grant Ingersoll-2
I think the problem is political - and that leads to both technical
and political problems.
We came up with a largely political solution that should solve both.

We can't have a one way street of pulling everything interesting out
of Solr for Lucene, or poaching, or expanding Lucene's domain while
shrinking Solr's (just limit to "server stuff", etc).  Lucene and Solr
committers are headed down the road toward greater competition - but
with this proposal, we said we'd rather work together instead.

-Yonik
12345 ... 7