[DISCUSS] Lucene-Solr split (Solr promoted to TLP)

classic Classic list List threaded Threaded
57 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Lucene-Solr split (Solr promoted to TLP)

Dawid Weiss-2
Dear Lucene and Solr developers!

A few days ago, I initiated a discussion among PMC members about
potential pros and cons of splitting the project into separate Lucene
and Solr entities by promoting Solr to its own top-level Apache
project (TLP). Let me share with you the motivation for such an action
and some follow-up thoughts I heard from other PMC members so far.

Please read this e-mail carefully. Both the PMC and I look forward to
hearing your opinion. This is a DISCUSS thread and it will be followed
next week by a VOTE thread. This is our shared project and we should
all shape its future responsibly.

The big question is this: “Is this the right time to split Solr and
Lucene into two independent projects?”.

Here are several technical considerations that drove me to ask the
question above (in no order of priorities):

1) Precommit/ test times. These are crazy high. If we split into two
projects we can pretty much cut all of Lucene testing out of Solr (and
likewise), making development a bit more fun again.

2) Build system itself and source release packaging. The current
combined codebase is a *beast* to maintain. Working with gradle on
both projects at once made me realise how little the two have in
common. The code layout, the dependencies, even the workflow of people

working on these projects... The build (both ant and gradle) is full
of Solr and Lucene-specific exceptions and hooks that could be more
elegantly solved if moved to each project independently.

3) Packaging. There is no single source distribution package for
Solr+Lucene. They are already "independent" there. Why should Lucene
and Solr always be released at the same pace? Does it always make
sense?

4) Solr is essentially taking in Lucene and its dependencies as a
whole (so is Elasticsearch and many other projects). In my opinion
this makes Lucene eligible for refactoring and

maintenance as a separate component. The learning curve for people
coming to each project separately is going to be gentler than trying
to dive into the combined codebase.

5) Mailing lists, build servers. Mailing lists for users are already
separated. I think this is yet another indication that Solr is
something more than a component within Lucene. It is perceived as an
independent entity and used as an independent product. I would really
like to have separate mailing lists for these two projects (this
includes build and test results) as it would make life easier: if your
focus is more on Lucene (or Solr), you would only need to track half
of the current traffic.


As I already mentioned, the discussion among PMC members highlighted
some initial concerns and reasons why the project should perhaps
remain glued together. These are outlined below with some of the
counter-arguments presented under each concern to avoid repetition of
the same content from the PMC mailing list (they’re copied from the
private discussion list).

1) Both projects may gradually split their ways after the separation
and even develop “against” each other like it used to be before the
merge.

Whether this is a legitimate concern is hard to tell. If Solr goes TLP
then all existing Lucene committers will automatically become Solr
committers (unless they opt not to) so there will be both procedural
ways to prevent this from happening (vetoes) as well as common-sense
reasons to just cooperate.

2) Some people like parallel version numbering (concurrent Solr and
Lucene releases) as it gives instant clarity which Solr version uses
which version of Lucene.

This can still be done on Solr side (it is Solr’s decision to adapt
any versioning scheme the project feels comfortable with). I
personally (DW) think this kind of versioning is actually more
confusing than helpful; Solr should have its own cadence of releases
driven by features, not sub-component changes. If the “backwards
compatibility” is a factor then a solution might be to sync on major
version releases only (e.g., this is how Elasticsearch is handling
this).

3) Solr tests are the first “battlefield” test zone for Lucene changes
- if it becomes TLP this part will be gone.

Yes, true. But realistically Solr will have to adopt some kind of
snapshot-based dependency on Lucene anyway (whether as a git submodule
or a maven snapshot dependency). So if there are bugs in Lucene they
will still be detected by Solr tests (and fairly early).

4) Why split now if we merged in the first place?

Some of you may wonder why split the project that was initially
*merged* from two independent codebases (around 10 years ago). In
short, there was a lot of code duplication and interaction between
Solr and Lucene back then, with patches flying back and forth.
Integration into a single codebase seemed like a great idea to clean
things up and make things easier. In many ways this is exactly what
did happen: we have cleaned up code dependencies and reusable
components (on Lucene side) consumed by not just Solr but also other
projects (downstream from Lucene).

The situation we find ourselves now is different to what it was
before: recent and ongoing development for the most part falls within
Solr or Lucene exclusively.


This e-mail is for discussing the idea and presenting arguments/
counter-arguments for or against the split. It will be followed by a
separate VOTE thread e-mail next Monday. If the vote passes then there
are many questions about how this process should be arranged and
orchestrated. There are past examples even within Lucene [1] that we
can learn from, and there are people who know how to do it - the
actual process is of lesser concern at the moment, what we mostly want
to do is to reach out to you, signal the idea and ask about your
opinion. Let us know what you think.

[1] https://lists.apache.org/thread.html/15bf2dc6d6ccd25459f8a43f0122751eedd3834caa31705f790844d7%401270142638%40%3Cuser.nutch.apache.org%3E

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

Michael Sokolov-4
I always like to look at data when making a big decision, so I
gathered some statistics about authors and commits to git over the
history of the project. I wanted to see what these statistics could
tell us about the degree of overlap between the two projects and
whether it has changed over time. Using commands like

     git log --pretty=%an --since=2012 --lucene
     git log --pretty=%an --since=2012 --solr

I looked at the authors of commits in the lucene and solr top-level
folders of the project. I think this makes a reasonable proxy for
contributors to the two projects. From there I found that since 2012,
there are 60 Lucene-only authors, 71 Solr-only authors, and 101
authors (or 43%) contributing at least one commit to each project.
Since 2018, the percentage of both-project authors is somewhat lower:
36%.

I also looked at commits spanning both projects. I'm not sure this
captures all the work that touches both projects, but it's a window
into that, at least. I found that since 2012, 1387/19063 (6.8%) of
commits spanned both project folders. Since 2018, 7.4% did.

I don't think you can really draw very many meaningful conclusions
from this, but a few things jump out: First, it is clear that these
projects are not completely separate today. A substantial number of
people commit to both, over time, although most people do not. Also,
relatively few commits span both projects. Some do though, and it's
certainly worth considering what the workflow for such changes would
be like in the split world. Maybe a majority of these are
build-related; it's hard to tell from this coarse analysis.


On Mon, May 4, 2020 at 5:11 AM Dawid Weiss <[hidden email]> wrote:

>
> Dear Lucene and Solr developers!
>
> A few days ago, I initiated a discussion among PMC members about
> potential pros and cons of splitting the project into separate Lucene
> and Solr entities by promoting Solr to its own top-level Apache
> project (TLP). Let me share with you the motivation for such an action
> and some follow-up thoughts I heard from other PMC members so far.
>
> Please read this e-mail carefully. Both the PMC and I look forward to
> hearing your opinion. This is a DISCUSS thread and it will be followed
> next week by a VOTE thread. This is our shared project and we should
> all shape its future responsibly.
>
> The big question is this: “Is this the right time to split Solr and
> Lucene into two independent projects?”.
>
> Here are several technical considerations that drove me to ask the
> question above (in no order of priorities):
>
> 1) Precommit/ test times. These are crazy high. If we split into two
> projects we can pretty much cut all of Lucene testing out of Solr (and
> likewise), making development a bit more fun again.
>
> 2) Build system itself and source release packaging. The current
> combined codebase is a *beast* to maintain. Working with gradle on
> both projects at once made me realise how little the two have in
> common. The code layout, the dependencies, even the workflow of people
>
> working on these projects... The build (both ant and gradle) is full
> of Solr and Lucene-specific exceptions and hooks that could be more
> elegantly solved if moved to each project independently.
>
> 3) Packaging. There is no single source distribution package for
> Solr+Lucene. They are already "independent" there. Why should Lucene
> and Solr always be released at the same pace? Does it always make
> sense?
>
> 4) Solr is essentially taking in Lucene and its dependencies as a
> whole (so is Elasticsearch and many other projects). In my opinion
> this makes Lucene eligible for refactoring and
>
> maintenance as a separate component. The learning curve for people
> coming to each project separately is going to be gentler than trying
> to dive into the combined codebase.
>
> 5) Mailing lists, build servers. Mailing lists for users are already
> separated. I think this is yet another indication that Solr is
> something more than a component within Lucene. It is perceived as an
> independent entity and used as an independent product. I would really
> like to have separate mailing lists for these two projects (this
> includes build and test results) as it would make life easier: if your
> focus is more on Lucene (or Solr), you would only need to track half
> of the current traffic.
>
>
> As I already mentioned, the discussion among PMC members highlighted
> some initial concerns and reasons why the project should perhaps
> remain glued together. These are outlined below with some of the
> counter-arguments presented under each concern to avoid repetition of
> the same content from the PMC mailing list (they’re copied from the
> private discussion list).
>
> 1) Both projects may gradually split their ways after the separation
> and even develop “against” each other like it used to be before the
> merge.
>
> Whether this is a legitimate concern is hard to tell. If Solr goes TLP
> then all existing Lucene committers will automatically become Solr
> committers (unless they opt not to) so there will be both procedural
> ways to prevent this from happening (vetoes) as well as common-sense
> reasons to just cooperate.
>
> 2) Some people like parallel version numbering (concurrent Solr and
> Lucene releases) as it gives instant clarity which Solr version uses
> which version of Lucene.
>
> This can still be done on Solr side (it is Solr’s decision to adapt
> any versioning scheme the project feels comfortable with). I
> personally (DW) think this kind of versioning is actually more
> confusing than helpful; Solr should have its own cadence of releases
> driven by features, not sub-component changes. If the “backwards
> compatibility” is a factor then a solution might be to sync on major
> version releases only (e.g., this is how Elasticsearch is handling
> this).
>
> 3) Solr tests are the first “battlefield” test zone for Lucene changes
> - if it becomes TLP this part will be gone.
>
> Yes, true. But realistically Solr will have to adopt some kind of
> snapshot-based dependency on Lucene anyway (whether as a git submodule
> or a maven snapshot dependency). So if there are bugs in Lucene they
> will still be detected by Solr tests (and fairly early).
>
> 4) Why split now if we merged in the first place?
>
> Some of you may wonder why split the project that was initially
> *merged* from two independent codebases (around 10 years ago). In
> short, there was a lot of code duplication and interaction between
> Solr and Lucene back then, with patches flying back and forth.
> Integration into a single codebase seemed like a great idea to clean
> things up and make things easier. In many ways this is exactly what
> did happen: we have cleaned up code dependencies and reusable
> components (on Lucene side) consumed by not just Solr but also other
> projects (downstream from Lucene).
>
> The situation we find ourselves now is different to what it was
> before: recent and ongoing development for the most part falls within
> Solr or Lucene exclusively.
>
>
> This e-mail is for discussing the idea and presenting arguments/
> counter-arguments for or against the split. It will be followed by a
> separate VOTE thread e-mail next Monday. If the vote passes then there
> are many questions about how this process should be arranged and
> orchestrated. There are past examples even within Lucene [1] that we
> can learn from, and there are people who know how to do it - the
> actual process is of lesser concern at the moment, what we mostly want
> to do is to reach out to you, signal the idea and ask about your
> opinion. Let us know what you think.
>
> [1] https://lists.apache.org/thread.html/15bf2dc6d6ccd25459f8a43f0122751eedd3834caa31705f790844d7%401270142638%40%3Cuser.nutch.apache.org%3E
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

Shai Erera
Interesting data Michael. I am not sure though that the shared commits tell us that there are people that contribute to both projects. Eventually, an API change/update in Lucene will require a change in Solr (but not vice versa). Those commits will still occur in both projects, only on the Solr side they will occur when Solr will upgrade to the respective Lucene version.

I wonder if we can tell, out of the shared commits, how many started in Lucene and ended in Solr because of the shared build (i.e. an API change required Solr code changes for the build to pass), vs how many started in Solr, and ended in Lucene because a core change was needed to support the Solr feature/update. The first case does not indicate, IMO, a shared contribution (whoever changes a Lucene API will not then go and update Solr and Elasticsearch if the projects were split), while the second case is a stronger indication of a shared contribution.

Maybe if we could "label" committers as mostly Lucene/Solr, we could tell more about the shared commits?

Anyway, data is good, I agree.

Shai

On Mon, May 4, 2020 at 5:49 PM Michael Sokolov <[hidden email]> wrote:
I always like to look at data when making a big decision, so I
gathered some statistics about authors and commits to git over the
history of the project. I wanted to see what these statistics could
tell us about the degree of overlap between the two projects and
whether it has changed over time. Using commands like

     git log --pretty=%an --since=2012 --lucene
     git log --pretty=%an --since=2012 --solr

I looked at the authors of commits in the lucene and solr top-level
folders of the project. I think this makes a reasonable proxy for
contributors to the two projects. From there I found that since 2012,
there are 60 Lucene-only authors, 71 Solr-only authors, and 101
authors (or 43%) contributing at least one commit to each project.
Since 2018, the percentage of both-project authors is somewhat lower:
36%.

I also looked at commits spanning both projects. I'm not sure this
captures all the work that touches both projects, but it's a window
into that, at least. I found that since 2012, 1387/19063 (6.8%) of
commits spanned both project folders. Since 2018, 7.4% did.

I don't think you can really draw very many meaningful conclusions
from this, but a few things jump out: First, it is clear that these
projects are not completely separate today. A substantial number of
people commit to both, over time, although most people do not. Also,
relatively few commits span both projects. Some do though, and it's
certainly worth considering what the workflow for such changes would
be like in the split world. Maybe a majority of these are
build-related; it's hard to tell from this coarse analysis.


On Mon, May 4, 2020 at 5:11 AM Dawid Weiss <[hidden email]> wrote:
>
> Dear Lucene and Solr developers!
>
> A few days ago, I initiated a discussion among PMC members about
> potential pros and cons of splitting the project into separate Lucene
> and Solr entities by promoting Solr to its own top-level Apache
> project (TLP). Let me share with you the motivation for such an action
> and some follow-up thoughts I heard from other PMC members so far.
>
> Please read this e-mail carefully. Both the PMC and I look forward to
> hearing your opinion. This is a DISCUSS thread and it will be followed
> next week by a VOTE thread. This is our shared project and we should
> all shape its future responsibly.
>
> The big question is this: “Is this the right time to split Solr and
> Lucene into two independent projects?”.
>
> Here are several technical considerations that drove me to ask the
> question above (in no order of priorities):
>
> 1) Precommit/ test times. These are crazy high. If we split into two
> projects we can pretty much cut all of Lucene testing out of Solr (and
> likewise), making development a bit more fun again.
>
> 2) Build system itself and source release packaging. The current
> combined codebase is a *beast* to maintain. Working with gradle on
> both projects at once made me realise how little the two have in
> common. The code layout, the dependencies, even the workflow of people
>
> working on these projects... The build (both ant and gradle) is full
> of Solr and Lucene-specific exceptions and hooks that could be more
> elegantly solved if moved to each project independently.
>
> 3) Packaging. There is no single source distribution package for
> Solr+Lucene. They are already "independent" there. Why should Lucene
> and Solr always be released at the same pace? Does it always make
> sense?
>
> 4) Solr is essentially taking in Lucene and its dependencies as a
> whole (so is Elasticsearch and many other projects). In my opinion
> this makes Lucene eligible for refactoring and
>
> maintenance as a separate component. The learning curve for people
> coming to each project separately is going to be gentler than trying
> to dive into the combined codebase.
>
> 5) Mailing lists, build servers. Mailing lists for users are already
> separated. I think this is yet another indication that Solr is
> something more than a component within Lucene. It is perceived as an
> independent entity and used as an independent product. I would really
> like to have separate mailing lists for these two projects (this
> includes build and test results) as it would make life easier: if your
> focus is more on Lucene (or Solr), you would only need to track half
> of the current traffic.
>
>
> As I already mentioned, the discussion among PMC members highlighted
> some initial concerns and reasons why the project should perhaps
> remain glued together. These are outlined below with some of the
> counter-arguments presented under each concern to avoid repetition of
> the same content from the PMC mailing list (they’re copied from the
> private discussion list).
>
> 1) Both projects may gradually split their ways after the separation
> and even develop “against” each other like it used to be before the
> merge.
>
> Whether this is a legitimate concern is hard to tell. If Solr goes TLP
> then all existing Lucene committers will automatically become Solr
> committers (unless they opt not to) so there will be both procedural
> ways to prevent this from happening (vetoes) as well as common-sense
> reasons to just cooperate.
>
> 2) Some people like parallel version numbering (concurrent Solr and
> Lucene releases) as it gives instant clarity which Solr version uses
> which version of Lucene.
>
> This can still be done on Solr side (it is Solr’s decision to adapt
> any versioning scheme the project feels comfortable with). I
> personally (DW) think this kind of versioning is actually more
> confusing than helpful; Solr should have its own cadence of releases
> driven by features, not sub-component changes. If the “backwards
> compatibility” is a factor then a solution might be to sync on major
> version releases only (e.g., this is how Elasticsearch is handling
> this).
>
> 3) Solr tests are the first “battlefield” test zone for Lucene changes
> - if it becomes TLP this part will be gone.
>
> Yes, true. But realistically Solr will have to adopt some kind of
> snapshot-based dependency on Lucene anyway (whether as a git submodule
> or a maven snapshot dependency). So if there are bugs in Lucene they
> will still be detected by Solr tests (and fairly early).
>
> 4) Why split now if we merged in the first place?
>
> Some of you may wonder why split the project that was initially
> *merged* from two independent codebases (around 10 years ago). In
> short, there was a lot of code duplication and interaction between
> Solr and Lucene back then, with patches flying back and forth.
> Integration into a single codebase seemed like a great idea to clean
> things up and make things easier. In many ways this is exactly what
> did happen: we have cleaned up code dependencies and reusable
> components (on Lucene side) consumed by not just Solr but also other
> projects (downstream from Lucene).
>
> The situation we find ourselves now is different to what it was
> before: recent and ongoing development for the most part falls within
> Solr or Lucene exclusively.
>
>
> This e-mail is for discussing the idea and presenting arguments/
> counter-arguments for or against the split. It will be followed by a
> separate VOTE thread e-mail next Monday. If the vote passes then there
> are many questions about how this process should be arranged and
> orchestrated. There are past examples even within Lucene [1] that we
> can learn from, and there are people who know how to do it - the
> actual process is of lesser concern at the moment, what we mostly want
> to do is to reach out to you, signal the idea and ask about your
> opinion. Let us know what you think.
>
> [1] https://lists.apache.org/thread.html/15bf2dc6d6ccd25459f8a43f0122751eedd3834caa31705f790844d7%401270142638%40%3Cuser.nutch.apache.org%3E
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

Mike Drob-3
In reply to this post by Michael Sokolov-4
This is an interesting approach, Michael. I took it a bit further by excluding all authors with only a single commit[1], since I think GitHub PRs tend to highlight that kind of contribution more. Since 2012 I found 24 lucene-only, 31 solr-only, and 77 (about 58%) contributing to both. Since 2018, excluding authors with a single commit, the number went down to 51% of the authors with commits to both projects. But, I think that speaks to a very high degree of collaboration in my opinion.


Dawid, thank you for putting this together. It has obviously been carefully thought over, and there's a lot of content, so I'm not going to try to comment on everything, but will highlight a few things that caught my attention.


> This is a DISCUSS thread and it will be followed next week by a VOTE thread. 
This sounds like a decision has already been made. Additionally, all of the counterarguments presented come with rebuttals attached, so I'm not sure if this is supposed to be a persuasive case or an expositional one.
I think I have an initial reaction that I'm opposed to a split, but I'm not yet concretely sure why.

> Precommit/ test times. These are crazy high.
This seems like an argument for fixing the tests and making them faster, I'm not sure how we get to splitting the projects from here. If you're doing Solr only changes, it's pretty easy to run "./gradlew -p solr test" and skip the lucene tests, similar for lucene only development.

> Mailing lists, build servers
This is probably a good idea and I think this is easy enough to do without splitting the project as well.

> Solr should have its own cadence of releases driven by features, not sub-component changes
Yea, I think this is very likely to happen, where new Lucene versions may not immediately get integrated into the next Solr version, or perhaps not at all, unless somebody is specifically interested in a feature that it offers. I think developers are busy, and incrementing a dependency version is not something that happens unless there is a tangible reason. Which leads directly into the next point...

> Solr tests are the first “battlefield” test zone for Lucene changes
I think https://issues.apache.org/jira/browse/SOLR-14428 is a great example of the kind of collaboration that we can see, and a good hint of what to expect if the projects are split. To summarize, there was a Lucene change which caused some issues in Solr. The fix is likely going to end up being another Lucene change, but just as easily could have been a kind of ugly workaround on the Solr side.

I think the points and counterpoints are essentially correct, but the opening statement appears to undersell the counterarguments as a matter of degree, in my view. I'll continue to think on this, and post more as ideas solidify in my head.

[1]: git shortlog -s -n --since=2018 | grep -v '\s1\s' | cut -c7-

On Mon, May 4, 2020 at 9:49 AM Michael Sokolov <[hidden email]> wrote:
I always like to look at data when making a big decision, so I
gathered some statistics about authors and commits to git over the
history of the project. I wanted to see what these statistics could
tell us about the degree of overlap between the two projects and
whether it has changed over time. Using commands like

     git log --pretty=%an --since=2012 --lucene
     git log --pretty=%an --since=2012 --solr

I looked at the authors of commits in the lucene and solr top-level
folders of the project. I think this makes a reasonable proxy for
contributors to the two projects. From there I found that since 2012,
there are 60 Lucene-only authors, 71 Solr-only authors, and 101
authors (or 43%) contributing at least one commit to each project.
Since 2018, the percentage of both-project authors is somewhat lower:
36%.

I also looked at commits spanning both projects. I'm not sure this
captures all the work that touches both projects, but it's a window
into that, at least. I found that since 2012, 1387/19063 (6.8%) of
commits spanned both project folders. Since 2018, 7.4% did.

I don't think you can really draw very many meaningful conclusions
from this, but a few things jump out: First, it is clear that these
projects are not completely separate today. A substantial number of
people commit to both, over time, although most people do not. Also,
relatively few commits span both projects. Some do though, and it's
certainly worth considering what the workflow for such changes would
be like in the split world. Maybe a majority of these are
build-related; it's hard to tell from this coarse analysis.


On Mon, May 4, 2020 at 5:11 AM Dawid Weiss <[hidden email]> wrote:
>
> Dear Lucene and Solr developers!
>
> A few days ago, I initiated a discussion among PMC members about
> potential pros and cons of splitting the project into separate Lucene
> and Solr entities by promoting Solr to its own top-level Apache
> project (TLP). Let me share with you the motivation for such an action
> and some follow-up thoughts I heard from other PMC members so far.
>
> Please read this e-mail carefully. Both the PMC and I look forward to
> hearing your opinion. This is a DISCUSS thread and it will be followed
> next week by a VOTE thread. This is our shared project and we should
> all shape its future responsibly.
>
> The big question is this: “Is this the right time to split Solr and
> Lucene into two independent projects?”.
>
> Here are several technical considerations that drove me to ask the
> question above (in no order of priorities):
>
> 1) Precommit/ test times. These are crazy high. If we split into two
> projects we can pretty much cut all of Lucene testing out of Solr (and
> likewise), making development a bit more fun again.
>
> 2) Build system itself and source release packaging. The current
> combined codebase is a *beast* to maintain. Working with gradle on
> both projects at once made me realise how little the two have in
> common. The code layout, the dependencies, even the workflow of people
>
> working on these projects... The build (both ant and gradle) is full
> of Solr and Lucene-specific exceptions and hooks that could be more
> elegantly solved if moved to each project independently.
>
> 3) Packaging. There is no single source distribution package for
> Solr+Lucene. They are already "independent" there. Why should Lucene
> and Solr always be released at the same pace? Does it always make
> sense?
>
> 4) Solr is essentially taking in Lucene and its dependencies as a
> whole (so is Elasticsearch and many other projects). In my opinion
> this makes Lucene eligible for refactoring and
>
> maintenance as a separate component. The learning curve for people
> coming to each project separately is going to be gentler than trying
> to dive into the combined codebase.
>
> 5) Mailing lists, build servers. Mailing lists for users are already
> separated. I think this is yet another indication that Solr is
> something more than a component within Lucene. It is perceived as an
> independent entity and used as an independent product. I would really
> like to have separate mailing lists for these two projects (this
> includes build and test results) as it would make life easier: if your
> focus is more on Lucene (or Solr), you would only need to track half
> of the current traffic.
>
>
> As I already mentioned, the discussion among PMC members highlighted
> some initial concerns and reasons why the project should perhaps
> remain glued together. These are outlined below with some of the
> counter-arguments presented under each concern to avoid repetition of
> the same content from the PMC mailing list (they’re copied from the
> private discussion list).
>
> 1) Both projects may gradually split their ways after the separation
> and even develop “against” each other like it used to be before the
> merge.
>
> Whether this is a legitimate concern is hard to tell. If Solr goes TLP
> then all existing Lucene committers will automatically become Solr
> committers (unless they opt not to) so there will be both procedural
> ways to prevent this from happening (vetoes) as well as common-sense
> reasons to just cooperate.
>
> 2) Some people like parallel version numbering (concurrent Solr and
> Lucene releases) as it gives instant clarity which Solr version uses
> which version of Lucene.
>
> This can still be done on Solr side (it is Solr’s decision to adapt
> any versioning scheme the project feels comfortable with). I
> personally (DW) think this kind of versioning is actually more
> confusing than helpful; Solr should have its own cadence of releases
> driven by features, not sub-component changes. If the “backwards
> compatibility” is a factor then a solution might be to sync on major
> version releases only (e.g., this is how Elasticsearch is handling
> this).
>
> 3) Solr tests are the first “battlefield” test zone for Lucene changes
> - if it becomes TLP this part will be gone.
>
> Yes, true. But realistically Solr will have to adopt some kind of
> snapshot-based dependency on Lucene anyway (whether as a git submodule
> or a maven snapshot dependency). So if there are bugs in Lucene they
> will still be detected by Solr tests (and fairly early).
>
> 4) Why split now if we merged in the first place?
>
> Some of you may wonder why split the project that was initially
> *merged* from two independent codebases (around 10 years ago). In
> short, there was a lot of code duplication and interaction between
> Solr and Lucene back then, with patches flying back and forth.
> Integration into a single codebase seemed like a great idea to clean
> things up and make things easier. In many ways this is exactly what
> did happen: we have cleaned up code dependencies and reusable
> components (on Lucene side) consumed by not just Solr but also other
> projects (downstream from Lucene).
>
> The situation we find ourselves now is different to what it was
> before: recent and ongoing development for the most part falls within
> Solr or Lucene exclusively.
>
>
> This e-mail is for discussing the idea and presenting arguments/
> counter-arguments for or against the split. It will be followed by a
> separate VOTE thread e-mail next Monday. If the vote passes then there
> are many questions about how this process should be arranged and
> orchestrated. There are past examples even within Lucene [1] that we
> can learn from, and there are people who know how to do it - the
> actual process is of lesser concern at the moment, what we mostly want
> to do is to reach out to you, signal the idea and ask about your
> opinion. Let us know what you think.
>
> [1] https://lists.apache.org/thread.html/15bf2dc6d6ccd25459f8a43f0122751eedd3834caa31705f790844d7%401270142638%40%3Cuser.nutch.apache.org%3E
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

Dawid Weiss-2
> This sounds like a decision has already been made.

No. I plan to send a VOTE thread nonetheless. A vote thread is just
that -- a vote. If majority decides both projects
should stay together it's still a decision. A discussion without any
resolution is going to dissolve over time into no resolution at all.

> Additionally, all of the counterarguments presented come with rebuttals attached, so I'm not sure if this is supposed to be a persuasive case or an expositional one.

This thread is for discussion, please expand the counterarguments with
any point of view you like. I did include counterarguments I collected
from private mailing list.

> I think I have an initial reaction that I'm opposed to a split, but I'm not yet concretely sure why.

Like I said multiple times I see this as a reasonable technical
decision and I don't think the community (communities?) will suffer
much because of this. This is not a hostile code fork or an attempt to
hijack developers. Whoever has interest in Solr and Lucene will still
be a Solr and Lucene developer. I really don't think that much will
change.

My point of view crystallised because of the build system work - I
admit this freely. The ant one is hair-bending. The gradle one is
inconvenient like hell when you have effectively two "top-level"
projects to handle within the same configuration. When I started
looking at other aspects I became convinced this is the right way to
go.

Separately from that I think Solr has become older, larger and is an
industry standard search component. It is time for it to mature and
just be a top-level Apache project even from public-relations point of
view.

> This seems like an argument for fixing the tests and making them faster, I'm not sure how we get to splitting the projects from here. If you're doing Solr only changes, it's pretty easy to run "./gradlew -p solr test" and skip the lucene tests, similar for lucene only development.

Nah, this isn't true. All CI jobs, github, etc. - everything is
checked and verified and extends things twice more than it should.

> > Mailing lists, build servers
> This is probably a good idea and I think this is easy enough to do without splitting the project as well.

They are already separated to a large degree. The only thing in common
is dev list an even there threads are really split between discussions
concerning Solr and Lucene functionality.

> > Solr tests are the first “battlefield” test zone for Lucene changes
> I think https://issues.apache.org/jira/browse/SOLR-14428 is a great example of the kind of collaboration that we can see, and a good hint of what to expect if the projects are split. To summarize, there was a Lucene change which caused some issues in Solr. The fix is likely going to end up being another Lucene change, but just as easily could have been a kind of ugly workaround on the Solr side.

Maybe. There are a lot of maybes. I still think a split would make
things easier. For example the ugly workaround could go into an
immediate bugfix release for Solr, followed by a patch to Lucene and a
proper fix later on. Now you can't do an immediate bugfix/ workaround
Solr release without a corresponding Lucene release (which doesn't
make sense to me at all).

Oh, and don't get me wrong - I understand you can have doubts. I am
prepared to defend my position because it's been growing in me for a
few months now; I have been digesting this for a longer time and it
probably makes a difference.

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

Dawid Weiss-2
Perhaps I didn't clarify this so far: my own interests (personal and
business) are shared equally between Solr and Lucene (we have products
that have plain Lucene underneath and we maintain products and systems
that use Solr).  So I am going to have a foot in both worlds no matter
the outcome. I really do feel confident both Lucene and Solr would
have a breath of fresh air if they were independent (smaller).

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

Gézapeti Cseh
I think separating the git repository and even the release schedules could be done under the same TLP.
It would solve most of the technical issues reflected in the first mail and there would be more time and data to see if creating Apache Solr again is something the PMC would want to do

gp


On Mon, May 4, 2020 at 8:20 PM Dawid Weiss <[hidden email]> wrote:
Perhaps I didn't clarify this so far: my own interests (personal and
business) are shared equally between Solr and Lucene (we have products
that have plain Lucene underneath and we maintain products and systems
that use Solr).  So I am going to have a foot in both worlds no matter
the outcome. I really do feel confident both Lucene and Solr would
have a breath of fresh air if they were independent (smaller).

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

Michael McCandless-2
On Mon, May 4, 2020 at 5:28 PM Gézapeti Cseh <[hidden email]> wrote:

I think separating the git repository and even the release schedules could be done under the same TLP.
It would solve most of the technical issues reflected in the first mail and there would be more time and data to

Hmm that is technically true, and in fact that is the way it was before 10 years ago: Solr was a sub-project of Apache Lucene.

But that is not the proposal here.

Lucene and Solr have become such major efforts, in developers and users eyes and keyboard effort/time, that they really are very different entities now.  TLP makes sense to me for each project.
 
see if creating Apache Solr again is something the PMC would want to do

Hmm, just to clarify, this is not an "again" sort of situation: Solr was not a top-level project before.  It was and still is a sub-project of Apache Lucene.

And the proposal is to now split it out as its own (new) top-level project, Apache Solr.

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

Michael McCandless-2
In reply to this post by Dawid Weiss-2
On Mon, May 4, 2020 at 2:13 PM Dawid Weiss <[hidden email]> wrote:

> This sounds like a decision has already been made.

No. I plan to send a VOTE thread nonetheless. A vote thread is just
that -- a vote. If majority decides both projects
should stay together it's still a decision. A discussion without any
resolution is going to dissolve over time into no resolution at all.

No decision has been made.

The point of a DISCUSS thread, prior to a VOTE thread, is for all interested parties to voice their diverse reactions to this proposal, and help the binding voters (Lucene/Solr committers) make up their minds about how to vote on the VOTE thread.  We have a delightfully diverse community here who will all contribute in choosing our path forward.
 
Separately from that I think Solr has become older, larger and is an
industry standard search component. It is time for it to mature and
just be a top-level Apache project even from public-relations point of
view.

+1

I feel Solr as its own Apache top-level project is actually long overdue: Solr has clearly been a leading standard open-search distributed search engine for quite some time already, with its own strong user and developer identities and culture.  We long ago achieved the goals (paying down open-source tech debt) of merging the two projects a decade ago.
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

Doug Turnbull
In reply to this post by Michael McCandless-2
Personally I feel the burden of proof should not be why they should be split up, but the other way - "what arguments can be made for keeping them together?"

I would be curious if people can make the argument for keeping them together...

-Doug

On Tue, May 5, 2020 at 10:29 AM Michael McCandless <[hidden email]> wrote:
On Mon, May 4, 2020 at 5:28 PM Gézapeti Cseh <[hidden email]> wrote:

I think separating the git repository and even the release schedules could be done under the same TLP.
It would solve most of the technical issues reflected in the first mail and there would be more time and data to

Hmm that is technically true, and in fact that is the way it was before 10 years ago: Solr was a sub-project of Apache Lucene.

But that is not the proposal here.

Lucene and Solr have become such major efforts, in developers and users eyes and keyboard effort/time, that they really are very different entities now.  TLP makes sense to me for each project.
 
see if creating Apache Solr again is something the PMC would want to do

Hmm, just to clarify, this is not an "again" sort of situation: Solr was not a top-level project before.  It was and still is a sub-project of Apache Lucene.

And the proposal is to now split it out as its own (new) top-level project, Apache Solr.



--
Doug Turnbull | CTO | OpenSource Connections, LLC | 240.476.9983 
Author: Relevant Search; Contributor: AI Powered Search
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

Jan Høydahl / Cominvent
In reply to this post by Dawid Weiss-2
Thanks for bringing it up Dawid.

I’ve asked myself the same question several times over the last couple of years, and have kind of been waiting for someone to make the proposal :)
In my head, Solr has out-grown being a sub project of Lucene, like hadoop, mahout, nutch and tika before it.

The move will promote Solr as a separate TLP with better visibility and more autonomy.
Simply put Solr will go from https://lucene.apache.org/solr/ to https://solr.apache.org/

Splitting will be lots of work for sure, but I am not worried about the future relationship between the two. The last couple of years most og us have already done LUCENE and SOLR changes in separate Jiras and separate patches, first committing changes to LUCENE before the related SOLR change. It will be more or less the same approach after the split, just that there will be a couple of days between the Lucene release and the next Solr release depending on it.

As it is today, deveopers have had to do necessary Solr changes at the same time when doing changes in Lucene. This is not really fair to the (mainly) Lucene developers. It is not fair to Solr either, as such work might be done in a hasty fashion and/or in a sub optimal way due to lack of familiarity with Solr code base; like we unfortunately have seen a couple of times in the past (not trying to blame anyone). With Lucene as a dependency, Solr can choose to stay on same Lucene version for a couple of releases while taking the time to work out the proper way to adapt to changed Lucene APIs or to sort out performance issues.

Question: When Lucene no longer has the Solr test suite to help catch bugs, how long time would it take from a Lucene commit, before Solr/ES Jenkins instances would have had time to produce a build and run tests? Would it be possible to setup a trigger in Solr Jenkins?

Jan

> 4. mai 2020 kl. 11:10 skrev Dawid Weiss <[hidden email]>:
>
> Dear Lucene and Solr developers!


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

Michael McCandless-2
On Tue, May 5, 2020 at 11:41 AM Jan Høydahl <[hidden email]> wrote:

As it is today, deveopers have had to do necessary Solr changes at the same time when doing changes in Lucene. This is not really fair to the (mainly) Lucene developers. It is not fair to Solr either, as such work might be done in a hasty fashion and/or in a sub optimal way due to lack of familiarity with Solr code base; like we unfortunately have seen a couple of times in the past (not trying to blame anyone). With Lucene as a dependency, Solr can choose to stay on same Lucene version for a couple of releases while taking the time to work out the proper way to adapt to changed Lucene APIs or to sort out performance issues.

+1, that is a great point, Jan.

This will mean that the (any) necessary Solr source code changes that go along with a Lucene change will (sometimes) be done with higher quality, more thought, better expertise, etc., which I agree will be good for ongoing Solr development, help prevent accidental performance regressions, etc.  Net/net that's a big positive for Solr, in addition to having a stronger independent identity (https://solr.apache.org).
 
Question: When Lucene no longer has the Solr test suite to help catch bugs, how long time would it take from a Lucene commit, before Solr/ES Jenkins instances would have had time to produce a build and run tests? Would it be possible to setup a trigger in Solr Jenkins?

That's a great question!

Maybe Elasticsearch developers could chime in, since this already happened for them many times by now :)  I would think there are technical solutions to let the Solr CI build pull the latest Lucene snapshot build, to keep the latency lowish, but I do not know the details.

Mike
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

Tomás Fernández Löbbe
I don’t agree with the argument “Solr outgrew being a subproject of Lucene”. I read “promotion to TLP” as if this was some achievement that needs to be celebrated now. Solr didn’t become a TLP years ago because the decision then was to merge with Lucene development, thinking they would progress better together than separated. It’s technically true that Solr is a subproject of Lucene, but so is Lucene Core, and I don’t see Lucene Core being promoted to TLP. They are both part of the same Apache project, which for historical reasons is called Lucene.

> I would be curious if people can make the argument for keeping them together...

I think the same arguments that were used 10 years ago to merge the projects are as valid now, some of them presented in Dawid’s email. Faster development, better coverage, code in the right places.[1]

IMO, if we need to say “we can’t release X because it breaks Y”, or “we need to release X to be able to release Y”, the projects are not really independent, and “the PMCs will overlap” won’t take us very far.

> The big question is this: “Is this the right time to split Solr and Lucene into two independent projects?”.
This is not the question we should be asking ourselves right now. It assumes the split is happening, and that’s what we are trying to discuss here. The question in my mind is “Is splitting Lucene and Solr into different project beneficial for them? Is this going to make them both better?"

> As it is today, deveopers have had to do necessary Solr changes at the same time when doing changes in Lucene. This is not really fair to the (mainly) Lucene developers. It is not fair to Solr either, as such work might be done in a hasty fashion and/or in a sub optimal way due to lack of familiarity with Solr code base; like we unfortunately have seen a couple of times in the past (not trying to blame anyone).

This, I agree, is a pain point for keeping them together. That said, while not all, most currently active committers joined the project while this was already a thing, it’s not something that was imposed later to the majority of us.

> With Lucene as a dependency, Solr can choose to stay on same Lucene version for a couple of releases while taking the time to work out the proper way to adapt to changed Lucene APIs or to sort out performance issues.

I agree with this and I believe it’s a point in favor of keeping them together (and in part discussed 10 years ago when projects merged). Keeping them on the same repo forces Solr to use the latest Lucene, helping find issues/bugs soon, hopefully before they are released.


[1] https://mail-archives.apache.org/mod_mbox/lucene-general/201002.mbox/%3c9ac0c6aa1002240832x1a8e3309k6799d75b8d19d0dd@...%3e

On Tue, May 5, 2020 at 8:56 AM Michael McCandless <[hidden email]> wrote:
On Tue, May 5, 2020 at 11:41 AM Jan Høydahl <[hidden email]> wrote:

As it is today, deveopers have had to do necessary Solr changes at the same time when doing changes in Lucene. This is not really fair to the (mainly) Lucene developers. It is not fair to Solr either, as such work might be done in a hasty fashion and/or in a sub optimal way due to lack of familiarity with Solr code base; like we unfortunately have seen a couple of times in the past (not trying to blame anyone). With Lucene as a dependency, Solr can choose to stay on same Lucene version for a couple of releases while taking the time to work out the proper way to adapt to changed Lucene APIs or to sort out performance issues.

+1, that is a great point, Jan.

This will mean that the (any) necessary Solr source code changes that go along with a Lucene change will (sometimes) be done with higher quality, more thought, better expertise, etc., which I agree will be good for ongoing Solr development, help prevent accidental performance regressions, etc.  Net/net that's a big positive for Solr, in addition to having a stronger independent identity (https://solr.apache.org).
 
Question: When Lucene no longer has the Solr test suite to help catch bugs, how long time would it take from a Lucene commit, before Solr/ES Jenkins instances would have had time to produce a build and run tests? Would it be possible to setup a trigger in Solr Jenkins?

That's a great question!

Maybe Elasticsearch developers could chime in, since this already happened for them many times by now :)  I would think there are technical solutions to let the Solr CI build pull the latest Lucene snapshot build, to keep the latency lowish, but I do not know the details.

Mike
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

Dawid Weiss-2
In reply to this post by Jan Høydahl / Cominvent
> Question: When Lucene no longer has the Solr test suite to help catch bugs, how long time would it take from a Lucene commit, before Solr/ES Jenkins instances would have had time to produce a build and run tests? Would it be possible to setup a trigger in Solr Jenkins?

It depends how the code is organized after Lucene becomes a
subcomponent. If it's a regular dependency (on a *-SNAPSHOT version)
then the trigger would have to be dual (any commit on Solr or any
commit on Lucene). If the code is organized around a git submodule
with Lucene then bumping a version on a submodule would effectively
trigger a CI build. This "bumping" can be automated on certain
branches (such as master) so effectively it'd be immediately ready for
testing...

It's not really that relevant to this discussion but if you're curious
what this looks like I created an example submodule setup reflecting
current master here, try it:

git clone [hidden email]:dweiss/lucene-solr.git -b split/solr
cd lucene-solr/

you'll see the 'lucene/ folder is empty. It is a  submodule. When you issue:

git submodule status

you'll see which git revision that submodule is on:

-e5092db7915ac49d0ade0591e7b52176657c380c lucene

You can get the sub repositories in their respective versions by doing:

git submodule init
git submodule update

When you cd into lucene now you'll see it is a separate repository
(that things can be committed to, branches switched, etc.).

git status
HEAD detached at e5092db791
nothing to commit, working tree clean

Submodules in git have an extra advantage over snapshot dependencies:
they always point at a given revision of a submodule *exactly* so each
and every commit in the parent repository has exact versions of each
submodule recorded in git history. Of course not everything is rosy -
working with submodule-organized repositories does have a darker side
too (new git workflows to be learned, switching incompatible branches
can be tricky, etc.).

D.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

Dawid Weiss-2
In reply to this post by Tomás Fernández Löbbe
> I read “promotion to TLP” as if this was some achievement that needs to be celebrated now.

I honestly believe it is an achievement for a project to receive
top-level status. It's a sign of having a community of users,
committers and processes mature enough to empower its further
development.

> It’s technically true that Solr is a subproject of Lucene, but so is Lucene Core, and I don’t see Lucene Core being promoted to TLP

I don't think these are same magnitude components, sorry. I can name
at least a few projects that depend on Lucene alone (core + extras)
and I can name companies using Solr as a product but I can't name a
single project that would depend on lucene-core alone (without any
other lucene-* dependency). Maybe there is something like this but
it's definitely an outlier example of a typical use case.

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

Tomás Fernández Löbbe


On Tue, May 5, 2020 at 12:37 PM Dawid Weiss <[hidden email]> wrote:
> I read “promotion to TLP” as if this was some achievement that needs to be celebrated now.

I honestly believe it is an achievement for a project to receive
top-level status. It's a sign of having a community of users,
committers and processes mature enough to empower its further
development.

My point is that this is not something new. Solr is a mature product and has had the community and process in place for a long time. 
 

> It’s technically true that Solr is a subproject of Lucene, but so is Lucene Core, and I don’t see Lucene Core being promoted to TLP

I don't think these are same magnitude components, sorry. I can name
at least a few projects that depend on Lucene alone (core + extras)
and I can name companies using Solr as a product but I can't name a
single project that would depend on lucene-core alone (without any
other lucene-* dependency). Maybe there is something like this but
it's definitely an outlier example of a typical use case.

If you go to lucene.apache.org, you'll see three things: Lucene Core (Lucene with all it's modules), Solr and PyLucene. That's what I mean.
 

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

Dawid Weiss-2
> If you go to lucene.apache.org, you'll see three things: Lucene Core (Lucene with all it's modules), Solr and PyLucene. That's what I mean.

Hmm... Maybe I'm dim but that's essentially what I want to do. Look:

1. Lucene Core (Lucene with all it's modules)
2. Solr
3. PyLucene

The thing is: (1) is already a TLP - that's just Lucene. My call is to
make (2) a TLP. (3) I can't tell much about because I don't know
PyLucene as well as I do Solr and Lucene... But it seems to me that
PyLucene fits much better under "Lucene" umbrella, even the name
suggests that.



Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

Ishan Chattopadhyaya
Except the logistics of enacting the split, I see no valid reason of keeping the projects together. Git submodule is the magic that we have to ease any potential discomfort. However, the effort needed to split feels absolutely massive, so I'm not sure if it is worth the hassle.

On Wed, 6 May, 2020, 1:31 pm Dawid Weiss, <[hidden email]> wrote:
> If you go to lucene.apache.org, you'll see three things: Lucene Core (Lucene with all it's modules), Solr and PyLucene. That's what I mean.

Hmm... Maybe I'm dim but that's essentially what I want to do. Look:

1. Lucene Core (Lucene with all it's modules)
2. Solr
3. PyLucene

The thing is: (1) is already a TLP - that's just Lucene. My call is to
make (2) a TLP. (3) I can't tell much about because I don't know
PyLucene as well as I do Solr and Lucene... But it seems to me that
PyLucene fits much better under "Lucene" umbrella, even the name
suggests that.



Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

Simon Willnauer-4
I can speak from experience that working with a snapshot is much
cleaner than working with submodules. We do this in elasticsearch for
a very long time now and our process here works just fine. It has a
bunch of advantages over a direct / source dependency like solr has
right now. I recall that someone else already mentioned some of them
like working on somewhat more stable codebase etc. do refactorings and
integration when there are people dedicated to it and have enough time
to do it properly.

Regarding the effort of a split, I think that not doing something
because it's a lot of work will just cause a ton of issues down the
road. Doing the right thing is a lot of work that's for sure but we
can start working on this in baby steps an we can all help. Like we
can gradually do this, start with website, lists then build system
etc. or start with build first and do website last. It's ok to apply
progress over perfection here. We all want this to be done properly
and we are all here to help, at least I am.

simon

On Wed, May 6, 2020 at 10:51 AM Ishan Chattopadhyaya
<[hidden email]> wrote:

>
> Except the logistics of enacting the split, I see no valid reason of keeping the projects together. Git submodule is the magic that we have to ease any potential discomfort. However, the effort needed to split feels absolutely massive, so I'm not sure if it is worth the hassle.
>
> On Wed, 6 May, 2020, 1:31 pm Dawid Weiss, <[hidden email]> wrote:
>>
>> > If you go to lucene.apache.org, you'll see three things: Lucene Core (Lucene with all it's modules), Solr and PyLucene. That's what I mean.
>>
>> Hmm... Maybe I'm dim but that's essentially what I want to do. Look:
>>
>> 1. Lucene Core (Lucene with all it's modules)
>> 2. Solr
>> 3. PyLucene
>>
>> The thing is: (1) is already a TLP - that's just Lucene. My call is to
>> make (2) a TLP. (3) I can't tell much about because I don't know
>> PyLucene as well as I do Solr and Lucene... But it seems to me that
>> PyLucene fits much better under "Lucene" umbrella, even the name
>> suggests that.
>>
>>
>>
>> Dawid
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

Gus Heck
IMO, if we need to say “we can’t release X because it breaks Y”, or “we need to release X to be able to release Y”, the projects are not really independent, and “the PMCs will overlap” won’t take us very far.

This. I don't think the two really can be separated. Any separation will merely be artificial, and/or an excuse for throwing stuff over the wall. The sooner incompatibilities or difficulties are identified the better. Definitely not in favor of splitting.  

Really, we are effectively "search.apache.org" (or I suppose "java-search.apache.org") and the lucene name as the TLP is just a legacy thing. We can have components (as does hc.apache.org) but Solr can't live without Lucene, so fostering a sense of separation is going to be bad for Solr. 

If someday we reach a point where some other library could swap into Solr to replace Lucene, then maybe.

My opinion, YMMV :)


On Wed, May 6, 2020 at 5:40 AM Simon Willnauer <[hidden email]> wrote:
I can speak from experience that working with a snapshot is much
cleaner than working with submodules. We do this in elasticsearch for
a very long time now and our process here works just fine. It has a
bunch of advantages over a direct / source dependency like solr has
right now. I recall that someone else already mentioned some of them
like working on somewhat more stable codebase etc. do refactorings and
integration when there are people dedicated to it and have enough time
to do it properly.

Regarding the effort of a split, I think that not doing something
because it's a lot of work will just cause a ton of issues down the
road. Doing the right thing is a lot of work that's for sure but we
can start working on this in baby steps an we can all help. Like we
can gradually do this, start with website, lists then build system
etc. or start with build first and do website last. It's ok to apply
progress over perfection here. We all want this to be done properly
and we are all here to help, at least I am.

simon

On Wed, May 6, 2020 at 10:51 AM Ishan Chattopadhyaya
<[hidden email]> wrote:
>
> Except the logistics of enacting the split, I see no valid reason of keeping the projects together. Git submodule is the magic that we have to ease any potential discomfort. However, the effort needed to split feels absolutely massive, so I'm not sure if it is worth the hassle.
>
> On Wed, 6 May, 2020, 1:31 pm Dawid Weiss, <[hidden email]> wrote:
>>
>> > If you go to lucene.apache.org, you'll see three things: Lucene Core (Lucene with all it's modules), Solr and PyLucene. That's what I mean.
>>
>> Hmm... Maybe I'm dim but that's essentially what I want to do. Look:
>>
>> 1. Lucene Core (Lucene with all it's modules)
>> 2. Solr
>> 3. PyLucene
>>
>> The thing is: (1) is already a TLP - that's just Lucene. My call is to
>> make (2) a TLP. (3) I can't tell much about because I don't know
>> PyLucene as well as I do Solr and Lucene... But it seems to me that
>> PyLucene fits much better under "Lucene" umbrella, even the name
>> suggests that.
>>
>>
>>
>> Dawid
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]



--
123