CHANGES.txt and issue categorization

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

CHANGES.txt and issue categorization

david.w.smiley@gmail.com
I'd like us to reflect on how we categorize issues in CHANGES.txt.  We have these categories:
(Lucene) 'API Changes', 'New Features', 'Improvements', 'Optimizations', 'Bug Fixes', 'Other'
(Solr) 'New Features', 'Improvements', 'Optimizations', 'Bug Fixes', 'Other Changes'
(I lifted these from dev-tools/scripts/addVersion.py line 215)

In particular, I'm often surprised at how some of us categorize New Features or Improvements that should better be categorized as something else.  I think the root cause of these problems may be that we don't have JIRA categories that directly align.  Furthermore, our dev practices will typically result in a CHANGES.txt being added out of band from the code-review process, and thus no peer-review on ideal placement.  Furthermore the message itself is often not code reviewed but should be.  Perhaps we can simply get in the habit of adding a JIRA comment (or GH code review) what we propose the category & issue summary should be.

Here is my attempt at a definition for _some_ of these categories.  I don't pretend to think we all agree 100% but it's up for discussion:
========
* New Features:  A user-visible new capability.  Usually opt-in.

* Improvements:  A user-visible improvement to an existing capability that somehow expands its ability or that which improves the behavior.  Not a refactoring, not an optimization.

* Optimizations: Something is now more efficient.  Usually automatic (not opt-in).

* Other:  Anything else: Refactorings, tests, build, docs, etc.  And adding log statements.
========

I recommend the following changes to Lucene 8.5:

These are "Improvements" that I think are better categorized as "Optimizations"
* LUCENE-9211: Add compression for Binary doc value fields. (Mark Harwood)
* LUCENE-4702: Better compression of terms dictionaries. (Adrien Grand)
* LUCENE-9228: Sort dvUpdates in the term order before applying if they all update a
  single field to the same value. This optimization can reduce the flush time by around
  20% for the docValues update user cases. (Nhat Nguyen, Adrien Grand, Simon Willnauer)
* LUCENE-9245: Reduce AutomatonTermsEnum memory usage. (Bruno Roustant, Robert Muir)
* LUCENE-9237: Faster UniformSplit intersect TermsEnum. (Bruno Roustant)

These "Improvements" I think are better categorized as "Other":
* LUCENE-9109: Backport some changes from master (except StackWalker) to improve
  TestSecurityManager (Uwe Schindler)
* LUCENE-9110: Backport refactored stack analysis in tests to use generalized
  LuceneTestCase methods (Uwe Schindler)
* LUCENE-9141: Simplify LatLonShapeXQuery API by adding a new abstract class called LatLonGeometry. Queries are
  executed with input objects that extend such interface. (Ignacio Vera)
* LUCENE-9194: Simplify XYShapeXQuery API by adding a new abstract class called XYGeometry. Queries are
  executed with input objects that extend such interface. (Ignacio Vera)

Maybe this "Other" item should be  "Optimization"? (not sure):
* LUCENE-9068: FuzzyQuery builds its Automaton up-front (Alan Woodward, Mike Drob)

Solr:

"New Features" that maybe should be "Improvements":
 * SOLR-13892: New "top-level" docValues join implementation (Jason Gerlowski, Joel Bernstein)
 * SOLR-14242: HdfsDirectory now supports indexing geo-points, ranges or shapes. (Adrien Grand)

"Improvements" that maybe should be "Optimizations":
* SOLR-13808: filter in BoolQParser and {"bool":{"filter":..}} in Query DSL are cached by default (Mikhail Khludnev)

"Improvements" that maybe should be "Other":
* SOLR-14114: Add WARN to Solr log that embedded ZK is not supported in production (janhoy)

Thoughts?

~ David Smiley
Apache Lucene/Solr Search Developer
Reply | Threaded
Open this post in threaded view
|

Re: CHANGES.txt and issue categorization

david.w.smiley@gmail.com
I'll simply move these items around tomorrow this time, unless I hear feedback to the contrary.

~ David Smiley
Apache Lucene/Solr Search Developer


On Mon, Mar 2, 2020 at 1:07 PM David Smiley <[hidden email]> wrote:
I'd like us to reflect on how we categorize issues in CHANGES.txt.  We have these categories:
(Lucene) 'API Changes', 'New Features', 'Improvements', 'Optimizations', 'Bug Fixes', 'Other'
(Solr) 'New Features', 'Improvements', 'Optimizations', 'Bug Fixes', 'Other Changes'
(I lifted these from dev-tools/scripts/addVersion.py line 215)

In particular, I'm often surprised at how some of us categorize New Features or Improvements that should better be categorized as something else.  I think the root cause of these problems may be that we don't have JIRA categories that directly align.  Furthermore, our dev practices will typically result in a CHANGES.txt being added out of band from the code-review process, and thus no peer-review on ideal placement.  Furthermore the message itself is often not code reviewed but should be.  Perhaps we can simply get in the habit of adding a JIRA comment (or GH code review) what we propose the category & issue summary should be.

Here is my attempt at a definition for _some_ of these categories.  I don't pretend to think we all agree 100% but it's up for discussion:
========
* New Features:  A user-visible new capability.  Usually opt-in.

* Improvements:  A user-visible improvement to an existing capability that somehow expands its ability or that which improves the behavior.  Not a refactoring, not an optimization.

* Optimizations: Something is now more efficient.  Usually automatic (not opt-in).

* Other:  Anything else: Refactorings, tests, build, docs, etc.  And adding log statements.
========

I recommend the following changes to Lucene 8.5:

These are "Improvements" that I think are better categorized as "Optimizations"
* LUCENE-9211: Add compression for Binary doc value fields. (Mark Harwood)
* LUCENE-4702: Better compression of terms dictionaries. (Adrien Grand)
* LUCENE-9228: Sort dvUpdates in the term order before applying if they all update a
  single field to the same value. This optimization can reduce the flush time by around
  20% for the docValues update user cases. (Nhat Nguyen, Adrien Grand, Simon Willnauer)
* LUCENE-9245: Reduce AutomatonTermsEnum memory usage. (Bruno Roustant, Robert Muir)
* LUCENE-9237: Faster UniformSplit intersect TermsEnum. (Bruno Roustant)

These "Improvements" I think are better categorized as "Other":
* LUCENE-9109: Backport some changes from master (except StackWalker) to improve
  TestSecurityManager (Uwe Schindler)
* LUCENE-9110: Backport refactored stack analysis in tests to use generalized
  LuceneTestCase methods (Uwe Schindler)
* LUCENE-9141: Simplify LatLonShapeXQuery API by adding a new abstract class called LatLonGeometry. Queries are
  executed with input objects that extend such interface. (Ignacio Vera)
* LUCENE-9194: Simplify XYShapeXQuery API by adding a new abstract class called XYGeometry. Queries are
  executed with input objects that extend such interface. (Ignacio Vera)

Maybe this "Other" item should be  "Optimization"? (not sure):
* LUCENE-9068: FuzzyQuery builds its Automaton up-front (Alan Woodward, Mike Drob)

Solr:

"New Features" that maybe should be "Improvements":
 * SOLR-13892: New "top-level" docValues join implementation (Jason Gerlowski, Joel Bernstein)
 * SOLR-14242: HdfsDirectory now supports indexing geo-points, ranges or shapes. (Adrien Grand)

"Improvements" that maybe should be "Optimizations":
* SOLR-13808: filter in BoolQParser and {"bool":{"filter":..}} in Query DSL are cached by default (Mikhail Khludnev)

"Improvements" that maybe should be "Other":
* SOLR-14114: Add WARN to Solr log that embedded ZK is not supported in production (janhoy)

Thoughts?

~ David Smiley
Apache Lucene/Solr Search Developer
Reply | Threaded
Open this post in threaded view
|

Re: CHANGES.txt and issue categorization

Mikhail Khludnev-2
In reply to this post by david.w.smiley@gmail.com
I'm ok with it. Thank you, David. Will you put it somewhere on wiki?

On Mon, Mar 2, 2020 at 10:07 AM David Smiley <[hidden email]> wrote:
I'd like us to reflect on how we categorize issues in CHANGES.txt.  We have these categories:
(Lucene) 'API Changes', 'New Features', 'Improvements', 'Optimizations', 'Bug Fixes', 'Other'
(Solr) 'New Features', 'Improvements', 'Optimizations', 'Bug Fixes', 'Other Changes'
(I lifted these from dev-tools/scripts/addVersion.py line 215)

In particular, I'm often surprised at how some of us categorize New Features or Improvements that should better be categorized as something else.  I think the root cause of these problems may be that we don't have JIRA categories that directly align.  Furthermore, our dev practices will typically result in a CHANGES.txt being added out of band from the code-review process, and thus no peer-review on ideal placement.  Furthermore the message itself is often not code reviewed but should be.  Perhaps we can simply get in the habit of adding a JIRA comment (or GH code review) what we propose the category & issue summary should be.

Here is my attempt at a definition for _some_ of these categories.  I don't pretend to think we all agree 100% but it's up for discussion:
========
* New Features:  A user-visible new capability.  Usually opt-in.

* Improvements:  A user-visible improvement to an existing capability that somehow expands its ability or that which improves the behavior.  Not a refactoring, not an optimization.

* Optimizations: Something is now more efficient.  Usually automatic (not opt-in).

* Other:  Anything else: Refactorings, tests, build, docs, etc.  And adding log statements.
========

I recommend the following changes to Lucene 8.5:

These are "Improvements" that I think are better categorized as "Optimizations"
* LUCENE-9211: Add compression for Binary doc value fields. (Mark Harwood)
* LUCENE-4702: Better compression of terms dictionaries. (Adrien Grand)
* LUCENE-9228: Sort dvUpdates in the term order before applying if they all update a
  single field to the same value. This optimization can reduce the flush time by around
  20% for the docValues update user cases. (Nhat Nguyen, Adrien Grand, Simon Willnauer)
* LUCENE-9245: Reduce AutomatonTermsEnum memory usage. (Bruno Roustant, Robert Muir)
* LUCENE-9237: Faster UniformSplit intersect TermsEnum. (Bruno Roustant)

These "Improvements" I think are better categorized as "Other":
* LUCENE-9109: Backport some changes from master (except StackWalker) to improve
  TestSecurityManager (Uwe Schindler)
* LUCENE-9110: Backport refactored stack analysis in tests to use generalized
  LuceneTestCase methods (Uwe Schindler)
* LUCENE-9141: Simplify LatLonShapeXQuery API by adding a new abstract class called LatLonGeometry. Queries are
  executed with input objects that extend such interface. (Ignacio Vera)
* LUCENE-9194: Simplify XYShapeXQuery API by adding a new abstract class called XYGeometry. Queries are
  executed with input objects that extend such interface. (Ignacio Vera)

Maybe this "Other" item should be  "Optimization"? (not sure):
* LUCENE-9068: FuzzyQuery builds its Automaton up-front (Alan Woodward, Mike Drob)

Solr:

"New Features" that maybe should be "Improvements":
 * SOLR-13892: New "top-level" docValues join implementation (Jason Gerlowski, Joel Bernstein)
 * SOLR-14242: HdfsDirectory now supports indexing geo-points, ranges or shapes. (Adrien Grand)

"Improvements" that maybe should be "Optimizations":
* SOLR-13808: filter in BoolQParser and {"bool":{"filter":..}} in Query DSL are cached by default (Mikhail Khludnev)

"Improvements" that maybe should be "Other":
* SOLR-14114: Add WARN to Solr log that embedded ZK is not supported in production (janhoy)

Thoughts?

~ David Smiley
Apache Lucene/Solr Search Developer


--
Sincerely yours
Mikhail Khludnev
Reply | Threaded
Open this post in threaded view
|

Re: CHANGES.txt and issue categorization

Adrien Grand
In reply to this post by david.w.smiley@gmail.com
+1 to move these entries.

On Wed, Mar 4, 2020 at 4:27 AM David Smiley <[hidden email]> wrote:
I'll simply move these items around tomorrow this time, unless I hear feedback to the contrary.

~ David Smiley
Apache Lucene/Solr Search Developer


On Mon, Mar 2, 2020 at 1:07 PM David Smiley <[hidden email]> wrote:
I'd like us to reflect on how we categorize issues in CHANGES.txt.  We have these categories:
(Lucene) 'API Changes', 'New Features', 'Improvements', 'Optimizations', 'Bug Fixes', 'Other'
(Solr) 'New Features', 'Improvements', 'Optimizations', 'Bug Fixes', 'Other Changes'
(I lifted these from dev-tools/scripts/addVersion.py line 215)

In particular, I'm often surprised at how some of us categorize New Features or Improvements that should better be categorized as something else.  I think the root cause of these problems may be that we don't have JIRA categories that directly align.  Furthermore, our dev practices will typically result in a CHANGES.txt being added out of band from the code-review process, and thus no peer-review on ideal placement.  Furthermore the message itself is often not code reviewed but should be.  Perhaps we can simply get in the habit of adding a JIRA comment (or GH code review) what we propose the category & issue summary should be.

Here is my attempt at a definition for _some_ of these categories.  I don't pretend to think we all agree 100% but it's up for discussion:
========
* New Features:  A user-visible new capability.  Usually opt-in.

* Improvements:  A user-visible improvement to an existing capability that somehow expands its ability or that which improves the behavior.  Not a refactoring, not an optimization.

* Optimizations: Something is now more efficient.  Usually automatic (not opt-in).

* Other:  Anything else: Refactorings, tests, build, docs, etc.  And adding log statements.
========

I recommend the following changes to Lucene 8.5:

These are "Improvements" that I think are better categorized as "Optimizations"
* LUCENE-9211: Add compression for Binary doc value fields. (Mark Harwood)
* LUCENE-4702: Better compression of terms dictionaries. (Adrien Grand)
* LUCENE-9228: Sort dvUpdates in the term order before applying if they all update a
  single field to the same value. This optimization can reduce the flush time by around
  20% for the docValues update user cases. (Nhat Nguyen, Adrien Grand, Simon Willnauer)
* LUCENE-9245: Reduce AutomatonTermsEnum memory usage. (Bruno Roustant, Robert Muir)
* LUCENE-9237: Faster UniformSplit intersect TermsEnum. (Bruno Roustant)

These "Improvements" I think are better categorized as "Other":
* LUCENE-9109: Backport some changes from master (except StackWalker) to improve
  TestSecurityManager (Uwe Schindler)
* LUCENE-9110: Backport refactored stack analysis in tests to use generalized
  LuceneTestCase methods (Uwe Schindler)
* LUCENE-9141: Simplify LatLonShapeXQuery API by adding a new abstract class called LatLonGeometry. Queries are
  executed with input objects that extend such interface. (Ignacio Vera)
* LUCENE-9194: Simplify XYShapeXQuery API by adding a new abstract class called XYGeometry. Queries are
  executed with input objects that extend such interface. (Ignacio Vera)

Maybe this "Other" item should be  "Optimization"? (not sure):
* LUCENE-9068: FuzzyQuery builds its Automaton up-front (Alan Woodward, Mike Drob)

Solr:

"New Features" that maybe should be "Improvements":
 * SOLR-13892: New "top-level" docValues join implementation (Jason Gerlowski, Joel Bernstein)
 * SOLR-14242: HdfsDirectory now supports indexing geo-points, ranges or shapes. (Adrien Grand)

"Improvements" that maybe should be "Optimizations":
* SOLR-13808: filter in BoolQParser and {"bool":{"filter":..}} in Query DSL are cached by default (Mikhail Khludnev)

"Improvements" that maybe should be "Other":
* SOLR-14114: Add WARN to Solr log that embedded ZK is not supported in production (janhoy)

Thoughts?

~ David Smiley
Apache Lucene/Solr Search Developer


--
Adrien
Reply | Threaded
Open this post in threaded view
|

Re: CHANGES.txt and issue categorization

Bruno Roustant
+1 to move these entries. And I agree with the categories definitions.

Le mer. 4 mars 2020 à 10:24, Adrien Grand <[hidden email]> a écrit :
+1 to move these entries.

On Wed, Mar 4, 2020 at 4:27 AM David Smiley <[hidden email]> wrote:
I'll simply move these items around tomorrow this time, unless I hear feedback to the contrary.

~ David Smiley
Apache Lucene/Solr Search Developer


On Mon, Mar 2, 2020 at 1:07 PM David Smiley <[hidden email]> wrote:
I'd like us to reflect on how we categorize issues in CHANGES.txt.  We have these categories:
(Lucene) 'API Changes', 'New Features', 'Improvements', 'Optimizations', 'Bug Fixes', 'Other'
(Solr) 'New Features', 'Improvements', 'Optimizations', 'Bug Fixes', 'Other Changes'
(I lifted these from dev-tools/scripts/addVersion.py line 215)

In particular, I'm often surprised at how some of us categorize New Features or Improvements that should better be categorized as something else.  I think the root cause of these problems may be that we don't have JIRA categories that directly align.  Furthermore, our dev practices will typically result in a CHANGES.txt being added out of band from the code-review process, and thus no peer-review on ideal placement.  Furthermore the message itself is often not code reviewed but should be.  Perhaps we can simply get in the habit of adding a JIRA comment (or GH code review) what we propose the category & issue summary should be.

Here is my attempt at a definition for _some_ of these categories.  I don't pretend to think we all agree 100% but it's up for discussion:
========
* New Features:  A user-visible new capability.  Usually opt-in.

* Improvements:  A user-visible improvement to an existing capability that somehow expands its ability or that which improves the behavior.  Not a refactoring, not an optimization.

* Optimizations: Something is now more efficient.  Usually automatic (not opt-in).

* Other:  Anything else: Refactorings, tests, build, docs, etc.  And adding log statements.
========

I recommend the following changes to Lucene 8.5:

These are "Improvements" that I think are better categorized as "Optimizations"
* LUCENE-9211: Add compression for Binary doc value fields. (Mark Harwood)
* LUCENE-4702: Better compression of terms dictionaries. (Adrien Grand)
* LUCENE-9228: Sort dvUpdates in the term order before applying if they all update a
  single field to the same value. This optimization can reduce the flush time by around
  20% for the docValues update user cases. (Nhat Nguyen, Adrien Grand, Simon Willnauer)
* LUCENE-9245: Reduce AutomatonTermsEnum memory usage. (Bruno Roustant, Robert Muir)
* LUCENE-9237: Faster UniformSplit intersect TermsEnum. (Bruno Roustant)

These "Improvements" I think are better categorized as "Other":
* LUCENE-9109: Backport some changes from master (except StackWalker) to improve
  TestSecurityManager (Uwe Schindler)
* LUCENE-9110: Backport refactored stack analysis in tests to use generalized
  LuceneTestCase methods (Uwe Schindler)
* LUCENE-9141: Simplify LatLonShapeXQuery API by adding a new abstract class called LatLonGeometry. Queries are
  executed with input objects that extend such interface. (Ignacio Vera)
* LUCENE-9194: Simplify XYShapeXQuery API by adding a new abstract class called XYGeometry. Queries are
  executed with input objects that extend such interface. (Ignacio Vera)

Maybe this "Other" item should be  "Optimization"? (not sure):
* LUCENE-9068: FuzzyQuery builds its Automaton up-front (Alan Woodward, Mike Drob)

Solr:

"New Features" that maybe should be "Improvements":
 * SOLR-13892: New "top-level" docValues join implementation (Jason Gerlowski, Joel Bernstein)
 * SOLR-14242: HdfsDirectory now supports indexing geo-points, ranges or shapes. (Adrien Grand)

"Improvements" that maybe should be "Optimizations":
* SOLR-13808: filter in BoolQParser and {"bool":{"filter":..}} in Query DSL are cached by default (Mikhail Khludnev)

"Improvements" that maybe should be "Other":
* SOLR-14114: Add WARN to Solr log that embedded ZK is not supported in production (janhoy)

Thoughts?

~ David Smiley
Apache Lucene/Solr Search Developer


--
Adrien
Reply | Threaded
Open this post in threaded view
|

Re: CHANGES.txt and issue categorization

Houston Putman
+1 to move the entries.

I would suggest that we document this organization somewhere though, so that future developers can adhere to the guidelines without finding this thread. I don't have a strong opinion on where this would go, maybe a legend at the top of CHANGES.txt or in the developer docs.

I do agree that the JIRA categories should be aligned at some point, as that would likely help a lot.

- Houston

On Thu, Mar 5, 2020 at 5:00 PM Bruno Roustant <[hidden email]> wrote:
+1 to move these entries. And I agree with the categories definitions.

Le mer. 4 mars 2020 à 10:24, Adrien Grand <[hidden email]> a écrit :
+1 to move these entries.

On Wed, Mar 4, 2020 at 4:27 AM David Smiley <[hidden email]> wrote:
I'll simply move these items around tomorrow this time, unless I hear feedback to the contrary.

~ David Smiley
Apache Lucene/Solr Search Developer


On Mon, Mar 2, 2020 at 1:07 PM David Smiley <[hidden email]> wrote:
I'd like us to reflect on how we categorize issues in CHANGES.txt.  We have these categories:
(Lucene) 'API Changes', 'New Features', 'Improvements', 'Optimizations', 'Bug Fixes', 'Other'
(Solr) 'New Features', 'Improvements', 'Optimizations', 'Bug Fixes', 'Other Changes'
(I lifted these from dev-tools/scripts/addVersion.py line 215)

In particular, I'm often surprised at how some of us categorize New Features or Improvements that should better be categorized as something else.  I think the root cause of these problems may be that we don't have JIRA categories that directly align.  Furthermore, our dev practices will typically result in a CHANGES.txt being added out of band from the code-review process, and thus no peer-review on ideal placement.  Furthermore the message itself is often not code reviewed but should be.  Perhaps we can simply get in the habit of adding a JIRA comment (or GH code review) what we propose the category & issue summary should be.

Here is my attempt at a definition for _some_ of these categories.  I don't pretend to think we all agree 100% but it's up for discussion:
========
* New Features:  A user-visible new capability.  Usually opt-in.

* Improvements:  A user-visible improvement to an existing capability that somehow expands its ability or that which improves the behavior.  Not a refactoring, not an optimization.

* Optimizations: Something is now more efficient.  Usually automatic (not opt-in).

* Other:  Anything else: Refactorings, tests, build, docs, etc.  And adding log statements.
========

I recommend the following changes to Lucene 8.5:

These are "Improvements" that I think are better categorized as "Optimizations"
* LUCENE-9211: Add compression for Binary doc value fields. (Mark Harwood)
* LUCENE-4702: Better compression of terms dictionaries. (Adrien Grand)
* LUCENE-9228: Sort dvUpdates in the term order before applying if they all update a
  single field to the same value. This optimization can reduce the flush time by around
  20% for the docValues update user cases. (Nhat Nguyen, Adrien Grand, Simon Willnauer)
* LUCENE-9245: Reduce AutomatonTermsEnum memory usage. (Bruno Roustant, Robert Muir)
* LUCENE-9237: Faster UniformSplit intersect TermsEnum. (Bruno Roustant)

These "Improvements" I think are better categorized as "Other":
* LUCENE-9109: Backport some changes from master (except StackWalker) to improve
  TestSecurityManager (Uwe Schindler)
* LUCENE-9110: Backport refactored stack analysis in tests to use generalized
  LuceneTestCase methods (Uwe Schindler)
* LUCENE-9141: Simplify LatLonShapeXQuery API by adding a new abstract class called LatLonGeometry. Queries are
  executed with input objects that extend such interface. (Ignacio Vera)
* LUCENE-9194: Simplify XYShapeXQuery API by adding a new abstract class called XYGeometry. Queries are
  executed with input objects that extend such interface. (Ignacio Vera)

Maybe this "Other" item should be  "Optimization"? (not sure):
* LUCENE-9068: FuzzyQuery builds its Automaton up-front (Alan Woodward, Mike Drob)

Solr:

"New Features" that maybe should be "Improvements":
 * SOLR-13892: New "top-level" docValues join implementation (Jason Gerlowski, Joel Bernstein)
 * SOLR-14242: HdfsDirectory now supports indexing geo-points, ranges or shapes. (Adrien Grand)

"Improvements" that maybe should be "Optimizations":
* SOLR-13808: filter in BoolQParser and {"bool":{"filter":..}} in Query DSL are cached by default (Mikhail Khludnev)

"Improvements" that maybe should be "Other":
* SOLR-14114: Add WARN to Solr log that embedded ZK is not supported in production (janhoy)

Thoughts?

~ David Smiley
Apache Lucene/Solr Search Developer


--
Adrien
Reply | Threaded
Open this post in threaded view
|

Re: CHANGES.txt and issue categorization

Jason Gerlowski
> Furthermore the message itself is often not code reviewed but should be.

+1 to that point especially.  I know on a few things I've worked on
that once you get down in the weeds of the implementation, tests, etc.
... coming up with a good high-level "why might a user care" sentence
can be tough.  We should push each other more on getting that
peer-reviewed for the sake of users who have to dig through
CHANGES.txt.

On Thu, Mar 5, 2020 at 5:17 PM Houston Putman <[hidden email]> wrote:

>
> +1 to move the entries.
>
> I would suggest that we document this organization somewhere though, so that future developers can adhere to the guidelines without finding this thread. I don't have a strong opinion on where this would go, maybe a legend at the top of CHANGES.txt or in the developer docs.
>
> I do agree that the JIRA categories should be aligned at some point, as that would likely help a lot.
>
> - Houston
>
> On Thu, Mar 5, 2020 at 5:00 PM Bruno Roustant <[hidden email]> wrote:
>>
>> +1 to move these entries. And I agree with the categories definitions.
>>
>> Le mer. 4 mars 2020 à 10:24, Adrien Grand <[hidden email]> a écrit :
>>>
>>> +1 to move these entries.
>>>
>>> On Wed, Mar 4, 2020 at 4:27 AM David Smiley <[hidden email]> wrote:
>>>>
>>>> I'll simply move these items around tomorrow this time, unless I hear feedback to the contrary.
>>>>
>>>> ~ David Smiley
>>>> Apache Lucene/Solr Search Developer
>>>> http://www.linkedin.com/in/davidwsmiley
>>>>
>>>>
>>>> On Mon, Mar 2, 2020 at 1:07 PM David Smiley <[hidden email]> wrote:
>>>>>
>>>>> I'd like us to reflect on how we categorize issues in CHANGES.txt.  We have these categories:
>>>>> (Lucene) 'API Changes', 'New Features', 'Improvements', 'Optimizations', 'Bug Fixes', 'Other'
>>>>> (Solr) 'New Features', 'Improvements', 'Optimizations', 'Bug Fixes', 'Other Changes'
>>>>> (I lifted these from dev-tools/scripts/addVersion.py line 215)
>>>>>
>>>>> In particular, I'm often surprised at how some of us categorize New Features or Improvements that should better be categorized as something else.  I think the root cause of these problems may be that we don't have JIRA categories that directly align.  Furthermore, our dev practices will typically result in a CHANGES.txt being added out of band from the code-review process, and thus no peer-review on ideal placement.  Furthermore the message itself is often not code reviewed but should be.  Perhaps we can simply get in the habit of adding a JIRA comment (or GH code review) what we propose the category & issue summary should be.
>>>>>
>>>>> Here is my attempt at a definition for _some_ of these categories.  I don't pretend to think we all agree 100% but it's up for discussion:
>>>>> ========
>>>>> * New Features:  A user-visible new capability.  Usually opt-in.
>>>>>
>>>>> * Improvements:  A user-visible improvement to an existing capability that somehow expands its ability or that which improves the behavior.  Not a refactoring, not an optimization.
>>>>>
>>>>> * Optimizations: Something is now more efficient.  Usually automatic (not opt-in).
>>>>>
>>>>> * Other:  Anything else: Refactorings, tests, build, docs, etc.  And adding log statements.
>>>>> ========
>>>>>
>>>>> I recommend the following changes to Lucene 8.5:
>>>>>
>>>>> These are "Improvements" that I think are better categorized as "Optimizations"
>>>>> * LUCENE-9211: Add compression for Binary doc value fields. (Mark Harwood)
>>>>> * LUCENE-4702: Better compression of terms dictionaries. (Adrien Grand)
>>>>> * LUCENE-9228: Sort dvUpdates in the term order before applying if they all update a
>>>>>   single field to the same value. This optimization can reduce the flush time by around
>>>>>   20% for the docValues update user cases. (Nhat Nguyen, Adrien Grand, Simon Willnauer)
>>>>> * LUCENE-9245: Reduce AutomatonTermsEnum memory usage. (Bruno Roustant, Robert Muir)
>>>>> * LUCENE-9237: Faster UniformSplit intersect TermsEnum. (Bruno Roustant)
>>>>>
>>>>> These "Improvements" I think are better categorized as "Other":
>>>>> * LUCENE-9109: Backport some changes from master (except StackWalker) to improve
>>>>>   TestSecurityManager (Uwe Schindler)
>>>>> * LUCENE-9110: Backport refactored stack analysis in tests to use generalized
>>>>>   LuceneTestCase methods (Uwe Schindler)
>>>>> * LUCENE-9141: Simplify LatLonShapeXQuery API by adding a new abstract class called LatLonGeometry. Queries are
>>>>>   executed with input objects that extend such interface. (Ignacio Vera)
>>>>> * LUCENE-9194: Simplify XYShapeXQuery API by adding a new abstract class called XYGeometry. Queries are
>>>>>   executed with input objects that extend such interface. (Ignacio Vera)
>>>>>
>>>>> Maybe this "Other" item should be  "Optimization"? (not sure):
>>>>> * LUCENE-9068: FuzzyQuery builds its Automaton up-front (Alan Woodward, Mike Drob)
>>>>>
>>>>> Solr:
>>>>>
>>>>> "New Features" that maybe should be "Improvements":
>>>>>  * SOLR-13892: New "top-level" docValues join implementation (Jason Gerlowski, Joel Bernstein)
>>>>>  * SOLR-14242: HdfsDirectory now supports indexing geo-points, ranges or shapes. (Adrien Grand)
>>>>>
>>>>> "Improvements" that maybe should be "Optimizations":
>>>>> * SOLR-13808: filter in BoolQParser and {"bool":{"filter":..}} in Query DSL are cached by default (Mikhail Khludnev)
>>>>>
>>>>> "Improvements" that maybe should be "Other":
>>>>> * SOLR-14114: Add WARN to Solr log that embedded ZK is not supported in production (janhoy)
>>>>>
>>>>> Thoughts?
>>>>>
>>>>> ~ David Smiley
>>>>> Apache Lucene/Solr Search Developer
>>>>> http://www.linkedin.com/in/davidwsmiley
>>>
>>>
>>>
>>> --
>>> Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]