[jira] Created: (LUCENE-2453) Make Index Output Buffer Size Configurable

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (LUCENE-2453) Make Index Output Buffer Size Configurable

JIRA jira@apache.org
Make Index Output Buffer Size Configurable
------------------------------------------

                 Key: LUCENE-2453
                 URL: https://issues.apache.org/jira/browse/LUCENE-2453
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Store
    Affects Versions: 3.0.1
            Reporter: Karthick Sankarachary


Currently, the buffered index input class allows sub-classes and users thereof to specify a size for the input buffer, which by default is 1024 bytes. In practice, this option is leveraged by the simple file and compound segment index input sub-classes.

By the same token, it would be nice if the buffered index output class could open up it's buffer size for users to configure. In particular, this would allow sub-classes thereof to align the output buffer size, which by default is 16348 bytes, to that of the underlying directory's data unit. For example, a network-based directory might want to buffer data in multiples of it's maximum transmission unit. To use an existing use-case, the file system-based directory could potentially choose to align it's output buffer size to the operating system's file block size.

The proposed change to the buffered index output class involves defining a one-arg constructor that takes a user-defined buffer size, and a default constructor that uses the currently defined buffer size.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-2453) Make Index Output Buffer Size Configurable

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karthick Sankarachary updated LUCENE-2453:
------------------------------------------

    Attachment: LUCENE-2453.patch

> Make Index Output Buffer Size Configurable
> ------------------------------------------
>
>                 Key: LUCENE-2453
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2453
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 3.0.1
>            Reporter: Karthick Sankarachary
>         Attachments: LUCENE-2453.patch
>
>
> Currently, the buffered index input class allows sub-classes and users thereof to specify a size for the input buffer, which by default is 1024 bytes. In practice, this option is leveraged by the simple file and compound segment index input sub-classes.
> By the same token, it would be nice if the buffered index output class could open up it's buffer size for users to configure. In particular, this would allow sub-classes thereof to align the output buffer size, which by default is 16348 bytes, to that of the underlying directory's data unit. For example, a network-based directory might want to buffer data in multiples of it's maximum transmission unit. To use an existing use-case, the file system-based directory could potentially choose to align it's output buffer size to the operating system's file block size.
> The proposed change to the buffered index output class involves defining a one-arg constructor that takes a user-defined buffer size, and a default constructor that uses the currently defined buffer size.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2453) Make Index Output Buffer Size Configurable

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865688#action_12865688 ]

Shai Erera commented on LUCENE-2453:
------------------------------------

Patch looks good ! Few comments:
* buffer can still be final (and should) since it's only initialized in the ctor
* I'd inline checkBufferSize in the ctor
* I think that adding the same level of control to BufferedIndexInput would be useful too?

In general, I think the size of the buffer (1024) is set like that because larger buffer sizes did not improve the performance. Can you perhaps run on the benchmark indexing algorithms, w/ the buffer size set to larger values and report the results? It'd be interesting to note if there are any improvements before we open up the API like that.

> Make Index Output Buffer Size Configurable
> ------------------------------------------
>
>                 Key: LUCENE-2453
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2453
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 3.0.1
>            Reporter: Karthick Sankarachary
>         Attachments: LUCENE-2453.patch
>
>
> Currently, the buffered index input class allows sub-classes and users thereof to specify a size for the input buffer, which by default is 1024 bytes. In practice, this option is leveraged by the simple file and compound segment index input sub-classes.
> By the same token, it would be nice if the buffered index output class could open up it's buffer size for users to configure. In particular, this would allow sub-classes thereof to align the output buffer size, which by default is 16348 bytes, to that of the underlying directory's data unit. For example, a network-based directory might want to buffer data in multiples of it's maximum transmission unit. To use an existing use-case, the file system-based directory could potentially choose to align it's output buffer size to the operating system's file block size.
> The proposed change to the buffered index output class involves defining a one-arg constructor that takes a user-defined buffer size, and a default constructor that uses the currently defined buffer size.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2453) Make Index Output Buffer Size Configurable

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866780#action_12866780 ]

Karthick Sankarachary commented on LUCENE-2453:
-----------------------------------------------

Hi Shai,

To answer your comments:

    *  buffer can still be final (and should) since it's only initialized in the ctor

[K] Agreed. It's not like we want to allow the size of the buffer to be changed once it has been instantiated.

    * I'd inline checkBufferSize in the ctor

[K] Done. Again, we only need to check the buffer size one time in the ctor.

    * I think that adding the same level of control to BufferedIndexInput would be useful too?

[K] Actually, the BufferedIndexInput already allows this level of control, and then some. In fact, I plagiarized the #checkBufferSize method from that class, where it is used twice, once in the ctor, and then again in the #setBufferSize method. In theory, we could allow the size of the BufferedIndexOutput's buffer to be reset as well, but in case the buffer is made smaller, we'll have to take care to flush some of the "older" bytes that no longer fit in the buffer. IMO, that was not worth the risk and hassle.

I will update the patch momentarily based on the comments above, and keep you posted on the benchmark results.

Regards,
Karthick

> Make Index Output Buffer Size Configurable
> ------------------------------------------
>
>                 Key: LUCENE-2453
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2453
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 3.0.1
>            Reporter: Karthick Sankarachary
>         Attachments: LUCENE-2453.patch
>
>
> Currently, the buffered index input class allows sub-classes and users thereof to specify a size for the input buffer, which by default is 1024 bytes. In practice, this option is leveraged by the simple file and compound segment index input sub-classes.
> By the same token, it would be nice if the buffered index output class could open up it's buffer size for users to configure. In particular, this would allow sub-classes thereof to align the output buffer size, which by default is 16348 bytes, to that of the underlying directory's data unit. For example, a network-based directory might want to buffer data in multiples of it's maximum transmission unit. To use an existing use-case, the file system-based directory could potentially choose to align it's output buffer size to the operating system's file block size.
> The proposed change to the buffered index output class involves defining a one-arg constructor that takes a user-defined buffer size, and a default constructor that uses the currently defined buffer size.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-2453) Make Index Output Buffer Size Configurable

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karthick Sankarachary updated LUCENE-2453:
------------------------------------------

    Attachment:     (was: LUCENE-2453.patch)

> Make Index Output Buffer Size Configurable
> ------------------------------------------
>
>                 Key: LUCENE-2453
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2453
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 3.0.1
>            Reporter: Karthick Sankarachary
>         Attachments: LUCENE-2453.patch
>
>
> Currently, the buffered index input class allows sub-classes and users thereof to specify a size for the input buffer, which by default is 1024 bytes. In practice, this option is leveraged by the simple file and compound segment index input sub-classes.
> By the same token, it would be nice if the buffered index output class could open up it's buffer size for users to configure. In particular, this would allow sub-classes thereof to align the output buffer size, which by default is 16348 bytes, to that of the underlying directory's data unit. For example, a network-based directory might want to buffer data in multiples of it's maximum transmission unit. To use an existing use-case, the file system-based directory could potentially choose to align it's output buffer size to the operating system's file block size.
> The proposed change to the buffered index output class involves defining a one-arg constructor that takes a user-defined buffer size, and a default constructor that uses the currently defined buffer size.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-2453) Make Index Output Buffer Size Configurable

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karthick Sankarachary updated LUCENE-2453:
------------------------------------------

    Attachment: LUCENE-2453.patch

> Make Index Output Buffer Size Configurable
> ------------------------------------------
>
>                 Key: LUCENE-2453
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2453
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 3.0.1
>            Reporter: Karthick Sankarachary
>         Attachments: LUCENE-2453.patch
>
>
> Currently, the buffered index input class allows sub-classes and users thereof to specify a size for the input buffer, which by default is 1024 bytes. In practice, this option is leveraged by the simple file and compound segment index input sub-classes.
> By the same token, it would be nice if the buffered index output class could open up it's buffer size for users to configure. In particular, this would allow sub-classes thereof to align the output buffer size, which by default is 16348 bytes, to that of the underlying directory's data unit. For example, a network-based directory might want to buffer data in multiples of it's maximum transmission unit. To use an existing use-case, the file system-based directory could potentially choose to align it's output buffer size to the operating system's file block size.
> The proposed change to the buffered index output class involves defining a one-arg constructor that takes a user-defined buffer size, and a default constructor that uses the currently defined buffer size.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2453) Make Index Output Buffer Size Configurable

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885081#action_12885081 ]

Andrzej Bialecki  commented on LUCENE-2453:
-------------------------------------------

Karthick, I'm interested in moving forward with this and LUCENE-2456. Could you perhaps prepare an updated patch that incorporates the comments above?

> Make Index Output Buffer Size Configurable
> ------------------------------------------
>
>                 Key: LUCENE-2453
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2453
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 3.0.1
>            Reporter: Karthick Sankarachary
>         Attachments: LUCENE-2453.patch
>
>
> Currently, the buffered index input class allows sub-classes and users thereof to specify a size for the input buffer, which by default is 1024 bytes. In practice, this option is leveraged by the simple file and compound segment index input sub-classes.
> By the same token, it would be nice if the buffered index output class could open up it's buffer size for users to configure. In particular, this would allow sub-classes thereof to align the output buffer size, which by default is 16348 bytes, to that of the underlying directory's data unit. For example, a network-based directory might want to buffer data in multiples of it's maximum transmission unit. To use an existing use-case, the file system-based directory could potentially choose to align it's output buffer size to the operating system's file block size.
> The proposed change to the buffered index output class involves defining a one-arg constructor that takes a user-defined buffer size, and a default constructor that uses the currently defined buffer size.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2453) Make Index Output Buffer Size Configurable

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885305#action_12885305 ]

Karthick Sankarachary commented on LUCENE-2453:
-----------------------------------------------

Hi Andrzej, The patch has already been updated to incorporate the comments above. Please let me know if you need anything else.

> Make Index Output Buffer Size Configurable
> ------------------------------------------
>
>                 Key: LUCENE-2453
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2453
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 3.0.1
>            Reporter: Karthick Sankarachary
>         Attachments: LUCENE-2453.patch
>
>
> Currently, the buffered index input class allows sub-classes and users thereof to specify a size for the input buffer, which by default is 1024 bytes. In practice, this option is leveraged by the simple file and compound segment index input sub-classes.
> By the same token, it would be nice if the buffered index output class could open up it's buffer size for users to configure. In particular, this would allow sub-classes thereof to align the output buffer size, which by default is 16348 bytes, to that of the underlying directory's data unit. For example, a network-based directory might want to buffer data in multiples of it's maximum transmission unit. To use an existing use-case, the file system-based directory could potentially choose to align it's output buffer size to the operating system's file block size.
> The proposed change to the buffered index output class involves defining a one-arg constructor that takes a user-defined buffer size, and a default constructor that uses the currently defined buffer size.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2453) Make Index Output Buffer Size Configurable

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891946#action_12891946 ]

Uwe Schindler commented on LUCENE-2453:
---------------------------------------

bq. Hi Andrzej, The patch has already been updated to incorporate the comments above. Please let me know if you need anything else.

Karthick , you should not delete old patches from the issue, as it makes it hard to follow the issue. Just upload the new patch with same filename and JIRA will automatically gray the old one out, but its still visible.

> Make Index Output Buffer Size Configurable
> ------------------------------------------
>
>                 Key: LUCENE-2453
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2453
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 3.0.1
>            Reporter: Karthick Sankarachary
>         Attachments: LUCENE-2453.patch
>
>
> Currently, the buffered index input class allows sub-classes and users thereof to specify a size for the input buffer, which by default is 1024 bytes. In practice, this option is leveraged by the simple file and compound segment index input sub-classes.
> By the same token, it would be nice if the buffered index output class could open up it's buffer size for users to configure. In particular, this would allow sub-classes thereof to align the output buffer size, which by default is 16348 bytes, to that of the underlying directory's data unit. For example, a network-based directory might want to buffer data in multiples of it's maximum transmission unit. To use an existing use-case, the file system-based directory could potentially choose to align it's output buffer size to the operating system's file block size.
> The proposed change to the buffered index output class involves defining a one-arg constructor that takes a user-defined buffer size, and a default constructor that uses the currently defined buffer size.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]