[jira] Created: (LUCY-81) Object serialization

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (LUCY-81) Object serialization

Tim Allison (Jira)
Object serialization
--------------------

                 Key: LUCY-81
                 URL: https://issues.apache.org/jira/browse/LUCY-81
             Project: Lucy
          Issue Type: New Feature
          Components: Core
            Reporter: Marvin Humphrey
            Assignee: Marvin Humphrey


Objects are serialized to OutStreams and deserialized from InStreams.  Hooks
are provided for the Perl core serialization module Storable, so that e.g.
Storable::freeze($query) works as expected; hopefully it will prove practical
to hook into canonical serialization routines for other hosts as well.

The primary utility for serialization is communication between machines within
search clusters, so all classes that may need to be sent across the network
will eventually get serialization routines.  However, only Lucy installations
with exactly the same version can be guaranteed to serialize and deserialize
each others data; rolling updates are not supported.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCY-81) Object serialization

Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/LUCY-81?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marvin Humphrey updated LUCY-81:
--------------------------------

    Attachment: obj_serialize.diff

Intuitively, Obj_Serialize and Obj_Deserialize seems as though they should be
an abstract methods.  However, it is convenient if direct subclasses of Obj
from the host language are able to call super methods to perform simple
serialization, so a basic implementation is provided.


> Object serialization
> --------------------
>
>                 Key: LUCY-81
>                 URL: https://issues.apache.org/jira/browse/LUCY-81
>             Project: Lucy
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Marvin Humphrey
>            Assignee: Marvin Humphrey
>         Attachments: obj_serialize.diff
>
>
> Objects are serialized to OutStreams and deserialized from InStreams.  Hooks
> are provided for the Perl core serialization module Storable, so that e.g.
> Storable::freeze($query) works as expected; hopefully it will prove practical
> to hook into canonical serialization routines for other hosts as well.
> The primary utility for serialization is communication between machines within
> search clusters, so all classes that may need to be sent across the network
> will eventually get serialization routines.  However, only Lucy installations
> with exactly the same version can be guaranteed to serialize and deserialize
> each others data; rolling updates are not supported.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCY-81) Object serialization

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/LUCY-81?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marvin Humphrey updated LUCY-81:
--------------------------------

    Attachment: charbuf_serialization.diff

CharBuf's serialization routine is straightforward: a C32 indicating byte
size, followed by UTF-8 bytes, with a sanity check to ensure valid UTF-8.  The
sanity check throws an error rather than replace invalid sequences, as the
assumption is that the material was already in memory at least once before and
thus if corruption has occurred something is seriously wrong.

> Object serialization
> --------------------
>
>                 Key: LUCY-81
>                 URL: https://issues.apache.org/jira/browse/LUCY-81
>             Project: Lucy
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Marvin Humphrey
>            Assignee: Marvin Humphrey
>         Attachments: charbuf_serialization.diff, obj_serialize.diff
>
>
> Objects are serialized to OutStreams and deserialized from InStreams.  Hooks
> are provided for the Perl core serialization module Storable, so that e.g.
> Storable::freeze($query) works as expected; hopefully it will prove practical
> to hook into canonical serialization routines for other hosts as well.
> The primary utility for serialization is communication between machines within
> search clusters, so all classes that may need to be sent across the network
> will eventually get serialization routines.  However, only Lucy installations
> with exactly the same version can be guaranteed to serialize and deserialize
> each others data; rolling updates are not supported.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCY-81) Object serialization

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/LUCY-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785495#action_12785495 ]

Marvin Humphrey commented on LUCY-81:
-------------------------------------

Committed obj_serialize.diff as r886907.

> Object serialization
> --------------------
>
>                 Key: LUCY-81
>                 URL: https://issues.apache.org/jira/browse/LUCY-81
>             Project: Lucy
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Marvin Humphrey
>            Assignee: Marvin Humphrey
>         Attachments: charbuf_serialization.diff, obj_serialize.diff
>
>
> Objects are serialized to OutStreams and deserialized from InStreams.  Hooks
> are provided for the Perl core serialization module Storable, so that e.g.
> Storable::freeze($query) works as expected; hopefully it will prove practical
> to hook into canonical serialization routines for other hosts as well.
> The primary utility for serialization is communication between machines within
> search clusters, so all classes that may need to be sent across the network
> will eventually get serialization routines.  However, only Lucy installations
> with exactly the same version can be guaranteed to serialize and deserialize
> each others data; rolling updates are not supported.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCY-81) Object serialization

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/LUCY-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785499#action_12785499 ]

Marvin Humphrey commented on LUCY-81:
-------------------------------------

Committed charbuf_serialization.diff as r886910.

> Object serialization
> --------------------
>
>                 Key: LUCY-81
>                 URL: https://issues.apache.org/jira/browse/LUCY-81
>             Project: Lucy
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Marvin Humphrey
>            Assignee: Marvin Humphrey
>         Attachments: charbuf_serialization.diff, obj_serialize.diff
>
>
> Objects are serialized to OutStreams and deserialized from InStreams.  Hooks
> are provided for the Perl core serialization module Storable, so that e.g.
> Storable::freeze($query) works as expected; hopefully it will prove practical
> to hook into canonical serialization routines for other hosts as well.
> The primary utility for serialization is communication between machines within
> search clusters, so all classes that may need to be sent across the network
> will eventually get serialization routines.  However, only Lucy installations
> with exactly the same version can be guaranteed to serialize and deserialize
> each others data; rolling updates are not supported.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCY-81) Object serialization

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/LUCY-81?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marvin Humphrey updated LUCY-81:
--------------------------------

    Attachment: bytebuf_serialization.diff

ByteBuf's serialization routines are very similar to CharBuf's, just minus the UTF-8 sanity check and with the buffer allocation size quantizing to multiples of 8.

> Object serialization
> --------------------
>
>                 Key: LUCY-81
>                 URL: https://issues.apache.org/jira/browse/LUCY-81
>             Project: Lucy
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Marvin Humphrey
>            Assignee: Marvin Humphrey
>         Attachments: bytebuf_serialization.diff, charbuf_serialization.diff, obj_serialize.diff
>
>
> Objects are serialized to OutStreams and deserialized from InStreams.  Hooks
> are provided for the Perl core serialization module Storable, so that e.g.
> Storable::freeze($query) works as expected; hopefully it will prove practical
> to hook into canonical serialization routines for other hosts as well.
> The primary utility for serialization is communication between machines within
> search clusters, so all classes that may need to be sent across the network
> will eventually get serialization routines.  However, only Lucy installations
> with exactly the same version can be guaranteed to serialize and deserialize
> each others data; rolling updates are not supported.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCY-81) Object serialization

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/LUCY-81?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marvin Humphrey updated LUCY-81:
--------------------------------

    Attachment: freezer.diff

The freezer.diff patch adds Lucy::Util::Freezer, which provides two utility
routines for serializing arbitrary objects: FREEZE(Obj, OutStream) and
THAW(InStream).  These work by first prepending the class name to the
serialized data, then using that class name on thaw to find the right
deserialization function.

> Object serialization
> --------------------
>
>                 Key: LUCY-81
>                 URL: https://issues.apache.org/jira/browse/LUCY-81
>             Project: Lucy
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Marvin Humphrey
>            Assignee: Marvin Humphrey
>         Attachments: bytebuf_serialization.diff, charbuf_serialization.diff, freezer.diff, obj_serialize.diff
>
>
> Objects are serialized to OutStreams and deserialized from InStreams.  Hooks
> are provided for the Perl core serialization module Storable, so that e.g.
> Storable::freeze($query) works as expected; hopefully it will prove practical
> to hook into canonical serialization routines for other hosts as well.
> The primary utility for serialization is communication between machines within
> search clusters, so all classes that may need to be sent across the network
> will eventually get serialization routines.  However, only Lucy installations
> with exactly the same version can be guaranteed to serialize and deserialize
> each others data; rolling updates are not supported.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCY-81) Object serialization

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/LUCY-81?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marvin Humphrey updated LUCY-81:
--------------------------------

    Attachment: num_serialization.diff

The num_serialization.diff file adds serialization functionality to Integer32,
Integer64, Float32 and Float64.  As the routines use InStream and OutStream
methods such as OutStream_Write_Float32(), they should be just as compatible
across architectures.

> Object serialization
> --------------------
>
>                 Key: LUCY-81
>                 URL: https://issues.apache.org/jira/browse/LUCY-81
>             Project: Lucy
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Marvin Humphrey
>            Assignee: Marvin Humphrey
>         Attachments: bytebuf_serialization.diff, charbuf_serialization.diff, freezer.diff, num_serialization.diff, obj_serialize.diff
>
>
> Objects are serialized to OutStreams and deserialized from InStreams.  Hooks
> are provided for the Perl core serialization module Storable, so that e.g.
> Storable::freeze($query) works as expected; hopefully it will prove practical
> to hook into canonical serialization routines for other hosts as well.
> The primary utility for serialization is communication between machines within
> search clusters, so all classes that may need to be sent across the network
> will eventually get serialization routines.  However, only Lucy installations
> with exactly the same version can be guaranteed to serialize and deserialize
> each others data; rolling updates are not supported.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCY-81) Object serialization

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/LUCY-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785695#action_12785695 ]

Marvin Humphrey commented on LUCY-81:
-------------------------------------

Committed bytebuf_serialization.diff as r887022, freezer.diff as r887024, and
num_serialization.diff as r887025.

> Object serialization
> --------------------
>
>                 Key: LUCY-81
>                 URL: https://issues.apache.org/jira/browse/LUCY-81
>             Project: Lucy
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Marvin Humphrey
>            Assignee: Marvin Humphrey
>         Attachments: bytebuf_serialization.diff, charbuf_serialization.diff, freezer.diff, num_serialization.diff, obj_serialize.diff
>
>
> Objects are serialized to OutStreams and deserialized from InStreams.  Hooks
> are provided for the Perl core serialization module Storable, so that e.g.
> Storable::freeze($query) works as expected; hopefully it will prove practical
> to hook into canonical serialization routines for other hosts as well.
> The primary utility for serialization is communication between machines within
> search clusters, so all classes that may need to be sent across the network
> will eventually get serialization routines.  However, only Lucy installations
> with exactly the same version can be guaranteed to serialize and deserialize
> each others data; rolling updates are not supported.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCY-81) Object serialization

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/LUCY-81?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marvin Humphrey updated LUCY-81:
--------------------------------

    Attachment: hash_serialization.diff

Most Hash objects have string keys, so the serialization routines in
hash_serialization.diff special-case CharBuf keys as an optimization.  FREEZE
is used for all values and all non-CharBuf keys, which is more costly because
of the need to encode the class name.

> Object serialization
> --------------------
>
>                 Key: LUCY-81
>                 URL: https://issues.apache.org/jira/browse/LUCY-81
>             Project: Lucy
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Marvin Humphrey
>            Assignee: Marvin Humphrey
>         Attachments: bytebuf_serialization.diff, charbuf_serialization.diff, freezer.diff, hash_serialization.diff, num_serialization.diff, obj_serialize.diff
>
>
> Objects are serialized to OutStreams and deserialized from InStreams.  Hooks
> are provided for the Perl core serialization module Storable, so that e.g.
> Storable::freeze($query) works as expected; hopefully it will prove practical
> to hook into canonical serialization routines for other hosts as well.
> The primary utility for serialization is communication between machines within
> search clusters, so all classes that may need to be sent across the network
> will eventually get serialization routines.  However, only Lucy installations
> with exactly the same version can be guaranteed to serialize and deserialize
> each others data; rolling updates are not supported.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCY-81) Object serialization

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/LUCY-81?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marvin Humphrey updated LUCY-81:
--------------------------------

    Attachment: varray_serialization.diff

VArray serialization depends on FREEZE for each element.

> Object serialization
> --------------------
>
>                 Key: LUCY-81
>                 URL: https://issues.apache.org/jira/browse/LUCY-81
>             Project: Lucy
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Marvin Humphrey
>            Assignee: Marvin Humphrey
>         Attachments: bytebuf_serialization.diff, charbuf_serialization.diff, freezer.diff, hash_serialization.diff, num_serialization.diff, obj_serialize.diff, varray_serialization.diff
>
>
> Objects are serialized to OutStreams and deserialized from InStreams.  Hooks
> are provided for the Perl core serialization module Storable, so that e.g.
> Storable::freeze($query) works as expected; hopefully it will prove practical
> to hook into canonical serialization routines for other hosts as well.
> The primary utility for serialization is communication between machines within
> search clusters, so all classes that may need to be sent across the network
> will eventually get serialization routines.  However, only Lucy installations
> with exactly the same version can be guaranteed to serialize and deserialize
> each others data; rolling updates are not supported.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCY-81) Object serialization

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/LUCY-81?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marvin Humphrey updated LUCY-81:
--------------------------------

    Attachment: hash_serialization.diff

I committed a new version of hash_serialization.diff as r887252, and
varray_serialization.diff as r887255.

The new hash_serialization.diff adds a perl binding test file,
"trunk/perl/t/binding/017-hash.t".

> Object serialization
> --------------------
>
>                 Key: LUCY-81
>                 URL: https://issues.apache.org/jira/browse/LUCY-81
>             Project: Lucy
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Marvin Humphrey
>            Assignee: Marvin Humphrey
>         Attachments: bytebuf_serialization.diff, charbuf_serialization.diff, freezer.diff, hash_serialization.diff, hash_serialization.diff, num_serialization.diff, obj_serialize.diff, varray_serialization.diff
>
>
> Objects are serialized to OutStreams and deserialized from InStreams.  Hooks
> are provided for the Perl core serialization module Storable, so that e.g.
> Storable::freeze($query) works as expected; hopefully it will prove practical
> to hook into canonical serialization routines for other hosts as well.
> The primary utility for serialization is communication between machines within
> search clusters, so all classes that may need to be sent across the network
> will eventually get serialization routines.  However, only Lucy installations
> with exactly the same version can be guaranteed to serialize and deserialize
> each others data; rolling updates are not supported.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (LUCY-81) Object serialization

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/LUCY-81?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marvin Humphrey resolved LUCY-81.
---------------------------------

    Resolution: Fixed

That covers serialization for all current candidate classes.

As a footnote, this serialization implementation doesn't seek to solve the
problems associated with references or reference cycles.  If you have a Hash
or a VArray where the same object serves as a value for multiple slots, it
will be exploded to multiple objects on deserialization.  Similarly, reference
cycles are not detected and will trigger an infinite loop.

> Object serialization
> --------------------
>
>                 Key: LUCY-81
>                 URL: https://issues.apache.org/jira/browse/LUCY-81
>             Project: Lucy
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Marvin Humphrey
>            Assignee: Marvin Humphrey
>         Attachments: bytebuf_serialization.diff, charbuf_serialization.diff, freezer.diff, hash_serialization.diff, hash_serialization.diff, num_serialization.diff, obj_serialize.diff, varray_serialization.diff
>
>
> Objects are serialized to OutStreams and deserialized from InStreams.  Hooks
> are provided for the Perl core serialization module Storable, so that e.g.
> Storable::freeze($query) works as expected; hopefully it will prove practical
> to hook into canonical serialization routines for other hosts as well.
> The primary utility for serialization is communication between machines within
> search clusters, so all classes that may need to be sent across the network
> will eventually get serialization routines.  However, only Lucy installations
> with exactly the same version can be guaranteed to serialize and deserialize
> each others data; rolling updates are not supported.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.