[lucy-dev] Clownfish interfaces

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[lucy-dev] Clownfish interfaces

Nick Wellnhofer
Lucifers,

We have discussed Clownfish interfaces already, but here's a quick recap. In
order to support callbacks from Clownfish into the host language, we currently
rely on class-based inheritance. In the host language, a Clownfish class is
subclassed, adding method implementations written in the host language. The
Clownfish compiler creates wrappers that call into the host language for each
such method. This works for Perl, and should work for other dynamic languages
like Python or Ruby as well.

This approach obviously doesn't work for host languages that don't support
class-based inheritance, like Go or Rust. One possible solution is to add
support for OOP interfaces which are the typically used by these languages to
provide dynamic dispatch. Interfaces are also useful on the Clownfish side alone.

I researched how several languages implement interfaces to come up with a
performant solution suitable for the Clownfish object system. Basic operations
on interfaces include:

- Static casts from objects to interface types. (The cast is known
   to succeed at compile time.)
- Static casts from interface objects to the root object type.
- Dynamic casts from objects to interface types. (The object may
   not implement the target interface and the cast may fail.)
- Invocation of interface methods.

For some of my earlier thoughts, see CLOWNFISH-12:

     https://issues.apache.org/jira/browse/CLOWNFISH-12

Embedding multiple vtables (C++)
--------------------------------

C++ doesn't know interfaces as a distinct language concept, but they can be
easily emulated with multiple inheritance. An interface is simply a class
without member variables. How multiple inheritance is implemented in C depends
on the compiler, but the typical approach is described in Stroustrup's paper
"Multiple Inheritance for C++" (1999):

     http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.23.4735

The basic idea is to embed multiple vtables into each object and to implement
up- and downcasting by adjusting the pointer to an object by fixed offsets.
(The complications resulting from the diamond problem don't apply to
interfaces since they don't have member variables.)

We could use the same idea for Clownfish. Static casts only require to apply
an offset to the object pointer. Interface methods are invoked just like
normal methods. Dynamic casts are more costly. They're rather slow in some C++
implementations where the whole inheritance tree is searched. Another problem
is how to initialize the extra vtable pointers. Currently, we have a single
method Class_Make_Obj to initialize objects. With multiple vtable pointers, we
would need separate code paths for object initialization, or initialize
interface vtables lazily.

Fat pointers (Go, Rust)
-----------------------

The approach chosen by Go and Rust is to implement interface objects as "fat
pointers", that is a pair consisting of the pointer to the actual object and a
pointer to the interface vtable (itable) for dynamic dispatch. When casting an
object to interface type dynamically, Go requires a somewhat expensive hash
table lookup to find the itable. Rust doesn't support dynamic casts at all.
Static casts and method invocations are simple and fast.

The major downside is that fat pointers require double the storage space. This
is especially inconvenient for containers like arrays of interface objects.

Direct lookup (Java)
--------------------

Like with C++, the implementation of interfaces is up to the JVM, but I found
the paper "Efficient Implementation of Java Interfaces: Invokeinterface
Considered Harmless" (2001) to be enlightening:

     http://dl.acm.org/citation.cfm?id=504291
     http://www.research.ibm.com/people/d/dgrove/papers/oopsla01.pdf

Under this approach, there's no special representation for interface objects.
They're simply object pointers and casts are a no-op. When an interface method
is invoked, the method is looked up dynamically by name in a hash table. The
key to make this operation fast is to employ a hash table of function
pointers, pointing to small generated code stubs that invoke the actual
method, or, in the case of hash collisions, execute a short if-else-sequence.

I have no idea whether this is still the approach of choice in modern JVMs.
I'd be curious if anyone has additional pointers.

My proposal
-----------

I really like the simplicity of the direct lookup approach from the IBM paper
that makes no distinction between normal objects and interface objects and
avoids the overhead of fat pointers. But stubs that directly invoke interface
methods are impossible to implement in C without relying on
implementation-defined behavior or even assembler. It's also impossible to add
a default method to an interface, and invoke it on an object of a class from
an existing binary without recompilation. One of the important reasons for
default methods is to allow this kind of interface evolution.

But hash tables of function pointers with hard-coded collison resolution can
still be used to lookup itables efficiently. Here's how it works:

- In every Class struct there's a small hash table with maybe 4-8
   slots. When invoking an interface method, an index into this table
   is computed from the full name of the interface like
   "Lucy::Analysis::Analyzer" and the table size. The important
   observation is that this index is known at compile time. This means
   that the hash function and table size must always be the same for a
   certain Clownfish version, of course.

- The function pointer in the hash table is called. It will point to
   generated code that looks like

     lucy_Analyzer_ITABLE*
     MYPARCEL_MyAnalyzer_ITABLE_GETTER_3(cfish_Obj *obj,
                                         cfish_Interface *iface) {
         if (iface == LUCY_ANALYZER) {
             return MYPARCEL_MyAnalyzer_Analyzer_ITABLE;
         }
         else if (...) {
             // Possibly other interfaces implemented by MyAnalyzer
             // that happen to hash to the same slot. But typically,
             // there will be only a single interface.
         }
         else {
             // Other ways to handle errors are possible.
             CFISH_THROW(CFISH_ERR,
                         "Class %o doesn't implement interface %o",
                         CFISH_Obj_Get_Class_Name(obj);
                         CFISH_Interface_Get_Name(iface));
         }
     }

- MYPARCEL_MyAnalyzer_Analyzer_ITABLE is populated dynamically during
   Clownfish bootstrap, handling default methods and allowing
   interface evolution.

- The interface method is looked up in the returned itable, using
   a global offset variable, similar to normal Clownfish method
   dispatch.

Invoking interface methods this way should result in about twice the overhead
of normal method invocation. It's also possible to combine this with the fat
pointer approach, avoiding the itable lookup for subsequent interface method
calls.

Host language callbacks
-----------------------

I can see two approaches. The first one autogenerates a "host" class for every
interface. These classes contain a handle pointing to the host language object
(SV* for Perl, registry index for Go) and have autogenerated method
implementations that always call into the host language. Converting a host
language object to Clownfish requires an initial interface to be specified.
It's not possible to dynamically cast the resulting Clownfish object to other
interface types the host language object may implement.

In the second approach, there's only a single Clownfish class for host
objects. A host object can also be directly converted to Obj without
specifying an interface type. When casting the host object to an interface
type dynamically, the itable lookup is modified to check whether the host
object actually implements the interface, then returns the appropriate itable.
The problem with this approach is that it doesn't map to languages like Rust
(or C++ without RTTI) that don't support dynamic casts.

In both approaches, converting a host object to Clownfish always creates a new
Clownfish object. This is more expensive than the current Perl implementation
which caches the Clownfish object, but allows standard Perl objects containing
a blessed hash. Converting back to the host language simply extracts the handle.

To make default methods work, there's generated code on the host language side
of an interface that calls the Clownfish implementation directly. This means
that calling a default method from Clownfish results in a
Clownfish-Host-Clownfish roundtrip but this should be acceptable. This
overhead could be avoided for host languages that allow introspection by
creating a custom itable for every host-class/interface combination, similar
to the way we currently register Perl classes in the Clownfish registry. But I
don't think it's worth the effort.

Concluding remarks
------------------

If we decide to switch to interfaces for host language callbacks, I intend to
completely remove the ability to subclass Clownfish classes from languages
like Perl. This means to remove some really well-thought-out code, but if we
want to expand our scope to languages without class-based inheritance, we
shouldn't support features that only work for some host languages. It also
means that the Perl callback mechanism will become slower, though it probably
won't be noticable given how slow Perl method calls are. The fact that users
can write normal Perl classes without the need for inside-out member variables
weighs up for that.

I think I can implement the approach described above in a few months for a 0.7
release, including Perl and Go support. Unless there are major objections to
my plan, I'll just start on a separate branch. So if there are unforeseen
obstacles, we don't have to revert commits on the master branch.

We will have to redesign all Lucy classes that are meant to be overridable
from the host language. For that, I'll need feedback from the rest of the
community, especially Marvin.

I don't plan to support subinterfaces (interfaces implementing other
interface) in the first iteration, but this should be straightforward to add.
I also may omit features from the initial implementation that aren't necessary
for Lucy callbacks.

Nick
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-dev] Clownfish interfaces

Peter Karman
Nick,

Do not mean to warnock you re: this great email. You're obviously thinking
through these issues for a variety of emerging languages, and that's vital
to Lucy and Clownfish. I've no technical remarks, just encouragement as you
dig in.

On Fri, Dec 16, 2016 at 12:02 PM, Nick Wellnhofer <[hidden email]>
wrote:

> Lucifers,
>
> We have discussed Clownfish interfaces already, but here's a quick recap.
> In order to support callbacks from Clownfish into the host language, we
> currently rely on class-based inheritance. In the host language, a
> Clownfish class is subclassed, adding method implementations written in the
> host language. The Clownfish compiler creates wrappers that call into the
> host language for each such method. This works for Perl, and should work
> for other dynamic languages like Python or Ruby as well.
>
> This approach obviously doesn't work for host languages that don't support
> class-based inheritance, like Go or Rust. One possible solution is to add
> support for OOP interfaces which are the typically used by these languages
> to provide dynamic dispatch. Interfaces are also useful on the Clownfish
> side alone.
>
> I researched how several languages implement interfaces to come up with a
> performant solution suitable for the Clownfish object system. Basic
> operations on interfaces include:
>
> - Static casts from objects to interface types. (The cast is known
>   to succeed at compile time.)
> - Static casts from interface objects to the root object type.
> - Dynamic casts from objects to interface types. (The object may
>   not implement the target interface and the cast may fail.)
> - Invocation of interface methods.
>
> For some of my earlier thoughts, see CLOWNFISH-12:
>
>     https://issues.apache.org/jira/browse/CLOWNFISH-12
>
> Embedding multiple vtables (C++)
> --------------------------------
>
> C++ doesn't know interfaces as a distinct language concept, but they can
> be easily emulated with multiple inheritance. An interface is simply a
> class without member variables. How multiple inheritance is implemented in
> C depends on the compiler, but the typical approach is described in
> Stroustrup's paper "Multiple Inheritance for C++" (1999):
>
>     http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.23.4735
>
> The basic idea is to embed multiple vtables into each object and to
> implement up- and downcasting by adjusting the pointer to an object by
> fixed offsets. (The complications resulting from the diamond problem don't
> apply to interfaces since they don't have member variables.)
>
> We could use the same idea for Clownfish. Static casts only require to
> apply an offset to the object pointer. Interface methods are invoked just
> like normal methods. Dynamic casts are more costly. They're rather slow in
> some C++ implementations where the whole inheritance tree is searched.
> Another problem is how to initialize the extra vtable pointers. Currently,
> we have a single method Class_Make_Obj to initialize objects. With multiple
> vtable pointers, we would need separate code paths for object
> initialization, or initialize interface vtables lazily.
>
> Fat pointers (Go, Rust)
> -----------------------
>
> The approach chosen by Go and Rust is to implement interface objects as
> "fat pointers", that is a pair consisting of the pointer to the actual
> object and a pointer to the interface vtable (itable) for dynamic dispatch.
> When casting an object to interface type dynamically, Go requires a
> somewhat expensive hash table lookup to find the itable. Rust doesn't
> support dynamic casts at all. Static casts and method invocations are
> simple and fast.
>
> The major downside is that fat pointers require double the storage space.
> This is especially inconvenient for containers like arrays of interface
> objects.
>
> Direct lookup (Java)
> --------------------
>
> Like with C++, the implementation of interfaces is up to the JVM, but I
> found the paper "Efficient Implementation of Java Interfaces:
> Invokeinterface Considered Harmless" (2001) to be enlightening:
>
>     http://dl.acm.org/citation.cfm?id=504291
>     http://www.research.ibm.com/people/d/dgrove/papers/oopsla01.pdf
>
> Under this approach, there's no special representation for interface
> objects. They're simply object pointers and casts are a no-op. When an
> interface method is invoked, the method is looked up dynamically by name in
> a hash table. The key to make this operation fast is to employ a hash table
> of function pointers, pointing to small generated code stubs that invoke
> the actual method, or, in the case of hash collisions, execute a short
> if-else-sequence.
>
> I have no idea whether this is still the approach of choice in modern
> JVMs. I'd be curious if anyone has additional pointers.
>
> My proposal
> -----------
>
> I really like the simplicity of the direct lookup approach from the IBM
> paper that makes no distinction between normal objects and interface
> objects and avoids the overhead of fat pointers. But stubs that directly
> invoke interface methods are impossible to implement in C without relying
> on implementation-defined behavior or even assembler. It's also impossible
> to add a default method to an interface, and invoke it on an object of a
> class from an existing binary without recompilation. One of the important
> reasons for default methods is to allow this kind of interface evolution.
>
> But hash tables of function pointers with hard-coded collison resolution
> can still be used to lookup itables efficiently. Here's how it works:
>
> - In every Class struct there's a small hash table with maybe 4-8
>   slots. When invoking an interface method, an index into this table
>   is computed from the full name of the interface like
>   "Lucy::Analysis::Analyzer" and the table size. The important
>   observation is that this index is known at compile time. This means
>   that the hash function and table size must always be the same for a
>   certain Clownfish version, of course.
>
> - The function pointer in the hash table is called. It will point to
>   generated code that looks like
>
>     lucy_Analyzer_ITABLE*
>     MYPARCEL_MyAnalyzer_ITABLE_GETTER_3(cfish_Obj *obj,
>                                         cfish_Interface *iface) {
>         if (iface == LUCY_ANALYZER) {
>             return MYPARCEL_MyAnalyzer_Analyzer_ITABLE;
>         }
>         else if (...) {
>             // Possibly other interfaces implemented by MyAnalyzer
>             // that happen to hash to the same slot. But typically,
>             // there will be only a single interface.
>         }
>         else {
>             // Other ways to handle errors are possible.
>             CFISH_THROW(CFISH_ERR,
>                         "Class %o doesn't implement interface %o",
>                         CFISH_Obj_Get_Class_Name(obj);
>                         CFISH_Interface_Get_Name(iface));
>         }
>     }
>
> - MYPARCEL_MyAnalyzer_Analyzer_ITABLE is populated dynamically during
>   Clownfish bootstrap, handling default methods and allowing
>   interface evolution.
>
> - The interface method is looked up in the returned itable, using
>   a global offset variable, similar to normal Clownfish method
>   dispatch.
>
> Invoking interface methods this way should result in about twice the
> overhead of normal method invocation. It's also possible to combine this
> with the fat pointer approach, avoiding the itable lookup for subsequent
> interface method calls.
>
> Host language callbacks
> -----------------------
>
> I can see two approaches. The first one autogenerates a "host" class for
> every interface. These classes contain a handle pointing to the host
> language object (SV* for Perl, registry index for Go) and have
> autogenerated method implementations that always call into the host
> language. Converting a host language object to Clownfish requires an
> initial interface to be specified. It's not possible to dynamically cast
> the resulting Clownfish object to other interface types the host language
> object may implement.
>
> In the second approach, there's only a single Clownfish class for host
> objects. A host object can also be directly converted to Obj without
> specifying an interface type. When casting the host object to an interface
> type dynamically, the itable lookup is modified to check whether the host
> object actually implements the interface, then returns the appropriate
> itable. The problem with this approach is that it doesn't map to languages
> like Rust (or C++ without RTTI) that don't support dynamic casts.
>
> In both approaches, converting a host object to Clownfish always creates a
> new Clownfish object. This is more expensive than the current Perl
> implementation which caches the Clownfish object, but allows standard Perl
> objects containing a blessed hash. Converting back to the host language
> simply extracts the handle.
>
> To make default methods work, there's generated code on the host language
> side of an interface that calls the Clownfish implementation directly. This
> means that calling a default method from Clownfish results in a
> Clownfish-Host-Clownfish roundtrip but this should be acceptable. This
> overhead could be avoided for host languages that allow introspection by
> creating a custom itable for every host-class/interface combination,
> similar to the way we currently register Perl classes in the Clownfish
> registry. But I don't think it's worth the effort.
>
> Concluding remarks
> ------------------
>
> If we decide to switch to interfaces for host language callbacks, I intend
> to completely remove the ability to subclass Clownfish classes from
> languages like Perl. This means to remove some really well-thought-out
> code, but if we want to expand our scope to languages without class-based
> inheritance, we shouldn't support features that only work for some host
> languages. It also means that the Perl callback mechanism will become
> slower, though it probably won't be noticable given how slow Perl method
> calls are. The fact that users can write normal Perl classes without the
> need for inside-out member variables weighs up for that.
>
> I think I can implement the approach described above in a few months for a
> 0.7 release, including Perl and Go support. Unless there are major
> objections to my plan, I'll just start on a separate branch. So if there
> are unforeseen obstacles, we don't have to revert commits on the master
> branch.
>
> We will have to redesign all Lucy classes that are meant to be overridable
> from the host language. For that, I'll need feedback from the rest of the
> community, especially Marvin.
>
> I don't plan to support subinterfaces (interfaces implementing other
> interface) in the first iteration, but this should be straightforward to
> add. I also may omit features from the initial implementation that aren't
> necessary for Lucy callbacks.
>
> Nick
>



--
Peter Karman . [hidden email] . http://peknet.com/
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-dev] Clownfish interfaces

Marvin Humphrey
In reply to this post by Nick Wellnhofer
On Fri, Dec 16, 2016 at 10:02 AM, Nick Wellnhofer <[hidden email]> wrote:

> Embedding multiple vtables (C++)
> --------------------------------
>
> C++ doesn't know interfaces as a distinct language concept, but they can be
> easily emulated with multiple inheritance. An interface is simply a class
> without member variables. How multiple inheritance is implemented in C
> depends on the compiler, but the typical approach is described in
> Stroustrup's paper "Multiple Inheritance for C++" (1999):
>
>     http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.23.4735

The assertion of simplicity from the abstract for this paper has always been
hard for me to square with how crazy moving around pointers turns out to be in
practice. :\

  ... It is  also  widely  believed  that  multiple inheritance complicates a
  programming language significantly, is hard to implement, and is expensive
  to run.  I will demonstrate that none of these last three conjectures are
  true.

It makes all kinds of sense to me that languages like Java and C# saw where
C++ had gone and decided "single superclass inheritance only, please".

> The basic idea is to embed multiple vtables into each object and to
> implement up- and downcasting by adjusting the pointer to an object by fixed
> offsets. (The complications resulting from the diamond problem don't apply
> to interfaces since they don't have member variables.)

The C++ approach is still a problem for Clownfish because of refcounting.

    +--------------+
    | vtable1_ptr  |
    +--------------+
    | refcount     |
    +--------------+
    | ... vars ... |
    +--------------+
    | vtable2_ptr  |
    +--------------+

If a cast to interface type moves the pointer from the address of vtable1_ptr
to the address of vtable2_ptr, it's no longer valid to perform refcount
manipulation on the adjusted pointer.

> We could use the same idea for Clownfish. Static casts only require to apply
> an offset to the object pointer. Interface methods are invoked just like
> normal methods. Dynamic casts are more costly. They're rather slow in some
> C++ implementations where the whole inheritance tree is searched. Another
> problem is how to initialize the extra vtable pointers. Currently, we have a
> single method Class_Make_Obj to initialize objects. With multiple vtable
> pointers, we would need separate code paths for object initialization, or
> initialize interface vtables lazily.
>
> Fat pointers (Go, Rust)
> -----------------------
>
> The approach chosen by Go and Rust is to implement interface objects as "fat
> pointers", that is a pair consisting of the pointer to the actual object and
> a pointer to the interface vtable (itable) for dynamic dispatch. When
> casting an object to interface type dynamically, Go requires a somewhat
> expensive hash table lookup to find the itable. Rust doesn't support dynamic
> casts at all. Static casts and method invocations are simple and fast.
>
> The major downside is that fat pointers require double the storage space.
> This is especially inconvenient for containers like arrays of interface
> objects.

Ruminating on my biases... I have found "fat pointers" hard to get used to
because, honestly, I'm accustomed to casting of objects and containers without
reallocation, which imposes the requirement that objects be pointers and that
all pointers be the same size.  Being able to cast `Obj**` to `Query**`
without cost or with only CPU cost for run-time type checking is something
that seems very natural.

By now, I've done enough Go programming that the construct isn't as foreign.
Still, the constraint that native Clownfish objects are struct pointers seems
reasonable.

> Direct lookup (Java)
> --------------------
>
> Like with C++, the implementation of interfaces is up to the JVM, but I
> found the paper "Efficient Implementation of Java Interfaces:
> Invokeinterface Considered Harmless" (2001) to be enlightening:
>
>     http://dl.acm.org/citation.cfm?id=504291
>     http://www.research.ibm.com/people/d/dgrove/papers/oopsla01.pdf
>
> Under this approach, there's no special representation for interface
> objects. They're simply object pointers and casts are a no-op. When an
> interface method is invoked, the method is looked up dynamically by name in
> a hash table. The key to make this operation fast is to employ a hash table
> of function pointers, pointing to small generated code stubs that invoke the
> actual method, or, in the case of hash collisions, execute a short
> if-else-sequence.
>
> I have no idea whether this is still the approach of choice in modern JVMs.
> I'd be curious if anyone has additional pointers.

This overview should be helpful:

    https://en.wikipedia.org/wiki/Java_performance

> My proposal
> -----------
>
> I really like the simplicity of the direct lookup approach from the IBM
> paper that makes no distinction between normal objects and interface objects
> and avoids the overhead of fat pointers. But stubs that directly invoke
> interface methods are impossible to implement in C without relying on
> implementation-defined behavior or even assembler. It's also impossible to
> add a default method to an interface, and invoke it on an object of a class
> from an existing binary without recompilation. One of the important reasons
> for default methods is to allow this kind of interface evolution.
>
> But hash tables of function pointers with hard-coded collison resolution can
> still be used to lookup itables efficiently. Here's how it works:
>
> - In every Class struct there's a small hash table with maybe 4-8
>   slots. When invoking an interface method, an index into this table
>   is computed from the full name of the interface like
>   "Lucy::Analysis::Analyzer" and the table size. The important
>   observation is that this index is known at compile time. This means
>   that the hash function and table size must always be the same for a
>   certain Clownfish version, of course.
>
> - The function pointer in the hash table is called. It will point to
>   generated code that looks like
>
>     lucy_Analyzer_ITABLE*
>     MYPARCEL_MyAnalyzer_ITABLE_GETTER_3(cfish_Obj *obj,
>                                         cfish_Interface *iface) {
>         if (iface == LUCY_ANALYZER) {
>             return MYPARCEL_MyAnalyzer_Analyzer_ITABLE;
>         }
>         else if (...) {
>             // Possibly other interfaces implemented by MyAnalyzer
>             // that happen to hash to the same slot. But typically,
>             // there will be only a single interface.
>         }
>         else {
>             // Other ways to handle errors are possible.
>             CFISH_THROW(CFISH_ERR,
>                         "Class %o doesn't implement interface %o",
>                         CFISH_Obj_Get_Class_Name(obj);
>                         CFISH_Interface_Get_Name(iface));
>         }
>     }

I don't think this changes the calculus much, but I anticipate that classes
will often satisfy multiple interfaces.  In addition to satisfying one primary
interface, classes might also satisfy interfaces for serialization,
comparison, etc.

> - MYPARCEL_MyAnalyzer_Analyzer_ITABLE is populated dynamically during
>   Clownfish bootstrap, handling default methods and allowing
>   interface evolution.
>
> - The interface method is looked up in the returned itable, using
>   a global offset variable, similar to normal Clownfish method
>   dispatch.
>
> Invoking interface methods this way should result in about twice the
> overhead of normal method invocation. It's also possible to combine this
> with the fat pointer approach, avoiding the itable lookup for subsequent
> interface method calls.

The general approach to itable lookup seems workable to me.  +1

> Host language callbacks
> -----------------------
>
> I can see two approaches. The first one autogenerates a "host" class for
> every interface. These classes contain a handle pointing to the host
> language object (SV* for Perl, registry index for Go) and have autogenerated
> method implementations that always call into the host language. Converting a
> host language object to Clownfish requires an initial interface to be
> specified. It's not possible to dynamically cast the resulting Clownfish
> object to other interface types the host language object may implement.
>
> In the second approach, there's only a single Clownfish class for host
> objects. A host object can also be directly converted to Obj without
> specifying an interface type. When casting the host object to an interface
> type dynamically, the itable lookup is modified to check whether the host
> object actually implements the interface, then returns the appropriate
> itable. The problem with this approach is that it doesn't map to languages
> like Rust (or C++ without RTTI) that don't support dynamic casts.
>
> In both approaches, converting a host object to Clownfish always creates a
> new Clownfish object. This is more expensive than the current Perl
> implementation which caches the Clownfish object, but allows standard Perl
> objects containing a blessed hash. Converting back to the host language
> simply extracts the handle.
>
> To make default methods work, there's generated code on the host language
> side of an interface that calls the Clownfish implementation directly. This
> means that calling a default method from Clownfish results in a
> Clownfish-Host-Clownfish roundtrip but this should be acceptable. This
> overhead could be avoided for host languages that allow introspection by
> creating a custom itable for every host-class/interface combination, similar
> to the way we currently register Perl classes in the Clownfish registry. But
> I don't think it's worth the effort.

The first approach seems closer to what Clownfish does now.  I think that's
what you're leaning towards, right?  +1

> Concluding remarks
> ------------------
>
> If we decide to switch to interfaces for host language callbacks, I intend
> to completely remove the ability to subclass Clownfish classes from
> languages like Perl. This means to remove some really well-thought-out code,
> but if we want to expand our scope to languages without class-based
> inheritance, we shouldn't support features that only work for some host
> languages.

+1 to move forward.

Regarding backwards compatibility for Lucy, there is a certain amout of
functionality which currently requires subclassing and overriding of methods.
Off the top of my head, QueryParser, IndexManager, Schema, FieldType and
Similarity all have such methods.

A general technique to solve this problem is to use composition: instead of
expecting people to subclass IndexManager and override Recycle(), create a
SegmentRecycler interface and allow customization through supplying a custom
SegmentRecycler to IndexManager.

Ultimately, as Clownfish evolves, IndexManager has to stop calling Recycle()
on itself, which will break compat for host overriding via subclassing
IndexManager.  One option to avoid disruption is to introduce SegmentRecycler
in one Lucy release and deprecate the callback functionality of Recycle(),
then remove the callback functionality of IndexManager in a later release.

> It also means that the Perl callback mechanism will become
> slower, though it probably won't be noticable given how slow Perl method
> calls are. The fact that users can write normal Perl classes without the
> need for inside-out member variables weighs up for that.

+1 to make this tradeoff.

> I think I can implement the approach described above in a few months for a
> 0.7 release, including Perl and Go support. Unless there are major
> objections to my plan, I'll just start on a separate branch. So if there are
> unforeseen obstacles, we don't have to revert commits on the master branch.
>
> We will have to redesign all Lucy classes that are meant to be overridable
> from the host language. For that, I'll need feedback from the rest of the
> community, especially Marvin.

I anticipate that the result will be occasionally inelegant Lucy code but
that's fine.

> I don't plan to support subinterfaces (interfaces implementing other
> interface) in the first iteration, but this should be straightforward to
> add. I also may omit features from the initial implementation that aren't
> necessary for Lucy callbacks.

Thank you for the well-crafted and thought-provoking proposal, Nick!

Marvin Humphrey
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-dev] Clownfish interfaces

Marvin Humphrey
In reply to this post by Peter Karman
On Thu, Dec 22, 2016 at 12:17 PM, Peter Karman <[hidden email]> wrote:
> Nick,
>
> Do not mean to warnock you re: this great email. You're obviously thinking
> through these issues for a variety of emerging languages, and that's vital
> to Lucy and Clownfish. I've no technical remarks, just encouragement as you
> dig in.

    http://bikeshed.com/

    Parkinson shows how you can go in to the board of directors and get
    approval for building a multi-million or even billion dollar atomic power
    plant, but if you want to build a bike shed you will be tangled up in
    endless discussions.

Nick's proposal is further towards the "atomic power plant" end of the scale,
and offering feedback worthy of it is challenging. :)

Marvin Humphrey
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-dev] Clownfish interfaces

Nick Wellnhofer
In reply to this post by Marvin Humphrey
On 28/12/2016 03:39, Marvin Humphrey wrote:
> Ruminating on my biases... I have found "fat pointers" hard to get used to
> because, honestly, I'm accustomed to casting of objects and containers without
> reallocation,

Even if fat pointers are two-element structs, they can be passed by value.
There's no need for dynamic allocation.

> which imposes the requirement that objects be pointers and that
> all pointers be the same size.  Being able to cast `Obj**` to `Query**`
> without cost or with only CPU cost for run-time type checking is something
> that seems very natural.
>
> By now, I've done enough Go programming that the construct isn't as foreign.
> Still, the constraint that native Clownfish objects are struct pointers seems
> reasonable.

I still like the idea of fat pointers as an additional optimization. But it
would be confusing to require an asterisk for normal objects (Obj*) but not
for interface objects.

> This overview should be helpful:
>
>     https://en.wikipedia.org/wiki/Java_performance

This page seems to indicate that HotSpot uses a linear search over an array:

     https://wiki.openjdk.java.net/display/HotSpot/InterfaceCalls

But this probably isn't a performance problem for JVMs because of inline caching.

> The first approach seems closer to what Clownfish does now.  I think that's
> what you're leaning towards, right?  +1

I'd prefer the second approach but to accommodate languages like Rust, we'll
probably have to go with the first option.

> Regarding backwards compatibility for Lucy, there is a certain amout of
> functionality which currently requires subclassing and overriding of methods.
> Off the top of my head, QueryParser, IndexManager, Schema, FieldType and
> Similarity all have such methods.

We should start with a list of classes that we officially allow to be
subclassed now and that need to be reworked. In addition to the ones you
mentioned, there are Analyzer, Highlighter, and Query/Compiler/Matcher.

> A general technique to solve this problem is to use composition: instead of
> expecting people to subclass IndexManager and override Recycle(), create a
> SegmentRecycler interface and allow customization through supplying a custom
> SegmentRecycler to IndexManager.

Another problem is that Deserialize and Load currently work on a blank object
created with Class_Make_Obj. This would break with host-language interface
objects but could be solved elegantly with class methods, i. e. methods that
dynamically dispatch on a Class object. Instead of

     Query *query = (Query*)Class_Make_Obj(query_class);
     query = Query_Deserialize(query, instream);

we would write

     Query *query = Query_Deserialize(query_class);

Many dynamic languages like Smalltalk, Objective-C, Perl, Python, or Ruby
support class methods, and it shouldn't be hard to support them in the
Clownfish core. For languages like Go or Rust, we could create a separate
interface hierarchy for class methods that are dispatched on singleton class
objects. See this commit which is also useful for conversion from Clownfish to Go:

 
https://github.com/nwellnhof/lucy-clownfish/commit/30ed13800d10a3ff551a9f23ec288a04d5516911

Nick