Release date and language bindings

classic Classic list List threaded Threaded
6 messages Options
fgl
Reply | Threaded
Open this post in threaded view
|

Release date and language bindings

fgl
Hello,

I'm interested in trying out Lucy when it becomes available.  I've had a
look at the dev list and there doesn't appear to be a firm release date
(which is understandable, considering only one or two devs).

Just to get an idea, are we talking about a year or two, a few months,
weeks...?

In terms of the index format:  is it going to be Lucene compatible or
completely new (with similarities)?

Also, are there plans to implement language bindings so that (eg) Perl can
be used to index with and PHP used to search with (as with other engines
like www.xapian.org)?

Cheers
Reply | Threaded
Open this post in threaded view
|

Re: Release date and language bindings

Marvin Humphrey
On Wed, Nov 25, 2009 at 11:44:53AM +0200, fgl wrote:

> I'm interested in trying out Lucy when it becomes available.  

:)

> I've had a look at the dev list and there doesn't appear to be a firm
> release date (which is understandable, considering only one or two devs).
>
> Just to get an idea, are we talking about a year or two, a few months,
> weeks...?

Months.  It depends somewhat on how many new features get fast-tracked ahead
of finishing the port, but getting a larger community involved with the
software we're using is important to the people sponsoring my work on Lucy, so
I don't anticipate that slipping too much.

> In terms of the index format:  is it going to be Lucene compatible or
> completely new (with similarities)?

Completely new, with similarities.  The Lucene index format has many quirks
and elaborate optimizations and is impractical to implement unless you're
writing a nearly line-by-line port a la Lucene.NET or Clucene.  If anything,
there will be modules for Java Lucene to read Lucy indexes first.  We're
trying to emphasize simplicity in the file format design to aid such
interchange, for instance by encoding all metadata as JSON.

> Also, are there plans to implement language bindings so that (eg) Perl can
> be used to index with and PHP used to search with (as with other engines
> like www.xapian.org)?

Yes, that will be possible.  

Caveat: you'll need to check the documentation for your Analyzer to ensure
that it works independently of the host language.  For instance, some
tokenizers will use the host's regex engine, and there will be differences
between regex engines which will make indexes incompatible.  (Java Lucene has
similar issues when transitioning between JRE versions.)

Marvin Humphrey

fgl
Reply | Threaded
Open this post in threaded view
|

Re: Release date and language bindings

fgl
On Wed, Nov 25, 2009 at 8:36 PM, Marvin Humphrey <[hidden email]>
>
> > Just to get an idea, are we talking about a year or two, a few months,
> > weeks...?
>
> Months.  It depends somewhat on how many new features get fast-tracked


<3 or >=3?

:)
Reply | Threaded
Open this post in threaded view
|

Re: Release date and language bindings

Marvin Humphrey
On Wed, Nov 25, 2009 at 09:23:16PM +0200, fgl wrote:

> On Wed, Nov 25, 2009 at 8:36 PM, Marvin Humphrey <[hidden email]>
> >
> > > Just to get an idea, are we talking about a year or two, a few months,
> > > weeks...?
> >
> > Months.  It depends somewhat on how many new features get fast-tracked
>
>
> <3 or >=3?
>
> :)

I've learned from experience only to supply fixed estimates to paying clients
who have the power to set my agenda.  ;)

Let's go with >= 3.  :)

Marvin Humphrey

Reply | Threaded
Open this post in threaded view
|

Re: Release date and language bindings

Peter Karman
In reply to this post by fgl
fgl wrote on 11/25/09 3:44 AM:

> Also, are there plans to implement language bindings so that (eg) Perl can
> be used to index with and PHP used to search with (as with other engines
> like www.xapian.org)?

Just to clarify and expand on what Marvin said, it will be *possible* to have
language bindings for any dynamic language that supports extensions on top of C,
but Lucy is not like Xapian in the sense that there is a complete, core C (C++
in Xapian's case) library for which dynamic language bindings exist (which
Xapian mostly does with SWIG).

Instead (and Marvin should correct me if I state this wrong) Lucy will be a set
of core C code that a dynamic ("host" in Lucy parlance) language builds on top
of and for which the host language implements certain required API features that
the core code does not implement. The idea behind Lucy is to share common core C
code between host language IR library implementations, not to implement a
complete IR library in and of itself. There will, for example, need to be a C
host implementation that builds on top of the core Lucy code.

The central assumption of Lucy is that there will be implementations in various
languages like PHP. But as far as I can see, it is beyond the project scope of
Lucy to actually *implement* any of those host language bindings beyond Perl and
Ruby -- but again, Marvin should correct me here.

--
Peter Karman  .  http://peknet.com/  .  [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Release date and language bindings

Marvin Humphrey
On Wed, Nov 25, 2009 at 10:55:29PM -0600, Peter Karman wrote:
> Instead (and Marvin should correct me if I state this wrong) Lucy will be a set
> of core C code that a dynamic ("host" in Lucy parlance) language builds on top
> of and for which the host language implements certain required API features that
> the core code does not implement.

Yes.  However, the amount of code that is required for each language binding
has shrunk a lot from when Lucy was first proposed.  

We originally thought to have most public-facing classes implemented in the
host language, while maintaining a C core for the hairy inner classes.  That
scheme was going to require a lot of code duplication, but it would make it
easy for users to snoop the source code and would give them the power to
subclass.

Breakthroughs in the Lucy object model design permit a change in plans: since
users can now subclass Lucy classes written entirely in C from their host
language, there's little reason to write all that per-host search code.
Effectively, only the binding code itself remains as a requirement.

> There will, for example, need to be a C host implementation that builds on
> top of the core Lucy code.

Yes.

> The central assumption of Lucy is that there will be implementations in various
> languages like PHP. But as far as I can see, it is beyond the project scope of
> Lucy to actually *implement* any of those host language bindings beyond Perl and
> Ruby -- but again, Marvin should correct me here.

It's a lot more realistic now.  There's a bit of a question about who's
actually going to write those bindings, though.  It's still not a small
undertaking.  To get an idea of the scope, everything in the following dirs
from the KinoSearch repository would have to be ported:

  boilerplater/lib/Boilerplater/Binding/Perl
  perl/lib
  perl/t/core
  perl/t/binding
  perl/xs
  perl/buildlib

Some of that's pretty mechanical, e.g. most files under perl/lib and perl/t.
The boilerplater stuff, though, requires an author with both a basic grasp of
OO Perl (because that's how Boilerplater's implemented) and intimate
familiarity with the host's C interface.  

I might be good for Ruby, Python, and GNU C bindings, but that's the extent of
my ambitions for now.  Maybe I'd be up for writing the tough stuff under
boilerplater for other languages, because I find that a really interesting
problem domain -- but keeping all the bindings up to date for N languages is
more than I can handle.

I'd kind of like all bindings to live at Apache, though.  For the Perl
bindings, I went to a lot of trouble to supply APIs using labeled params rather
than positional args, because I consider that interface so superior, and I've
thought hard about how to get the documentation looking good.  I want Lucy to
meet high standards for API design and documentation quality, and it would bum
me out if something slapdash and non-portable hacked up with SWIG around the
GNU C bindings became the canonical Lucy interface from some languages.

Marvin Humphrey