[lucy-dev] Rolling out your own analyzer.

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

[lucy-dev] Rolling out your own analyzer.

Bruno Albuquerque
This is mostly out of curiosity than anything else. If I have the
Lucy/Clownfish C headers and respective libraries, is it possible to roll
my own Analyzer and use it with Lucy? Is there any documentation about
doing something like that? It would be best if it did not involve having to
run cfc to achieve that, but I am ok with that if it can not be avoided.

FYI, i did a simple test. I copied the EazyAnalyzer code (.c and generated
,h), renamed it to something else (say: MyAnalyzer) and did a find and
replace in both .c and .h file to change everything. I tried to compile it
and first it could not find lucy_MyAnalyzer (this was a typedef in
lycy_parcel.h, so manually added it to the .c file), then I had to fix
includes (remove ToolBox.h header, for example) and, after that, there were
still symbols that could not be found (for example the ones with _OFFSET in
their names).

Thanks in advance.
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-dev] Rolling out your own analyzer.

Marvin Humphrey
On Mon, Mar 30, 2015 at 1:28 PM, Bruno Albuquerque <[hidden email]> wrote:
> This is mostly out of curiosity than anything else. If I have the
> Lucy/Clownfish C headers and respective libraries, is it possible to roll
> my own Analyzer and use it with Lucy? Is there any documentation about
> doing something like that? It would be best if it did not involve having to
> run cfc to achieve that, but I am ok with that if it can not be avoided.

Subtyping from C is not officially supported because we have not worked out
a stable API for it.  Subclassing Analyzer is not officially supported but for
a different reason: Lucy's array-based model for token processing appears to
be inferior to a stream-based model and the API was redacted in order to give
us the option of changing it.

Nevertheless, I can provide you with undocumented hacks which achieve your
ends.  The interface is not yet elegant, but conversations like this will lead
to improving it.

    typedef struct MyAnalyzer MyAnalyzer;

    // Transform() is the central Analyzer method.  Check out the
    // documentation in Analyzer.cfh and various implementations; that
    // should give you enough to cargo cult your own version.
    static Inversion*
    S_MyAnalyzer_Transform_IMP(MyAnalyzer *self, Inversion *inversion) {
        return (Inversion*)INCREF(inversion);
    }

    // Create a subclass at runtime.
    static Class*
    S_class_var(void) {
        StackString *class_name = SSTR_WRAP_UTF8("MyAnalyzer", 10);
        Class *klass = Class_fetch_class((String*)class_name);
        if (!klass) {
            klass = Class_singleton((String*)class_name, ANALYZER);
            Class_Override(klass,
                           (cfish_method_t)S_MyAnalyzer_Transform_IMP,
                           LUCY_Analyzer_Transform_OFFSET);
        }
        return klass;
    }

    // Constructor.
    MyAnalyzer*
    MyAnalyzer_new(void) {
        Class *klass = S_class_var();
        return (MyAnalyzer*)Class_Make_Obj(klass);
    }

Marvin Humphrey
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-dev] Rolling out your own analyzer.

Nick Wellnhofer
In reply to this post by Bruno Albuquerque
On 30/03/2015 22:28, Bruno Albuquerque wrote:
> This is mostly out of curiosity than anything else. If I have the
> Lucy/Clownfish C headers and respective libraries, is it possible to roll
> my own Analyzer and use it with Lucy? Is there any documentation about
> doing something like that? It would be best if it did not involve having to
> run cfc to achieve that, but I am ok with that if it can not be avoided.

If you're interested in how to extend Lucy properly, have a look at my sample
extension LucyX-Analysis-WhitespaceTokenizer:

     https://github.com/nwellnhof/LucyX-Analysis-WhitespaceTokenizer

I'm not sure whether it builds with the 0.4 release but it should give you the
general idea.

Nick
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-dev] Rolling out your own analyzer.

Bruno Albuquerque
In reply to this post by Marvin Humphrey
Thanks Marvin. This should be enough for me to experiment with it.

On a related question, Lucy relies on Snowball for language support
(normalization, stemming, stopwords) but snowball has a very limited set of
languages it supports. What would be the best way to add support for new
languages? Creating a new module (in the same way that snowball seem to be
a module)?

Em seg, 30 de mar de 2015 às 22:59, Marvin Humphrey <[hidden email]>
escreveu:

> On Mon, Mar 30, 2015 at 1:28 PM, Bruno Albuquerque <[hidden email]>
> wrote:
> > This is mostly out of curiosity than anything else. If I have the
> > Lucy/Clownfish C headers and respective libraries, is it possible to roll
> > my own Analyzer and use it with Lucy? Is there any documentation about
> > doing something like that? It would be best if it did not involve having
> to
> > run cfc to achieve that, but I am ok with that if it can not be avoided.
>
> Subtyping from C is not officially supported because we have not worked out
> a stable API for it.  Subclassing Analyzer is not officially supported but
> for
> a different reason: Lucy's array-based model for token processing appears
> to
> be inferior to a stream-based model and the API was redacted in order to
> give
> us the option of changing it.
>
> Nevertheless, I can provide you with undocumented hacks which achieve your
> ends.  The interface is not yet elegant, but conversations like this will
> lead
> to improving it.
>
>     typedef struct MyAnalyzer MyAnalyzer;
>
>     // Transform() is the central Analyzer method.  Check out the
>     // documentation in Analyzer.cfh and various implementations; that
>     // should give you enough to cargo cult your own version.
>     static Inversion*
>     S_MyAnalyzer_Transform_IMP(MyAnalyzer *self, Inversion *inversion) {
>         return (Inversion*)INCREF(inversion);
>     }
>
>     // Create a subclass at runtime.
>     static Class*
>     S_class_var(void) {
>         StackString *class_name = SSTR_WRAP_UTF8("MyAnalyzer", 10);
>         Class *klass = Class_fetch_class((String*)class_name);
>         if (!klass) {
>             klass = Class_singleton((String*)class_name, ANALYZER);
>             Class_Override(klass,
>                            (cfish_method_t)S_MyAnalyzer_Transform_IMP,
>                            LUCY_Analyzer_Transform_OFFSET);
>         }
>         return klass;
>     }
>
>     // Constructor.
>     MyAnalyzer*
>     MyAnalyzer_new(void) {
>         Class *klass = S_class_var();
>         return (MyAnalyzer*)Class_Make_Obj(klass);
>     }
>
> Marvin Humphrey
>
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-dev] Rolling out your own analyzer.

Bruno Albuquerque
In reply to this post by Nick Wellnhofer
Thanks Nick. This is definitely very useful. i will check if it works with
0.4.2, but as per Marvin's first message, it looks like the API required
for it is either hidden or not present at all. Unless I misunderstood
something.

Em ter, 31 de mar de 2015 às 06:47, Nick Wellnhofer <[hidden email]>
escreveu:

> On 30/03/2015 22:28, Bruno Albuquerque wrote:
> > This is mostly out of curiosity than anything else. If I have the
> > Lucy/Clownfish C headers and respective libraries, is it possible to roll
> > my own Analyzer and use it with Lucy? Is there any documentation about
> > doing something like that? It would be best if it did not involve having
> to
> > run cfc to achieve that, but I am ok with that if it can not be avoided.
>
> If you're interested in how to extend Lucy properly, have a look at my
> sample
> extension LucyX-Analysis-WhitespaceTokenizer:
>
>      https://github.com/nwellnhof/LucyX-Analysis-WhitespaceTokenizer
>
> I'm not sure whether it builds with the 0.4 release but it should give you
> the
> general idea.
>
> Nick
>
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-dev] Rolling out your own analyzer.

Marvin Humphrey
In reply to this post by Bruno Albuquerque
On Tue, Mar 31, 2015 at 5:32 AM, Bruno Albuquerque <[hidden email]> wrote:

> On a related question, Lucy relies on Snowball for language support
> (normalization, stemming, stopwords) but snowball has a very limited set of
> languages it supports. What would be the best way to add support for new
> languages?

There's no canonical form of "language support" in Lucy.  There are only
Analyzers which happen to be tuned for content in a specific language.

What Analyzers do is tokenize and normalize content.  You start with a Unicode
text string.  Let's say it's the following:

    Eats, Shoots and Leaves.

If you perform no analysis, the only search which will match that field is
the exact term query `Eats, Shoots and Leaves.` -- because there's only one
entry in the term dictionary and that's it.

    # Tokens produced by analysis chain and stored in index:
    ['Eats, Shoots and Leaves.']

If you use an Analyzer which only splits on whitespace, you will become able
to search for individual terms, but your searches will be case-sensitive and
punctuation will get in the way.  For example, a search for `Leaves` will fail
but a search for `Leaves.` will succeed.

    ['Eats,', 'Shoots', 'and', 'Leaves.']

If you use an Analyzer which splits on whitespace and is intelligent about
removing punctuation, that problem is solved.

    ['Eats', 'Shoots', 'and', 'Leaves']

If you add case folding to the analysis chain, then searches for both `leaves`
and `Leaves` will succeed.

    ['eats', 'shoots', 'and', 'leaves']

(Note that no matter which Analyzer you use, the same transform must be
applied at search time in order to match.)

If you add an English Snowball stemmer, then searches for both `leaves` and
`leave` will match (though not `leaf`, which stems to `leaf` using Snowball
EN).

    ['eat', 'shoot', 'and', 'leave']

So... to implement "language support" for another language, you need to create
an Analyzer which implements a Transform() method which applies
tokenization and normalization appropriate for that language.

Does that make sense?

Marvin Humphrey
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-dev] Rolling out your own analyzer.

Bruno Albuquerque
Sure, and it matches with my original understanding about what should be
done. I got doubts when I saw the modules directory with snowball in it.

Lucene appears to support a lot more languages out of the box. Are there
plans for Lucy to support other languages? (even if Lucy has no formal
notion of language support, this is an important part of IR).

Thanks for all your help so far.


Em ter, 31 de mar de 2015 às 13:42, Marvin Humphrey <[hidden email]>
escreveu:

> On Tue, Mar 31, 2015 at 5:32 AM, Bruno Albuquerque <[hidden email]>
> wrote:
>
> > On a related question, Lucy relies on Snowball for language support
> > (normalization, stemming, stopwords) but snowball has a very limited set
> of
> > languages it supports. What would be the best way to add support for new
> > languages?
>
> There's no canonical form of "language support" in Lucy.  There are only
> Analyzers which happen to be tuned for content in a specific language.
>
> What Analyzers do is tokenize and normalize content.  You start with a
> Unicode
> text string.  Let's say it's the following:
>
>     Eats, Shoots and Leaves.
>
> If you perform no analysis, the only search which will match that field is
> the exact term query `Eats, Shoots and Leaves.` -- because there's only one
> entry in the term dictionary and that's it.
>
>     # Tokens produced by analysis chain and stored in index:
>     ['Eats, Shoots and Leaves.']
>
> If you use an Analyzer which only splits on whitespace, you will become
> able
> to search for individual terms, but your searches will be case-sensitive
> and
> punctuation will get in the way.  For example, a search for `Leaves` will
> fail
> but a search for `Leaves.` will succeed.
>
>     ['Eats,', 'Shoots', 'and', 'Leaves.']
>
> If you use an Analyzer which splits on whitespace and is intelligent about
> removing punctuation, that problem is solved.
>
>     ['Eats', 'Shoots', 'and', 'Leaves']
>
> If you add case folding to the analysis chain, then searches for both
> `leaves`
> and `Leaves` will succeed.
>
>     ['eats', 'shoots', 'and', 'leaves']
>
> (Note that no matter which Analyzer you use, the same transform must be
> applied at search time in order to match.)
>
> If you add an English Snowball stemmer, then searches for both `leaves` and
> `leave` will match (though not `leaf`, which stems to `leaf` using Snowball
> EN).
>
>     ['eat', 'shoot', 'and', 'leave']
>
> So... to implement "language support" for another language, you need to
> create
> an Analyzer which implements a Transform() method which applies
> tokenization and normalization appropriate for that language.
>
> Does that make sense?
>
> Marvin Humphrey
>
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-dev] Rolling out your own analyzer.

Marvin Humphrey
On Tue, Mar 31, 2015 at 10:28 AM, Bruno Albuquerque <[hidden email]> wrote:
> Sure, and it matches with my original understanding about what should be
> done. I got doubts when I saw the modules directory with snowball in it.
>
> Lucene appears to support a lot more languages out of the box. Are there
> plans for Lucy to support other languages? (even if Lucy has no formal
> notion of language support, this is an important part of IR).

Lucene is a gigantic project with an army of developers working on it.  It's
not realistic to shoot for feature parity, and if we tried, we would be so
busy porting cruft that we couldn't achieve the two priorities that matter most
to present members of the Lucy development community: adding support for more
programming languages to our symbiotic object system "Clownfish", and
refactoring Lucy's matching framework to facilitate certain assembler-level
optimization techniques.

If new members of the Lucy community were to contribute support for additional
languages, though, we'd presumably find ways to acommodate them.

Marvin Humphrey
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-dev] Rolling out your own analyzer.

Bruno Albuquerque
Resurrecting a very old topic.

It turns out I managed to do this to create a "new" analyzer it it has been
working flawlessly. Recently I saw the need to add attributes to my
analyzer (more specifically, I need to pass  a language to it).

My first attempt was to use the same method to subclass EasyAnalyzer
instead of Analyzer (basically replacing ANALYZER by EASYANALYZER and
adapting everything else as needed) to use the fact that EasyAnalyzer
already has a language field, but this did not work (i.e. my overridden
transform method was never called and the EasyAnalyzer one was being called
instead).

So, my question is: Using the trick of overriding specific methods form
Analyzer during runtime, is there a way I can also create my own ivars and
attach it to my overriden analyzer?

While we are at this, this is all Lucy 0.4.2 (yeah, I know, old) and I am
considering moving to the latest version mostly due to the now included
golang bindings. I didn't look a lot at the golang bindings yet but will it
be possible to create a new analyzer in a non-hackish way and entirely in
go using it?

Thanks in advance.


On Tue, Mar 31, 2015 at 10:54 AM Marvin Humphrey <[hidden email]>
wrote:

> On Tue, Mar 31, 2015 at 10:28 AM, Bruno Albuquerque <[hidden email]>
> wrote:
> > Sure, and it matches with my original understanding about what should be
> > done. I got doubts when I saw the modules directory with snowball in it.
> >
> > Lucene appears to support a lot more languages out of the box. Are there
> > plans for Lucy to support other languages? (even if Lucy has no formal
> > notion of language support, this is an important part of IR).
>
> Lucene is a gigantic project with an army of developers working on it.
> It's
> not realistic to shoot for feature parity, and if we tried, we would be so
> busy porting cruft that we couldn't achieve the two priorities that matter
> most
> to present members of the Lucy development community: adding support for
> more
> programming languages to our symbiotic object system "Clownfish", and
> refactoring Lucy's matching framework to facilitate certain assembler-level
> optimization techniques.
>
> If new members of the Lucy community were to contribute support for
> additional
> languages, though, we'd presumably find ways to acommodate them.
>
> Marvin Humphrey
>
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-dev] Rolling out your own analyzer.

Peter Karman
Bruno Albuquerque wrote on 3/5/17 10:26 AM:

> Resurrecting a very old topic.
>
> It turns out I managed to do this to create a "new" analyzer it it has been
> working flawlessly. Recently I saw the need to add attributes to my
> analyzer (more specifically, I need to pass  a language to it).
>
> My first attempt was to use the same method to subclass EasyAnalyzer
> instead of Analyzer (basically replacing ANALYZER by EASYANALYZER and
> adapting everything else as needed) to use the fact that EasyAnalyzer
> already has a language field, but this did not work (i.e. my overridden
> transform method was never called and the EasyAnalyzer one was being called
> instead).
>
> So, my question is: Using the trick of overriding specific methods form
> Analyzer during runtime, is there a way I can also create my own ivars and
> attach it to my overriden analyzer?
>
> While we are at this, this is all Lucy 0.4.2 (yeah, I know, old) and I am
> considering moving to the latest version mostly due to the now included
> golang bindings. I didn't look a lot at the golang bindings yet but will it
> be possible to create a new analyzer in a non-hackish way and entirely in
> go using it?
>

Assuming you are doing this in Perl, there was documentation added recently on
how to do this:

https://github.com/apache/lucy/commit/040bb290f12f8df015b1d9ae99758f1600a52f18#diff-3bbd01088ea5a113c55f839852dfba1fR62

You need to use the inside-out object pattern.

I'm not sure about the corresponding way to do this in C or other host languages.

--
Peter Karman  .  https://peknet.com/  .  https://keybase.io/peterkarman
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-dev] Rolling out your own analyzer.

Nick Wellnhofer
In reply to this post by Bruno Albuquerque
On 05/03/2017 17:26, Bruno Albuquerque wrote:
> So, my question is: Using the trick of overriding specific methods form
> Analyzer during runtime, is there a way I can also create my own ivars and
> attach it to my overriden analyzer?

Yes, this is possible. See the example code below (based on Lucy 0.6). Note
that you should implement Equals, Dump, and Load methods if you have an
Analyzer with ivars. See the Lucy Analyzers for guidance:

     https://github.com/apache/lucy/tree/master/core/Lucy/Analysis

Another option is to write your own Clownfish parcel. Then the Clownfish
compiler 'cfc' will create all the boilerplate code for you.

     https://lucy.apache.org/docs/c/Clownfish/Docs/WritingClasses.html

> While we are at this, this is all Lucy 0.4.2 (yeah, I know, old) and I am
> considering moving to the latest version mostly due to the now included
> golang bindings. I didn't look a lot at the golang bindings yet but will it
> be possible to create a new analyzer in a non-hackish way and entirely in
> go using it?

Extending Lucy classes from Go isn't supported yet.

Nick


#define CFISH_USE_SHORT_NAMES
#define LUCY_USE_SHORT_NAMES

#include <stdio.h>

#include "Clownfish/Class.h"
#include "Lucy/Analysis/Analyzer.h"
#include "Lucy/Analysis/Inversion.h"

typedef struct MyAnalyzer MyAnalyzer;

typedef struct {
     int i;
     const char *p;
} MyAnalyzerIVARS;

Class *MYANALYZER;
uint32_t MyAnalyzer_IVARS_OFFSET;

static CFISH_INLINE MyAnalyzerIVARS*
MyAnalyzer_IVARS(MyAnalyzer *self) {
    char *ptr = (char*)self + MyAnalyzer_IVARS_OFFSET;
    return (MyAnalyzerIVARS*)ptr;
}

Analyzer*
MyAnalyzer_new() {
     MyAnalyzer *self = (MyAnalyzer*)Class_Make_Obj(MYANALYZER);
     MyAnalyzerIVARS *const ivars = MyAnalyzer_IVARS(self);
     ivars->i = 42;
     ivars->p = "abc";
     return (Analyzer*)self;
}

Inversion*
MyAnalyzer_Transform_IMP(MyAnalyzer *self, Inversion *inversion) {
     MyAnalyzerIVARS *const ivars = MyAnalyzer_IVARS(self);
     printf("i=%d\n", ivars->i);
     return (Inversion*)INCREF(inversion);
}

int
main() {
     lucy_bootstrap_parcel();

     cfish_ClassSpec class_spec = {
         &MYANALYZER,
         &ANALYZER,
         "MyNamespace::MyAnalyzer",
         sizeof(MyAnalyzerIVARS),
         &MyAnalyzer_IVARS_OFFSET,
         0,
         0,
         0,
         0
     };
     cfish_ParcelSpec parcel_spec = {
         &class_spec,
         NULL,
         NULL,
         NULL,
         1   // Number of classes.
     };

     Class_bootstrap(&parcel_spec);
     Class_Override(MYANALYZER, (cfish_method_t)MyAnalyzer_Transform_IMP,
                    LUCY_Analyzer_Transform_OFFSET);

     Analyzer *analyzer = MyAnalyzer_new();
     Analyzer_Transform(analyzer, NULL);
     DECREF(analyzer);

     return 0;
}

Reply | Threaded
Open this post in threaded view
|

Re: [lucy-dev] Rolling out your own analyzer.

Bruno Albuquerque
Hi Nick.

Thanks again.

I will try the example you gave but it does appear no have a lot more
boiler plate than the original example and I just want to make sure that is
really needed (the least code I have to add for this, the better, as I will
do a complete rewrite yo use the latest Lucy anyway). Here was the original
sample code that worked for me:

    typedef struct MyAnalyzer MyAnalyzer;

    // Transform() is the central Analyzer method.  Check out the
    // documentation in Analyzer.cfh and various implementations; that
    // should give you enough to cargo cult your own version.
    static Inversion*
    S_MyAnalyzer_Transform_IMP(MyAnalyzer *self, Inversion *inversion) {
        return (Inversion*)INCREF(inversion);
    }

    // Create a subclass at runtime.
    static Class*
    S_class_var(void) {
        StackString *class_name = SSTR_WRAP_UTF8("MyAnalyzer", 10);
        Class *klass = Class_fetch_class((String*)class_name);
        if (!klass) {
            klass = Class_singleton((String*)class_name, ANALYZER);
            Class_Override(klass,
                           (cfish_method_t)S_MyAnalyzer_Transform_IMP,
                           LUCY_Analyzer_Transform_OFFSET);
        }
        return klass;
    }

    // Constructor.
    MyAnalyzer*
    MyAnalyzer_new(void) {
        Class *klass = S_class_var();
        return (MyAnalyzer*)Class_Make_Obj(klass);
    }

My main worry with your new example is that there is a considerable amount
of code being written in main and this will all go inside a library.
Because for now I am basically cargo-culting my way around this, I am not
sure what would be the best point to do that initialization. Would doing it
in _new work anyway?

Thanks again.


On Sun, Mar 5, 2017 at 3:32 PM Nick Wellnhofer <[hidden email]> wrote:

> On 05/03/2017 17:26, Bruno Albuquerque wrote:
> > So, my question is: Using the trick of overriding specific methods form
> > Analyzer during runtime, is there a way I can also create my own ivars
> and
> > attach it to my overriden analyzer?
>
> Yes, this is possible. See the example code below (based on Lucy 0.6). Note
> that you should implement Equals, Dump, and Load methods if you have an
> Analyzer with ivars. See the Lucy Analyzers for guidance:
>
>      https://github.com/apache/lucy/tree/master/core/Lucy/Analysis
>
> Another option is to write your own Clownfish parcel. Then the Clownfish
> compiler 'cfc' will create all the boilerplate code for you.
>
>      https://lucy.apache.org/docs/c/Clownfish/Docs/WritingClasses.html
>
> > While we are at this, this is all Lucy 0.4.2 (yeah, I know, old) and I am
> > considering moving to the latest version mostly due to the now included
> > golang bindings. I didn't look a lot at the golang bindings yet but will
> it
> > be possible to create a new analyzer in a non-hackish way and entirely in
> > go using it?
>
> Extending Lucy classes from Go isn't supported yet.
>
> Nick
>
>
> #define CFISH_USE_SHORT_NAMES
> #define LUCY_USE_SHORT_NAMES
>
> #include <stdio.h>
>
> #include "Clownfish/Class.h"
> #include "Lucy/Analysis/Analyzer.h"
> #include "Lucy/Analysis/Inversion.h"
>
> typedef struct MyAnalyzer MyAnalyzer;
>
> typedef struct {
>      int i;
>      const char *p;
> } MyAnalyzerIVARS;
>
> Class *MYANALYZER;
> uint32_t MyAnalyzer_IVARS_OFFSET;
>
> static CFISH_INLINE MyAnalyzerIVARS*
> MyAnalyzer_IVARS(MyAnalyzer *self) {
>     char *ptr = (char*)self + MyAnalyzer_IVARS_OFFSET;
>     return (MyAnalyzerIVARS*)ptr;
> }
>
> Analyzer*
> MyAnalyzer_new() {
>      MyAnalyzer *self = (MyAnalyzer*)Class_Make_Obj(MYANALYZER);
>      MyAnalyzerIVARS *const ivars = MyAnalyzer_IVARS(self);
>      ivars->i = 42;
>      ivars->p = "abc";
>      return (Analyzer*)self;
> }
>
> Inversion*
> MyAnalyzer_Transform_IMP(MyAnalyzer *self, Inversion *inversion) {
>      MyAnalyzerIVARS *const ivars = MyAnalyzer_IVARS(self);
>      printf("i=%d\n", ivars->i);
>      return (Inversion*)INCREF(inversion);
> }
>
> int
> main() {
>      lucy_bootstrap_parcel();
>
>      cfish_ClassSpec class_spec = {
>          &MYANALYZER,
>          &ANALYZER,
>          "MyNamespace::MyAnalyzer",
>          sizeof(MyAnalyzerIVARS),
>          &MyAnalyzer_IVARS_OFFSET,
>          0,
>          0,
>          0,
>          0
>      };
>      cfish_ParcelSpec parcel_spec = {
>          &class_spec,
>          NULL,
>          NULL,
>          NULL,
>          1   // Number of classes.
>      };
>
>      Class_bootstrap(&parcel_spec);
>      Class_Override(MYANALYZER, (cfish_method_t)MyAnalyzer_Transform_IMP,
>                     LUCY_Analyzer_Transform_OFFSET);
>
>      Analyzer *analyzer = MyAnalyzer_new();
>      Analyzer_Transform(analyzer, NULL);
>      DECREF(analyzer);
>
>      return 0;
> }
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-dev] Rolling out your own analyzer.

Nick Wellnhofer
On 06/03/2017 15:57, Bruno Albuquerque wrote:
> My main worry with your new example is that there is a considerable amount
> of code being written in main and this will all go inside a library.
> Because for now I am basically cargo-culting my way around this, I am not
> sure what would be the best point to do that initialization. Would doing it
> in _new work anyway?

Yes, this would work just the same. You'll only need a global variable for the
ivars offset.

Nick


Reply | Threaded
Open this post in threaded view
|

Re: [lucy-dev] Rolling out your own analyzer.

Bruno Albuquerque
Ok, thanks. It appears cfish_ParcelSpec does not exist in 0.4.2. Is there
an alternative?


On Mon, Mar 6, 2017 at 9:26 AM Nick Wellnhofer <[hidden email]> wrote:

> On 06/03/2017 15:57, Bruno Albuquerque wrote:
> > My main worry with your new example is that there is a considerable
> amount
> > of code being written in main and this will all go inside a library.
> > Because for now I am basically cargo-culting my way around this, I am not
> > sure what would be the best point to do that initialization. Would doing
> it
> > in _new work anyway?
>
> Yes, this would work just the same. You'll only need a global variable for
> the
> ivars offset.
>
> Nick
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-dev] Rolling out your own analyzer.

Nick Wellnhofer
On 06/03/2017 22:27, Bruno Albuquerque wrote:
> Ok, thanks. It appears cfish_ParcelSpec does not exist in 0.4.2. Is there
> an alternative?

In 0.4, Class_bootstrap is declared as

    void
    Class_bootstrap(const cfish_ClassSpec *specs, size_t num_specs);

and cfish_ClassSpec is

     typedef struct cfish_ClassSpec {
         cfish_Class **klass;
         cfish_Class **parent;
         const char   *name;
         size_t        ivars_size;
         size_t       *ivars_offset_ptr;
         uint32_t      num_novel_meths;
         uint32_t      num_overridden_meths;
         uint32_t      num_inherited_meths;
         const cfish_NovelMethSpec      *novel_meth_specs;
         const cfish_OverriddenMethSpec *overridden_meth_specs;
         const cfish_InheritedMethSpec  *inherited_meth_specs;
     } cfish_ClassSpec;

So something like the following should work:

     size_t MyAnalyzer_IVARS_OFFSET;

     // Create a subclass with ivars at runtime.
     static Class*
     S_class_var(void) {
         StackString *class_name = SSTR_WRAP_UTF8("MyAnalyzer", 10);
         Class *klass = Class_fetch_class((String*)class_name);
         if (!klass) {
             cfish_ClassSpec class_spec = {
                 &klass,
                 &ANALYZER,
                 "MyAnalyzer",
                 sizeof(MyAnalyzerIVARS),
                 &MyAnalyzer_IVARS_OFFSET,
                 0, 0, 0,
                 NULL, NULL, NULL
             };
             Class_bootstrap(&class_spec, 1);
             Class_Override(klass,
                            (cfish_method_t)S_MyAnalyzer_Transform_IMP,
                            LUCY_Analyzer_Transform_OFFSET);
         }
         return klass;
     }

Nick