[lucy-user] Getting matched query terms for HitDocs

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[lucy-user] Getting matched query terms for HitDocs

Philip Southam
So let's say I have a query like:
        (dog OR cat OR bird) animal

And the text I'm indexing is like this
        A dog is an animal
        Cats and Dogs are animals
        A tree is not an animal

Of course (with stemming) the following two entries should be matched:
        A dog is an animal
        Cats and Dogs are animals

How do I get what query terms/phrases were found so I know that doc_id 1
matched against (dog, animal) and doc_id 2 matched (cat, dog, animal)?
I'm looking for something similar in functionality to what Whoosh[1] and
Xapian[2] offer in this regard.

I tried looking at the highlighter source thinking that has to implement
similar logic, but my knowledge of C is next to nil and I didn't see any
thing like that in the Perl bindings that I could use.


[1]:
http://pythonhosted.org/Whoosh/searching.html#which-terms-from-my-query-matched

[2]:
http://xapian.org/docs/apidoc/html/classXapian_1_1Enquire.html#dda4181ccd15beb52c39f5e24adbb25b

Regards,
--
Philip Southam
Chief Architect / Яeverse Эngineer
http://zefr.com
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-user] Getting matched query terms for HitDocs

Peter Karman
On 8/28/13 11:32 PM, Philip Southam wrote:

>
> I tried looking at the highlighter source thinking that has to implement
> similar logic, but my knowledge of C is next to nil and I didn't see any
> thing like that in the Perl bindings that I could use.
>
>

it is not native to the library, but I have one similar implementation
here as an example:

https://metacpan.org/module/SWISH::Prog::Lucy::Results#find_relevant_fields-1-0

see http://markmail.org/message/xoqwxofwphlowqxf

--
Peter Karman  .  http://peknet.com/  .  [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-user] Getting matched query terms for HitDocs

Philip Southam
On 08/29/2013 05:15 AM, Peter Karman wrote:

> On 8/28/13 11:32 PM, Philip Southam wrote:
>
>>
>> I tried looking at the highlighter source thinking that has to implement
>> similar logic, but my knowledge of C is next to nil and I didn't see any
>> thing like that in the Perl bindings that I could use.
>>
>>
>
> it is not native to the library, but I have one similar implementation
> here as an example:
>
> https://metacpan.org/module/SWISH::Prog::Lucy::Results#find_relevant_fields-1-0
>
>
> see http://markmail.org/message/xoqwxofwphlowqxf
>

Thanks Peter. I've gotten as far as getting a value out of
LUCY_Compiler_Highlight_Spans. Now the part I'm struggling with is
unmarshalling the data out of the returned spans cfish_VArray* object.
In lines 141-148 of this gist[1] I've done my best to port the perl
example referenced in the markmail link. It compiles and runs, but
instead of printing (line 148) what I would expect to be a matching
term, I get something like:

        Lucy::Search::Span@0x00000000026fec90

indicating I'm not interacting with the spans object returned on line
143 properly. Anyone have any suggestions?


[1]: https://gist.github.com/philipsoutham/6386741

--
Philip Southam
Chief Architect / Яeverse Эngineer
http://zefr.com

Reply | Threaded
Open this post in threaded view
|

Re: [lucy-user] Getting matched query terms for HitDocs

Marvin Humphrey
On Thu, Aug 29, 2013 at 11:16 PM, Philip Southam <[hidden email]> wrote:

> Thanks Peter. I've gotten as far as getting a value out of
> LUCY_Compiler_Highlight_Spans. Now the part I'm struggling with is
> unmarshalling the data out of the returned spans cfish_VArray* object.
> In lines 141-148 of this gist[1] I've done my best to port the perl
> example referenced in the markmail link. It compiles and runs, but
> instead of printing (line 148) what I would expect to be a matching
> term, I get something like:
>
>         Lucy::Search::Span@0x00000000026fec90
>
> indicating I'm not interacting with the spans object returned on line
> 143 properly. Anyone have any suggestions?
>
>
> [1]: https://gist.github.com/philipsoutham/6386741

That's the default To_String() method, which is inherited by
Lucy::Search::Span from Clownfish::Object.  (For the implementation, see
`Obj_To_String_IMP` clownfish/runtime/core/Clownfish/Obj.c.)  Since Span
doesn't have its own stringification routine, you'll need to extract the data
you want using its accesssors, whose fully qualified names are
`LUCY_Span_Get_Offset`, `LUCY_Span_Get_Length` and `LUCY_Span_Get_Weight`.

Those methods are declared in core/Lucy/Search/Span.cfh (here's one)...

    /** Accessor for <code>offset</code> attribute.
     */
    public int32_t
    Get_Offset(Span *self);

... and implemented in core/Lucy/Search/Span.c:

    int32_t
    Span_Get_Offset_IMP(Span *self) {
        return Span_IVARS(self)->offset;
    }

HTH,

Marvin Humphrey
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-user] Getting matched query terms for HitDocs

Philip Southam
On 08/30/2013 12:55 PM, Marvin Humphrey wrote:

> On Thu, Aug 29, 2013 at 11:16 PM, Philip Southam <[hidden email]> wrote:
>> Thanks Peter. I've gotten as far as getting a value out of
>> LUCY_Compiler_Highlight_Spans. Now the part I'm struggling with is
>> unmarshalling the data out of the returned spans cfish_VArray* object.
>> In lines 141-148 of this gist[1] I've done my best to port the perl
>> example referenced in the markmail link. It compiles and runs, but
>> instead of printing (line 148) what I would expect to be a matching
>> term, I get something like:
>>
>>         Lucy::Search::Span@0x00000000026fec90
>>
>> indicating I'm not interacting with the spans object returned on line
>> 143 properly. Anyone have any suggestions?
>>
>>
>> [1]: https://gist.github.com/philipsoutham/6386741
>
> That's the default To_String() method, which is inherited by
> Lucy::Search::Span from Clownfish::Object.  (For the implementation, see
> `Obj_To_String_IMP` clownfish/runtime/core/Clownfish/Obj.c.)  Since Span
> doesn't have its own stringification routine, you'll need to extract the data
> you want using its accesssors, whose fully qualified names are
> `LUCY_Span_Get_Offset`, `LUCY_Span_Get_Length` and `LUCY_Span_Get_Weight`.
>
> Those methods are declared in core/Lucy/Search/Span.cfh (here's one)...
>
>     /** Accessor for <code>offset</code> attribute.
>      */
>     public int32_t
>     Get_Offset(Span *self);
>
> ... and implemented in core/Lucy/Search/Span.c:
>
>     int32_t
>     Span_Get_Offset_IMP(Span *self) {
>         return Span_IVARS(self)->offset;
>     }
>
> HTH,
>
> Marvin Humphrey
>

Thanks, with these clues I was able to get what I wanted (albeit with a
little more ceremony than I would like). If you're curious, you can see
it in action here[1].


[1]:
https://github.com/philipsoutham/golucy/blob/8e12c7343bcb161b82ad85978e565d17bc67a32c/v0.0.1/searcher.go#L132

--
Philip Southam
Chief Architect / Яeverse Эngineer
http://zefr.com