regarding comparing texts using Lucene

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

regarding comparing texts using Lucene

Veda G M
Hello,

Is it possible to compare large chunks of text and get the similarity
score/percentage using Lucene?

Say for e.g., we have 2-3 paragraphs of text and need to search if there is
any document that matches this semantically and the similarity that the
returned hit and the search string share in terms of percentage.

Could you please let me know if this is possible with Lucene?

Thanks.

Regards,
Veda
Reply | Threaded
Open this post in threaded view
|

Re: regarding comparing texts using Lucene

Adrien Grand
Hi Veda,

Lucene doesn't provide such functionality out of the box, but you could use
MoreLikeThis (
https://lucene.apache.org/core/7_4_0/queries/org/apache/lucene/queries/mlt/MoreLikeThis.html)
to search for similar documents and then compute a finer-grained similarity
score on client-side. This would avoid having to compute a similarity score
with every document of your collection.

Le mer. 19 sept. 2018 à 15:28, Veda G M <[hidden email]> a écrit :

> Hello,
>
> Is it possible to compare large chunks of text and get the similarity
> score/percentage using Lucene?
>
> Say for e.g., we have 2-3 paragraphs of text and need to search if there is
> any document that matches this semantically and the similarity that the
> returned hit and the search string share in terms of percentage.
>
> Could you please let me know if this is possible with Lucene?
>
> Thanks.
>
> Regards,
> Veda
>