[lucy-user] C Library: Regex query

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[lucy-user] C Library: Regex query

serkanmulayim@gmail.com
Hi guys,

I would like to ask if it is possible to do regex queries (without adding new fields, and tokenizing differently) in the C library. What I need to do is to be able to be able to return documents based on file name suffix. So that a query as (*.pdf) should return all documents that contain a PDF file type.

I can understand the complexity it creates for the searcher to do a suffix query. But in my use case there would not be many files that are associated with the documents. So that attachment fields will exist for small number of documents.

If this is not possible, I will also index the documents with their file types in a new field. (or reverse the attachment names).

Thanks,
Serkan
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-user] C Library: Regex query

Peter Karman
Serkan Mulayim wrote on 6/8/17 7:12 PM:
> Hi guys,
>
> I would like to ask if it is possible to do regex queries (without adding new fields, and tokenizing differently) in the C library. What I need to do is to be able to be able to return documents based on file name suffix. So that a query as (*.pdf) should return all documents that contain a PDF file type.
>
> I can understand the complexity it creates for the searcher to do a suffix query. But in my use case there would not be many files that are associated with the documents. So that attachment fields will exist for small number of documents.
>
> If this is not possible, I will also index the documents with their file types in a new field. (or reverse the attachment names).
>

afaik there is no C implementation of the Regex query. I wrote the Perl version.

https://metacpan.org/release/LucyX-Search-WildcardQuery

You will be *much* happier with storing the file extension as a separate field
and searching on that. Far far more efficient at search time than munging a regex.


--
Peter Karman  .  https://karpet.github.io  .  https://keybase.io/peterkarman