Help with design

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Help with design


I'm trying to come up with the best design for a problem.

I want to search texts for expressions that shouldn't be found in them.

My bad expressions list is quite stable. But the texts that I want to scan
change often.

Design I

Index my texts, and then loop on my expressions list to see if I can find
any of them in any of my texts.

I know how to build this design, but I dread rebuilding my index often with
my new/changed texts.

Design II
Index my expressions (usually 2-3 words long). Cut my to-be-checked-text in
sentences, then feed those sentences to lucene to see which are a match with
my expressions. But is it even possible to have a match such as:

sentence to check : "Il mange avec de faux amis." --> indexed info : "faux

In other word, how to go to answer the question: Does this sencence contains
any of those expressions ? Is it possible? How would I go about it?

Design III