Lucene same search result for worlds with and without spaces

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Lucene same search result for worlds with and without spaces

egorlex
Hi,

I need help with Lucene.

How a can realize same search result for worlds with and without spaces.

For example request "similar issues" and "similarissues" must return all
Similar Issues.

Thanks.



--
Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene same search result for worlds with and without spaces

Ahmet Arslan
Hi Egorlex,

ShingleFilter could be used to achieve your goal.

Ahmet







On Tuesday, June 19, 2018, 8:06:46 PM GMT+3, egorlex <[hidden email]> wrote:





Hi,

I need help with Lucene.

How a can realize same search result for worlds with and without spaces.

For example request "similar issues" and "similarissues" must return all
Similar Issues.

Thanks.



--
Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene same search result for worlds with and without spaces

egorlex
Thanks for replay!

sorry, could you help a little, according to example

"given the phrase “Shingles is a viral disease”, a shingle filter might
produce:

Shingles is
is a
a viral
viral disease
"

I do not quite understand how this ShingleFilter can turn "similarissues"
into "similar issues"

Thanks!



--
Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Lucene same search result for worlds with and without spaces

Markus Jelsma-2
In reply to this post by egorlex
Hi Egorlex,

Set the tokenSeparator to "" and ShingleFilter will concatenate all shingles without whitespace. Keep in mind, this will greatly increase the size of the index so it might not be a good idea to concatenate all pairs of words.

If you are looking for finding "similarissues" with "similar issues" (and vice versa) you might want to check out DictionaryCompoundWordTokenFilter and/or HyphenationCompoundWordTokenFilter. Although English hardly uses compound words, the token filters still do their job quite nicely.

Regards,
Markus

 
 
-----Original message-----

> From:egorlex <[hidden email]>
> Sent: Wednesday 20th June 2018 11:42
> To: [hidden email]
> Subject: Re: Lucene same search result for worlds with and without spaces
>
> Thanks for replay!
>
> sorry, could you help a little, according to example
>
> "given the phrase “Shingles is a viral disease”, a shingle filter might
> produce:
>
> Shingles is
> is a
> a viral
> viral disease
> "
>
> I do not quite understand how this ShingleFilter can turn "similarissues"
> into "similar issues"
>
> Thanks!
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene same search result for worlds with and without spaces

András Péteri
An n-gram tokenizer/filter might also work for you:
http://lucene.apache.org/core/7_3_1/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenizer.html

Regards,
András

On Wed, Jun 20, 2018 at 11:53 AM, Markus Jelsma <[hidden email]>
wrote:

> Hi Egorlex,
>
> Set the tokenSeparator to "" and ShingleFilter will concatenate all
> shingles without whitespace. Keep in mind, this will greatly increase the
> size of the index so it might not be a good idea to concatenate all pairs
> of words.
>
> If you are looking for finding "similarissues" with "similar issues" (and
> vice versa) you might want to check out DictionaryCompoundWordTokenFilter
> and/or HyphenationCompoundWordTokenFilter. Although English hardly uses
> compound words, the token filters still do their job quite nicely.
>
> Regards,
> Markus
>
>
>
> -----Original message-----
> > From:egorlex <[hidden email]>
> > Sent: Wednesday 20th June 2018 11:42
> > To: [hidden email]
> > Subject: Re: Lucene same search result for worlds with and without spaces
> >
> > Thanks for replay!
> >
> > sorry, could you help a little, according to example
> >
> > "given the phrase “Shingles is a viral disease”, a shingle filter might
> > produce:
> >
> > Shingles is
> > is a
> > a viral
> > viral disease
> > "
> >
> > I do not quite understand how this ShingleFilter can turn "similarissues"
> > into "similar issues"
> >
> > Thanks!
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-
> f532864.html
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Lucene same search result for worlds with and without spaces

Ahmet Arslan
In reply to this post by egorlex
Hi Egorlex,

Shingle filter won't turn "similarissues" into "similar issues". But it can do the reverse.
It is like a sliding window. Think about what indexed tokens would be if you set token separator to ""

Ahmet








On Wednesday, June 20, 2018, 12:42:22 PM GMT+3, egorlex <[hidden email]> wrote:





Thanks for replay!

sorry, could you help a little, according to example

"given the phrase “Shingles is a viral disease”, a shingle filter might
produce:

Shingles is
is a
a viral
viral disease
"

I do not quite understand how this ShingleFilter can turn "similarissues"
into "similar issues"


Thanks!



--
Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]