BlendedInfixSuggester, a couple of questions

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

BlendedInfixSuggester, a couple of questions

Andrea Gazzarini-5
Hi,
I'm using Solr 7.1.0 (but I guess all what I'm going to describe is the same in the previous versions) and I have to implement a simple product name suggester.

I started focusing on the BlendedInfixLookup which could fit my needs, but I have some doubts, even after looking at the code, about how it works.  I have several questions:

1) org.apache.lucene.search.suggest.Lookup
The formula in the BlendedInfixSuggester documentation says "final weight = 1 - (0.10*position)" so it would suggest to me a float or a double datatype. Instead, the "value" instance member of the Lookup class, which should hold the computed weight, it's a long. 
realised that because, in a scenario where the weight field in my schema always returns 1, the final computed weight is always 0 or 1, therefore loosing the precision when the actual result of the formula above is between 0 and 1 (excluded).

2) Position role within the BlendedInfixSuggester
If I write more than one term in the query, let's say 

"Mini Bar Fridge" 

I would expect in the results something like (note that allTermsRequired=true and the schema weight field always returns 1000)

- Mini Bar Fridge something
- Mini Bar Fridge something else
- Mini Bar something Fridge        
- Mini Bar something else Fridge
- Mini something Bar Fridge
...

Instead I see this: 

Mini Bar something Fridge        
Mini Bar something else Fridge
Mini Bar Fridge something
Mini Bar Fridge something else
Mini something Bar Fridge
...

After having a look at the suggester code (BlendedInfixSuggester.createCoefficient), I see that the component takes in account only one position, which is the lowest position (among the three matching terms) within the term vector ("mini" in the example above) so all the suggestions above have the same weight 

score = weight * (1 - 0.10 * position) = 1000 * (1 - 0.10 * 0) = 1000

Is that the expected behaviour? 

Many thanks in advance
Andrea
Reply | Threaded
Open this post in threaded view
|

Re: BlendedInfixSuggester, a couple of questions

Andrea Gazzarini-5
Hi guys,
any suggestion about this?

Best, 
Andres

On 27 Nov 2017 5:54 pm, "Andrea Gazzarini" <[hidden email]> wrote:
Hi,
I'm using Solr 7.1.0 (but I guess all what I'm going to describe is the same in the previous versions) and I have to implement a simple product name suggester.

I started focusing on the BlendedInfixLookup which could fit my needs, but I have some doubts, even after looking at the code, about how it works.  I have several questions:

1) org.apache.lucene.search.suggest.Lookup
The formula in the BlendedInfixSuggester documentation says "final weight = 1 - (0.10*position)" so it would suggest to me a float or a double datatype. Instead, the "value" instance member of the Lookup class, which should hold the computed weight, it's a long. 
realised that because, in a scenario where the weight field in my schema always returns 1, the final computed weight is always 0 or 1, therefore loosing the precision when the actual result of the formula above is between 0 and 1 (excluded).

2) Position role within the BlendedInfixSuggester
If I write more than one term in the query, let's say 

"Mini Bar Fridge" 

I would expect in the results something like (note that allTermsRequired=true and the schema weight field always returns 1000)

- Mini Bar Fridge something
- Mini Bar Fridge something else
- Mini Bar something Fridge        
- Mini Bar something else Fridge
- Mini something Bar Fridge
...

Instead I see this: 

Mini Bar something Fridge        
Mini Bar something else Fridge
Mini Bar Fridge something
Mini Bar Fridge something else
Mini something Bar Fridge
...

After having a look at the suggester code (BlendedInfixSuggester.createCoefficient), I see that the component takes in account only one position, which is the lowest position (among the three matching terms) within the term vector ("mini" in the example above) so all the suggestions above have the same weight 

score = weight * (1 - 0.10 * position) = 1000 * (1 - 0.10 * 0) = 1000

Is that the expected behaviour? 

Many thanks in advance
Andrea
Reply | Threaded
Open this post in threaded view
|

Re: BlendedInfixSuggester, a couple of questions

Alessandro Benedetti-4
UP
i am facing the same behaviour and I agree with Andrea observations, any view on this from the dev community ?

Regards

On Wed, Nov 29, 2017 at 4:36 PM, Andrea Gazzarini <[hidden email]> wrote:
Hi guys,
any suggestion about this?

Best, 
Andres

On 27 Nov 2017 5:54 pm, "Andrea Gazzarini" <[hidden email]> wrote:
Hi,
I'm using Solr 7.1.0 (but I guess all what I'm going to describe is the same in the previous versions) and I have to implement a simple product name suggester.

I started focusing on the BlendedInfixLookup which could fit my needs, but I have some doubts, even after looking at the code, about how it works.  I have several questions:

The formula in the BlendedInfixSuggester documentation says "final weight = 1 - (0.10*position)" so it would suggest to me a float or a double datatype. Instead, the "value" instance member of the Lookup class, which should hold the computed weight, it's a long. 
realised that because, in a scenario where the weight field in my schema always returns 1, the final computed weight is always 0 or 1, therefore loosing the precision when the actual result of the formula above is between 0 and 1 (excluded).

2) Position role within the BlendedInfixSuggester
If I write more than one term in the query, let's say 

"Mini Bar Fridge" 

I would expect in the results something like (note that allTermsRequired=true and the schema weight field always returns 1000)

- Mini Bar Fridge something
- Mini Bar Fridge something else
- Mini Bar something Fridge        
- Mini Bar something else Fridge
- Mini something Bar Fridge
...

Instead I see this: 

Mini Bar something Fridge        
Mini Bar something else Fridge
Mini Bar Fridge something
Mini Bar Fridge something else
Mini something Bar Fridge
...

After having a look at the suggester code (BlendedInfixSuggester.createCoefficient), I see that the component takes in account only one position, which is the lowest position (among the three matching terms) within the term vector ("mini" in the example above) so all the suggestions above have the same weight 

score = weight * (1 - 0.10 * position) = 1000 * (1 - 0.10 * 0) = 1000

Is that the expected behaviour? 

Many thanks in advance
Andrea



--
--------------------------

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright    
In the forests of the night,    
What immortal hand or eye    
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England
Reply | Threaded
Open this post in threaded view
|

Re: BlendedInfixSuggester, a couple of questions

david.w.smiley@gmail.com
Feel free to file an issue with a proposal; probably to Lucene in this case.

On Tue, May 22, 2018 at 7:42 AM Alessandro Benedetti <[hidden email]> wrote:
UP
i am facing the same behaviour and I agree with Andrea observations, any view on this from the dev community ?

Regards

On Wed, Nov 29, 2017 at 4:36 PM, Andrea Gazzarini <[hidden email]> wrote:
Hi guys,
any suggestion about this?

Best, 
Andres

On 27 Nov 2017 5:54 pm, "Andrea Gazzarini" <[hidden email]> wrote:
Hi,
I'm using Solr 7.1.0 (but I guess all what I'm going to describe is the same in the previous versions) and I have to implement a simple product name suggester.

I started focusing on the BlendedInfixLookup which could fit my needs, but I have some doubts, even after looking at the code, about how it works.  I have several questions:

The formula in the BlendedInfixSuggester documentation says "final weight = 1 - (0.10*position)" so it would suggest to me a float or a double datatype. Instead, the "value" instance member of the Lookup class, which should hold the computed weight, it's a long. 
realised that because, in a scenario where the weight field in my schema always returns 1, the final computed weight is always 0 or 1, therefore loosing the precision when the actual result of the formula above is between 0 and 1 (excluded).

2) Position role within the BlendedInfixSuggester
If I write more than one term in the query, let's say 

"Mini Bar Fridge" 

I would expect in the results something like (note that allTermsRequired=true and the schema weight field always returns 1000)

- Mini Bar Fridge something
- Mini Bar Fridge something else
- Mini Bar something Fridge        
- Mini Bar something else Fridge
- Mini something Bar Fridge
...

Instead I see this: 

Mini Bar something Fridge        
Mini Bar something else Fridge
Mini Bar Fridge something
Mini Bar Fridge something else
Mini something Bar Fridge
...

After having a look at the suggester code (BlendedInfixSuggester.createCoefficient), I see that the component takes in account only one position, which is the lowest position (among the three matching terms) within the term vector ("mini" in the example above) so all the suggestions above have the same weight 

score = weight * (1 - 0.10 * position) = 1000 * (1 - 0.10 * 0) = 1000

Is that the expected behaviour? 

Many thanks in advance
Andrea



--
--------------------------

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright    
In the forests of the night,    
What immortal hand or eye    
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England
--
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
Reply | Threaded
Open this post in threaded view
|

Re: BlendedInfixSuggester, a couple of questions

Alessandro Benedetti
Thanks David, I attach in copy Andrea, probably he wants to follow up as he originally found the Lucene behavior.

Cheers

--------------------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director

On Tue, May 22, 2018 at 2:53 PM, David Smiley <[hidden email]> wrote:
Feel free to file an issue with a proposal; probably to Lucene in this case.

On Tue, May 22, 2018 at 7:42 AM Alessandro Benedetti <[hidden email]> wrote:
UP
i am facing the same behaviour and I agree with Andrea observations, any view on this from the dev community ?

Regards

On Wed, Nov 29, 2017 at 4:36 PM, Andrea Gazzarini <[hidden email]> wrote:
Hi guys,
any suggestion about this?

Best, 
Andres

On 27 Nov 2017 5:54 pm, "Andrea Gazzarini" <[hidden email]> wrote:
Hi,
I'm using Solr 7.1.0 (but I guess all what I'm going to describe is the same in the previous versions) and I have to implement a simple product name suggester.

I started focusing on the BlendedInfixLookup which could fit my needs, but I have some doubts, even after looking at the code, about how it works.  I have several questions:

The formula in the BlendedInfixSuggester documentation says "final weight = 1 - (0.10*position)" so it would suggest to me a float or a double datatype. Instead, the "value" instance member of the Lookup class, which should hold the computed weight, it's a long. 
realised that because, in a scenario where the weight field in my schema always returns 1, the final computed weight is always 0 or 1, therefore loosing the precision when the actual result of the formula above is between 0 and 1 (excluded).

2) Position role within the BlendedInfixSuggester
If I write more than one term in the query, let's say 

"Mini Bar Fridge" 

I would expect in the results something like (note that allTermsRequired=true and the schema weight field always returns 1000)

- Mini Bar Fridge something
- Mini Bar Fridge something else
- Mini Bar something Fridge        
- Mini Bar something else Fridge
- Mini something Bar Fridge
...

Instead I see this: 

Mini Bar something Fridge        
Mini Bar something else Fridge
Mini Bar Fridge something
Mini Bar Fridge something else
Mini something Bar Fridge
...

After having a look at the suggester code (BlendedInfixSuggester.createCoefficient), I see that the component takes in account only one position, which is the lowest position (among the three matching terms) within the term vector ("mini" in the example above) so all the suggestions above have the same weight 

score = weight * (1 - 0.10 * position) = 1000 * (1 - 0.10 * 0) = 1000

Is that the expected behaviour? 

Many thanks in advance
Andrea



--
--------------------------

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright    
In the forests of the night,    
What immortal hand or eye    
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England
--
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker

---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|

Re: BlendedInfixSuggester, a couple of questions

Alessandro Benedetti
Hi all,

A patch with the fix and related tests is available for review.

Regards

--------------------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director

On Tue, May 22, 2018 at 3:05 PM, Alessandro Benedetti <[hidden email]> wrote:
Thanks David, I attach in copy Andrea, probably he wants to follow up as he originally found the Lucene behavior.

Cheers

--------------------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director

On Tue, May 22, 2018 at 2:53 PM, David Smiley <[hidden email]> wrote:
Feel free to file an issue with a proposal; probably to Lucene in this case.

On Tue, May 22, 2018 at 7:42 AM Alessandro Benedetti <[hidden email]> wrote:
UP
i am facing the same behaviour and I agree with Andrea observations, any view on this from the dev community ?

Regards

On Wed, Nov 29, 2017 at 4:36 PM, Andrea Gazzarini <[hidden email]> wrote:
Hi guys,
any suggestion about this?

Best, 
Andres

On 27 Nov 2017 5:54 pm, "Andrea Gazzarini" <[hidden email]> wrote:
Hi,
I'm using Solr 7.1.0 (but I guess all what I'm going to describe is the same in the previous versions) and I have to implement a simple product name suggester.

I started focusing on the BlendedInfixLookup which could fit my needs, but I have some doubts, even after looking at the code, about how it works.  I have several questions:

The formula in the BlendedInfixSuggester documentation says "final weight = 1 - (0.10*position)" so it would suggest to me a float or a double datatype. Instead, the "value" instance member of the Lookup class, which should hold the computed weight, it's a long. 
realised that because, in a scenario where the weight field in my schema always returns 1, the final computed weight is always 0 or 1, therefore loosing the precision when the actual result of the formula above is between 0 and 1 (excluded).

2) Position role within the BlendedInfixSuggester
If I write more than one term in the query, let's say 

"Mini Bar Fridge" 

I would expect in the results something like (note that allTermsRequired=true and the schema weight field always returns 1000)

- Mini Bar Fridge something
- Mini Bar Fridge something else
- Mini Bar something Fridge        
- Mini Bar something else Fridge
- Mini something Bar Fridge
...

Instead I see this: 

Mini Bar something Fridge        
Mini Bar something else Fridge
Mini Bar Fridge something
Mini Bar Fridge something else
Mini something Bar Fridge
...

After having a look at the suggester code (BlendedInfixSuggester.createCoefficient), I see that the component takes in account only one position, which is the lowest position (among the three matching terms) within the term vector ("mini" in the example above) so all the suggestions above have the same weight 

score = weight * (1 - 0.10 * position) = 1000 * (1 - 0.10 * 0) = 1000

Is that the expected behaviour? 

Many thanks in advance
Andrea



--
--------------------------

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright    
In the forests of the night,    
What immortal hand or eye    
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England
--
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker


---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|

Re: BlendedInfixSuggester, a couple of questions

Alessandro Benedetti
Hi all,
A patch with the improvement and related tests is available for review

Regards

--------------------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director

On Fri, Jun 1, 2018 at 12:57 PM, Alessandro Benedetti <[hidden email]> wrote:
Hi all,

A patch with the fix and related tests is available for review.

Regards

--------------------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director

On Tue, May 22, 2018 at 3:05 PM, Alessandro Benedetti <[hidden email]> wrote:
Thanks David, I attach in copy Andrea, probably he wants to follow up as he originally found the Lucene behavior.

Cheers

--------------------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director

On Tue, May 22, 2018 at 2:53 PM, David Smiley <[hidden email]> wrote:
Feel free to file an issue with a proposal; probably to Lucene in this case.

On Tue, May 22, 2018 at 7:42 AM Alessandro Benedetti <[hidden email]> wrote:
UP
i am facing the same behaviour and I agree with Andrea observations, any view on this from the dev community ?

Regards

On Wed, Nov 29, 2017 at 4:36 PM, Andrea Gazzarini <[hidden email]> wrote:
Hi guys,
any suggestion about this?

Best, 
Andres

On 27 Nov 2017 5:54 pm, "Andrea Gazzarini" <[hidden email]> wrote:
Hi,
I'm using Solr 7.1.0 (but I guess all what I'm going to describe is the same in the previous versions) and I have to implement a simple product name suggester.

I started focusing on the BlendedInfixLookup which could fit my needs, but I have some doubts, even after looking at the code, about how it works.  I have several questions:

The formula in the BlendedInfixSuggester documentation says "final weight = 1 - (0.10*position)" so it would suggest to me a float or a double datatype. Instead, the "value" instance member of the Lookup class, which should hold the computed weight, it's a long. 
realised that because, in a scenario where the weight field in my schema always returns 1, the final computed weight is always 0 or 1, therefore loosing the precision when the actual result of the formula above is between 0 and 1 (excluded).

2) Position role within the BlendedInfixSuggester
If I write more than one term in the query, let's say 

"Mini Bar Fridge" 

I would expect in the results something like (note that allTermsRequired=true and the schema weight field always returns 1000)

- Mini Bar Fridge something
- Mini Bar Fridge something else
- Mini Bar something Fridge        
- Mini Bar something else Fridge
- Mini something Bar Fridge
...

Instead I see this: 

Mini Bar something Fridge        
Mini Bar something else Fridge
Mini Bar Fridge something
Mini Bar Fridge something else
Mini something Bar Fridge
...

After having a look at the suggester code (BlendedInfixSuggester.createCoefficient), I see that the component takes in account only one position, which is the lowest position (among the three matching terms) within the term vector ("mini" in the example above) so all the suggestions above have the same weight 

score = weight * (1 - 0.10 * position) = 1000 * (1 - 0.10 * 0) = 1000

Is that the expected behaviour? 

Many thanks in advance
Andrea



--
--------------------------

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright    
In the forests of the night,    
What immortal hand or eye    
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England
--
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker



---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io