Tweaking boosts for more search results variety

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Tweaking boosts for more search results variety

gadde
Our index is aggregated content from various sites on the web. We want good
user experience by showing multiple sites in the search results. In our
setup we are seeing most of the results from same site on the top.

Here is some information regarding queries and schema
                site - String field. We have about 1000 sites in index
                sitetype - String field.  we have 3 site types
omitNorms="true" for both the fields

Doc count varies largely based on site and sitetype by a factor of 10 -
1000 times
Total index size is about 5 million docs.
Solr Version: 4.0

In our queries we have a fixed and preferential boost for certain sites.
sitetype has different and fixed boosts for 3 possible values. We turned
off Inverse Document Frequency (IDF) for these boosts to work properly.
Other text fields are boosted based on search keywords only.

With this setup we often see a bunch of hits from a single site followed by
next etc.,
Is there any solution to see results from variety of sites and still keep
the preferential boosts in place?
Reply | Threaded
Open this post in threaded view
|

Re: Tweaking boosts for more search results variety

Jack Krupansky-2
The grouping (field collapsing) feature somewhat addresses this - group by a
"site" field and then if more than one or a few top pages are from the same
site they get grouped or collapsed so that you can see more sites in a few
results.

See:
http://wiki.apache.org/solr/FieldCollapsing
https://cwiki.apache.org/confluence/display/solr/Result+Grouping

-- Jack Krupansky

-----Original Message-----
From: Sai Gadde
Sent: Thursday, September 05, 2013 2:27 AM
To: [hidden email]
Subject: Tweaking boosts for more search results variety

Our index is aggregated content from various sites on the web. We want good
user experience by showing multiple sites in the search results. In our
setup we are seeing most of the results from same site on the top.

Here is some information regarding queries and schema
                site - String field. We have about 1000 sites in index
                sitetype - String field.  we have 3 site types
omitNorms="true" for both the fields

Doc count varies largely based on site and sitetype by a factor of 10 -
1000 times
Total index size is about 5 million docs.
Solr Version: 4.0

In our queries we have a fixed and preferential boost for certain sites.
sitetype has different and fixed boosts for 3 possible values. We turned
off Inverse Document Frequency (IDF) for these boosts to work properly.
Other text fields are boosted based on search keywords only.

With this setup we often see a bunch of hits from a single site followed by
next etc.,
Is there any solution to see results from variety of sites and still keep
the preferential boosts in place?

Reply | Threaded
Open this post in threaded view
|

Re: Tweaking boosts for more search results variety

gadde
Thank you Jack for the suggestion.

We can try group by site. But considering that number of sites are only
about 1000 against the index size of 5 million, One can expect most of the
hits would be hidden and for certain specific keywords only a handful of
actual results could be displayed if results are grouped by site.

we already group on a signature field to identify duplicate content in
these 5 million+ docs. But here the number of duplicates are only about
3-5% maximum.

Is there any workaround for these limitations with grouping?

Thanks
Shyam



On Thu, Sep 5, 2013 at 9:16 PM, Jack Krupansky <[hidden email]>wrote:

> The grouping (field collapsing) feature somewhat addresses this - group by
> a "site" field and then if more than one or a few top pages are from the
> same site they get grouped or collapsed so that you can see more sites in a
> few results.
>
> See:
> http://wiki.apache.org/solr/**FieldCollapsing<http://wiki.apache.org/solr/FieldCollapsing>
> https://cwiki.apache.org/**confluence/display/solr/**Result+Grouping<https://cwiki.apache.org/confluence/display/solr/Result+Grouping>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Sai Gadde
> Sent: Thursday, September 05, 2013 2:27 AM
> To: [hidden email]
> Subject: Tweaking boosts for more search results variety
>
>
> Our index is aggregated content from various sites on the web. We want good
> user experience by showing multiple sites in the search results. In our
> setup we are seeing most of the results from same site on the top.
>
> Here is some information regarding queries and schema
>                site - String field. We have about 1000 sites in index
>                sitetype - String field.  we have 3 site types
> omitNorms="true" for both the fields
>
> Doc count varies largely based on site and sitetype by a factor of 10 -
> 1000 times
> Total index size is about 5 million docs.
> Solr Version: 4.0
>
> In our queries we have a fixed and preferential boost for certain sites.
> sitetype has different and fixed boosts for 3 possible values. We turned
> off Inverse Document Frequency (IDF) for these boosts to work properly.
> Other text fields are boosted based on search keywords only.
>
> With this setup we often see a bunch of hits from a single site followed by
> next etc.,
> Is there any solution to see results from variety of sites and still keep
> the preferential boosts in place?
>
Reply | Threaded
Open this post in threaded view
|

Re: Tweaking boosts for more search results variety

kamaci
What do you mean with "*these limitations" *Do you want to make multiple
grouping at same time?


2013/9/6 Sai Gadde <[hidden email]>

> Thank you Jack for the suggestion.
>
> We can try group by site. But considering that number of sites are only
> about 1000 against the index size of 5 million, One can expect most of the
> hits would be hidden and for certain specific keywords only a handful of
> actual results could be displayed if results are grouped by site.
>
> we already group on a signature field to identify duplicate content in
> these 5 million+ docs. But here the number of duplicates are only about
> 3-5% maximum.
>
> Is there any workaround for these limitations with grouping?
>
> Thanks
> Shyam
>
>
>
> On Thu, Sep 5, 2013 at 9:16 PM, Jack Krupansky <[hidden email]
> >wrote:
>
> > The grouping (field collapsing) feature somewhat addresses this - group
> by
> > a "site" field and then if more than one or a few top pages are from the
> > same site they get grouped or collapsed so that you can see more sites
> in a
> > few results.
> >
> > See:
> > http://wiki.apache.org/solr/**FieldCollapsing<
> http://wiki.apache.org/solr/FieldCollapsing>
> > https://cwiki.apache.org/**confluence/display/solr/**Result+Grouping<
> https://cwiki.apache.org/confluence/display/solr/Result+Grouping>
> >
> > -- Jack Krupansky
> >
> > -----Original Message----- From: Sai Gadde
> > Sent: Thursday, September 05, 2013 2:27 AM
> > To: [hidden email]
> > Subject: Tweaking boosts for more search results variety
> >
> >
> > Our index is aggregated content from various sites on the web. We want
> good
> > user experience by showing multiple sites in the search results. In our
> > setup we are seeing most of the results from same site on the top.
> >
> > Here is some information regarding queries and schema
> >                site - String field. We have about 1000 sites in index
> >                sitetype - String field.  we have 3 site types
> > omitNorms="true" for both the fields
> >
> > Doc count varies largely based on site and sitetype by a factor of 10 -
> > 1000 times
> > Total index size is about 5 million docs.
> > Solr Version: 4.0
> >
> > In our queries we have a fixed and preferential boost for certain sites.
> > sitetype has different and fixed boosts for 3 possible values. We turned
> > off Inverse Document Frequency (IDF) for these boosts to work properly.
> > Other text fields are boosted based on search keywords only.
> >
> > With this setup we often see a bunch of hits from a single site followed
> by
> > next etc.,
> > Is there any solution to see results from variety of sites and still keep
> > the preferential boosts in place?
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Tweaking boosts for more search results variety

gadde
Sorry for the delayed response.

Limitations in this scenario where we have 5 million indexed documents from
about only 1000 sites. If results are grouped by site we will not be able
to show more than a couple of pages for lot of search keywords.


Ex: Search for "Solr" has 1000 matches but only from 20 sites.
In these 20 sites
10 sites are of sitetype A - boost 5
7 sites are of sitetype B - boost 2
3 sites are of sitetype C - boost 1

Limitation 1: If these are grouped by site only 20 results would be
displayed in 2 pages (10 per page).

We still want to display all the results. For a better user experience
"Ideally" we would like to have 10 results in page 1  from 10 distinct
sites of sitetype A (which has higher boost already) or In a real world
scenario from 7-8 distinct sites. In our case we see like 7 matches on a
page from a single site.

Limitation 2: Inverse Document frequency (IDF) would have helped here but,
in that case our preferential boost for sitetypes is ignored and some
results from sitetype C would come on top due to IDF boost.

What we want to achieve is any way to control variety of sites displayed in
search results with preferential boost still in place.

Thanks in advance




On Sun, Sep 8, 2013 at 6:36 AM, Furkan KAMACI <[hidden email]>wrote:

> What do you mean with "*these limitations" *Do you want to make multiple
> grouping at same time?
>
>
> 2013/9/6 Sai Gadde <[hidden email]>
>
> > Thank you Jack for the suggestion.
> >
> > We can try group by site. But considering that number of sites are only
> > about 1000 against the index size of 5 million, One can expect most of
> the
> > hits would be hidden and for certain specific keywords only a handful of
> > actual results could be displayed if results are grouped by site.
> >
> > we already group on a signature field to identify duplicate content in
> > these 5 million+ docs. But here the number of duplicates are only about
> > 3-5% maximum.
> >
> > Is there any workaround for these limitations with grouping?
> >
> > Thanks
> > Shyam
> >
> >
> >
> > On Thu, Sep 5, 2013 at 9:16 PM, Jack Krupansky <[hidden email]
> > >wrote:
> >
> > > The grouping (field collapsing) feature somewhat addresses this - group
> > by
> > > a "site" field and then if more than one or a few top pages are from
> the
> > > same site they get grouped or collapsed so that you can see more sites
> > in a
> > > few results.
> > >
> > > See:
> > > http://wiki.apache.org/solr/**FieldCollapsing<
> > http://wiki.apache.org/solr/FieldCollapsing>
> > > https://cwiki.apache.org/**confluence/display/solr/**Result+Grouping<
> > https://cwiki.apache.org/confluence/display/solr/Result+Grouping>
> > >
> > > -- Jack Krupansky
> > >
> > > -----Original Message----- From: Sai Gadde
> > > Sent: Thursday, September 05, 2013 2:27 AM
> > > To: [hidden email]
> > > Subject: Tweaking boosts for more search results variety
> > >
> > >
> > > Our index is aggregated content from various sites on the web. We want
> > good
> > > user experience by showing multiple sites in the search results. In our
> > > setup we are seeing most of the results from same site on the top.
> > >
> > > Here is some information regarding queries and schema
> > >                site - String field. We have about 1000 sites in index
> > >                sitetype - String field.  we have 3 site types
> > > omitNorms="true" for both the fields
> > >
> > > Doc count varies largely based on site and sitetype by a factor of 10 -
> > > 1000 times
> > > Total index size is about 5 million docs.
> > > Solr Version: 4.0
> > >
> > > In our queries we have a fixed and preferential boost for certain
> sites.
> > > sitetype has different and fixed boosts for 3 possible values. We
> turned
> > > off Inverse Document Frequency (IDF) for these boosts to work properly.
> > > Other text fields are boosted based on search keywords only.
> > >
> > > With this setup we often see a bunch of hits from a single site
> followed
> > by
> > > next etc.,
> > > Is there any solution to see results from variety of sites and still
> keep
> > > the preferential boosts in place?
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Tweaking boosts for more search results variety

Marc Sturlese
This is totally deprecated but maybe can be helpful if you want to re-sort some documents
https://issues.apache.org/jira/browse/SOLR-1311
Reply | Threaded
Open this post in threaded view
|

Re: Tweaking boosts for more search results variety

gadde
Perfect. This is exactly what we need!

I wish there is an option for plugin (or) if there is some feature like
this in mainstream Solr release.

Still this is a great resource for us. Thanks Marc for pointing to very
useful information.

Thanks all for the help.




On Tue, Sep 10, 2013 at 5:30 PM, Marc Sturlese <[hidden email]>wrote:

> This is totally deprecated but maybe can be helpful if you want to re-sort
> some documents
> https://issues.apache.org/jira/browse/SOLR-1311
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Tweaking-boosts-for-more-search-results-variety-tp4088302p4089044.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>