[jira] Created: (SOLR-1537) Dedupe Sharded Search Results by Shard Order or Score

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (SOLR-1537) Dedupe Sharded Search Results by Shard Order or Score

Tim Allison (Jira)
Dedupe Sharded Search Results by Shard Order or Score
-----------------------------------------------------

                 Key: SOLR-1537
                 URL: https://issues.apache.org/jira/browse/SOLR-1537
             Project: Solr
          Issue Type: Improvement
          Components: search
    Affects Versions: 1.4, 1.5
         Environment: All
            Reporter: Dennis Kubes
             Fix For: 1.4, 1.5


Allows sharded search results to dedupe results by ID based on either the order of the shards in the shards param or by score.  Allows the result returned to be deterministic.  If by shards then shards that appear first in the shards param have a higher precedence than shards that appear later.  If by score then higher scores beat out lower scores.  This doesn't allow multiple duplicates because currently SOLR only permits a single result by ID to be returned.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-1537) Dedupe Sharded Search Results by Shard Order or Score

Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dennis Kubes updated SOLR-1537:
-------------------------------

    Attachment: solr-dedupe-20091031.patch

Basic patch.  No unit tests.  Gives dedupe functionality for shards based on either shard order in the shard param or by score.

> Dedupe Sharded Search Results by Shard Order or Score
> -----------------------------------------------------
>
>                 Key: SOLR-1537
>                 URL: https://issues.apache.org/jira/browse/SOLR-1537
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 1.4, 1.5
>         Environment: All
>            Reporter: Dennis Kubes
>             Fix For: 1.4, 1.5
>
>         Attachments: solr-dedupe-20091031.patch
>
>
> Allows sharded search results to dedupe results by ID based on either the order of the shards in the shards param or by score.  Allows the result returned to be deterministic.  If by shards then shards that appear first in the shards param have a higher precedence than shards that appear later.  If by score then higher scores beat out lower scores.  This doesn't allow multiple duplicates because currently SOLR only permits a single result by ID to be returned.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-1537) Dedupe Sharded Search Results by Shard Order or Score

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dennis Kubes updated SOLR-1537:
-------------------------------

    Attachment: solr-dedupe-20091031-2.patch

Updated patch.  Had to replace the use of the TreeSet for on the fly document queuing with a two pass HashSet and Java 5 PriorityQueue.  This was to allow comparably equal documents (i.e. documents with the same score).

> Dedupe Sharded Search Results by Shard Order or Score
> -----------------------------------------------------
>
>                 Key: SOLR-1537
>                 URL: https://issues.apache.org/jira/browse/SOLR-1537
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 1.4, 1.5
>         Environment: All
>            Reporter: Dennis Kubes
>             Fix For: 1.4, 1.5
>
>         Attachments: solr-dedupe-20091031-2.patch, solr-dedupe-20091031.patch
>
>
> Allows sharded search results to dedupe results by ID based on either the order of the shards in the shards param or by score.  Allows the result returned to be deterministic.  If by shards then shards that appear first in the shards param have a higher precedence than shards that appear later.  If by score then higher scores beat out lower scores.  This doesn't allow multiple duplicates because currently SOLR only permits a single result by ID to be returned.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-1537) Dedupe Sharded Search Results by Shard Order or Score

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar updated SOLR-1537:
----------------------------------------

    Fix Version/s:     (was: 1.4)

> Dedupe Sharded Search Results by Shard Order or Score
> -----------------------------------------------------
>
>                 Key: SOLR-1537
>                 URL: https://issues.apache.org/jira/browse/SOLR-1537
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 1.4, 1.5
>         Environment: All
>            Reporter: Dennis Kubes
>             Fix For: 1.5
>
>         Attachments: solr-dedupe-20091031-2.patch, solr-dedupe-20091031.patch
>
>
> Allows sharded search results to dedupe results by ID based on either the order of the shards in the shards param or by score.  Allows the result returned to be deterministic.  If by shards then shards that appear first in the shards param have a higher precedence than shards that appear later.  If by score then higher scores beat out lower scores.  This doesn't allow multiple duplicates because currently SOLR only permits a single result by ID to be returned.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-1537) Dedupe Sharded Search Results by Shard Order or Score

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774053#action_12774053 ]

Otis Gospodnetic commented on SOLR-1537:
----------------------------------------

The "ID" here being the uniqueKey?  i.e. the use case is the removal of dupes when the same document is indexed in multiple shards and more than 1 shard return that document in the result set?


> Dedupe Sharded Search Results by Shard Order or Score
> -----------------------------------------------------
>
>                 Key: SOLR-1537
>                 URL: https://issues.apache.org/jira/browse/SOLR-1537
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 1.4, 1.5
>         Environment: All
>            Reporter: Dennis Kubes
>             Fix For: 1.5
>
>         Attachments: solr-dedupe-20091031-2.patch, solr-dedupe-20091031.patch
>
>
> Allows sharded search results to dedupe results by ID based on either the order of the shards in the shards param or by score.  Allows the result returned to be deterministic.  If by shards then shards that appear first in the shards param have a higher precedence than shards that appear later.  If by score then higher scores beat out lower scores.  This doesn't allow multiple duplicates because currently SOLR only permits a single result by ID to be returned.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-1537) Dedupe Sharded Search Results by Shard Order or Score

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774122#action_12774122 ]

Dennis Kubes commented on SOLR-1537:
------------------------------------

That is correct.  Dupes is when more than one shard returns a values for the same uniqueKey.  Removal of dupes is by uniqueKey deterministically by either order of shards or by highest score.  Before there was no way to determine which dupe would show up because it was based on whichever shard returned first from the query broadcast to multiple shards.  In other words the fastest responding shard would give the first uniqueKey value and the rest with that uniqueKey would be ignored.  Fastest though could change between query requests.

> Dedupe Sharded Search Results by Shard Order or Score
> -----------------------------------------------------
>
>                 Key: SOLR-1537
>                 URL: https://issues.apache.org/jira/browse/SOLR-1537
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 1.4, 1.5
>         Environment: All
>            Reporter: Dennis Kubes
>             Fix For: 1.5
>
>         Attachments: solr-dedupe-20091031-2.patch, solr-dedupe-20091031.patch
>
>
> Allows sharded search results to dedupe results by ID based on either the order of the shards in the shards param or by score.  Allows the result returned to be deterministic.  If by shards then shards that appear first in the shards param have a higher precedence than shards that appear later.  If by score then higher scores beat out lower scores.  This doesn't allow multiple duplicates because currently SOLR only permits a single result by ID to be returned.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-1537) Dedupe Sharded Search Results by Shard Order or Score

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dennis Kubes updated SOLR-1537:
-------------------------------

    Attachment: solr-dedupe-20091106-3.patch

Fixes small issue with numFound count being double.

> Dedupe Sharded Search Results by Shard Order or Score
> -----------------------------------------------------
>
>                 Key: SOLR-1537
>                 URL: https://issues.apache.org/jira/browse/SOLR-1537
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 1.4, 1.5
>         Environment: All
>            Reporter: Dennis Kubes
>             Fix For: 1.5
>
>         Attachments: solr-dedupe-20091031-2.patch, solr-dedupe-20091031.patch, solr-dedupe-20091106-3.patch
>
>
> Allows sharded search results to dedupe results by ID based on either the order of the shards in the shards param or by score.  Allows the result returned to be deterministic.  If by shards then shards that appear first in the shards param have a higher precedence than shards that appear later.  If by score then higher scores beat out lower scores.  This doesn't allow multiple duplicates because currently SOLR only permits a single result by ID to be returned.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-1537) Dedupe Sharded Search Results by Shard Order or Score

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dennis Kubes updated SOLR-1537:
-------------------------------

    Attachment: SOLR-1537-20091126-4.patch

Final patch.  This incorporates an updated version of SOLR-1143, allowing the return of partial search results.  This patch fixes bugs in the number of results returned, sorting order, errors on edge conditions, among others.  This patch also supercedes SOLR-1143 bringing all unit tests up to date and adding enhanced functionality to allow returning partial results when servers names are mispelled or there are other errors besides simple connection errors.  Headers have been added to show the number of shards failing and the names of those shards.  Unit test have been added to demonstrate dedup of search results by shard order.  This patch passes all current unit tests.

> Dedupe Sharded Search Results by Shard Order or Score
> -----------------------------------------------------
>
>                 Key: SOLR-1537
>                 URL: https://issues.apache.org/jira/browse/SOLR-1537
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 1.4, 1.5
>         Environment: All
>            Reporter: Dennis Kubes
>             Fix For: 1.5
>
>         Attachments: SOLR-1537-20091126-4.patch, solr-dedupe-20091031-2.patch, solr-dedupe-20091031.patch, solr-dedupe-20091106-3.patch
>
>
> Allows sharded search results to dedupe results by ID based on either the order of the shards in the shards param or by score.  Allows the result returned to be deterministic.  If by shards then shards that appear first in the shards param have a higher precedence than shards that appear later.  If by score then higher scores beat out lower scores.  This doesn't allow multiple duplicates because currently SOLR only permits a single result by ID to be returned.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.