is there a way to remove deleted documents from index without optimize

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

is there a way to remove deleted documents from index without optimize

CrazyDiamond
my index is updating frequently and i need to remove unused documents from index after update/reindex.
Optimizaion is very expensive so what should i do?
Reply | Threaded
Open this post in threaded view
|

Re: is there a way to remove deleted documents from index without optimize

Doug Turnbull
Avoid optimize like the plague.

Instead focus on tuning the segment merging process. As you commit index
files, segments are created. But they're periodically merged. Merging
removes remnants of the tombstoned docs.  You can optimize this, tune it,
etc. If you're dealing with a lot of updates, this is something you
definitely want to tune.  See this document, scroll down to the merge
parameters.
https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig

There are other options for dealing with high update speed. You could shard
SolrCloud further and minimize replication. You could put things in Kafka
and work through them as you can, catching if you have any slow time. You
can tune your hard and soft commits to create segments of an appropriate
size, etc.

-Doug



On Tue, Sep 22, 2015 at 9:01 PM, CrazyDiamond <[hidden email]> wrote:

> my index is updating frequently and i need to remove unused documents from
> index after update/reindex.
> Optimizaion is very expensive so what should i do?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/is-there-a-way-to-remove-deleted-documents-from-index-without-optimize-tp4230691.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



--
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections
<http://opensourceconnections.com>, LLC | 240.476.9983
Author: Relevant Search <http://manning.com/turnbull>
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.
Reply | Threaded
Open this post in threaded view
|

Re: is there a way to remove deleted documents from index without optimize

Walter Underwood
In reply to this post by CrazyDiamond
Don’t do anything. Solr will automatically clean up the deleted documents for you.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)


> On Sep 22, 2015, at 6:01 PM, CrazyDiamond <[hidden email]> wrote:
>
> my index is updating frequently and i need to remove unused documents from
> index after update/reindex.
> Optimizaion is very expensive so what should i do?
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-way-to-remove-deleted-documents-from-index-without-optimize-tp4230691.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: is there a way to remove deleted documents from index without optimize

Harry Yoo-2
I should have read this. My project has been running from apache solr 4.x, and moved to 5.x and recently migrated to 6.6.1. Do you think solr will take care of old version indexes as well? I wanted to make sure my indexes are updated with 6.x lucence version so that it will be supported when i move to solr 7.x

Is there any best practice managing solr indexes?

Harry

> On Sep 22, 2015, at 8:21 PM, Walter Underwood <[hidden email]> wrote:
>
> Don’t do anything. Solr will automatically clean up the deleted documents for you.
>
> wunder
> Walter Underwood
> [hidden email]
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Sep 22, 2015, at 6:01 PM, CrazyDiamond <[hidden email]> wrote:
>>
>> my index is updating frequently and i need to remove unused documents from
>> index after update/reindex.
>> Optimizaion is very expensive so what should i do?
>>
>>
>>
>> --
>> View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-way-to-remove-deleted-documents-from-index-without-optimize-tp4230691.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>

Reply | Threaded
Open this post in threaded view
|

Re: is there a way to remove deleted documents from index without optimize

Erick Erickson
You can use the IndexUpgradeTool that ships with each version of Solr
(well, actually Lucene) to, well, upgrade your index. So you can use
the IndexUpgradeTool that ships with 5x to upgrade from 4x. And the
one that ships with 6x to upgrade from 5x. etc.

That said, none of that is necessary _if_ you
> have the Lucene version in solrconfig.xml be the one that corresponds to your current Solr. I.e. a solrconfig for 6x should have a luceneMatchVersion of 6something.
> you update your index enough to rewrite all segments before moving to the _next_ version. When Lucene sees merges a segment, it writes the new segment according to the luceneMatchVersion in solrconfig.xml. So as long as you are on a version long enough for all segments to be merged into new segments, you don't have to worry.

Best,
Erick

On Thu, Oct 12, 2017 at 8:29 PM, Harry Yoo <[hidden email]> wrote:

> I should have read this. My project has been running from apache solr 4.x, and moved to 5.x and recently migrated to 6.6.1. Do you think solr will take care of old version indexes as well? I wanted to make sure my indexes are updated with 6.x lucence version so that it will be supported when i move to solr 7.x
>
> Is there any best practice managing solr indexes?
>
> Harry
>
>> On Sep 22, 2015, at 8:21 PM, Walter Underwood <[hidden email]> wrote:
>>
>> Don’t do anything. Solr will automatically clean up the deleted documents for you.
>>
>> wunder
>> Walter Underwood
>> [hidden email]
>> http://observer.wunderwood.org/  (my blog)
>>
>>
>>> On Sep 22, 2015, at 6:01 PM, CrazyDiamond <[hidden email]> wrote:
>>>
>>> my index is updating frequently and i need to remove unused documents from
>>> index after update/reindex.
>>> Optimizaion is very expensive so what should i do?
>>>
>>>
>>>
>>> --
>>> View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-way-to-remove-deleted-documents-from-index-without-optimize-tp4230691.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: is there a way to remove deleted documents from index without optimize

Harry Yoo-2
Thanks for the clarification.

I use

<config>
        <luceneMatchVersion>${lucene.version}</luceneMatchVersion>

in the solrconfig.xml  and pass -Dlucene.version when I launch solr, to keep the versions.



> On Oct 12, 2017, at 11:01 PM, Erick Erickson <[hidden email]> wrote:
>
> You can use the IndexUpgradeTool that ships with each version of Solr
> (well, actually Lucene) to, well, upgrade your index. So you can use
> the IndexUpgradeTool that ships with 5x to upgrade from 4x. And the
> one that ships with 6x to upgrade from 5x. etc.
>
> That said, none of that is necessary _if_ you
>> have the Lucene version in solrconfig.xml be the one that corresponds to your current Solr. I.e. a solrconfig for 6x should have a luceneMatchVersion of 6something.
>> you update your index enough to rewrite all segments before moving to the _next_ version. When Lucene sees merges a segment, it writes the new segment according to the luceneMatchVersion in solrconfig.xml. So as long as you are on a version long enough for all segments to be merged into new segments, you don't have to worry.
>
> Best,
> Erick
>
> On Thu, Oct 12, 2017 at 8:29 PM, Harry Yoo <[hidden email]> wrote:
>> I should have read this. My project has been running from apache solr 4.x, and moved to 5.x and recently migrated to 6.6.1. Do you think solr will take care of old version indexes as well? I wanted to make sure my indexes are updated with 6.x lucence version so that it will be supported when i move to solr 7.x
>>
>> Is there any best practice managing solr indexes?
>>
>> Harry
>>
>>> On Sep 22, 2015, at 8:21 PM, Walter Underwood <[hidden email]> wrote:
>>>
>>> Don’t do anything. Solr will automatically clean up the deleted documents for you.
>>>
>>> wunder
>>> Walter Underwood
>>> [hidden email]
>>> http://observer.wunderwood.org/  (my blog)
>>>
>>>
>>>> On Sep 22, 2015, at 6:01 PM, CrazyDiamond <[hidden email]> wrote:
>>>>
>>>> my index is updating frequently and i need to remove unused documents from
>>>> index after update/reindex.
>>>> Optimizaion is very expensive so what should i do?
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-way-to-remove-deleted-documents-from-index-without-optimize-tp4230691.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: is there a way to remove deleted documents from index without optimize

Shawn Heisey-2
In reply to this post by Erick Erickson
On 10/12/2017 10:01 PM, Erick Erickson wrote:
> You can use the IndexUpgradeTool that ships with each version of Solr
> (well, actually Lucene) to, well, upgrade your index. So you can use
> the IndexUpgradeTool that ships with 5x to upgrade from 4x. And the
> one that ships with 6x to upgrade from 5x. etc.
>
> That said, none of that is necessary _if_ you
>> have the Lucene version in solrconfig.xml be the one that corresponds to your current Solr. I.e. a solrconfig for 6x should have a luceneMatchVersion of 6something.
>> you update your index enough to rewrite all segments before moving to the _next_ version. When Lucene sees merges a segment, it writes the new segment according to the luceneMatchVersion in solrconfig.xml. So as long as you are on a version long enough for all segments to be merged into new segments, you don't have to worry.

As far as I am aware, luceneMatchVersion in Solr will not change the
segment format, but only how some Lucene components (primarily analysis)
function.  Have I got incorrect information?

Something else that might be worth mentioning:  The IndexUpgrader is an
fairly simple piece of code.  It runs forceMerge (optimize) on the
index, creating a single new segment from the entire existing index. 
That ties into this thread's initial subject and LUCENE-7976.  I wonder
if perhaps the upgrade merge policy should be changed so that it just
rewrites all existing segments instead of fully merging them.

Thanks,
Shawn