8.5.1 LogReplayer extremely slow

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

8.5.1 LogReplayer extremely slow

Markus Jelsma-2
Hello,

Our main Solr text search collection broke down last night (search was still working fine), every indexing action timed out with the Solr master spending most of its time in Java regex. One shard has only one replica left for queries and it stays like that. I have copied both shard's leader to local to see what is going on.

One shard is fine but the other has a replica with has about 600MB of data to replay and it is extremely slow. Using the VisualVM sampler i find that the replayer is also spending almost all time in dealing with Java regex (stack trace below). Is this to be expected? And what is it actually doing? Where do the TokenFilters come from?

I had a old but clean collection on the same cluster and started indexing to it to see what is going on but it too timed out due to Java regex. This is weird, because locally i have no problem indexing a million records in a 8.5.1 collection, and the broken down cluster has been running fine for over a month.

A note, this index uses PreAnalyzedField, so i would expect no analysis or whatsoever, certainly no regex.

Thanks,
Markus

"replayUpdatesExecutor-3-thread-1-processing-n:127.0.1.1:8983_solr x:sitesearch_shard2_replica_t2 c:sitesearch s:shard2 r:core_node4" #222 prio=5 os_prio=0 cpu=239207,44ms elapsed=239,50s tid=0x00007ffde0057000 nid=0x24f5 runnable  [0x00007ffeedd0f000]
   java.lang.Thread.State: RUNNABLE
        at java.util.regex.Pattern$GroupTail.match(java.base@11.0.7/Pattern.java:4863)
        at java.util.regex.Pattern$CharPropertyGreedy.match(java.base@11.0.7/Pattern.java:4306)
        at java.util.regex.Pattern$GroupHead.match(java.base@11.0.7/Pattern.java:4804)
        at java.util.regex.Pattern$CharPropertyGreedy.match(java.base@11.0.7/Pattern.java:4306)
        at java.util.regex.Pattern$Start.match(java.base@11.0.7/Pattern.java:3619)
        at java.util.regex.Matcher.search(java.base@11.0.7/Matcher.java:1729)
        at java.util.regex.Matcher.find(java.base@11.0.7/Matcher.java:746)
        at org.apache.lucene.analysis.pattern.PatternReplaceFilter.incrementToken(PatternReplaceFilter.java:71)
        at org.apache.lucene.analysis.miscellaneous.TrimFilter.incrementToken(TrimFilter.java:42)
        at org.apache.lucene.analysis.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:49)
        at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:812)
        at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:442)
        at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:406)
        at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:250)
        at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:495)
        at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)
        at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586)
        at org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:979)
        at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:345)
        at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:292)
        at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:239)
        at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:76)
        at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
        at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:259)
        at org.apache.solr.update.processor.DistributedUpdateProcessor.doVersionAdd(DistributedUpdateProcessor.java:489)
        at org.apache.solr.update.processor.DistributedUpdateProcessor.lambda$versionAdd$0(DistributedUpdateProcessor.java:339)
        at org.apache.solr.update.processor.DistributedUpdateProcessor$$Lambda$631/0x0000000840670c40.apply(Unknown Source)
        at org.apache.solr.update.VersionBucket.runWithLock(VersionBucket.java:50)
        - locked <0x00000000a7df5620> (a org.apache.solr.update.VersionBucket)
        at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:339)
        at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:225)
        at org.apache.solr.update.processor.DistributedZkUpdateProcessor.processAdd(DistributedZkUpdateProcessor.java:245)
        at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
        at org.apache.solr.update.UpdateLog$LogReplayer.lambda$execute$1(UpdateLog.java:2025)
        at org.apache.solr.update.UpdateLog$LogReplayer$$Lambda$629/0x0000000840672c40.run(Unknown Source)
Reply | Threaded
Open this post in threaded view
|

RE: 8.5.1 LogReplayer extremely slow

Markus Jelsma-2
I found the bastard, it was a freaky document that skrewed Solr over, indexing kept failing, passing documents between replica's times out, documents get reindexed and so the document (and others) end up in the transaction log (many times) and are eligible for reindexing. Reindexing and replaying of the transaction log both fail on that specific document. Recovery was also not possible due to time outs.

Although the original document [1] is a mess, Solr should have no difficulties ingesting it [2]. Any ideas what is going on? Ticket, if so, about what exactly? For the record, this is PreAnalyzed.

Many thanks,
Markus

[1] https://pastebin.com/1NqBdYCM
[2] https://www.openindex.io/export/do_not_index.xml

-----Original message-----

> From:Markus Jelsma <[hidden email]>
> Sent: Monday 11th May 2020 18:43
> To: solr-user <[hidden email]>
> Subject: 8.5.1 LogReplayer extremely slow
>
> Hello,
>
> Our main Solr text search collection broke down last night (search was still working fine), every indexing action timed out with the Solr master spending most of its time in Java regex. One shard has only one replica left for queries and it stays like that. I have copied both shard's leader to local to see what is going on.
>
> One shard is fine but the other has a replica with has about 600MB of data to replay and it is extremely slow. Using the VisualVM sampler i find that the replayer is also spending almost all time in dealing with Java regex (stack trace below). Is this to be expected? And what is it actually doing? Where do the TokenFilters come from?
>
> I had a old but clean collection on the same cluster and started indexing to it to see what is going on but it too timed out due to Java regex. This is weird, because locally i have no problem indexing a million records in a 8.5.1 collection, and the broken down cluster has been running fine for over a month.
>
> A note, this index uses PreAnalyzedField, so i would expect no analysis or whatsoever, certainly no regex.
>
> Thanks,
> Markus
>
> "replayUpdatesExecutor-3-thread-1-processing-n:127.0.1.1:8983_solr x:sitesearch_shard2_replica_t2 c:sitesearch s:shard2 r:core_node4" #222 prio=5 os_prio=0 cpu=239207,44ms elapsed=239,50s tid=0x00007ffde0057000 nid=0x24f5 runnable  [0x00007ffeedd0f000]
>    java.lang.Thread.State: RUNNABLE
>         at java.util.regex.Pattern$GroupTail.match(java.base@11.0.7/Pattern.java:4863)
>         at java.util.regex.Pattern$CharPropertyGreedy.match(java.base@11.0.7/Pattern.java:4306)
>         at java.util.regex.Pattern$GroupHead.match(java.base@11.0.7/Pattern.java:4804)
>         at java.util.regex.Pattern$CharPropertyGreedy.match(java.base@11.0.7/Pattern.java:4306)
>         at java.util.regex.Pattern$Start.match(java.base@11.0.7/Pattern.java:3619)
>         at java.util.regex.Matcher.search(java.base@11.0.7/Matcher.java:1729)
>         at java.util.regex.Matcher.find(java.base@11.0.7/Matcher.java:746)
>         at org.apache.lucene.analysis.pattern.PatternReplaceFilter.incrementToken(PatternReplaceFilter.java:71)
>         at org.apache.lucene.analysis.miscellaneous.TrimFilter.incrementToken(TrimFilter.java:42)
>         at org.apache.lucene.analysis.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:49)
>         at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:812)
>         at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:442)
>         at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:406)
>         at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:250)
>         at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:495)
>         at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)
>         at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586)
>         at org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:979)
>         at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:345)
>         at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:292)
>         at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:239)
>         at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:76)
>         at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
>         at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:259)
>         at org.apache.solr.update.processor.DistributedUpdateProcessor.doVersionAdd(DistributedUpdateProcessor.java:489)
>         at org.apache.solr.update.processor.DistributedUpdateProcessor.lambda$versionAdd$0(DistributedUpdateProcessor.java:339)
>         at org.apache.solr.update.processor.DistributedUpdateProcessor$$Lambda$631/0x0000000840670c40.apply(Unknown Source)
>         at org.apache.solr.update.VersionBucket.runWithLock(VersionBucket.java:50)
>         - locked <0x00000000a7df5620> (a org.apache.solr.update.VersionBucket)
>         at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:339)
>         at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:225)
>         at org.apache.solr.update.processor.DistributedZkUpdateProcessor.processAdd(DistributedZkUpdateProcessor.java:245)
>         at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
>         at org.apache.solr.update.UpdateLog$LogReplayer.lambda$execute$1(UpdateLog.java:2025)
>         at org.apache.solr.update.UpdateLog$LogReplayer$$Lambda$629/0x0000000840672c40.run(Unknown Source)
>
Reply | Threaded
Open this post in threaded view
|

Re: 8.5.1 LogReplayer extremely slow

ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Tue, May 12, 2020 at 6:23 AM Markus Jelsma
<[hidden email]> wrote:

>
> I found the bastard, it was a freaky document that skrewed Solr over, indexing kept failing, passing documents between replica's times out, documents get reindexed and so the document (and others) end up in the transaction log (many times) and are eligible for reindexing. Reindexing and replaying of the transaction log both fail on that specific document. Recovery was also not possible due to time outs.
>
> Although the original document [1] is a mess, Solr should have no difficulties ingesting it [2]. Any ideas what is going on? Ticket, if so, about what exactly? For the record, this is PreAnalyzed.
>
> Many thanks,
> Markus
>
> [1] https://pastebin.com/1NqBdYCM
> [2] https://www.openindex.io/export/do_not_index.xml
>
> -----Original message-----
> > From:Markus Jelsma <[hidden email]>
> > Sent: Monday 11th May 2020 18:43
> > To: solr-user <[hidden email]>
> > Subject: 8.5.1 LogReplayer extremely slow
> >
> > Hello,
> >
> > Our main Solr text search collection broke down last night (search was still working fine), every indexing action timed out with the Solr master spending most of its time in Java regex. One shard has only one replica left for queries and it stays like that. I have copied both shard's leader to local to see what is going on.
> >
> > One shard is fine but the other has a replica with has about 600MB of data to replay and it is extremely slow. Using the VisualVM sampler i find that the replayer is also spending almost all time in dealing with Java regex (stack trace below). Is this to be expected? And what is it actually doing? Where do the TokenFilters come from?
> >
> > I had a old but clean collection on the same cluster and started indexing to it to see what is going on but it too timed out due to Java regex. This is weird, because locally i have no problem indexing a million records in a 8.5.1 collection, and the broken down cluster has been running fine for over a month.
> >
> > A note, this index uses PreAnalyzedField, so i would expect no analysis or whatsoever, certainly no regex.
> >
> > Thanks,
> > Markus
> >
> > "replayUpdatesExecutor-3-thread-1-processing-n:127.0.1.1:8983_solr x:sitesearch_shard2_replica_t2 c:sitesearch s:shard2 r:core_node4" #222 prio=5 os_prio=0 cpu=239207,44ms elapsed=239,50s tid=0x00007ffde0057000 nid=0x24f5 runnable  [0x00007ffeedd0f000]
> >    java.lang.Thread.State: RUNNABLE
> >         at java.util.regex.Pattern$GroupTail.match(java.base@11.0.7/Pattern.java:4863)
> >         at java.util.regex.Pattern$CharPropertyGreedy.match(java.base@11.0.7/Pattern.java:4306)
> >         at java.util.regex.Pattern$GroupHead.match(java.base@11.0.7/Pattern.java:4804)
> >         at java.util.regex.Pattern$CharPropertyGreedy.match(java.base@11.0.7/Pattern.java:4306)
> >         at java.util.regex.Pattern$Start.match(java.base@11.0.7/Pattern.java:3619)
> >         at java.util.regex.Matcher.search(java.base@11.0.7/Matcher.java:1729)
> >         at java.util.regex.Matcher.find(java.base@11.0.7/Matcher.java:746)
> >         at org.apache.lucene.analysis.pattern.PatternReplaceFilter.incrementToken(PatternReplaceFilter.java:71)
> >         at org.apache.lucene.analysis.miscellaneous.TrimFilter.incrementToken(TrimFilter.java:42)
> >         at org.apache.lucene.analysis.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:49)
> >         at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:812)
> >         at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:442)
> >         at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:406)
> >         at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:250)
> >         at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:495)
> >         at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)
> >         at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586)
> >         at org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:979)
> >         at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:345)
> >         at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:292)
> >         at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:239)
> >         at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:76)
> >         at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
> >         at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:259)
> >         at org.apache.solr.update.processor.DistributedUpdateProcessor.doVersionAdd(DistributedUpdateProcessor.java:489)
> >         at org.apache.solr.update.processor.DistributedUpdateProcessor.lambda$versionAdd$0(DistributedUpdateProcessor.java:339)
> >         at org.apache.solr.update.processor.DistributedUpdateProcessor$$Lambda$631/0x0000000840670c40.apply(Unknown Source)
> >         at org.apache.solr.update.VersionBucket.runWithLock(VersionBucket.java:50)
> >         - locked <0x00000000a7df5620> (a org.apache.solr.update.VersionBucket)
> >         at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:339)
> >         at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:225)
> >         at org.apache.solr.update.processor.DistributedZkUpdateProcessor.processAdd(DistributedZkUpdateProcessor.java:245)
> >         at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
> >         at org.apache.solr.update.UpdateLog$LogReplayer.lambda$execute$1(UpdateLog.java:2025)
> >         at org.apache.solr.update.UpdateLog$LogReplayer$$Lambda$629/0x0000000840672c40.run(Unknown Source)
> >