can anybody give some suggest about this elasticsearch shard failed problem? thanks

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

can anybody give some suggest about this elasticsearch shard failed problem? thanks

喜之郎
Elasticsearch version (bin/elasticsearch --version): 5.1.1

Plugins installed: [] no

JVM version (java -version): 1.8.0_77

OS version (uname -a if on a Unix-like system): CentOS Linux release 7.2.1511 (Core)

Description of the problem including expected versus actual behavior:
when using update api ,highly concurrency , primary shard and replication shard all failed.
And this happened many times in 2 machines. So I think tihs is a bug.

Provide logs (if relevant):

[2018-04-27T12:02:22,797][WARN ][o.e.c.a.s.ShardStateAction] [172.20.3.2] [analytics_profile_12014][7] received shard failed for shard id [[analytics_profile_12014][7]], allocation id [xIEoF3JaTLWQz6X2KxMWRA], primary term [0], message [shard failure, reason [refresh failed]], failure [EOFException[read past EOF: MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/7/index/_fqb1.cfe")]]
java.io.EOFException: read past EOF: MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/7/index/_fqb1.cfe")
Suppressed: org.apache.lucene.index.CorruptIndexException: checksum status indeterminate: remaining=0, please run checkindex for more details (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/7/index/_fqb1.cfe")))
org.elasticsearch.action.FailedNodeException: Failed node [BbfFMNRpRvW5p8LDs3rquQ]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleException(TransportNodesAction.java:219) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:984) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.transport.TcpTransport.lambda$handleException$17(TcpTransport.java:1314) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.transport.TcpTransport.handleException(TcpTransport.java:1312) [elasticsearch-5.1.1.jar:5.1.1]
Caused by: org.elasticsearch.transport.RemoteTransportException: [172.20.3.2_1][172.20.3.2:9301][internal:cluster/nodes/indices/shard/store[n]]
Caused by: org.elasticsearch.ElasticsearchException: Failed to list store metadata for shard [[analytics_action_12014_201804][15]]
Caused by: org.apache.lucene.index.CorruptIndexException: failed engine (reason: [corrupt file (source: [index])]) (resource=preexisting_corruption)
Caused by: java.io.IOException: failed engine (reason: [corrupt file (source: [index])])
Caused by: org.apache.lucene.index.CorruptIndexException: compound sub-files must have a valid codec header and footer: file is too small (0 bytes) (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/mnt/esdata1/nodes/0/indices/cBECbko7SMKP3oXsTGi_kg/15/index/_2kqi.fdx")))
[2018-04-27T12:02:22,800][WARN ][o.e.c.a.s.ShardStateAction] [172.20.3.2] [analytics_profile_12014][18] received shard failed for shard id [[analytics_profile_12014][18]], allocation id [7TieFxLRRZ-28uOsPFr1yQ], primary term [0], message [shard failure, reason [refresh failed]], failure [EOFException[read past EOF: MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/18/index/_fhe2.cfe")]]
java.io.EOFException: read past EOF: MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/18/index/_fhe2.cfe")
Suppressed: org.apache.lucene.index.CorruptIndexException: checksum status indeterminate: remaining=0, please run checkindex for more details (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/18/index/_fhe2.cfe")))
Reply | Threaded
Open this post in threaded view
|

回复:can anybody give some suggest about this elasticsearch shard failed problem? thanks

喜之郎
lucene version is 6.3.0
filesystem is xfs.
And this always happen at 00:02 06:02 12:02 18:02,
it's very strange




------------------ 原始邮件 ------------------
发件人: "251922566"<[hidden email]>;
发送时间: 2018年6月5日(星期二) 晚上6:50
收件人: "java-user"<[hidden email]>;

主题: can anybody give some suggest about this elasticsearch shard failed problem? thanks




Elasticsearch version (bin/elasticsearch --version): 5.1.1

Plugins installed: [] no

JVM version (java -version): 1.8.0_77

OS version (uname -a if on a Unix-like system): CentOS Linux release 7.2.1511 (Core)

Description of the problem including expected versus actual behavior:
when using update api ,highly concurrency , primary shard and replication shard all failed.
And this happened many times in 2 machines. So I think tihs is a bug.

Provide logs (if relevant):

[2018-04-27T12:02:22,797][WARN ][o.e.c.a.s.ShardStateAction] [172.20.3.2] [analytics_profile_12014][7] received shard failed for shard id [[analytics_profile_12014][7]], allocation id [xIEoF3JaTLWQz6X2KxMWRA], primary term [0], message [shard failure, reason [refresh failed]], failure [EOFException[read past EOF: MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/7/index/_fqb1.cfe")]]
java.io.EOFException: read past EOF: MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/7/index/_fqb1.cfe")
Suppressed: org.apache.lucene.index.CorruptIndexException: checksum status indeterminate: remaining=0, please run checkindex for more details (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/7/index/_fqb1.cfe")))
org.elasticsearch.action.FailedNodeException: Failed node [BbfFMNRpRvW5p8LDs3rquQ]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleException(TransportNodesAction.java:219) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:984) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.transport.TcpTransport.lambda$handleException$17(TcpTransport.java:1314) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.transport.TcpTransport.handleException(TcpTransport.java:1312) [elasticsearch-5.1.1.jar:5.1.1]
Caused by: org.elasticsearch.transport.RemoteTransportException: [172.20.3.2_1][172.20.3.2:9301][internal:cluster/nodes/indices/shard/store[n]]
Caused by: org.elasticsearch.ElasticsearchException: Failed to list store metadata for shard [[analytics_action_12014_201804][15]]
Caused by: org.apache.lucene.index.CorruptIndexException: failed engine (reason: [corrupt file (source: [index])]) (resource=preexisting_corruption)
Caused by: java.io.IOException: failed engine (reason: [corrupt file (source: [index])])
Caused by: org.apache.lucene.index.CorruptIndexException: compound sub-files must have a valid codec header and footer: file is too small (0 bytes) (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/mnt/esdata1/nodes/0/indices/cBECbko7SMKP3oXsTGi_kg/15/index/_2kqi.fdx")))
[2018-04-27T12:02:22,800][WARN ][o.e.c.a.s.ShardStateAction] [172.20.3.2] [analytics_profile_12014][18] received shard failed for shard id [[analytics_profile_12014][18]], allocation id [7TieFxLRRZ-28uOsPFr1yQ], primary term [0], message [shard failure, reason [refresh failed]], failure [EOFException[read past EOF: MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/18/index/_fhe2.cfe")]]
java.io.EOFException: read past EOF: MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/18/index/_fhe2.cfe")
Suppressed: org.apache.lucene.index.CorruptIndexException: checksum status indeterminate: remaining=0, please run checkindex for more details (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/18/index/_fhe2.cfe")))
Reply | Threaded
Open this post in threaded view
|

Re: can anybody give some suggest about this elasticsearch shard failed problem? thanks

Adrien Grand
The fact that it happens when writing compound files is very suspicious
since there should be little time between when the original files are
written and when they are merged into a compound file. Is it a remote
filesystem? Do you have cron jobs that run every 6 hours?

Le mar. 5 juin 2018 à 16:43, 喜之郎 <[hidden email]> a écrit :

> lucene version is 6.3.0
> filesystem is xfs.
> And this always happen at 00:02 06:02 12:02 18:02,
> it's very strange
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "251922566"<[hidden email]>;
> 发送时间: 2018年6月5日(星期二) 晚上6:50
> 收件人: "java-user"<[hidden email]>;
>
> 主题: can anybody give some suggest about this elasticsearch shard failed
> problem? thanks
>
>
>
>
> Elasticsearch version (bin/elasticsearch --version): 5.1.1
>
> Plugins installed: [] no
>
> JVM version (java -version): 1.8.0_77
>
> OS version (uname -a if on a Unix-like system): CentOS Linux release
> 7.2.1511 (Core)
>
> Description of the problem including expected versus actual behavior:
> when using update api ,highly concurrency , primary shard and replication
> shard all failed.
> And this happened many times in 2 machines. So I think tihs is a bug.
>
> Provide logs (if relevant):
>
> [2018-04-27T12:02:22,797][WARN ][o.e.c.a.s.ShardStateAction] [172.20.3.2]
> [analytics_profile_12014][7] received shard failed for shard id
> [[analytics_profile_12014][7]], allocation id [xIEoF3JaTLWQz6X2KxMWRA],
> primary term [0], message [shard failure, reason [refresh failed]], failure
> [EOFException[read past EOF:
> MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/7/index/_fqb1.cfe")]]
> java.io.EOFException: read past EOF:
> MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/7/index/_fqb1.cfe")
> Suppressed: org.apache.lucene.index.CorruptIndexException: checksum status
> indeterminate: remaining=0, please run checkindex for more details
> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/7/index/_fqb1.cfe")))
> org.elasticsearch.action.FailedNodeException: Failed node
> [BbfFMNRpRvW5p8LDs3rquQ]
> at
> org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleException(TransportNodesAction.java:219)
> ~[elasticsearch-5.1.1.jar:5.1.1]
> at
> org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:984)
> ~[elasticsearch-5.1.1.jar:5.1.1]
> at
> org.elasticsearch.transport.TcpTransport.lambda$handleException$17(TcpTransport.java:1314)
> ~[elasticsearch-5.1.1.jar:5.1.1]
> at
> org.elasticsearch.transport.TcpTransport.handleException(TcpTransport.java:1312)
> [elasticsearch-5.1.1.jar:5.1.1]
> Caused by: org.elasticsearch.transport.RemoteTransportException:
> [172.20.3.2_1][172.20.3.2:9301
> ][internal:cluster/nodes/indices/shard/store[n]]
> Caused by: org.elasticsearch.ElasticsearchException: Failed to list store
> metadata for shard [[analytics_action_12014_201804][15]]
> Caused by: org.apache.lucene.index.CorruptIndexException: failed engine
> (reason: [corrupt file (source: [index])]) (resource=preexisting_corruption)
> Caused by: java.io.IOException: failed engine (reason: [corrupt file
> (source: [index])])
> Caused by: org.apache.lucene.index.CorruptIndexException: compound
> sub-files must have a valid codec header and footer: file is too small (0
> bytes)
> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/mnt/esdata1/nodes/0/indices/cBECbko7SMKP3oXsTGi_kg/15/index/_2kqi.fdx")))
> [2018-04-27T12:02:22,800][WARN ][o.e.c.a.s.ShardStateAction] [172.20.3.2]
> [analytics_profile_12014][18] received shard failed for shard id
> [[analytics_profile_12014][18]], allocation id [7TieFxLRRZ-28uOsPFr1yQ],
> primary term [0], message [shard failure, reason [refresh failed]], failure
> [EOFException[read past EOF:
> MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/18/index/_fhe2.cfe")]]
> java.io.EOFException: read past EOF:
> MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/18/index/_fhe2.cfe")
> Suppressed: org.apache.lucene.index.CorruptIndexException: checksum status
> indeterminate: remaining=0, please run checkindex for more details
> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/18/index/_fhe2.cfe")))