kengine コンテナが再起動を繰り返す

クラスタ構成

以下のコマンドを数回実行して、kengine コンテナの STATUSUP 期間と一致せず、頻繁に変わっている場合、そのコンテナは頻繁に再起動されていることを示しています。時々 STATUSRestarting と表示される可能性もあります。

$ docker ps
CONTAINER ID   IMAGE                                                  COMMAND                  CREATED          STATUS          PORTS                                                 NAMES
2ffa9132b693   kompira.azurecr.io/kompira-enterprise:latest           "docker-entrypoint.s…"   20 seconds ago    Up 1 second                                                           ke2_kengine.1.omizpboyhn5tpx82xuco6m4xz
148289b7d3b8   registry.hub.docker.com/library/rabbitmq:3.13-alpine   "docker-entrypoint.s…"   17 minutes ago   Up 17 minutes   4369/tcp, 5671-5672/tcp, 15691-15692/tcp, 25672/tcp   ke2_rabbitmq.3.h89fr1zvis1mdzc1kfiqc98qc
ec0dd234fec1   registry.hub.docker.com/library/nginx:1.27-alpine      "/docker-entrypoint.…"   17 minutes ago   Up 17 minutes   80/tcp                                                ke2_nginx.3.7yk7iq7llghsa5npk47swh3dt
6803dc2b6181   kompira.azurecr.io/kompira-enterprise:latest           "docker-entrypoint.s…"   17 minutes ago   Up 17 minutes                                                         ke2_kompira.3.wlywfduedgt0joxbm9mpk8gz9
cbf49609d34f   kompira.azurecr.io/kompira-enterprise:latest           "docker-entrypoint.s…"   17 minutes ago   Up 17 minutes                                                         ke2_jobmngrd.3.w9im9yziam3blrtctm182j19q
99abcdd16419   registry.hub.docker.com/library/redis:7.2-alpine       "docker-entrypoint.s…"   17 minutes ago   Up 17 minutes   6379/tcp                                              ke2_redis.1.p65rvw4apkbvl8fga1ph04mwx

/system/info を確認すると、kengine のステータスが unknown と表示され、engine 毎に変わってしまいます。一方で、/.status を確認するとnormal と表示されます。

しかし、ジョブフローを実行するとエラーが発生します。

上の状況で kengine ログチェックすると以下の通りになるの可能性があります。

 [2024-10-31 08:50:46,101:ke-ke2-rhel89-swarm-1:kompirad:MainThread] INFO: [Engine-5960] started.
[2024-10-31 08:50:49,114:ke-ke2-rhel89-swarm-1:kompirad:Thread-2 (_loop)] WARNING: [EngineClient] failed to consuming: Message not delivered: NO_ROUTE (312) to queue 'engine_queue_5947' from exchange ''
[2024-10-31 08:50:49,127:ke-ke2-rhel89-swarm-1:kompirad:Thread-2 (_loop)] WARNING: [EngineClient] failed to consuming: Message not delivered: NO_ROUTE (312) to queue 'engine_queue_5948' from exchange ''
[2024-10-31 08:50:59,105:ke-ke2-rhel89-swarm-1:kompirad:Thread-2 (_loop)] WARNING: [EngineClient] failed to consuming: Message not delivered: NO_ROUTE (312) to queue 'engine_queue_5948' from exchange ''
[2024-10-31 08:50:59,117:ke-ke2-rhel89-swarm-1:kompirad:Thread-2 (_loop)] WARNING: [EngineClient] failed to consuming: Message not delivered: NO_ROUTE (312) to queue 'engine_queue_5947' from exchange ''
[2024-10-31 08:51:09,106:ke-ke2-rhel89-swarm-1:kompirad:Thread-2 (_loop)] WARNING: [EngineClient] failed to consuming: Message not delivered: NO_ROUTE (312) to queue 'engine_queue_5948' from exchange ''
[2024-10-31 08:51:09,119:ke-ke2-rhel89-swarm-1:kompirad:Thread-2 (_loop)] WARNING: [EngineClient] failed to consuming: Message not delivered: NO_ROUTE (312) to queue 'engine_queue_5947' from exchange ''
[2024-10-31 08:51:19,106:ke-ke2-rhel89-swarm-1:kompirad:Thread-2 (_loop)] WARNING: [EngineClient] failed to consuming: Message not delivered: NO_ROUTE (312) to queue 'engine_queue_5948' from exchange ''
[2024-10-31 08:51:19,119:ke-ke2-rhel89-swarm-1:kompirad:Thread-2 (_loop)] WARNING: [EngineClient] failed to consuming: Message not delivered: NO_ROUTE (312) to queue 'engine_queue_5947' from exchange ''
[2024-10-31 08:51:29,106:ke-ke2-rhel89-swarm-1:kompirad:Thread-2 (_loop)] WARNING: [EngineClient] failed to consuming: Message not delivered: NO_ROUTE (312) to queue 'engine_queue_5948' from exchange ''
[2024-10-31 08:51:29,119:ke-ke2-rhel89-swarm-1:kompirad:Thread-2 (_loop)] WARNING: [EngineClient] failed to consuming: Message not delivered: NO_ROUTE (312) to queue 'engine_queue_5947' from exchange ''
[2024-10-31 08:51:39,105:ke-ke2-rhel89-swarm-1:kompirad:Thread-2 (_loop)] WARNING: [EngineClient] failed to consuming: Message not delivered: NO_ROUTE (312) to queue 'engine_queue_5948' from exchange ''
[2024-10-31 08:51:39,117:ke-ke2-rhel89-swarm-1:kompirad:Thread-2 (_loop)] WARNING: [EngineClient] failed to consuming: Message not delivered: NO_ROUTE (312) to queue 'engine_queue_5947' from exchange ''
[2024-10-31 08:51:49,105:ke-ke2-rhel89-swarm-1:kompirad:Thread-2 (_loop)] WARNING: [EngineClient] failed to consuming: Message not delivered: NO_ROUTE (312) to queue 'engine_queue_5948' from exchange ''
[2024-10-31 08:51:49,117:ke-ke2-rhel89-swarm-1:kompirad:Thread-2 (_loop)] WARNING: [EngineClient] failed to consuming: Message not delivered: NO_ROUTE (312) to queue 'engine_queue_5947' from exchange ''
[2024-10-31 08:51:59,106:ke-ke2-rhel89-swarm-1:kompirad:QueueManager] ERROR: [AMQPConnector] AMQP connection error: connection closed
[2024-10-31 08:51:59,126:ke-ke2-rhel89-swarm-1:Executor-0:MainThread] INFO: [Executor-0] received stop message
[2024-10-31 08:51:59,126:ke-ke2-rhel89-swarm-1:Executor-0:MainThread] INFO: [Executor-0] finally
[2024-10-31 08:51:59,330:ke-ke2-rhel89-swarm-1:Executor-0:ResultHandlerThread] INFO: result handler: finished
[2024-10-31 08:51:59,336:ke-ke2-rhel89-swarm-1:Executor-0:CommandManager] ERROR: [AMQPConnector] AMQP connection error: connection closed
[2024-10-31 08:51:59,497:ke-ke2-rhel89-swarm-1:Executor-1:MainThread] INFO: [Executor-1] received stop message
[2024-10-31 08:51:59,497:ke-ke2-rhel89-swarm-1:Executor-1:MainThread] INFO: [Executor-1] finally
[2024-10-31 08:51:59,702:ke-ke2-rhel89-swarm-1:Executor-1:ResultHandlerThread] INFO: result handler: finished
[2024-10-31 08:51:59,706:ke-ke2-rhel89-swarm-1:Executor-1:CommandManager] ERROR: [AMQPConnector] AMQP connection error: connection closed
[2024-10-31 08:51:59,879:ke-ke2-rhel89-swarm-1:Executor-2:MainThread] INFO: [Executor-2] received stop message
[2024-10-31 08:51:59,879:ke-ke2-rhel89-swarm-1:Executor-2:MainThread] INFO: [Executor-2] finally
[2024-10-31 08:52:00,083:ke-ke2-rhel89-swarm-1:Executor-2:ResultHandlerThread] INFO: result handler: finished
[2024-10-31 08:52:00,282:ke-ke2-rhel89-swarm-1:Executor-3:MainThread] INFO: [Executor-3] received stop message
[2024-10-31 08:52:00,282:ke-ke2-rhel89-swarm-1:Executor-3:MainThread] INFO: [Executor-3] finally
[2024-10-31 08:52:00,487:ke-ke2-rhel89-swarm-1:Executor-3:ResultHandlerThread] INFO: result handler: finished
[2024-10-31 08:52:00,501:ke-ke2-rhel89-swarm-1:Executor-3:CommandManager] ERROR: [AMQPConnector] AMQP connection error: connection closed
[2024-10-31 08:52:00,662:ke-ke2-rhel89-swarm-1:kompirad:MainThread] INFO: [Engine-5960] finished.
[2024-10-31 08:52:00,662:ke-ke2-rhel89-swarm-1:kompirad:MainThread] INFO: kompirad: going to terminate engine_server
[2024-10-31 08:52:00,662:ke-ke2-rhel89-swarm-1:kompirad:MainThread] INFO: kompirad: bye

何か原因で RabbitMQ クラスタの設定が変わってしまった。RabbitMQ クラスタ全体が不安定になりました。

RabbitMQ に関する以下の所を進んでください。