クラスタ構成の rabbitmq ログ調査

INS-CR1: RabbitMQ の Mnesia データベースが破損している

長時間にわたってネットワーク分断やスプリットブレイン、このような状況が生じる可能性があります。RabbitMQ のログを確認すると、「ログのサンプルエントリ」のようなログが表示される可能性があります。

主なエラーメッセージ

  • Mnesia function failed 99 times. Possibly an infinite retry loop; trying one last time
  • reason: reached_max_restart_intensity

ログのサンプルエントリ

Node: ke2-rhel89-swarm-3

2024-09-05 17:57:59.080590+09:00 [info] <0.34211.0> accepting AMQP connection <0.34211.0> (10.0.2.125:56332 -> 10.0.2.54:5672)
2024-09-05 17:57:59.080977+09:00 [warning] <0.34211.0> Mnesia->Khepri fallback handling: Mnesia function failed 99 times. Possibly an infinite retry loop; trying one last time
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0>   crasher:
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0>     initial call: rabbit_reader:init/3
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0>     pid: <0.34211.0>
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0>     registered_name: []
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0>     exception exit: {aborted,
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0>                         {no_exists,[rabbit_runtime_parameters,cluster_name]}}
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0>       in function  mnesia:abort/1 (mnesia.erl, line 362)
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0>       in call from rabbit_db_rtparams:get_in_mnesia/1 (rabbit_db_rtparams.erl, line 148)
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0>       in call from rabbit_runtime_parameters:lookup0/2 (rabbit_runtime_parameters.erl, line 362)
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0>       in call from rabbit_runtime_parameters:value0/1 (rabbit_runtime_parameters.erl, line 356)
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0>       in call from rabbit_nodes:cluster_name/0 (rabbit_nodes.erl, line 104)
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0>       in call from rabbit_reader:server_properties/1 (rabbit_reader.erl, line 240)
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0>       in call from rabbit_reader:start_connection/3 (rabbit_reader.erl, line 1131)
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0>       in call from rabbit_reader:handle_input/3 (rabbit_reader.erl, line 1081)
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0>     ancestors: [<0.34209.0>,<0.710.0>,<0.709.0>,<0.708.0>,<0.706.0>,
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0>                   <0.705.0>,rabbit_sup,<0.254.0>]
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0>     message_queue_len: 1
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0>     messages: [{'EXIT',#Port<0.1428>,normal}]
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0>     links: [<0.34209.0>]
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0>     dictionary: [{process_name,
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0>                       {rabbit_reader,
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0>                           <<"10.0.2.125:56332 -> 10.0.2.54:5672">>}}]
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0>     trap_exit: true
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0>     status: running
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0>     heap_size: 2586
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0>     stack_size: 28
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0>     reductions: 10565
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0>   neighbours:
2024-09-05 17:57:59.081138+09:00 [error] <0.34211.0> 
2024-09-05 17:57:59.081622+09:00 [error] <0.34209.0>     supervisor: {<0.34209.0>,rabbit_connection_sup}
2024-09-05 17:57:59.081622+09:00 [error] <0.34209.0>     errorContext: child_terminated
2024-09-05 17:57:59.081622+09:00 [error] <0.34209.0>     reason: {aborted,{no_exists,[rabbit_runtime_parameters,cluster_name]}}
2024-09-05 17:57:59.081622+09:00 [error] <0.34209.0>     offender: [{pid,<0.34211.0>},
2024-09-05 17:57:59.081622+09:00 [error] <0.34209.0>                {id,reader},
2024-09-05 17:57:59.081622+09:00 [error] <0.34209.0>                {mfargs,{rabbit_reader,start_link,
2024-09-05 17:57:59.081622+09:00 [error] <0.34209.0>                                       [<0.34210.0>,
2024-09-05 17:57:59.081622+09:00 [error] <0.34209.0>                                        {acceptor,{0,0,0,0,0,0,0,0},5672}]}},
2024-09-05 17:57:59.081622+09:00 [error] <0.34209.0>                {restart_type,transient},
2024-09-05 17:57:59.081622+09:00 [error] <0.34209.0>                {significant,true},
2024-09-05 17:57:59.081622+09:00 [error] <0.34209.0>                {shutdown,300000},
2024-09-05 17:57:59.081622+09:00 [error] <0.34209.0>                {child_type,worker}]
2024-09-05 17:57:59.081622+09:00 [error] <0.34209.0> 
2024-09-05 17:57:59.081788+09:00 [error] <0.34209.0>     supervisor: {<0.34209.0>,rabbit_connection_sup}
2024-09-05 17:57:59.081788+09:00 [error] <0.34209.0>     errorContext: shutdown
2024-09-05 17:57:59.081788+09:00 [error] <0.34209.0>     reason: reached_max_restart_intensity
2024-09-05 17:57:59.081788+09:00 [error] <0.34209.0>     offender: [{pid,<0.34211.0>},
2024-09-05 17:57:59.081788+09:00 [error] <0.34209.0>                {id,reader},
2024-09-05 17:57:59.081788+09:00 [error] <0.34209.0>                {mfargs,{rabbit_reader,start_link,
2024-09-05 17:57:59.081788+09:00 [error] <0.34209.0>                                       [<0.34210.0>,
2024-09-05 17:57:59.081788+09:00 [error] <0.34209.0>                                        {acceptor,{0,0,0,0,0,0,0,0},5672}]}},
2024-09-05 17:57:59.081788+09:00 [error] <0.34209.0>                {restart_type,transient},
2024-09-05 17:57:59.081788+09:00 [error] <0.34209.0>                {significant,true},
2024-09-05 17:57:59.081788+09:00 [error] <0.34209.0>                {shutdown,300000},
2024-09-05 17:57:59.081788+09:00 [error] <0.34209.0>                {child_type,worker}]
2024-09-05 17:57:59.081788+09:00 [error] <0.34209.0> 
2024-09-05 17:58:36.651258+09:00 [info] <0.34270.0> accepting AMQP connection <0.34270.0> (10.0.2.132:49752 -> 10.0.2.54:5672)
2024-09-05 17:58:36.651753+09:00 [warning] <0.34270.0> Mnesia->Khepri fallback handling: Mnesia function failed 99 times. Possibly an infinite retry loop; trying one last time
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0>   crasher:
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0>     initial call: rabbit_reader:init/3
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0>     pid: <0.34270.0>
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0>     registered_name: []
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0>     exception exit: {aborted,
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0>                         {no_exists,[rabbit_runtime_parameters,cluster_name]}}
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0>       in function  mnesia:abort/1 (mnesia.erl, line 362)
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0>       in call from rabbit_db_rtparams:get_in_mnesia/1 (rabbit_db_rtparams.erl, line 148)
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0>       in call from rabbit_runtime_parameters:lookup0/2 (rabbit_runtime_parameters.erl, line 362)
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0>       in call from rabbit_runtime_parameters:value0/1 (rabbit_runtime_parameters.erl, line 356)
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0>       in call from rabbit_nodes:cluster_name/0 (rabbit_nodes.erl, line 104)
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0>       in call from rabbit_reader:server_properties/1 (rabbit_reader.erl, line 240)
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0>       in call from rabbit_reader:start_connection/3 (rabbit_reader.erl, line 1131)
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0>       in call from rabbit_reader:handle_input/3 (rabbit_reader.erl, line 1081)
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0>     ancestors: [<0.34268.0>,<0.710.0>,<0.709.0>,<0.708.0>,<0.706.0>,
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0>                   <0.705.0>,rabbit_sup,<0.254.0>]
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0>     message_queue_len: 1
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0>     messages: [{'EXIT',#Port<0.1430>,normal}]
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0>     links: [<0.34268.0>]
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0>     dictionary: [{process_name,
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0>                       {rabbit_reader,
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0>                           <<"10.0.2.132:49752 -> 10.0.2.54:5672">>}}]
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0>     trap_exit: true
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0>     status: running
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0>     heap_size: 2586
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0>     stack_size: 28
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0>     reductions: 10561
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0>   neighbours:
2024-09-05 17:58:36.652032+09:00 [error] <0.34270.0> 
2024-09-05 17:58:36.652688+09:00 [error] <0.34268.0>     supervisor: {<0.34268.0>,rabbit_connection_sup}
2024-09-05 17:58:36.652688+09:00 [error] <0.34268.0>     errorContext: child_terminated
2024-09-05 17:58:36.652688+09:00 [error] <0.34268.0>     reason: {aborted,{no_exists,[rabbit_runtime_parameters,cluster_name]}}
2024-09-05 17:58:36.652688+09:00 [error] <0.34268.0>     offender: [{pid,<0.34270.0>},
2024-09-05 17:58:36.652688+09:00 [error] <0.34268.0>                {id,reader},
2024-09-05 17:58:36.652688+09:00 [error] <0.34268.0>                {mfargs,{rabbit_reader,start_link,
2024-09-05 17:58:36.652688+09:00 [error] <0.34268.0>                                       [<0.34269.0>,
2024-09-05 17:58:36.652688+09:00 [error] <0.34268.0>                                        {acceptor,{0,0,0,0,0,0,0,0},5672}]}},
2024-09-05 17:58:36.652688+09:00 [error] <0.34268.0>                {restart_type,transient},
2024-09-05 17:58:36.652688+09:00 [error] <0.34268.0>                {significant,true},
2024-09-05 17:58:36.652688+09:00 [error] <0.34268.0>                {shutdown,300000},
2024-09-05 17:58:36.652688+09:00 [error] <0.34268.0>                {child_type,worker}]
2024-09-05 17:58:36.652688+09:00 [error] <0.34268.0> 
2024-09-05 17:58:36.652820+09:00 [error] <0.34268.0>     supervisor: {<0.34268.0>,rabbit_connection_sup}
2024-09-05 17:58:36.652820+09:00 [error] <0.34268.0>     errorContext: shutdown
2024-09-05 17:58:36.652820+09:00 [error] <0.34268.0>     reason: reached_max_restart_intensity
2024-09-05 17:58:36.652820+09:00 [error] <0.34268.0>     offender: [{pid,<0.34270.0>},
2024-09-05 17:58:36.652820+09:00 [error] <0.34268.0>                {id,reader},
2024-09-05 17:58:36.652820+09:00 [error] <0.34268.0>                {mfargs,{rabbit_reader,start_link,
2024-09-05 17:58:36.652820+09:00 [error] <0.34268.0>                                       [<0.34269.0>,
2024-09-05 17:58:36.652820+09:00 [error] <0.34268.0>                                        {acceptor,{0,0,0,0,0,0,0,0},5672}]}},
2024-09-05 17:58:36.652820+09:00 [error] <0.34268.0>                {restart_type,transient},
2024-09-05 17:58:36.652820+09:00 [error] <0.34268.0>                {significant,true},
2024-09-05 17:58:36.652820+09:00 [error] <0.34268.0>                {shutdown,300000},
2024-09-05 17:58:36.652820+09:00 [error] <0.34268.0>                {child_type,worker}]
2024-09-05 17:58:36.652820+09:00 [error] <0.34268.0> 
2024-09-05 17:58:37.987335+09:00 [info] <0.34280.0> accepting AMQP connection <0.34280.0> (10.0.2.131:36582 -> 10.0.2.54:5672)
2024-09-05 17:58:37.987755+09:00 [warning] <0.34280.0> Mnesia->Khepri fallback handling: Mnesia function failed 99 times. Possibly an infinite retry loop; trying one last time

解決策

  1. rabbitmq クラスタ状況を確認してください

    $ docker exec $(docker ps -q -f name=rabbit) rabbitmqctl cluster_status
    
    Basics
    
    Cluster name: rabbit@mq-ke2-rhel89-swarm-1
    Total CPU cores available cluster-wide: 12
    
    Disk Nodes
    
    rabbit@mq-ke2-rhel89-swarm-1
    rabbit@mq-ke2-rhel89-swarm-2
    rabbit@mq-ke2-rhel89-swarm-3
    
    Running Nodes
    
    rabbit@mq-ke2-rhel89-swarm-1
    rabbit@mq-ke2-rhel89-swarm-2
    rabbit@mq-ke2-rhel89-swarm-3
    
    Versions
    
    rabbit@mq-ke2-rhel89-swarm-1: RabbitMQ 3.13.7 on Erlang 26.2.5.5
    rabbit@mq-ke2-rhel89-swarm-2: RabbitMQ 3.13.7 on Erlang 26.2.5.5
    rabbit@mq-ke2-rhel89-swarm-3: RabbitMQ 3.13.7 on Erlang 26.2.5.5
    
    CPU Cores
    
    Node: rabbit@mq-ke2-rhel89-swarm-1, available CPU cores: 4
    Node: rabbit@mq-ke2-rhel89-swarm-2, available CPU cores: 4
    Node: rabbit@mq-ke2-rhel89-swarm-3, available CPU cores: 4
    
    Maintenance status
    
    Node: rabbit@mq-ke2-rhel89-swarm-1, status: not under maintenance
    Node: rabbit@mq-ke2-rhel89-swarm-2, status: not under maintenance
    Node: rabbit@mq-ke2-rhel89-swarm-3, status: not under maintenance
    
    Alarms
    
    (none)
    
    Network Partitions
    
    (none)
    
    Listeners
    
    Node: rabbit@mq-ke2-rhel89-swarm-1, interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP
    Node: rabbit@mq-ke2-rhel89-swarm-1, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
    Node: rabbit@mq-ke2-rhel89-swarm-1, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
    Node: rabbit@mq-ke2-rhel89-swarm-1, interface: [::], port: 5671, protocol: amqp/ssl, purpose: AMQP 0-9-1 and AMQP 1.0 over TLS
    Node: rabbit@mq-ke2-rhel89-swarm-2, interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP
    Node: rabbit@mq-ke2-rhel89-swarm-2, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
    Node: rabbit@mq-ke2-rhel89-swarm-2, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
    Node: rabbit@mq-ke2-rhel89-swarm-2, interface: [::], port: 5671, protocol: amqp/ssl, purpose: AMQP 0-9-1 and AMQP 1.0 over TLS
    Node: rabbit@mq-ke2-rhel89-swarm-3, interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP
    Node: rabbit@mq-ke2-rhel89-swarm-3, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
    Node: rabbit@mq-ke2-rhel89-swarm-3, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
    Node: rabbit@mq-ke2-rhel89-swarm-3, interface: [::], port: 5671, protocol: amqp/ssl, purpose: AMQP 0-9-1 and AMQP 1.0 over TLS
    
    Feature flags
    
    Flag: classic_mirrored_queue_version, state: enabled
    Flag: classic_queue_type_delivery_support, state: enabled
    Flag: direct_exchange_routing_v2, state: enabled
    Flag: drop_unroutable_metric, state: enabled
    Flag: empty_basic_get_metric, state: enabled
    Flag: feature_flags_v2, state: enabled
    Flag: implicit_default_bindings, state: enabled
    Flag: khepri_db, state: disabled
    Flag: listener_records_in_ets, state: enabled
    Flag: maintenance_mode_status, state: enabled
    Flag: message_containers, state: enabled
    Flag: message_containers_deaths_v2, state: enabled
    Flag: quorum_queue, state: enabled
    Flag: quorum_queue_non_voters, state: enabled
    Flag: restart_streams, state: enabled
    Flag: stream_filtering, state: enabled
    Flag: stream_queue, state: enabled
    Flag: stream_sac_coordinator_unblock_group, state: enabled
    Flag: stream_single_active_consumer, state: enabled
    Flag: stream_update_config_command, state: enabled
    Flag: tracking_records_in_ets, state: enabled
    Flag: user_limits, state: enabled
    Flag: virtual_host_metadata, state: enabled
    

    上の情報で Cluster name あることと Running Nodes 3台があることを確認します。

    なければノード間通信を確認します。

    ネットワーク導通をご確認ください。

  2. rabbitmq をリセットしてください。

    $ docker exec $(docker ps -q -f name=rabbit) rabbitmqctl stop_app
    $ docker exec $(docker ps -q -f name=rabbit) rabbitmqctl reset
    $ docker exec $(docker ps -q -f name=rabbit) rabbitmqctl start_app
    
  3. 正常に戻らな場合、rabbitmq コンテナを再起動してください

    $ docker restart < rabbitmq コンテナ ID>
    
  4. 解決できない場合は、各ノードの RabbitMQ の volumeを削除も必要です。

    Rabbitmq のボリュームを削除

INS-CR2: Rabbitmq ログに「Partial partition detected」というエラーが記録されている

RabbitMQクラスタ内で部分的なネットワーク分割(パーシャルパーティション)が検出されたために発生します。これは、ネットワークの不具合や一部のノード間で通信ができなくなったことにより、スプリットブレインの状態が引き起こされ、クラスタの一貫性が失われたことを意味します。rabbitmq のログを確認すると、「ログのサンプルエントリ」のようなログが表示される可能性があります。

ログのサンプルエントリ

Node: ke2-rhel89-swarm-1

2024-09-10 11:59:17.509560+09:00 [error] <0.514.0> Partial partition detected:
2024-09-10 11:59:17.509560+09:00 [error] <0.514.0>  * We saw DOWN from rabbit@mq-ke2-rhel89-swarm-2
2024-09-10 11:59:17.509560+09:00 [error] <0.514.0>  * We can still see rabbit@mq-ke2-rhel89-swarm-3 which can see rabbit@mq-ke2-rhel89-swarm-2
2024-09-10 11:59:17.509560+09:00 [error] <0.514.0>  * pause_minority mode enabled
2024-09-10 11:59:17.509560+09:00 [error] <0.514.0> We will therefore pause until the *entire* cluster recovers
2024-09-10 11:59:17.509619+09:00 [warning] <0.514.0> Cluster minority/secondary status detected - awaiting recovery

解決策

  1. ノード間通信を確認する:

    ネットワーク導通をご確認ください。

  2. ネットワークパーティションの問題で RabbitMQ 停止の可能性があります。

    RabbitMQ のステータスを以下の通りに確認します

    実行中場合以下のように情報表示される

    $ docker exec $(docker ps -q -f name=rabbit) rabbitmq-diagnostics  -q check_running
    
    RabbitMQ on node rabbit@mq-<ホスト名> is fully booted and running
    # ex: RabbitMQ on node rabbit@mq-ke2-rhel89-swarm-1 is fully booted and running
    

    停止場合以下の情報表示されます。

    docker exec $(docker ps -q -f name=rabbit) rabbitmq-diagnostics  -q check_running
    
    Error:
    RabbitMQ on node rabbit@mq-ke2-rhel89-swarm-1 is not running or has not fully booted yet (check with is_booting)
    
  3. ネットワークパーティションが解消されたが、少数派の特定のーどのRabbitmqがまだ一時停止状態の場合、止されてしまったノードのrabbitmq を起動します。

    $ docker exec $(docker ps -q -f name=rabbit) rabbitmqctl start_app
    

    起動しても正常に戻らない場合、rabbitmq コンテナを再起動します。

    # rabbitmq コンテナ ID: $ docker ps -q -f name=rabbitmq を実行してコンテナ ID を取得できます
    $ docker restart <rabbitmq コンテナ ID>
    

    まだ異常なら docker を再起動してください

    $ systemctl restart docker.service
    

INS-CR3: Rabbitmq ログに「Waiting for Mnesia tables」というエラーが記録されている

ネットワークが30秒以上中断された・リモートのRabbitMQノードが30秒以上接続できない 場合、特定のノードの RabbitMQ ログを確認すると、「ログのサンプルエントリ」のようなログが表示される可能性があります。

主なエラーメッセージ

  • Waiting for Mnesia tables for 30000 ms, 6 retries left
  • Error while waiting for Mnesia tables

ログのサンプルエントリ

2024-09-25 11:56:47.952788+09:00 [warning] <0.254.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,
2024-09-25 11:56:47.952788+09:00 [warning] <0.254.0>                                         ['rabbit@mq-ke2-rhel89-swarm-3',
2024-09-25 11:56:47.952788+09:00 [warning] <0.254.0>                                          'rabbit@mq-ke2-rhel89-swarm-2',
2024-09-25 11:56:47.952788+09:00 [warning] <0.254.0>                                          'rabbit@mq-ke2-rhel89-swarm-1'],
2024-09-25 11:56:47.952788+09:00 [warning] <0.254.0>                                         [rabbit_user,rabbit_user_permission,
2024-09-25 11:56:47.952788+09:00 [warning] <0.254.0>                                          rabbit_runtime_parameters,
2024-09-25 11:56:47.952788+09:00 [warning] <0.254.0>                                          rabbit_topic_permission,rabbit_vhost,
2024-09-25 11:56:47.952788+09:00 [warning] <0.254.0>                                          rabbit_durable_queue,
2024-09-25 11:56:47.952788+09:00 [warning] <0.254.0>                                          rabbit_durable_exchange,
2024-09-25 11:56:47.952788+09:00 [warning] <0.254.0>                                          rabbit_durable_route]}
2024-09-25 11:56:47.952925+09:00 [info] <0.254.0> Waiting for Mnesia tables for 30000 ms, 8 retries left
2024-09-25 11:57:17.953785+09:00 [warning] <0.254.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,
2024-09-25 11:57:17.953785+09:00 [warning] <0.254.0>                                         ['rabbit@mq-ke2-rhel89-swarm-3',
2024-09-25 11:57:17.953785+09:00 [warning] <0.254.0>                                          'rabbit@mq-ke2-rhel89-swarm-2',
2024-09-25 11:57:17.953785+09:00 [warning] <0.254.0>                                          'rabbit@mq-ke2-rhel89-swarm-1'],
2024-09-25 11:57:17.953785+09:00 [warning] <0.254.0>                                         [rabbit_user,rabbit_user_permission,
2024-09-25 11:57:17.953785+09:00 [warning] <0.254.0>                                          rabbit_runtime_parameters,
2024-09-25 11:57:17.953785+09:00 [warning] <0.254.0>                                          rabbit_topic_permission,rabbit_vhost,
2024-09-25 11:57:17.953785+09:00 [warning] <0.254.0>                                          rabbit_durable_queue,
2024-09-25 11:57:17.953785+09:00 [warning] <0.254.0>                                          rabbit_durable_exchange,
2024-09-25 11:57:17.953785+09:00 [warning] <0.254.0>                                          rabbit_durable_route]}
2024-09-25 11:57:17.953907+09:00 [info] <0.254.0> Waiting for Mnesia tables for 30000 ms, 7 retries left
2024-09-25 11:57:47.954844+09:00 [warning] <0.254.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,
2024-09-25 11:57:47.954844+09:00 [warning] <0.254.0>                                         ['rabbit@mq-ke2-rhel89-swarm-3',
2024-09-25 11:57:47.954844+09:00 [warning] <0.254.0>                                          'rabbit@mq-ke2-rhel89-swarm-2',
2024-09-25 11:57:47.954844+09:00 [warning] <0.254.0>                                          'rabbit@mq-ke2-rhel89-swarm-1'],
2024-09-25 11:57:47.954844+09:00 [warning] <0.254.0>                                         [rabbit_user,rabbit_user_permission,
2024-09-25 11:57:47.954844+09:00 [warning] <0.254.0>                                          rabbit_runtime_parameters,
2024-09-25 11:57:47.954844+09:00 [warning] <0.254.0>                                          rabbit_topic_permission,rabbit_vhost,
2024-09-25 11:57:47.954844+09:00 [warning] <0.254.0>                                          rabbit_durable_queue,
2024-09-25 11:57:47.954844+09:00 [warning] <0.254.0>                                          rabbit_durable_exchange,
2024-09-25 11:57:47.954844+09:00 [warning] <0.254.0>                                          rabbit_durable_route]}
2024-09-25 11:57:47.955042+09:00 [info] <0.254.0> Waiting for Mnesia tables for 30000 ms, 6 retries left
2024-09-25 11:58:17.955866+09:00 [warning] <0.254.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,
2024-09-25 11:58:17.955866+09:00 [warning] <0.254.0>                                         ['rabbit@mq-ke2-rhel89-swarm-3',
2024-09-25 11:58:17.955866+09:00 [warning] <0.254.0>                                          'rabbit@mq-ke2-rhel89-swarm-2',
2024-09-25 11:58:17.955866+09:00 [warning] <0.254.0>                                          'rabbit@mq-ke2-rhel89-swarm-1'],
2024-09-25 11:58:17.955866+09:00 [warning] <0.254.0>                                         [rabbit_user,rabbit_user_permission,
2024-09-25 11:58:17.955866+09:00 [warning] <0.254.0>                                          rabbit_runtime_parameters,
2024-09-25 11:58:17.955866+09:00 [warning] <0.254.0>                                          rabbit_topic_permission,rabbit_vhost,
2024-09-25 11:58:17.955866+09:00 [warning] <0.254.0>                                          rabbit_durable_queue,
2024-09-25 11:58:17.955866+09:00 [warning] <0.254.0>                                          rabbit_durable_exchange,
2024-09-25 11:58:17.955866+09:00 [warning] <0.254.0>                                          rabbit_durable_route]}
2024-09-25 11:58:17.955997+09:00 [info] <0.254.0> Waiting for Mnesia tables for 30000 ms, 5 retries left
2024-09-25 11:58:47.956902+09:00 [warning] <0.254.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,
2024-09-25 11:58:47.956902+09:00 [warning] <0.254.0>                                         ['rabbit@mq-ke2-rhel89-swarm-3',
2024-09-25 11:58:47.956902+09:00 [warning] <0.254.0>                                          'rabbit@mq-ke2-rhel89-swarm-2',
2024-09-25 11:58:47.956902+09:00 [warning] <0.254.0>                                          'rabbit@mq-ke2-rhel89-swarm-1'],
2024-09-25 11:58:47.956902+09:00 [warning] <0.254.0>                                         [rabbit_user,rabbit_user_permission,
2024-09-25 11:58:47.956902+09:00 [warning] <0.254.0>                                          rabbit_runtime_parameters,
2024-09-25 11:58:47.956902+09:00 [warning] <0.254.0>                                          rabbit_topic_permission,rabbit_vhost,
2024-09-25 11:58:47.956902+09:00 [warning] <0.254.0>                                          rabbit_durable_queue,
2024-09-25 11:58:47.956902+09:00 [warning] <0.254.0>                                          rabbit_durable_exchange,
2024-09-25 11:58:47.956902+09:00 [warning] <0.254.0>                                          rabbit_durable_route]}
2024-09-25 11:58:47.957045+09:00 [info] <0.254.0> Waiting for Mnesia tables for 30000 ms, 4 retries left
2024-09-25 11:59:17.957855+09:00 [warning] <0.254.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,
2024-09-25 11:59:17.957855+09:00 [warning] <0.254.0>                                         ['rabbit@mq-ke2-rhel89-swarm-3',
2024-09-25 11:59:17.957855+09:00 [warning] <0.254.0>                                          'rabbit@mq-ke2-rhel89-swarm-2',
2024-09-25 11:59:17.957855+09:00 [warning] <0.254.0>                                          'rabbit@mq-ke2-rhel89-swarm-1'],
2024-09-25 11:59:17.957855+09:00 [warning] <0.254.0>                                         [rabbit_user,rabbit_user_permission,
2024-09-25 11:59:17.957855+09:00 [warning] <0.254.0>                                          rabbit_runtime_parameters,
2024-09-25 11:59:17.957855+09:00 [warning] <0.254.0>                                          rabbit_topic_permission,rabbit_vhost,
2024-09-25 11:59:17.957855+09:00 [warning] <0.254.0>                                          rabbit_durable_queue,
2024-09-25 11:59:17.957855+09:00 [warning] <0.254.0>                                          rabbit_durable_exchange,
2024-09-25 11:59:17.957855+09:00 [warning] <0.254.0>                                          rabbit_durable_route]}
2024-09-25 11:59:17.958001+09:00 [info] <0.254.0> Waiting for Mnesia tables for 30000 ms, 3 retries left
2024-09-25 11:59:47.958928+09:00 [warning] <0.254.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,
2024-09-25 11:59:47.958928+09:00 [warning] <0.254.0>                                         ['rabbit@mq-ke2-rhel89-swarm-3',
2024-09-25 11:59:47.958928+09:00 [warning] <0.254.0>                                          'rabbit@mq-ke2-rhel89-swarm-2',
2024-09-25 11:59:47.958928+09:00 [warning] <0.254.0>                                          'rabbit@mq-ke2-rhel89-swarm-1'],
2024-09-25 11:59:47.958928+09:00 [warning] <0.254.0>                                         [rabbit_user,rabbit_user_permission,
2024-09-25 11:59:47.958928+09:00 [warning] <0.254.0>                                          rabbit_runtime_parameters,
2024-09-25 11:59:47.958928+09:00 [warning] <0.254.0>                                          rabbit_topic_permission,rabbit_vhost,
2024-09-25 11:59:47.958928+09:00 [warning] <0.254.0>                                          rabbit_durable_queue,
2024-09-25 11:59:47.958928+09:00 [warning] <0.254.0>                                          rabbit_durable_exchange,
2024-09-25 11:59:47.958928+09:00 [warning] <0.254.0>                                          rabbit_durable_route]}
2024-09-25 11:59:47.959111+09:00 [info] <0.254.0> Waiting for Mnesia tables for 30000 ms, 2 retries left

解決策

ノード間通信に一時的な問題があっても、rabbitmq は内部的に再試行を行ない自動回復する場合があります。

などにこの状況になる可能性があります。 RabbitMQは約9回の再試行を行います(9回×30秒=約5分)そして回復されます。 実はよく自動回復になりますがもしそれでも復旧しない場合以下の steps を進んでください。

  1. ノード間通信を確認する:

ネットワーク導通をご確認ください。

  1. ネットワークの問題で RabbitMQ 停止の可能性があります。

    RabbitMQ のステータスを以下の通りに確認します

    実行中場合以下のように情報表示される

    $ docker exec $(docker ps -q -f name=rabbit) rabbitmq-diagnostics  -q check_running
    
    RabbitMQ on node rabbit@mq-<ホスト名> is fully booted and running
    # ex: RabbitMQ on node rabbit@mq-ke2-rhel89-swarm-1 is fully booted and running
    

    停止場合以下の情報表示されます。

    docker exec $(docker ps -q -f name=rabbit) rabbitmq-diagnostics  -q check_running
    
    Error:
    RabbitMQ on node rabbit@mq-ke2-rhel89-swarm-1 is not running or has not fully booted yet (check with is_booting)
    
  2. ネットワーク問題が解消されたが、少数派の特定ノードの Rabbitmq がまだ停止状態の場合、停止されてしまったノードのrabbitmq を起動します。

    $ docker exec $(docker ps -q -f name=rabbit) rabbitmqctl start_app
    

    起動しても正常に戻らない場合、rabbitmq コンテナを再起動します。

    # rabbitmq コンテナ ID: $ docker ps -q -f name=rabbitmq を実行してコンテナ ID を取得できます
    $ docker restart <rabbitmq コンテナ ID>
    

    画面の ./status や /system/info などを見て rabbitmq まだ異常なら 全てのノードの rabbitmq を再起動してください

    $ docker service update --force ke2_rabbitmq
    

INS-CR4: Rabbitmq ログに「erl_crash」というエラーが記録されている

rabbitmq のログ見ると以下のようなエラーが記録されている場合があります。

ログのサンプルエントリ

Node: ke2-rhel89-swarm-1

Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done
[os_mon] memory supervisor port (memsup): Erlang has closed
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
Enabling plugins on node rabbit@mq-ke2-rhel89-swarm-1:
rabbitmq_auth_mechanism_ssl
The following plugins have been configured:
  rabbitmq_auth_mechanism_ssl
  rabbitmq_federation
  rabbitmq_management_agent
  rabbitmq_prometheus
  rabbitmq_web_dispatch
Applying plugin configuration to rabbit@mq-ke2-rhel89-swarm-1...
The following plugins have been enabled:
  rabbitmq_auth_mechanism_ssl

set 5 plugins.
Offline change; changes will take effect at broker restart.
=INFO REPORT==== 28-Oct-2024::07:31:11.890366 ===
    alarm_handler: {set,{system_memory_high_watermark,[]}}
2024-10-28 07:31:12.531435+09:00 [warning] <0.156.0> Overriding Erlang cookie using the value set in the environment
2024-10-28 07:31:13.848732+09:00 [notice] <0.44.0> Application syslog exited with reason: stopped
2024-10-28 07:31:13.848786+09:00 [notice] <0.254.0> Logging: switching to configured handler(s); following messages may not be visible in this log output
2024-10-28 07:31:13.849152+09:00 [notice] <0.254.0> Logging: configured log handlers are now ACTIVE
2024-10-28 07:31:13.854554+09:00 [info] <0.254.0> ra: starting system quorum_queues
2024-10-28 07:31:13.854594+09:00 [info] <0.254.0> starting Ra system: quorum_queues in directory: /var/lib/rabbitmq/mnesia/rabbit@mq-ke2-rhel89-swarm-1/quorum/rabbit@mq-ke2-rhel89-swarm-1
2024-10-28 07:31:13.891668+09:00 [info] <0.268.0> ra system 'quorum_queues' running pre init for 7 registered servers
2024-10-28 07:31:13.911893+09:00 [info] <0.269.0> ra: meta data store initialised for system quorum_queues. 7 record(s) recovered
2024-10-28 07:31:13.921881+09:00 [notice] <0.274.0> WAL: ra_log_wal init, open tbls: ra_log_open_mem_tables, closed tbls: ra_log_closed_mem_tables
2024-10-28 07:31:13.931462+09:00 [info] <0.254.0> ra: starting system coordination
2024-10-28 07:31:13.931499+09:00 [info] <0.254.0> starting Ra system: coordination in directory: /var/lib/rabbitmq/mnesia/rabbit@mq-ke2-rhel89-swarm-1/coordination/rabbit@mq-ke2-rhel89-swarm-1
2024-10-28 07:31:13.932199+09:00 [info] <0.282.0> ra system 'coordination' running pre init for 1 registered servers
dets: file "/var/lib/rabbitmq/mnesia/rabbit@mq-ke2-rhel89-swarm-1/coordination/rabbit@mq-ke2-rhel89-swarm-1/meta.dets" not properly closed, repairing ...
2024-10-28 07:31:13.941100+09:00 [info] <0.283.0> ra: meta data store initialised for system coordination. 1 record(s) recovered
2024-10-28 07:31:13.941262+09:00 [notice] <0.288.0> WAL: ra_coordination_log_wal init, open tbls: ra_coordination_log_open_mem_tables, closed tbls: ra_coordination_log_closed_mem_tables
2024-10-28 07:31:13.943398+09:00 [info] <0.254.0> ra: starting system coordination
2024-10-28 07:31:13.943423+09:00 [info] <0.254.0> starting Ra system: coordination in directory: /var/lib/rabbitmq/mnesia/rabbit@mq-ke2-rhel89-swarm-1/coordination/rabbit@mq-ke2-rhel89-swarm-1
2024-10-28 07:31:14.068431+09:00 [info] <0.254.0> Waiting for Khepri leader for 30000 ms, 9 retries left
2024-10-28 07:31:14.141114+09:00 [info] <0.254.0> Khepri leader elected
2024-10-28 07:31:14.141167+09:00 [info] <0.254.0> Waiting for Khepri projections for 30000 ms, 9 retries left
2024-10-28 07:31:14.422786+09:00 [notice] <0.293.0> RabbitMQ metadata store: candidate -> leader in term: 6307 machine version: 1
2024-10-28 07:31:14.540727+09:00 [notice] <0.254.0> Feature flags: checking nodes `rabbit@mq-ke2-rhel89-swarm-1` and `rabbit@mq-ke2-rhel89-swarm-2` compatibility...
2024-10-28 07:31:14.548508+09:00 [notice] <0.254.0> Feature flags: nodes `rabbit@mq-ke2-rhel89-swarm-1` and `rabbit@mq-ke2-rhel89-swarm-2` are compatible
2024-10-28 07:31:14.710432+09:00 [notice] <0.44.0> Application mnesia exited with reason: stopped
2024-10-28 07:31:14.711298+09:00 [notice] <0.254.0> Feature flags: checking nodes `rabbit@mq-ke2-rhel89-swarm-1` and `rabbit@mq-ke2-rhel89-swarm-3` compatibility...
2024-10-28 07:31:14.714022+09:00 [notice] <0.254.0> Feature flags: nodes `rabbit@mq-ke2-rhel89-swarm-1` and `rabbit@mq-ke2-rhel89-swarm-3` are compatible
2024-10-28 07:31:14.716732+09:00 [notice] <0.44.0> Application mnesia exited with reason: stopped

BOOT FAILED
===========
Error during startup: {error,
                          {inconsistent_cluster,
2024-10-28 07:31:14.716907+09:00 [error] <0.254.0> 
                              "Mnesia: node 'rabbit@mq-ke2-rhel89-swarm-1' thinks it's clustered with node 'rabbit@mq-ke2-rhel89-swarm-3', but 'rabbit@mq-ke2-rhel89-swarm-3' disagrees"}}

2024-10-28 07:31:14.716907+09:00 [error] <0.254.0> BOOT FAILED
2024-10-28 07:31:14.716907+09:00 [error] <0.254.0> ===========
2024-10-28 07:31:14.716907+09:00 [error] <0.254.0> Error during startup: {error,
2024-10-28 07:31:14.716907+09:00 [error] <0.254.0>                           {inconsistent_cluster,
2024-10-28 07:31:14.716907+09:00 [error] <0.254.0>                               "Mnesia: node 'rabbit@mq-ke2-rhel89-swarm-1' thinks it's clustered with node 'rabbit@mq-ke2-rhel89-swarm-3', but 'rabbit@mq-ke2-rhel89-swarm-3' disagrees"}}
2024-10-28 07:31:14.716907+09:00 [error] <0.254.0> 
2024-10-28 07:31:15.717893+09:00 [error] <0.253.0>   crasher:
2024-10-28 07:31:15.717893+09:00 [error] <0.253.0>     initial call: application_master:init/4
2024-10-28 07:31:15.717893+09:00 [error] <0.253.0>     pid: <0.253.0>
2024-10-28 07:31:15.717893+09:00 [error] <0.253.0>     registered_name: []
2024-10-28 07:31:15.717893+09:00 [error] <0.253.0>     exception exit: {{inconsistent_cluster,
2024-10-28 07:31:15.717893+09:00 [error] <0.253.0>                          "Mnesia: node 'rabbit@mq-ke2-rhel89-swarm-1' thinks it's clustered with node 'rabbit@mq-ke2-rhel89-swarm-3', but 'rabbit@mq-ke2-rhel89-swarm-3' disagrees"},
2024-10-28 07:31:15.717893+09:00 [error] <0.253.0>                      {rabbit,start,[normal,[]]}}
2024-10-28 07:31:15.717893+09:00 [error] <0.253.0>       in function  application_master:init/4 (application_master.erl, line 142)
2024-10-28 07:31:15.717893+09:00 [error] <0.253.0>     ancestors: [<0.252.0>]
2024-10-28 07:31:15.717893+09:00 [error] <0.253.0>     message_queue_len: 1
2024-10-28 07:31:15.717893+09:00 [error] <0.253.0>     messages: [{'EXIT',<0.254.0>,normal}]
2024-10-28 07:31:15.717893+09:00 [error] <0.253.0>     links: [<0.252.0>,<0.44.0>]
2024-10-28 07:31:15.717893+09:00 [error] <0.253.0>     dictionary: []
2024-10-28 07:31:15.717893+09:00 [error] <0.253.0>     trap_exit: true
2024-10-28 07:31:15.717893+09:00 [error] <0.253.0>     status: running
2024-10-28 07:31:15.717893+09:00 [error] <0.253.0>     heap_size: 987
2024-10-28 07:31:15.717893+09:00 [error] <0.253.0>     stack_size: 28
2024-10-28 07:31:15.717893+09:00 [error] <0.253.0>     reductions: 191
2024-10-28 07:31:15.717893+09:00 [error] <0.253.0>   neighbours:
2024-10-28 07:31:15.717893+09:00 [error] <0.253.0> 
2024-10-28 07:31:15.724125+09:00 [notice] <0.44.0> Application rabbit exited with reason: {{inconsistent_cluster,"Mnesia: node 'rabbit@mq-ke2-rhel89-swarm-1' thinks it's clustered with node 'rabbit@mq-ke2-rhel89-swarm-3', but 'rabbit@mq-ke2-rhel89-swarm-3' disagrees"},{rabbit,start,[normal,[]]}}
Runtime terminating during boot (terminating)

解決策

  1. ノード間通信を確認する

    ネットワーク導通をご確認ください。

  2. 正常に戻らな場合、rabbitmq コンテナを再起動してください

    $ docker restart < rabbitmq コンテナ ID>
    

    まだ異常なら docker を再起動してください

    $ systemctl restart docker.service
    
  3. 解決できない場合は、各ノードの RabbitMQ の ボリュームを削除も必要です。

    Rabbitmq のボリュームを削除

INS-CR5: Rabbitmq ログに「leader saw pre_vote_rpc for unknown peer」という警告が記録されている

RabbitMQのクラスタ設定は他のノードと同じ状態になってないとこのような状況発生される可能性があります。 rabbitmq のログ見ると以下ような情報が表示される可能性があります。

ログのサンプルエントリ

2024-10-31 09:17:36.242578+09:00 [info] <0.13048.0> closing AMQP connection <0.13048.0> (10.0.5.55:37762 -> 10.0.5.26:5672, vhost: '/', user: 'guest')
2024-10-31 09:17:36.588684+09:00 [warning] <0.12964.0> closing AMQP connection <0.12964.0> (10.0.5.55:37732 -> 10.0.5.26:5672, vhost: '/', user: 'guest'):
2024-10-31 09:17:36.588684+09:00 [warning] <0.12964.0> client unexpectedly closed TCP connection
2024-10-31 09:17:39.566677+09:00 [warning] <0.1454.0> queue 'default/rpc_queue' in vhost '/': leader saw pre_vote_rpc for unknown peer {'%2F_default/rpc_queue','rabbit@mq-ke2-rhel89-swarm-2'}
2024-10-31 09:17:44.715647+09:00 [warning] <0.1454.0> queue 'default/rpc_queue' in vhost '/': leader saw pre_vote_rpc for unknown peer {'%2F_default/rpc_queue','rabbit@mq-ke2-rhel89-swarm-2'}
2024-10-31 09:17:46.508294+09:00 [info] <0.13368.0> accepting AMQP connection <0.13368.0> (10.0.5.56:41862 -> 10.0.5.26:5672)
2024-10-31 09:17:46.550266+09:00 [info] <0.13368.0> connection <0.13368.0> (10.0.5.56:41862 -> 10.0.5.26:5672): user 'guest' authenticated and granted access to vhost '/'
2024-10-31 09:17:47.222382+09:00 [info] <0.13389.0> accepting AMQP connection <0.13389.0> (10.0.5.56:41866 -> 10.0.5.26:5672)
2024-10-31 09:17:47.264270+09:00 [info] <0.13389.0> connection <0.13389.0> (10.0.5.56:41866 -> 10.0.5.26:5672): user 'guest' authenticated and granted access to vhost '/'
2024-10-31 09:17:47.864849+09:00 [info] <0.13407.0> accepting AMQP connection <0.13407.0> (10.0.5.56:33694 -> 10.0.5.26:5672)
2024-10-31 09:17:47.907315+09:00 [info] <0.13407.0> connection <0.13407.0> (10.0.5.56:33694 -> 10.0.5.26:5672): user 'guest' authenticated and granted access to vhost '/'
2024-10-31 09:17:48.506052+09:00 [info] <0.13425.0> accepting AMQP connection <0.13425.0> (10.0.5.56:33704 -> 10.0.5.26:5672)
2024-10-31 09:17:48.548373+09:00 [info] <0.13425.0> connection <0.13425.0> (10.0.5.56:33704 -> 10.0.5.26:5672): user 'guest' authenticated and granted access to vhost '/'
2024-10-31 09:17:49.176771+09:00 [info] <0.13443.0> accepting AMQP connection <0.13443.0> (10.0.5.56:33718 -> 10.0.5.26:5672)
2024-10-31 09:17:49.219214+09:00 [info] <0.13443.0> connection <0.13443.0> (10.0.5.56:33718 -> 10.0.5.26:5672): user 'guest' authenticated and granted access to vhost '/'
2024-10-31 09:17:49.258780+09:00 [info] <0.13461.0> accepting AMQP connection <0.13461.0> (10.0.5.56:33732 -> 10.0.5.26:5672)
2024-10-31 09:17:49.301266+09:00 [info] <0.13461.0> connection <0.13461.0> (10.0.5.56:33732 -> 10.0.5.26:5672): user 'guest' authenticated and granted access to vhost '/'
2024-10-31 09:17:50.482669+09:00 [warning] <0.1454.0> queue 'default/rpc_queue' in vhost '/': leader saw pre_vote_rpc for unknown peer {'%2F_default/rpc_queue','rabbit@mq-ke2-rhel89-swarm-2'}
2024-10-31 09:17:55.794600+09:00 [warning] <0.1454.0> queue 'default/rpc_queue' in vhost '/': leader saw pre_vote_rpc for unknown peer {'%2F_default/rpc_queue','rabbit@mq-ke2-rhel89-swarm-2'}
2024-10-31 09:18:01.492628+09:00 [warning] <0.1454.0> queue 'default/rpc_queue' in vhost '/': leader saw pre_vote_rpc for unknown peer {'%2F_default/rpc_queue','rabbit@mq-ke2-rhel89-swarm-2'}
2024-10-31 09:18:07.175701+09:00 [warning] <0.1454.0> queue 'default/rpc_queue' in vhost '/': leader saw pre_vote_rpc for unknown peer {'%2F_default/rpc_queue','rabbit@mq-ke2-rhel89-swarm-2'}
2024-10-31 09:18:12.572585+09:00 [warning] <0.1454.0> queue 'default/rpc_queue' in vhost '/': leader saw pre_vote_rpc for unknown peer {'%2F_default/rpc_queue','rabbit@mq-ke2-rhel89-swarm-2'}
2024-10-31 09:18:17.895508+09:00 [warning] <0.1454.0> queue 'default/rpc_queue' in vhost '/': leader saw pre_vote_rpc for unknown peer {'%2F_default/rpc_queue','rabbit@mq-ke2-rhel89-swarm-2'}
2024-10-31 09:18:22.965648+09:00 [warning] <0.1454.0> queue 'default/rpc_queue' in vhost '/': leader saw pre_vote_rpc for unknown peer {'%2F_default/rpc_queue','rabbit@mq-ke2-rhel89-swarm-2'}
2024-10-31 09:18:28.798947+09:00 [warning] <0.1454.0> queue 'default/rpc_queue' in vhost '/': leader saw pre_vote_rpc for unknown peer {'%2F_default/rpc_queue','rabbit@mq-ke2-rhel89-swarm-2'}
2024-10-31 09:18:34.309611+09:00 [warning] <0.1454.0> queue 'default/rpc_queue' in vhost '/': leader saw pre_vote_rpc for unknown peer {'%2F_default/rpc_queue','rabbit@mq-ke2-rhel89-swarm-2'}
2024-10-31 09:18:40.272684+09:00 [warning] <0.1454.0> queue 'default/rpc_queue' in vhost '/': leader saw pre_vote_rpc for unknown peer {'%2F_default/rpc_queue','rabbit@mq-ke2-rhel89-swarm-2'}
2024-10-31 09:18:46.007727+09:00 [warning] <0.1454.0> queue 'default/rpc_queue' in vhost '/': leader saw pre_vote_rpc for unknown peer {'%2F_default/rpc_queue','rabbit@mq-ke2-rhel89-swarm-2'}
2024-10-31 09:18:51.897620+09:00 [warning] <0.1454.0> queue 'default/rpc_queue' in vhost '/': leader saw pre_vote_rpc for unknown peer {'%2F_default/rpc_queue','rabbit@mq-ke2-rhel89-swarm-2'}
2024-10-31 09:18:52.423122+09:00 [info] <0.13461.0> closing AMQP connection <0.13461.0> (10.0.5.56:33732 -> 10.0.5.26:5672, vhost: '/', user: 'guest')
2024-10-31 09:18:52.661815+09:00 [info] <0.13389.0> closing AMQP connection <0.13389.0> (10.0.5.56:41866 -> 10.0.5.26:5672, vhost: '/', user: 'guest')
2024-10-31 09:18:53.078702+09:00 [info] <0.13407.0> closing AMQP connection <0.13407.0> (10.0.5.56:33694 -> 10.0.5.26:5672, vhost: '/', user: 'guest')
2024-10-31 09:18:53.480129+09:00 [info] <0.13425.0> closing AMQP connection <0.13425.0> (10.0.5.56:33704 -> 10.0.5.26:5672, vhost: '/', user: 'guest')
2024-10-31 09:18:53.881663+09:00 [info] <0.13443.0> closing AMQP connection <0.13443.0> (10.0.5.56:33718 -> 10.0.5.26:5672, vhost: '/', user: 'guest')
2024-10-31 09:18:54.202519+09:00 [warning] <0.13368.0> closing AMQP connection <0.13368.0> (10.0.5.56:41862 -> 10.0.5.26:5672, vhost: '/', user: 'guest'):
2024-10-31 09:18:54.202519+09:00 [warning] <0.13368.0> client unexpectedly closed TCP connection
2024-10-31 09:18:57.255652+09:00 [warning] <0.1454.0> queue 'default/rpc_queue' in vhost '/': leader saw pre_vote_rpc for unknown peer {'%2F_default/rpc_queue','rabbit@mq-ke2-rhel89-swarm-2'}
2024-10-31 09:19:03.158772+09:00 [warning] <0.1454.0> queue 'default/rpc_queue' in vhost '/': leader saw pre_vote_rpc for unknown peer {'%2F_default/rpc_queue','rabbit@mq-ke2-rhel89-swarm-2'}
2024-10-31 09:19:04.172441+09:00 [info] <0.13730.0> accepting AMQP connection <0.13730.0> (10.0.5.57:43388 -> 10.0.5.26:5672)
2024-10-31 09:19:04.216282+09:00 [info] <0.13730.0> connection <0.13730.0> (10.0.5.57:43388 -> 10.0.5.26:5672): user 'guest' authenticated and granted access to vhost '/'
2024-10-31 09:19:04.878854+09:00 [info] <0.13749.0> accepting AMQP connection <0.13749.0> (10.0.5.57:43394 -> 10.0.5.26:5672)
2024-10-31 09:19:04.921416+09:00 [info] <0.13749.0> connection <0.13749.0> (10.0.5.57:43394 -> 10.0.5.26:5672): user 'guest' authenticated and granted access to vhost '/'
2024-10-31 09:19:05.512922+09:00 [info] <0.13767.0> accepting AMQP connection <0.13767.0> (10.0.5.57:43396 -> 10.0.5.26:5672)
2024-10-31 09:19:05.556265+09:00 [info] <0.13767.0> connection <0.13767.0> (10.0.5.57:43396 -> 10.0.5.26:5672): user 'guest' authenticated and granted access to vhost '/'
2024-10-31 09:19:06.151074+09:00 [info] <0.13786.0> accepting AMQP connection <0.13786.0> (10.0.5.57:43398 -> 10.0.5.26:5672)
2024-10-31 09:19:06.193316+09:00 [info] <0.13786.0> connection <0.13786.0> (10.0.5.57:43398 -> 10.0.5.26:5672): user 'guest' authenticated and granted access to vhost '/'
2024-10-31 09:19:06.754779+09:00 [info] <0.13816.0> accepting AMQP connection <0.13816.0> (10.0.5.57:43414 -> 10.0.5.26:5672)
2024-10-31 09:19:06.797260+09:00 [info] <0.13816.0> connection <0.13816.0> (10.0.5.57:43414 -> 10.0.5.26:5672): user 'guest' authenticated and granted access to vhost '/'
2024-10-31 09:19:06.836767+09:00 [info] <0.13834.0> accepting AMQP connection <0.13834.0> (10.0.5.57:43430 -> 10.0.5.26:5672)
2024-10-31 09:19:06.879273+09:00 [info] <0.13834.0> connection <0.13834.0> (10.0.5.57:43430 -> 10.0.5.26:5672): user 'guest' authenticated and granted access to vhost '/'
2024-10-31 09:19:08.679523+09:00 [warning] <0.1454.0> queue 'default/rpc_queue' in vhost '/': leader saw pre_vote_rpc for unknown peer {'%2F_default/rpc_queue','rabbit@mq-ke2-rhel89-swarm-2'}

解決策

  1. rabbitmq クラスタ状況を確認してください

    $ docker exec $(docker ps -q -f name=rabbit) rabbitmqctl cluster_status
    
    Basics
    
    Cluster name: rabbit@mq-ke2-rhel89-swarm-1
    Total CPU cores available cluster-wide: 12
    
    Disk Nodes
    
    rabbit@mq-ke2-rhel89-swarm-1
    rabbit@mq-ke2-rhel89-swarm-2
    rabbit@mq-ke2-rhel89-swarm-3
    
    Running Nodes
    
    rabbit@mq-ke2-rhel89-swarm-1
    rabbit@mq-ke2-rhel89-swarm-2
    rabbit@mq-ke2-rhel89-swarm-3
    
    Versions
    
    rabbit@mq-ke2-rhel89-swarm-1: RabbitMQ 3.13.7 on Erlang 26.2.5.5
    rabbit@mq-ke2-rhel89-swarm-2: RabbitMQ 3.13.7 on Erlang 26.2.5.5
    rabbit@mq-ke2-rhel89-swarm-3: RabbitMQ 3.13.7 on Erlang 26.2.5.5
    
    CPU Cores
    
    Node: rabbit@mq-ke2-rhel89-swarm-1, available CPU cores: 4
    Node: rabbit@mq-ke2-rhel89-swarm-2, available CPU cores: 4
    Node: rabbit@mq-ke2-rhel89-swarm-3, available CPU cores: 4
    
    Maintenance status
    
    Node: rabbit@mq-ke2-rhel89-swarm-1, status: not under maintenance
    Node: rabbit@mq-ke2-rhel89-swarm-2, status: not under maintenance
    Node: rabbit@mq-ke2-rhel89-swarm-3, status: not under maintenance
    
    Alarms
    
    (none)
    
    Network Partitions
    
    (none)
    
    Listeners
    
    Node: rabbit@mq-ke2-rhel89-swarm-1, interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP
    Node: rabbit@mq-ke2-rhel89-swarm-1, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
    Node: rabbit@mq-ke2-rhel89-swarm-1, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
    Node: rabbit@mq-ke2-rhel89-swarm-1, interface: [::], port: 5671, protocol: amqp/ssl, purpose: AMQP 0-9-1 and AMQP 1.0 over TLS
    Node: rabbit@mq-ke2-rhel89-swarm-2, interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP
    Node: rabbit@mq-ke2-rhel89-swarm-2, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
    Node: rabbit@mq-ke2-rhel89-swarm-2, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
    Node: rabbit@mq-ke2-rhel89-swarm-2, interface: [::], port: 5671, protocol: amqp/ssl, purpose: AMQP 0-9-1 and AMQP 1.0 over TLS
    Node: rabbit@mq-ke2-rhel89-swarm-3, interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP
    Node: rabbit@mq-ke2-rhel89-swarm-3, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
    Node: rabbit@mq-ke2-rhel89-swarm-3, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
    Node: rabbit@mq-ke2-rhel89-swarm-3, interface: [::], port: 5671, protocol: amqp/ssl, purpose: AMQP 0-9-1 and AMQP 1.0 over TLS
    
    Feature flags
    
    Flag: classic_mirrored_queue_version, state: enabled
    Flag: classic_queue_type_delivery_support, state: enabled
    Flag: direct_exchange_routing_v2, state: enabled
    Flag: drop_unroutable_metric, state: enabled
    Flag: empty_basic_get_metric, state: enabled
    Flag: feature_flags_v2, state: enabled
    Flag: implicit_default_bindings, state: enabled
    Flag: khepri_db, state: disabled
    Flag: listener_records_in_ets, state: enabled
    Flag: maintenance_mode_status, state: enabled
    Flag: message_containers, state: enabled
    Flag: message_containers_deaths_v2, state: enabled
    Flag: quorum_queue, state: enabled
    Flag: quorum_queue_non_voters, state: enabled
    Flag: restart_streams, state: enabled
    Flag: stream_filtering, state: enabled
    Flag: stream_queue, state: enabled
    Flag: stream_sac_coordinator_unblock_group, state: enabled
    Flag: stream_single_active_consumer, state: enabled
    Flag: stream_update_config_command, state: enabled
    Flag: tracking_records_in_ets, state: enabled
    Flag: user_limits, state: enabled
    Flag: virtual_host_metadata, state: enabled
    

    上の情報で Cluster name あることと Running Nodes 3台があることを確認します。

    なければノード間通信を確認します。

    ネットワーク導通をご確認ください。

  2. 全てのノードの rabbitmq を再起動してください

    $ docker service update --force ke2_rabbitmq
    
  3. 解決できない場合は、各ノードの RabbitMQ の ボリュームを削除も必要です。

    Rabbitmq のボリュームを削除