ceph pg down+peering状态处理方案




参考文档:http://docs.ceph.com/docs/hammer/rados/troubleshooting/troubleshooting-pg/

复现步骤:

  1. 对存储池(min_size=1)中的一个卷进行fio数据写入
  2. 停止2个副本osd,保持单副本运行状态,并保持写入数据
  3. 停止第3个副本osd,之后启动前2个副本osd

此时ceph -s就可以看到出现了down+peering状态的pg。

我们在这种情况下,首先选择启动第3个副本osd,如果可以正常启动,那么down+peering状态的pg就会恢复正常active状态。

如果发生第3个副本osd无法启动(磁盘故障或者节点故障等)情况,则需要考虑挽救方案,这种情况下可能会丢失部分对象数据(对象数据回滚到老版本)。这种场景的pg状态恢复操作步骤如下:
根据ceph pg xxx query命令,找到peering_blocked_by的osd id,执行ceph osd lost $OSDID –yes-i-really-mean-it,标记该osd为lost,并重启pg xxx中的存活的一个osd(强制重新进行peering),如果pg数据没有更新到故障的osd上,也即单副本运行期间,该pg没有新写入数据,那么这么操作后不会发生对象数据丢失。之后把故障的osd移除即可。

如果执行lost命令之后,ceph -s看到有unfound对象。则需要执行ceph pg xxx mark_unfound_lost revert,将pg中lost对象revert到之前的版本(单副本运行期间写入的数据会丢失)。之后pg就会恢复正常的active状态。最后将故障osd移除即可。

常用命令:

$ ceph health detail   ## 可以查看集群健康状态的详细信息,比如不可用pg情况、osd down情况等
pg 11.3c is stuck inactive for 5581.136026, current state down+peering, last acting [24,40]
pg 11.17 is stuck inactive for 5582.046433, current state down+peering, last acting [24,40]
pg 11.3a is stuck inactive for 5543.997257, current state down+peering, last acting [40,24]
pg 11.2e is stuck inactive for 5582.079197, current state down+peering, last acting [24,40]
pg 11.1 is stuck unclean for 5585.626778, current state down+peering, last acting [40,24]
pg 11.3c is stuck unclean for 5585.230310, current state down+peering, last acting [24,40]
pg 11.3 is stuck unclean for 5585.384133, current state down+peering, last acting [40,24]
pg 11.17 is stuck unclean for 5586.867256, current state down+peering, last acting [24,40]
pg 11.38 is stuck unclean for 5586.403706, current state down+peering, last acting [40,24]
pg 11.3a is stuck unclean for 11991.965760, current state down+peering, last acting [40,24]
pg 11.2e is stuck unclean for 5585.567934, current state down+peering, last acting [24,40]
pg 11.6 is stuck unclean for 5585.452910, current state down+peering, last acting [40,24]
pg 11.3c is down+peering, acting [24,40]
pg 11.38 is down+peering, acting [40,24]
pg 11.3a is down+peering, acting [40,24]
pg 11.2e is down+peering, acting [24,40]
pg 11.17 is down+peering, acting [24,40]
pg 11.6 is down+peering, acting [40,24]
pg 11.1 is down+peering, acting [40,24]
pg 11.3 is down+peering, acting [40,24]
$ ceph pg 11.3c query  ##查看pg的详细信息
......
  "recovery_state": [
   ......
            "probing_osds": [
                "24",
                "40"
            ],
            "blocked": "peering is blocked due to down osds",
            "down_osds_we_would_probe": [
                0
            ],
            "peering_blocked_by": [
                {
                    "osd": 0,
                    "current_lost_at": 20891,
                    "comment": "starting or marking this osd lost may let us proceed"  ### 提示信息
              }
            ]
        },
......