ceph pg down+peering状态处理方案

参考文档：http://docs.ceph.com/docs/hammer/rados/troubleshooting/troubleshooting-pg/

复现步骤：

对存储池（min_size=1）中的一个卷进行fio数据写入
停止2个副本osd，保持单副本运行状态，并保持写入数据
停止第3个副本osd，之后启动前2个副本osd

此时ceph -s就可以看到出现了down+peering状态的pg。

我们在这种情况下，首先选择启动第3个副本osd，如果可以正常启动，那么down+peering状态的pg就会恢复正常active状态。

如果发生第3个副本osd无法启动（磁盘故障或者节点故障等）情况，则需要考虑挽救方案，这种情况下可能会丢失部分对象数据（对象数据回滚到老版本）。这种场景的pg状态恢复操作步骤如下：
根据ceph pg xxx query命令，找到peering_blocked_by的osd id，执行ceph osd lost $OSDID –yes-i-really-mean-it，标记该osd为lost，并重启pg xxx中的存活的一个osd（强制重新进行peering），如果pg数据没有更新到故障的osd上，也即单副本运行期间，该pg没有新写入数据，那么这么操作后不会发生对象数据丢失。之后把故障的osd移除即可。

如果执行lost命令之后，ceph -s看到有unfound对象。则需要执行ceph pg xxx mark_unfound_lost revert，将pg中lost对象revert到之前的版本（单副本运行期间写入的数据会丢失）。之后pg就会恢复正常的active状态。最后将故障osd移除即可。

常用命令：

$ ceph health detail   ## 可以查看集群健康状态的详细信息，比如不可用pg情况、osd down情况等
pg 11.3c is stuck inactive for 5581.136026, current state down+peering, last acting [24,40]
pg 11.17 is stuck inactive for 5582.046433, current state down+peering, last acting [24,40]
pg 11.3a is stuck inactive for 5543.997257, current state down+peering, last acting [40,24]
pg 11.2e is stuck inactive for 5582.079197, current state down+peering, last acting [24,40]
pg 11.1 is stuck unclean for 5585.626778, current state down+peering, last acting [40,24]
pg 11.3c is stuck unclean for 5585.230310, current state down+peering, last acting [24,40]
pg 11.3 is stuck unclean for 5585.384133, current state down+peering, last acting [40,24]
pg 11.17 is stuck unclean for 5586.867256, current state down+peering, last acting [24,40]
pg 11.38 is stuck unclean for 5586.403706, current state down+peering, last acting [40,24]
pg 11.3a is stuck unclean for 11991.965760, current state down+peering, last acting [40,24]
pg 11.2e is stuck unclean for 5585.567934, current state down+peering, last acting [24,40]
pg 11.6 is stuck unclean for 5585.452910, current state down+peering, last acting [40,24]
pg 11.3c is down+peering, acting [24,40]
pg 11.38 is down+peering, acting [40,24]
pg 11.3a is down+peering, acting [40,24]
pg 11.2e is down+peering, acting [24,40]
pg 11.17 is down+peering, acting [24,40]
pg 11.6 is down+peering, acting [40,24]
pg 11.1 is down+peering, acting [40,24]
pg 11.3 is down+peering, acting [40,24]
$ ceph pg 11.3c query  ##查看pg的详细信息
......
  "recovery_state": [
   ......
            "probing_osds": [
                "24",
                "40"
            ],
            "blocked": "peering is blocked due to down osds",
            "down_osds_we_would_probe": [
                0
            ],
            "peering_blocked_by": [
                {
                    "osd": 0,
                    "current_lost_at": 20891,
                    "comment": "starting or marking this osd lost may let us proceed"  ### 提示信息
              }
            ]
        },
......

$ ceph health detail ## 可以查看集群健康状态的详细信息，比如不可用pg情况、osd down情况等

pg 11.3c is stuck inactive for 5581.136026, current state down+peering, last acting [24,40]

pg 11.17 is stuck inactive for 5582.046433, current state down+peering, last acting [24,40]

pg 11.3a is stuck inactive for 5543.997257, current state down+peering, last acting [40,24]

pg 11.2e is stuck inactive for 5582.079197, current state down+peering, last acting [24,40]

pg 11.1 is stuck unclean for 5585.626778, current state down+peering, last acting [40,24]

pg 11.3c is stuck unclean for 5585.230310, current state down+peering, last acting [24,40]

pg 11.3 is stuck unclean for 5585.384133, current state down+peering, last acting [40,24]

pg 11.17 is stuck unclean for 5586.867256, current state down+peering, last acting [24,40]

pg 11.38 is stuck unclean for 5586.403706, current state down+peering, last acting [40,24]

pg 11.3a is stuck unclean for 11991.965760, current state down+peering, last acting [40,24]

pg 11.2e is stuck unclean for 5585.567934, current state down+peering, last acting [24,40]

pg 11.6 is stuck unclean for 5585.452910, current state down+peering, last acting [40,24]

pg 11.3c is down+peering, acting [24,40]

pg 11.38 is down+peering, acting [40,24]

pg 11.3a is down+peering, acting [40,24]

pg 11.2e is down+peering, acting [24,40]

pg 11.17 is down+peering, acting [24,40]

pg 11.6 is down+peering, acting [40,24]

pg 11.1 is down+peering, acting [40,24]

pg 11.3 is down+peering, acting [40,24]

$ ceph pg 11.3c query ##查看pg的详细信息

......

"recovery_state": [

......

"probing_osds": [

"24",

"40"

"blocked": "peering is blocked due to down osds",

"down_osds_we_would_probe": [

"peering_blocked_by": [

{

"osd": 0,

"current_lost_at": 20891,

"comment": "starting or marking this osd lost may let us proceed" ### 提示信息

}

]

......