rbd-nbd map卷到222个的时候会失败




控制台报错信息:fork: retry: Resource temporarily unavailable

// rbd client日志:
2019-01-30 08:50:14.949238 7f772824fec0 -1 /home/nbs/jenkins/ceph-build/release/ceph-deb-stretch-x86_64-basic/sha1/dc18f441ea142687ac152894b689d59170a47301/WORKDIR/ceph-12.2.5+netease+stretch+1.1-19-gdc18f44/src/common/Thread.cc: In function 'void Thread::create(const char*, size_t)' thread 7f772824fec0 time 2019-01-30 08:50:14.948047
/home/nbs/jenkins/ceph-build/release/ceph-deb-stretch-x86_64-basic/sha1/dc18f441ea142687ac152894b689d59170a47301/WORKDIR/ceph-12.2.5+netease+stretch+1.1-19-gdc18f44/src/common/Thread.cc: 152: FAILED assert(ret == 0)

ceph version 12.2.5+netease+stretch+1.1-19-gdc18f44 (dc18f441ea142687ac152894b689d59170a47301) luminous (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7f771ed1fab2]
2: (()+0x50ae55) [0x7f771ef8fe55]
3: (init_async_signal_handler()+0xe8) [0x557258dd9af8]
4: (()+0x15061) [0x557258dc6061]
5: (main()+0x9) [0x557258dc1f59]
6: (__libc_start_main()+0xf1) [0x7f771bf702e1]
7: (_start()+0x2a) [0x557258dc205a]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

首先怀疑是ulimit限制到了,但看了下没问题。

后来在syslog日志中看到了一条错误:

kernel: [ 6039.287966] cgroup: fork rejected by pids controller in /system.slice/ssh.service

所以就找到了原因,问题原因是我是用ssh过去在节点上执行的rbd-nbd map命令,而sshd进程可以fork的进程数量是有限制的。

尝试修改cgroup限制可以解决本问题:

$ sudo cat /sys/fs/cgroup/pids/system.slice/ssh.service/pids.max
4915 # 默认的4915数量太少了

$ sudo tee /sys/fs/cgroup/pids/system.slice/ssh.service/pids.max
32768 # 输入
32768 # ctrl+d退出输入即可

也可在root用户下直接执行:
echo 32768 > /sys/fs/cgroup/pids/system.slice/ssh.service/pids.max

修改后即可创建更多的进程。