Ceph CPU&MEMORY profiling – Blog of Aspirer

环境信息

OS：debian 9 with kernel-4.9.65
Ceph：luminous-12.2.5

CPU profiling

有两个工具，Linux常用的是perf，这个工具比较通用，功能也非常强大，debian提供安装包，另一个是oprofile，debian没有安装包，需要自己编译，并且在虚拟机里面无法使用。

use perf

参考：
– http://docs.ceph.com/docs/master/dev/perf/
– https://www.ibm.com/developerworks/cn/linux/l-cn-perf1/index.html （推荐这篇，各种常用命令解释比较清楚）

安装非常简单，直接apt-get install linux-perf-4.9即可，其中4.9是内核大版本号。

主要用到的命令有：
– perf top/perf top -p 1234/perf top -e cpu-clock:u（用户态CPU时钟周期采样统计）：实时观察进程CPU时钟周期采样计数信息
– perf stat/perf stat -p 1234：进程基础统计信息，用于高层次的分析进程情况，比如是IO密集还是CPU密集，或者先看下问题发生在哪个方面
– perf record -p 1234 -F 99 –call-graph dwarf — sleep 60：捕获进程CPU采样周期并且保存调用关系图
– perf report –call-graph caller/callee：报告展示调用关系，caller和callee是顺序相反的两个展示参数（调用者在上还是被调用者在上）
– perf list：查看所有perf支持的event列表，默认是cpu-cycles，这个是硬件事件，也可以用cpu-clock，这个是软件事件
– perf help xxx：查看命令帮助文档

配合FlameGraph脚本生成火焰图（需要先用perf record采集数据）：
1. git clone https://github.com/brendangregg/FlameGraph
2. perf script | FlameGraph/stackcollapse-perf.pl > perf-fg
3. ./FlameGraph/flamegraph.pl perf-fg > perf.svg

perf-flamegraph

上图中条带越宽表示函数被采样到的次数占总采样次数比例越高（占用的CPU时间片越多），也就是越耗费CPU资源。

该工具的好处是不需要特殊的编译选项，实际测试加不加-fno-omit-frame-pointer这个CFLAGS看起来对结果没啥影响。

use oprofile

官方的文档已经太老了，新版本的oprofile已经没有opcontrol命令了：
– http://docs.ceph.com/docs/master/rados/troubleshooting/cpu-profiling/
– http://docs.ceph.com/docs/master/dev/cpu-profiler/

因此自己编译了一个新版本的，过程如下：
1. wget https://sourceforge.net/projects/oprofile/files/oprofile/oprofile-1.3.0/oprofile-1.3.0.tar.gz
2. tar xzf oprofile-1.3.0.tar.gz
3. cd oprofile-1.3.0
4. apt install libpopt-dev libiberty-dev
5. useradd oprofile
6. ./configure
7. make && make install

然后就可以使用operf命令了，但是我在虚拟机里面使用报错：

root@ceph-l oprofile-1.3.0 $ operf -h
Your kernel's Performance Events Subsystem does not support your processor type.

root@ceph-l oprofile-1.3.0 $ operf -h

Your kernel's Performance Events Subsystem does not support your processor type.

考虑到物理机上使用也要编译，因此也不再深究。

MEMORY profiling with google-perftools

参考：
– http://docs.ceph.com/docs/master/rados/troubleshooting/memory-profiling/
– http://goog-perftools.sourceforge.net/doc/heap_profiler.html （官方帮助文档）

我们L版本使用的是tcmalloc，因此可以直接使用google-perftools，该工具安装也是apt-get install google-perftools即可。

常用命令：
– ceph tell osd.0 heap start_profiler：开启内存使用统计
– ceph tell osd.0 heap dump：dump内存使用情况（需要先start_profiler），默认输出到日志目录
– google-pprof –text /usr/bin/ceph-osd /var/log/ceph/ceph-osd.0.profile.0001.heap：查看dump出来的内存使用情况
– google-pprof –text –base osd.1.profile.0002.heap /usr/bin/ceph-osd osd.1.profile.0003.heap：对比两次dump处理的内存堆使用情况，会把base里的内存减掉，方便查看内存增量
– ceph tell osd.0 heap stats：基础统计信息，不需要start_profiler就能使用
– ceph tell osd.2 heap release：释放tcmalloc的缓存，归还给OS，也不需要start_profiler
– ceph tell osd.0 heap stop_profiler：停止profiler

root@blkin ceph $ google-pprof --text /usr/bin/ceph-osd osd.1.profile.0002.heap                                       
Using local file /usr/bin/ceph-osd.
Using local file osd.1.profile.0002.heap.
Total: 6.9 MB
     3.6  52.8%  52.8%      3.6  52.8% ceph::logging::Log::create_entry
     1.2  17.1%  70.0%      1.2  17.1% ceph::buffer::raw_posix_aligned::raw_posix_aligned
     0.9  12.7%  82.7%      0.9  12.7% mempool::pool_allocator::allocate
     0.7   9.9%  92.6%      0.7   9.9% std::__cxx11::basic_string::_M_mutate
     0.2   3.6%  96.1%      0.2   3.6% __gnu_cxx::new_allocator::allocate
     0.1   1.9%  98.1%      0.1   1.9% ceph::buffer::raw_combined::create
     0.1   0.8%  98.8%      0.1   0.8% std::__cxx11::basic_string::_M_construct
     0.0   0.3%  99.1%      0.0   0.3% AsyncConnection::AsyncConnection
     0.0   0.3%  99.4%      0.0   0.3% decode_message
     0.0   0.2%  99.6%      0.0   0.5% OpTracker::create_request
     0.0   0.2%  99.8%      0.0   0.5% AsyncMessenger::add_accept
     0.0   0.1%  99.9%      0.0   0.2% BlueStore::_deferred_queue
     0.0   0.1%  99.9%      0.0   0.2% BlueStore::_get_deferred_op
     0.0   0.0% 100.0%      0.0   0.0% std::__cxx11::basic_string::reserve
     0.0   0.0% 100.0%      0.0   0.0% OSD::ms_verify_authorizer
     0.0   0.0% 100.0%      0.0   0.0% ceph::Formatter::create@277566a
     0.0   0.0% 100.0%      0.0   0.0% get_auth_session_handler
     0.0   0.0% 100.0%      0.0   0.1% AuthNoneClientHandler::build_authorizer
     0.0   0.0% 100.0%      0.0   0.0% OSD::handle_command

root@blkin ceph $ google-pprof --text /usr/bin/ceph-osd osd.1.profile.0002.heap

Using local file /usr/bin/ceph-osd.

Using local file osd.1.profile.0002.heap.

Total: 6.9 MB

3.6 52.8% 52.8% 3.6 52.8% ceph::logging::Log::create_entry

1.2 17.1% 70.0% 1.2 17.1% ceph::buffer::raw_posix_aligned::raw_posix_aligned

0.9 12.7% 82.7% 0.9 12.7% mempool::pool_allocator::allocate

0.7 9.9% 92.6% 0.7 9.9% std::__cxx11::basic_string::_M_mutate

0.2 3.6% 96.1% 0.2 3.6% __gnu_cxx::new_allocator::allocate

0.1 1.9% 98.1% 0.1 1.9% ceph::buffer::raw_combined::create

0.1 0.8% 98.8% 0.1 0.8% std::__cxx11::basic_string::_M_construct

0.0 0.3% 99.1% 0.0 0.3% AsyncConnection::AsyncConnection

0.0 0.3% 99.4% 0.0 0.3% decode_message

0.0 0.2% 99.6% 0.0 0.5% OpTracker::create_request

0.0 0.2% 99.8% 0.0 0.5% AsyncMessenger::add_accept

0.0 0.1% 99.9% 0.0 0.2% BlueStore::_deferred_queue

0.0 0.1% 99.9% 0.0 0.2% BlueStore::_get_deferred_op

0.0 0.0% 100.0% 0.0 0.0% std::__cxx11::basic_string::reserve

0.0 0.0% 100.0% 0.0 0.0% OSD::ms_verify_authorizer

0.0 0.0% 100.0% 0.0 0.0% ceph::Formatter::create@277566a

0.0 0.0% 100.0% 0.0 0.0% get_auth_session_handler

0.0 0.0% 100.0% 0.0 0.1% AuthNoneClientHandler::build_authorizer

0.0 0.0% 100.0% 0.0 0.0% OSD::handle_command

其中第一列是函数使用的内存量（MB），第二列是当前函数内存使用量占总内存使用量的百分比（也即第一列的占比），第三列是第二列的累加值，也即TopN函数使用内存占比，第四列是当前函数和所有他调用到的子函数的内存使用量（MB），第五列是第四列和内存使用总量的百分比，最后一列是函数名。

官方解释：
– The first column contains the direct memory use in MB.
– The fourth column contains memory use by the procedure and all of its callees.
– The second and fifth columns are just percentage representations of the numbers in the first and fifth columns.
– The third column is a cumulative sum of the second column (i.e., the kth entry in the third column is the sum of the first k entries in the second column.)