kubernetes源码调试体验




源码是k8s的release-1.10分支,为啥没用master?因为我的虚拟机里面安装的golang版本是1.9.4的,不满足最新版的要求,也懒得更新了。

编译环境是CentOS 7.2 x86_64。

总的来说,Go的调试与C比较接近,调试工具和调试命令都很像,尤其是都可以用gdb调试,但我这次没有用gdb,而是用了网上说的更适合Go的delve

要调试k8s,首先得编译出来debug版本的二进制程序,正常编译肯定是make all就好了,debug版本的要去掉编译优化选项并打开调试选项,看了下k8s的Makefile:

     67 # Build code.
     68 #
     69 # Args:
     70 #   WHAT: Directory names to build.  If any of these directories has a 'main'
     71 #     package, the build will produce executable files under $(OUT_DIR)/go/bin.
     72 #     If not specified, "everything" will be built.
     73 #   GOFLAGS: Extra flags to pass to 'go' when building.
     74 #   GOLDFLAGS: Extra linking flags passed to 'go' when building.
     75 #   GOGCFLAGS: Additional go compile flags passed to 'go' when building.
     76 #
     77 # Example:
     78 #   make
     79 #   make all
     80 #   make all WHAT=cmd/kubelet GOFLAGS=-v
     81 #   make all GOGCFLAGS="-N -l"
     82 #     Note: Use the -N -l options to disable compiler optimizations an inlining.
     83 #           Using these build options allows you to subsequently use source
     84 #           debugging tools like delve.
     85 endef
     86 .PHONY: all

可以看到81 # make all GOGCFLAGS=”-N -l” 这行和下面的几行注释,提到了-N -l这两个编译选项,分别是指禁用编译优化和禁用内联优化,最终目的是可以在调试代码的时候单步执行可以看到实际对应的每行源码,具体支持的选项列表可以通过如下命令查看:

root@linux kubernetes [release-1.10] $ go tool compile --help
usage: compile [options] file.go...
  -%    debug non-static initializers
  -+    compiling runtime
  -B    disable bounds checking
  -C    disable printing of columns in error messages
  -D path
        set relative path for local imports
  -E    debug symbol export
  -I directory
        add directory to import search path
  -K    debug missing line numbers
  -N    disable optimizations
  -S    print assembly listing
  -V    print compiler version
  -W    debug parse tree after type checking
  -asmhdr file
        write assembly header to file
  -bench file
        append benchmark times to file
  -blockprofile file
        write block profile to file
  -buildid id
        record id as the build id in the export metadata
  -c int
        concurrency during compilation, 1 means no concurrency (default 1)
  -complete
        compiling complete package (no C or assembly)
  -cpuprofile file
        write cpu profile to file
  -d list
        print debug information about items in list; try -d help
  -dolinkobj
        generate linker-specific objects; if false, some invalid code may compile (default true)
  -dwarf
        generate DWARF symbols (default true)
  -dynlink
        support references to Go symbols defined in other shared libraries
  -e    no limit on number of errors reported
  -f    debug stack frames
  -goversion string
        required version of the runtime
  -h    halt on error
  -i    debug line number stack
  -importcfg file
        read import configuration from file
  -importmap definition
        add definition of the form source=actual to import map
  -installsuffix suffix
        set pkg directory suffix
  -j    debug runtime-initialized variables
  -l    disable inlining
  -linkobj file
        write linker-specific object to file
  -live
        debug liveness analysis
  -m    print optimization decisions
  -memprofile file
        write memory profile to file
  -memprofilerate rate
        set runtime.MemProfileRate to rate
  -msan
        build code compatible with C/C++ memory sanitizer
  -mutexprofile file
        write mutex profile to file
  -nolocalimports
        reject local (relative) imports
  -o file
        write output to file
  -p path
        set expected package import path
  -pack
        write package file instead of object file
  -r    debug generated wrappers
  -race
        enable race detector
  -s    warn about composite literals that can be simplified
  -shared
        generate code that can be linked into a shared library
  -std
        compiling standard library
  -traceprofile file
        write an execution trace to file
  -trimpath prefix
        remove prefix from recorded source file paths
  -u    reject unsafe code
  -v    increase debug verbosity
  -w    debug type checking
  -wb
        enable write barrier (default true)

 

要编译首先得下载源码,git clone https://github.com/kubernetes/kubernetes.git ,之后切到你想要编译的分支,这里以release-1.10为例,git checkout origin/release-1.10 -b release-1.10 ,根据官方文档准备编译环境:https://github.com/kubernetes/community/blob/master/contributors/devel/development.md#building-kubernetes-on-a-local-osshell-environment

我这里选的是Linux本地编译,不是Docker编译。etcd和go的安装就不多说了。接下来就是编译了,命令上面已经说过了make all GOGCFLAGS=”-N -l” ,在kubernetes目录(源码根目录)下执行就好了。然后就是等待编译结束,第一次会比较慢,编译好之后会把编译好的二进制文件放到kubernetes目录下的_output/bin/ ,对应源码则是在_output/local/go/src/ ,进入目录可以看出,_output/local/go/src/k8s.io/ 下面的kubernetes目录其实是一个软链接,链接到kubernetes源码根目录。

接下来是调试过程,首先要安装delve,安装也比较简单,我是用的go get github.com/derekparker/delve/cmd/dlv 直接下载编译的,自动编译好的二进制文件放在~/go/bin/ 目录下(也就是是$GOPATH/bin目录),接下来要么你把这个dlv工具copy到系统变量$PATH里任何目录下(比如/usr/local/bin),要么也可以建个软链接过去,或者把它所在的目录也加到系统变量$PATH下,比如在dlv工具所在目录执行:export PATH=$PATH:`pwd` (注意这种方式没有持久化,退出shell窗口之后就失效了,需要再次执行),之后就可以在任何地方运行dlv命令了。

首先看下dlv用法,目前看起来比较常用的是attach、exec、debug三个子命令:

root@linux ~ $ dlv -h
Delve is a source level debugger for Go programs.

Delve enables you to interact with your program by controlling the execution of the process,
evaluating variables, and providing information of thread / goroutine state, CPU register state and more.

The goal of this tool is to provide a simple yet powerful interface for debugging Go programs.

Pass flags to the program you are debugging using `--`, for example:

`dlv exec ./hello -- server --config conf/config.toml`

Usage:
  dlv [command]

Available Commands:
  attach      Attach to running process and begin debugging.
  connect     Connect to a headless debug server.
  core        Examine a core dump.
  debug       Compile and begin debugging main package in current directory, or the package specified.
  exec        Execute a precompiled binary, and begin a debug session.
  help        Help about any command
  run         Deprecated command. Use 'debug' instead.
  test        Compile test binary and begin debugging program.
  trace       Compile and begin tracing program.
  version     Prints version.

简单调试下kube-apiserver:

root@linux bin [release-1.10] $ dlv exec ./kube-apiserver 
Type 'help' for list of commands.
(dlv) h
The following commands are available:
    args ------------------------ Print function arguments.
    break (alias: b) ------------ Sets a breakpoint.
    breakpoints (alias: bp) ----- Print out info for active breakpoints.
    clear ----------------------- Deletes breakpoint.
    clearall -------------------- Deletes multiple breakpoints.
    condition (alias: cond) ----- Set breakpoint condition.
    config ---------------------- Changes configuration parameters.
    continue (alias: c) --------- Run until breakpoint or program termination.
    disassemble (alias: disass) - Disassembler.
    down ------------------------ Move the current frame down.
    exit (alias: quit | q) ------ Exit the debugger.
    frame ----------------------- Set the current frame, or execute command on a different frame.
    funcs ----------------------- Print list of functions.
    goroutine ------------------- Shows or changes current goroutine
    goroutines ------------------ List program goroutines.
    help (alias: h) ------------- Prints the help message.
    list (alias: ls | l) -------- Show source code.
    locals ---------------------- Print local variables.
    next (alias: n) ------------- Step over to next source line.
    on -------------------------- Executes a command when a breakpoint is hit.
    print (alias: p) ------------ Evaluate an expression.
    regs ------------------------ Print contents of CPU registers.
    restart (alias: r) ---------- Restart process.
    set ------------------------- Changes the value of a variable.
    source ---------------------- Executes a file containing a list of delve commands
    sources --------------------- Print list of source files.
    stack (alias: bt) ----------- Print stack trace.
    step (alias: s) ------------- Single step through program.
    step-instruction (alias: si)  Single step a single cpu instruction.
    stepout --------------------- Step out of the current function.
    thread (alias: tr) ---------- Switch to the specified thread.
    threads --------------------- Print out info for every traced thread.
    trace (alias: t) ------------ Set tracepoint.
    types ----------------------- Print list of types
    up -------------------------- Move the current frame up.
    vars ------------------------ Print package variables.
    whatis ---------------------- Prints type of an expression.
Type help followed by a command for full documentation.
(dlv) c
I0626 19:05:42.817137 32499 server.go:135] Version: v1.10.6-beta.0.5+d2b954eaf83afe
I0626 19:05:42.819718 32499 server.go:724] external host was not specified, using 10.0.90.22
W0626 19:05:42.819741 32499 authentication.go:377] AnonymousAuth is not allowed with the AllowAll authorizer. Resetting AnonymousAuth to false. You should use a different authorizer
--etcd-servers must be specified
Process 32499 has exited with status 1
(dlv)

这里我没有给kube-apiserver传参数,执行c之后就异常退出了,我们这次调试的目标就是分析为啥退出?首先在入口的main函数处设置断点(可以不设置):

Process 32499 has exited with status 0
(dlv) r
Process restarted with PID 1223
(dlv) b main.main
Breakpoint 1 set at 0x40b1178 for main.main() /root/k8s/kubernetes/_output/local/go/src/k8s.io/kubernetes/cmd/kube-apiserver/apiserver.go:37
(dlv)

之后根据错误日志”–etcd-servers must be specified”找到报错的代码位置处继续设置断点(注意源文件路径,是相对路径,相对于/root/k8s/kubernetes/_output/local/go/src/目录的,其中kubernetes是git clone的源码根目录):

func (s *EtcdOptions) Validate() []error {
	if s == nil {
		return nil
	}

	allErrors := []error{}
	if len(s.StorageConfig.ServerList) == 0 {
		allErrors = append(allErrors, fmt.Errorf("--etcd-servers must be specified"))
	}
    ...
}
(dlv) b k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/options/etcd.go:86
Breakpoint 2 set at 0x1d359d0 for k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/options.(*EtcdOptions).Validate() /root/k8s/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/options/etcd.go:86
(dlv) c
> main.main() /root/k8s/kubernetes/_output/local/go/src/k8s.io/kubernetes/cmd/kube-apiserver/apiserver.go:37 (hits goroutine(1):1 total:1) (PC: 0x40b1178)
    32:         "k8s.io/kubernetes/cmd/kube-apiserver/app"
    33:         _ "k8s.io/kubernetes/pkg/client/metrics/prometheus" // for client metric registration
    34:         _ "k8s.io/kubernetes/pkg/version/prometheus"        // for version metric registration
    35: )
    36:
=>  37: func main() {
    38:         rand.Seed(time.Now().UTC().UnixNano())
    39:
    40:         command := app.NewAPIServerCommand()
    41:
    42:         // TODO: once we switch everything over to Cobra commands, we can go back to calling
(dlv) c
I0626 19:14:02.763315    1223 server.go:135] Version: v1.10.6-beta.0.5+d2b954eaf83afe
I0626 19:14:02.764537    1223 server.go:724] external host was not specified, using 10.0.90.22
W0626 19:14:02.764556    1223 authentication.go:377] AnonymousAuth is not allowed with the AllowAll authorizer.  Resetting AnonymousAuth to false. You should use a different authorizer
> k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/options.(*EtcdOptions).Validate() /root/k8s/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/options/etcd.go:86 (hits goroutine(1):1 total:1) (PC: 0x1d359d0)
    81:         if s == nil {
    82:                 return nil
    83:         }
    84:
    85:         allErrors := []error{}
=>  86:         if len(s.StorageConfig.ServerList) == 0 {
    87:                 allErrors = append(allErrors, fmt.Errorf("--etcd-servers must be specified"))
    88:         }
    89:
    90:         if !storageTypes.Has(s.StorageConfig.Type) {
    91:                 allErrors = append(allErrors, fmt.Errorf("--storage-backend invalid, must be 'etcd3' or 'etcd2'. If not specified, it will default to 'etcd3'"))

bt命令可以看到整个调用栈:

(dlv) bt
 0  0x0000000001d359d0 in k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/options.(*EtcdOptions).Validate
    at /root/k8s/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/options/etcd.go:86
 1  0x0000000003c4e171 in k8s.io/kubernetes/cmd/kube-apiserver/app/options.(*ServerRunOptions).Validate
    at /root/k8s/kubernetes/_output/local/go/src/k8s.io/kubernetes/cmd/kube-apiserver/app/options/validation.go:55
 2  0x00000000040a37d1 in k8s.io/kubernetes/cmd/kube-apiserver/app.CreateKubeAPIServerConfig
    at /root/k8s/kubernetes/_output/local/go/src/k8s.io/kubernetes/cmd/kube-apiserver/app/server.go:295
 3  0x00000000040a1da3 in k8s.io/kubernetes/cmd/kube-apiserver/app.CreateServerChain
    at /root/k8s/kubernetes/_output/local/go/src/k8s.io/kubernetes/cmd/kube-apiserver/app/server.go:152
 4  0x00000000040a1a21 in k8s.io/kubernetes/cmd/kube-apiserver/app.Run
    at /root/k8s/kubernetes/_output/local/go/src/k8s.io/kubernetes/cmd/kube-apiserver/app/server.go:137
 5  0x00000000040ac927 in k8s.io/kubernetes/cmd/kube-apiserver/app.NewAPIServerCommand.func1
    at /root/k8s/kubernetes/_output/local/go/src/k8s.io/kubernetes/cmd/kube-apiserver/app/server.go:121
 6  0x0000000003c5c0b8 in k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
    at /root/k8s/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:757
 7  0x0000000003c5cb46 in k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
    at /root/k8s/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:843
 8  0x0000000003c5c39f in k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
    at /root/k8s/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:791
 9  0x00000000040b12b5 in main.main
    at /root/k8s/kubernetes/_output/local/go/src/k8s.io/kubernetes/cmd/kube-apiserver/apiserver.go:51
10  0x0000000000431964 in runtime.main
    at /usr/lib/golang/src/runtime/proc.go:195
(dlv)

这里简单调试下,主要是为后续深入分析k8s代码做准备,有了调试工具,所有代码流程都可以分析清楚,只需要在你关心的代码或者函数哪里加上断点,之后等代码执行过来执行bt命令就能看到整个调用栈了,非常省事省力快速有效,拿到调用栈之后就是一步一步的分析源码了,这样可以保证分析流程不出错(尤其是在大型项目里,很多同名函数、回调函数,很容易绕晕,调用栈基本就相当于是分析代码流程的指南针了)。

dlv的attach和debug,也试了下,也比较简单,只是使用场景不太一样而已,attach是挂载到运行中的进程进行debug,debug是直接针对go源码调试的,命令执行起来之后跟exec调试过程就没有区别了。这里就不贴执行流程了。