Open vSwitch OpenFlow流表规则及操作浅析




环境准备

操作系统:Centos7.2 64bit

软件:tcpdump(tcpdump-4.9.0-5)、openvswitch(openvswitch-2.7.2-3)、ping(iputils-20160308-10)、ip(iproute-3.10.0-54)、ifconfig(net-tools-2.0-0.22.20131004git)

网络环境:ovs bridge ovs-wp、netns ns0&ns1、ovs ports wp0&wp1,wp0 in ns0 with ip address 10.0.0.11,wp1 in ns1 with ip address 10.0.0.12,ovs-wp bridge with ip address 10.0.0.1

 

实验目标

理解

cookie=0x0, duration=314.876s, table=0, n_packets=0, n_bytes=0, idle_age=314, priority=1,in_port=100 actions=mod_nw_src:10.0.0.101,NORMAL

cookie=0x98cd36ade228396c, duration=16162.223s, table=71, n_packets=532, n_bytes=22344, idle_age=3, priority=95,arp,reg5=0xf,in_port=15,dl_src=fa:16:3e:0b:4c:49,arp_spa=10.0.70.52 actions=NORMAL

cookie=0x98cd36ade228396c, duration=16162.915s, table=71, n_packets=0, n_bytes=0, idle_age=16162, priority=70,udp,reg5=0x3546,in_port=13638,tp_src=67,tp_dst=68 actions=drop

各字段的意义和使用场景

操作步骤

创建ovs bridge:ovs-vsctl add-br ovs-wp

创建ovs port并添加到ovs bridge:

有两种方法,一种是分两步, ovs-vsctl add-port ovs-wp wp0先添加wp0 port,然后执行ovs-vsctl ovs-wp set Interface wp0 type=internal修改port类型为internal;第二种方法是一次执行完,ovs-vsctl add-port ovs-wp wp1 — set Interface wp1 type=internal

问题1:为啥要把port修改为internal类型?

查看ovs bridge信息:ovs-vsctl show

Bridge ovs-wp
        Port "wp1"
            Interface "wp1"
                type: internal
        Port ovs-wp
            Interface ovs-wp
                type: internal
        Port "wp0"
            Interface "wp0"
                type: internal

用ovs-ofctl命令查看:ovs-ofctl show ovs-wp

OFPT_FEATURES_REPLY (xid=0x2): dpid:000036359cfd5f41
n_tables:254, n_buffers:0
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
 1(wp1): addr:00:00:00:00:e0:ec
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
 2(wp2): addr:ea:eb:a1:76:cc:3c
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
 100(wp0): addr:00:00:00:00:a0:26
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(ovs-wp): addr:36:35:9c:fd:5f:41
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

wp0前面的数字100就是OpenFlow port number。

创建netns: ip netns add ns0、 ip netns add ns1

移动port到netns:ip link set wp0 netns ns0、ip link set wp0 netns ns1

进入netns,设置ip地址:ip netns ns0 exec bash、ip addr add 10.10.0.11/24 dev wp0,ip netns ns1 exec bash、ip addr add 10.10.0.12/24 dev wp1

问题2:什么是netns?可以用来做什么?

设置ovs bridge的ip地址:ip addr add 10.0.0.1/24 dev ovs-wp

root@host-10-0-80-25 ~ $ ifconfig ovs-wp
ovs-wp: flags=4419<UP,BROADCAST,RUNNING,PROMISC,MULTICAST>  mtu 1500
        inet 10.0.0.1  netmask 255.255.255.0  broadcast 0.0.0.0
        inet6 fe80::3435:9cff:fefd:5f41  prefixlen 64  scopeid 0x20<link>
        ether 36:35:9c:fd:5f:41  txqueuelen 0  (Ethernet)
        RX packets 129  bytes 7266 (7.0 KiB)
        RX errors 0  dropped 27  overruns 0  frame 0
        TX packets 47  bytes 3518 (3.4 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

从root netns ping wp1(wp0也类似):

root@host-10-0-80-25 ~ $ tcpdump -i wp1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on wp1, link-type EN10MB (Ethernet), capture size 262144 bytes
14:40:56.955935 IP 10.0.0.1 > 10.0.0.12: ICMP echo request, id 8760, seq 1, length 64
14:40:56.955971 IP 10.0.0.12 > 10.0.0.1: ICMP echo reply, id 8760, seq 1, length 64
14:40:57.955861 IP 10.0.0.1 > 10.0.0.12: ICMP echo request, id 8760, seq 2, length 64
14:40:57.955889 IP 10.0.0.12 > 10.0.0.1: ICMP echo reply, id 8760, seq 2, length 64

查看bridge上的流表信息:ovs-ofctl dump-flows ovs-wp

root@host-10-0-80-25 ~ $ ovs-ofctl dump-flows ovs-wp
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=258387.448s, table=0, n_packets=1359, n_bytes=115974, idle_age=1780, hard_age=65534, priority=0 actions=NORMAL

从wp0 ping wp1以及反方向测试连通性:先进入ns0,ip netns exec ns0 bash,之后执行ip a(ip addr简写),可以看到wp0虚拟网卡及其ip信息,

root@host-10-0-80-25 ~ $ ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN 
 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
20: wp0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    link/ether a6:9e:9d:40:55:fb brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.11/24 scope global wp0
       valid_lft forever preferred_lft forever
    inet6 fe80::a49e:9dff:fe40:55fb/64 scope link 
       valid_lft forever preferred_lft forever
root@host-10-0-80-25 ~ $ ping 10.0.0.12 
PING 10.0.0.12 (10.0.0.12) 56(84) bytes of data.
64 bytes from 10.0.0.12: icmp_seq=1 ttl=64 time=0.490 ms
64 bytes from 10.0.0.12: icmp_seq=2 ttl=64 time=0.038 ms

ping的过程中在ns1中使用tcpdump查看wp1上的数据包:

root@host-10-0-80-25 ~ $ tcpdump -i wp1    
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on wp1, link-type EN10MB (Ethernet), capture size 262144 bytes

17:30:50.180109 IP 10.0.0.11 > 10.0.0.12: ICMP echo request, id 28272, seq 1, length 64
17:30:50.180154 IP 10.0.0.12 > 10.0.0.11: ICMP echo reply, id 28272, seq 1, length 64
17:30:51.180847 IP 10.0.0.11 > 10.0.0.12: ICMP echo request, id 28272, seq 2, length 64
17:30:51.180877 IP 10.0.0.12 > 10.0.0.11: ICMP echo reply, id 28272, seq 2, length 64

默认在netns中ping wp0、wp1自己的ip是不通的,原因是没有启用lo设备,启用后可以正常ping通:ifconfig lo up,之后再次执行ip a查看网络信息:

root@host-10-0-80-25 ~ $ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
20: wp0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    link/ether a6:9e:9d:40:55:fb brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.11/24 scope global wp0
       valid_lft forever preferred_lft forever
    inet6 fe80::a49e:9dff:fe40:55fb/64 scope link 
       valid_lft forever preferred_lft forever

在ns0里ping wp0自己ip地址10.0.0.11过程中使用tcpdump查看lo设备网络数据包,可以看到有数据包,查看wp0虚拟网卡没有数据包,ping 127.0.0.1也一样,

root@host-10-0-80-25 ~ $ tcpdump -i wp0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on wp0, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel
root@host-10-0-80-25 ~ $ tcpdump -i lo
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes
17:47:14.881881 IP 10.0.0.11 > 10.0.0.11: ICMP echo request, id 29373, seq 6, length 64
17:47:14.881899 IP 10.0.0.11 > 10.0.0.11: ICMP echo reply, id 29373, seq 6, length 64
^C
2 packets captured
4 packets received by filter
0 packets dropped by kernel

问题3:为啥ping自己的ip(非lo设备上的127.0.0.1)必须要启用lo设备?

在ns0里ping ns1里wp1的ip 10.0.0.12过程中,查看两次ovs bridge流表,可以看到数据包数量和字节数有增加,停止ping之后保持不变,

root@host-10-0-80-25 ~ $ ovs-ofctl dump-flows ovs-wp
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=260195.275s, table=0, n_packets=1849, n_bytes=161978, idle_age=959, hard_age=65534, priority=0 actions=NORMAL
root@host-10-0-80-25 ~ $ ovs-ofctl dump-flows ovs-wp
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=260199.123s, table=0, n_packets=1853, n_bytes=162370, idle_age=0, hard_age=65534, priority=0 actions=NORMAL

n_packets=1849, n_bytes=161978,n_packets=1853, n_bytes=162370,多了4个包,并且idle_age也清零了。

问题4:n_packets、n_bytes、idle_age、hard_age的含义是啥?

流规则操作

添加一条规则,丢弃wp0到wp1的icmp协议包,让wp0 ping不通wp1:

方法1,用wp0的mac地址做为过滤条件:

ovs-ofctl add-flow ovs-wp "dl_src=a6:9e:9d:40:55:fb, dl_type=0x0800, nw_proto=1, actions=drop"

添加完查看流表规则:

root@host-10-0-80-25 ~ $ ovs-ofctl dump-flows ovs-wp
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=72.153s, table=0, n_packets=0, n_bytes=0, idle_age=72, icmp,dl_src=a6:9e:9d:40:55:fb actions=drop
 cookie=0x0, duration=320712.513s, table=0, n_packets=1873, n_bytes=164106, idle_age=60505, hard_age=65534, priority=0 actions=NORMAL

多出来一条新加的,进入ns0,执行ping 10.0.0.12,不通了:

root@host-10-0-80-25 ~ $ ip netns exec ns0 bash
root@host-10-0-80-25 ~ $ ip a
1: lo: <LOOPBACK,PROMISC,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
20: wp0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    link/ether a6:9e:9d:40:55:fb brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.11/24 scope global wp0
       valid_lft forever preferred_lft forever
    inet6 fe80::a49e:9dff:fe40:55fb/64 scope link 
       valid_lft forever preferred_lft forever
root@host-10-0-80-25 ~ $ ping 10.0.0.12
PING 10.0.0.12 (10.0.0.12) 56(84) bytes of data.
^C
--- 10.0.0.12 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms

问题5:dl_src、dl_type、nw_proto、action的含义是啥?

方法2,用OpenFlow Port Number作为过滤条件(先删除方法1的规则):

root@host-10-0-80-25 ~ $ ovs-ofctl --strict del-flows ovs-wp "dl_src=a6:9e:9d:40:55:fb, dl_type=0x0800, nw_proto=1"
root@host-10-0-80-25 ~ $ ovs-ofctl dump-flows ovs-wp
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=320972.028s, table=0, n_packets=1875, n_bytes=164190, idle_age=163, hard_age=65534, priority=0 actions=NORMAL
root@host-10-0-80-25 ~ $ ovs-ofctl add-flow ovs-wp "in_port=100, dl_type=0x0800, nw_proto=1, actions=drop"             
root@host-10-0-80-25 ~ $ ovs-ofctl dump-flows ovs-wp      
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=3.230s, table=0, n_packets=0, n_bytes=0, idle_age=3, icmp,in_port=100 actions=drop
 cookie=0x0, duration=321049.505s, table=0, n_packets=1875, n_bytes=164190, idle_age=240, hard_age=65534, priority=0 actions=NORMAL

添加一条规则,修改wp0到wp1的数据包的源ip地址为1.2.3.4,并验证priority字段的用途:

# root netns执行
root@host-10-0-80-25 ~ $ ovs-ofctl add-flow ovs-wp "table=0, in_port=100, priority=1, actions=mod_nw_src:1.2.3.4,normal"     
root@host-10-0-80-25 ~ $ ovs-ofctl dump-flows ovs-wp
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1.126s, table=0, n_packets=0, n_bytes=0, idle_age=1, priority=1,in_port=100 actions=mod_nw_src:1.2.3.4,NORMAL
 cookie=0x0, duration=321475.439s, table=0, n_packets=1891, n_bytes=165422, idle_age=381, hard_age=65534, priority=0 actions=NORMAL
# ns0执行
root@host-10-0-80-25 ~ $ ping 10.0.0.12
PING 10.0.0.12 (10.0.0.12) 56(84) bytes of data.
^C
--- 10.0.0.12 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 3999ms
# ns1执行
root@host-10-0-80-25 ~ $ tcpdump -i wp1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on wp1, link-type EN10MB (Ethernet), capture size 262144 bytes
10:52:23.955783 IP 1.2.3.4 > 10.0.0.12: ICMP echo request, id 27149, seq 1, length 64
10:52:24.954834 IP 1.2.3.4 > 10.0.0.12: ICMP echo request, id 27149, seq 2, length 64
10:52:25.954850 IP 1.2.3.4 > 10.0.0.12: ICMP echo request, id 27149, seq 3, length 64
10:52:26.954831 IP 1.2.3.4 > 10.0.0.12: ICMP echo request, id 27149, seq 4, length 64
10:52:27.954880 IP 1.2.3.4 > 10.0.0.12: ICMP echo request, id 27149, seq 5, length 64
10:52:28.963968 ARP, Request who-has 10.0.0.12 tell 10.0.0.11, length 28
10:52:28.963996 ARP, Reply 10.0.0.12 is-at 4a:eb:8d:f7:b8:aa (oui Unknown), length 28
^C
7 packets captured
7 packets received by filter
0 packets dropped by kernel

再加一条优先级数字更大的,源ip改为2.3.4.5的规则:

root@host-10-0-80-25 ~ $ ovs-ofctl add-flow ovs-wp "table=0, in_port=100, priority=10, actions=mod_nw_src:2.3.4.5,normal"
root@host-10-0-80-25 ~ $ ovs-ofctl dump-flows ovs-wp
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=13384.251s, table=0, n_packets=6, n_bytes=532, idle_age=13349, priority=1,in_port=100 actions=mod_nw_src:1.2.3.4,NORMAL
 cookie=0x0, duration=2.782s, table=0, n_packets=0, n_bytes=0, idle_age=2, priority=10,in_port=100 actions=mod_nw_src:2.3.4.5,NORMAL
 cookie=0x0, duration=334858.564s, table=0, n_packets=1892, n_bytes=165464, idle_age=13349, hard_age=65534, priority=0 actions=NORMAL

priority数字越大表示越高,后加的这条优先级为priority=10的规则覆盖了之前的优先级为priority=1的,源ip修改为2.3.4.5:

root@host-10-0-80-25 ~ $ tcpdump -i wp1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on wp1, link-type EN10MB (Ethernet), capture size 262144 bytes
14:35:32.454862 IP 2.3.4.5 > 10.0.0.12: ICMP echo request, id 8372, seq 4, length 64
14:35:33.454858 IP 2.3.4.5 > 10.0.0.12: ICMP echo request, id 8372, seq 5, length 64
^C
2 packets captured
2 packets received by filter
0 packets dropped by kernel

从ns0、ns1 ping公网ip如114.114.114.114:

root@host-10-0-80-25 ~ $ ping 114.114.114.114
connect: Network is unreachable

ip r查看路由信息,没有默认路由信息:

root@host-10-0-80-25 ~ $ ip r
10.0.0.0/24 dev wp1  proto kernel  scope link  src 10.0.0.12

添加默认路由,之后还是不通,应该是ovs-wp这个bridge没有添加出口网卡设备,也没有配置相关的路由,相当于是无法连通公网的一个局域网。在ovs-wp加上eth0之后,把ovs-wp的mac改成eth0的,把eth0的改成其他的,用dhclient -v ovs-wp获取到之前eth0上的ip地址,默认路由也加回来了,在root netns ping外网地址可以通,但是在ns1里面还是不行,tcpdump看到数据包走到ovs-wp之后就没有再回来,eth0网卡没有数据包,添加流表规则,把wp1 port上的包都转到eth0所在的port,可以看到eth0上有包了,但是还是没有出去(ping 同网段的其他主机,在其他主机上tcpdump没看到有包过来),到此就不知道啥原因了,身边也没有可以请教的人,这个问题就先遗留吧。

流表规则还有几个字段的意义不太懂,

问题6:cookie、table、duration的含义是啥?

问题思考

问题1:为啥要把port修改为internal类型?

只有internal类型的port才能设置ip地址。

更多请参考:http://www.isjian.com/openstack/openstack-base-use-openvswitch/#port

问题2:什么是netns?可以用来做什么?

network namespace是Linux命名空间的一种,主要目的是为了实现网络隔离(隔离的网卡设备、独立的路由表)。

network namespace更多信息请参考:https://blog.scottlowe.org/2013/09/04/introducing-linux-network-namespaces/

Linux namespace介绍:https://coolshell.cn/articles/17010.html

问题3:为啥ping自己的ip(非lo设备上的127.0.0.1)必须要启用lo设备?

所有只在本机内部流转的网络数据包都需要经过lo设备的处理。

参考:http://blog.csdn.net/xie0812/article/details/32075613

问题4:n_packets、n_bytes、idle_age、hard_age的含义是啥?

n_packets、n_bytes,匹配到这条规则的网络包数、字节数。

idle_age:多久没有数据包经过这条规则,单位秒

hard_age:距这条规则创建、修改经过的时间,单位秒

参考:http://www.openvswitch.org//support/dist-docs/ovs-ofctl.8.txt

问题5:dl_src、dl_type、nw_proto、actions的含义是啥?

dl_src:datalink source,也就是源mac地址,对应的dl_dst就是目标mac地址。

dl_type:datalink type,也就是数据链路类型,

nw_proto:network protocol,也就是网络层协议。

actions:规则执行的操作,操作有很多种,可参考下面的链接。

具体可参考:https://www.ibm.com/developerworks/cn/cloud/library/1401_zhaoyi_openswitch/

问题6:cookie、table、duration的含义是啥?

cookie=0x98cd36ade228396c,一个64bit的整数,相同的cookie值可以用来标记是同一批或同一类规则。

table:流表编号,可以用来建立规则的层次关系,比如先经过0号表,之后actions里面指定继续匹配10号表的规则,如:

 cookie=0x98cd36ade228396c, duration=16162.671s, table=0, n_packets=0, n_bytes=0, idle_age=16162, priority=90,dl_dst=fa:16:3e:2b:f5:40 actions=load:0x3547->NXM_NX_REG5[],load:0x2->NXM_NX_REG6[],resubmit(,81)

duration=secs,规则创建了多长时间。

最后唠叨一句,重启openvswitchd进程,会导致所有规则丢失,这也是OpenStack neutron项目很久才解决的一大难题。

参考: