环境准备
操作系统:Centos7.2 64bit
软件:tcpdump(tcpdump-4.9.0-5)、openvswitch(openvswitch-2.7.2-3)、ping(iputils-20160308-10)、ip(iproute-3.10.0-54)、ifconfig(net-tools-2.0-0.22.20131004git)
网络环境:ovs bridge ovs-wp、netns ns0&ns1、ovs ports wp0&wp1,wp0 in ns0 with ip address 10.0.0.11,wp1 in ns1 with ip address 10.0.0.12,ovs-wp bridge with ip address 10.0.0.1
实验目标
理解
cookie=0x0, duration=314.876s, table=0, n_packets=0, n_bytes=0, idle_age=314, priority=1,in_port=100 actions=mod_nw_src:10.0.0.101,NORMAL
cookie=0x98cd36ade228396c, duration=16162.223s, table=71, n_packets=532, n_bytes=22344, idle_age=3, priority=95,arp,reg5=0xf,in_port=15,dl_src=fa:16:3e:0b:4c:49,arp_spa=10.0.70.52 actions=NORMAL
cookie=0x98cd36ade228396c, duration=16162.915s, table=71, n_packets=0, n_bytes=0, idle_age=16162, priority=70,udp,reg5=0x3546,in_port=13638,tp_src=67,tp_dst=68 actions=drop
各字段的意义和使用场景
操作步骤
创建ovs bridge:ovs-vsctl add-br ovs-wp
创建ovs port并添加到ovs bridge:
有两种方法,一种是分两步, ovs-vsctl add-port ovs-wp wp0先添加wp0 port,然后执行ovs-vsctl ovs-wp set Interface wp0 type=internal修改port类型为internal;第二种方法是一次执行完,ovs-vsctl add-port ovs-wp wp1 — set Interface wp1 type=internal
问题1:为啥要把port修改为internal类型?
查看ovs bridge信息:ovs-vsctl show
1 2 3 4 5 6 7 8 9 10 |
Bridge ovs-wp Port "wp1" Interface "wp1" type: internal Port ovs-wp Interface ovs-wp type: internal Port "wp0" Interface "wp0" type: internal |
用ovs-ofctl命令查看:ovs-ofctl show ovs-wp
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
OFPT_FEATURES_REPLY (xid=0x2): dpid:000036359cfd5f41 n_tables:254, n_buffers:0 capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst 1(wp1): addr:00:00:00:00:e0:ec config: PORT_DOWN state: LINK_DOWN speed: 0 Mbps now, 0 Mbps max 2(wp2): addr:ea:eb:a1:76:cc:3c config: PORT_DOWN state: LINK_DOWN speed: 0 Mbps now, 0 Mbps max 100(wp0): addr:00:00:00:00:a0:26 config: PORT_DOWN state: LINK_DOWN speed: 0 Mbps now, 0 Mbps max LOCAL(ovs-wp): addr:36:35:9c:fd:5f:41 config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0 |
wp0前面的数字100就是OpenFlow port number。
创建netns: ip netns add ns0、 ip netns add ns1
移动port到netns:ip link set wp0 netns ns0、ip link set wp0 netns ns1
进入netns,设置ip地址:ip netns ns0 exec bash、ip addr add 10.10.0.11/24 dev wp0,ip netns ns1 exec bash、ip addr add 10.10.0.12/24 dev wp1
问题2:什么是netns?可以用来做什么?
设置ovs bridge的ip地址:ip addr add 10.0.0.1/24 dev ovs-wp
1 2 3 4 5 6 7 8 9 |
root@host-10-0-80-25 ~ $ ifconfig ovs-wp ovs-wp: flags=4419<UP,BROADCAST,RUNNING,PROMISC,MULTICAST> mtu 1500 inet 10.0.0.1 netmask 255.255.255.0 broadcast 0.0.0.0 inet6 fe80::3435:9cff:fefd:5f41 prefixlen 64 scopeid 0x20<link> ether 36:35:9c:fd:5f:41 txqueuelen 0 (Ethernet) RX packets 129 bytes 7266 (7.0 KiB) RX errors 0 dropped 27 overruns 0 frame 0 TX packets 47 bytes 3518 (3.4 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 |
从root netns ping wp1(wp0也类似):
1 2 3 4 5 6 7 |
root@host-10-0-80-25 ~ $ tcpdump -i wp1 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on wp1, link-type EN10MB (Ethernet), capture size 262144 bytes 14:40:56.955935 IP 10.0.0.1 > 10.0.0.12: ICMP echo request, id 8760, seq 1, length 64 14:40:56.955971 IP 10.0.0.12 > 10.0.0.1: ICMP echo reply, id 8760, seq 1, length 64 14:40:57.955861 IP 10.0.0.1 > 10.0.0.12: ICMP echo request, id 8760, seq 2, length 64 14:40:57.955889 IP 10.0.0.12 > 10.0.0.1: ICMP echo reply, id 8760, seq 2, length 64 |
查看bridge上的流表信息:ovs-ofctl dump-flows ovs-wp
1 2 3 |
root@host-10-0-80-25 ~ $ ovs-ofctl dump-flows ovs-wp NXST_FLOW reply (xid=0x4): cookie=0x0, duration=258387.448s, table=0, n_packets=1359, n_bytes=115974, idle_age=1780, hard_age=65534, priority=0 actions=NORMAL |
从wp0 ping wp1以及反方向测试连通性:先进入ns0,ip netns exec ns0 bash,之后执行ip a(ip addr简写),可以看到wp0虚拟网卡及其ip信息,
1 2 3 4 5 6 7 8 9 |
root@host-10-0-80-25 ~ $ ip a 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 20: wp0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN link/ether a6:9e:9d:40:55:fb brd ff:ff:ff:ff:ff:ff inet 10.0.0.11/24 scope global wp0 valid_lft forever preferred_lft forever inet6 fe80::a49e:9dff:fe40:55fb/64 scope link valid_lft forever preferred_lft forever |
1 2 3 4 |
root@host-10-0-80-25 ~ $ ping 10.0.0.12 PING 10.0.0.12 (10.0.0.12) 56(84) bytes of data. 64 bytes from 10.0.0.12: icmp_seq=1 ttl=64 time=0.490 ms 64 bytes from 10.0.0.12: icmp_seq=2 ttl=64 time=0.038 ms |
ping的过程中在ns1中使用tcpdump查看wp1上的数据包:
1 2 3 4 5 6 7 8 |
root@host-10-0-80-25 ~ $ tcpdump -i wp1 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on wp1, link-type EN10MB (Ethernet), capture size 262144 bytes 17:30:50.180109 IP 10.0.0.11 > 10.0.0.12: ICMP echo request, id 28272, seq 1, length 64 17:30:50.180154 IP 10.0.0.12 > 10.0.0.11: ICMP echo reply, id 28272, seq 1, length 64 17:30:51.180847 IP 10.0.0.11 > 10.0.0.12: ICMP echo request, id 28272, seq 2, length 64 17:30:51.180877 IP 10.0.0.12 > 10.0.0.11: ICMP echo reply, id 28272, seq 2, length 64 |
默认在netns中ping wp0、wp1自己的ip是不通的,原因是没有启用lo设备,启用后可以正常ping通:ifconfig lo up,之后再次执行ip a查看网络信息:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
root@host-10-0-80-25 ~ $ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 20: wp0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN link/ether a6:9e:9d:40:55:fb brd ff:ff:ff:ff:ff:ff inet 10.0.0.11/24 scope global wp0 valid_lft forever preferred_lft forever inet6 fe80::a49e:9dff:fe40:55fb/64 scope link valid_lft forever preferred_lft forever |
在ns0里ping wp0自己ip地址10.0.0.11过程中使用tcpdump查看lo设备网络数据包,可以看到有数据包,查看wp0虚拟网卡没有数据包,ping 127.0.0.1也一样,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
root@host-10-0-80-25 ~ $ tcpdump -i wp0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on wp0, link-type EN10MB (Ethernet), capture size 262144 bytes ^C 0 packets captured 0 packets received by filter 0 packets dropped by kernel root@host-10-0-80-25 ~ $ tcpdump -i lo tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes 17:47:14.881881 IP 10.0.0.11 > 10.0.0.11: ICMP echo request, id 29373, seq 6, length 64 17:47:14.881899 IP 10.0.0.11 > 10.0.0.11: ICMP echo reply, id 29373, seq 6, length 64 ^C 2 packets captured 4 packets received by filter 0 packets dropped by kernel |
问题3:为啥ping自己的ip(非lo设备上的127.0.0.1)必须要启用lo设备?
在ns0里ping ns1里wp1的ip 10.0.0.12过程中,查看两次ovs bridge流表,可以看到数据包数量和字节数有增加,停止ping之后保持不变,
1 2 3 4 5 6 |
root@host-10-0-80-25 ~ $ ovs-ofctl dump-flows ovs-wp NXST_FLOW reply (xid=0x4): cookie=0x0, duration=260195.275s, table=0, n_packets=1849, n_bytes=161978, idle_age=959, hard_age=65534, priority=0 actions=NORMAL root@host-10-0-80-25 ~ $ ovs-ofctl dump-flows ovs-wp NXST_FLOW reply (xid=0x4): cookie=0x0, duration=260199.123s, table=0, n_packets=1853, n_bytes=162370, idle_age=0, hard_age=65534, priority=0 actions=NORMAL |
n_packets=1849, n_bytes=161978,n_packets=1853, n_bytes=162370,多了4个包,并且idle_age也清零了。
问题4:n_packets、n_bytes、idle_age、hard_age的含义是啥?
流规则操作
添加一条规则,丢弃wp0到wp1的icmp协议包,让wp0 ping不通wp1:
方法1,用wp0的mac地址做为过滤条件:
1 |
ovs-ofctl add-flow ovs-wp "dl_src=a6:9e:9d:40:55:fb, dl_type=0x0800, nw_proto=1, actions=drop" |
添加完查看流表规则:
1 2 3 4 |
root@host-10-0-80-25 ~ $ ovs-ofctl dump-flows ovs-wp NXST_FLOW reply (xid=0x4): cookie=0x0, duration=72.153s, table=0, n_packets=0, n_bytes=0, idle_age=72, icmp,dl_src=a6:9e:9d:40:55:fb actions=drop cookie=0x0, duration=320712.513s, table=0, n_packets=1873, n_bytes=164106, idle_age=60505, hard_age=65534, priority=0 actions=NORMAL |
多出来一条新加的,进入ns0,执行ping 10.0.0.12,不通了:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
root@host-10-0-80-25 ~ $ ip netns exec ns0 bash root@host-10-0-80-25 ~ $ ip a 1: lo: <LOOPBACK,PROMISC,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 20: wp0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN link/ether a6:9e:9d:40:55:fb brd ff:ff:ff:ff:ff:ff inet 10.0.0.11/24 scope global wp0 valid_lft forever preferred_lft forever inet6 fe80::a49e:9dff:fe40:55fb/64 scope link valid_lft forever preferred_lft forever root@host-10-0-80-25 ~ $ ping 10.0.0.12 PING 10.0.0.12 (10.0.0.12) 56(84) bytes of data. ^C --- 10.0.0.12 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 999ms |
问题5:dl_src、dl_type、nw_proto、action的含义是啥?
方法2,用OpenFlow Port Number作为过滤条件(先删除方法1的规则):
1 2 3 4 5 6 7 8 9 |
root@host-10-0-80-25 ~ $ ovs-ofctl --strict del-flows ovs-wp "dl_src=a6:9e:9d:40:55:fb, dl_type=0x0800, nw_proto=1" root@host-10-0-80-25 ~ $ ovs-ofctl dump-flows ovs-wp NXST_FLOW reply (xid=0x4): cookie=0x0, duration=320972.028s, table=0, n_packets=1875, n_bytes=164190, idle_age=163, hard_age=65534, priority=0 actions=NORMAL root@host-10-0-80-25 ~ $ ovs-ofctl add-flow ovs-wp "in_port=100, dl_type=0x0800, nw_proto=1, actions=drop" root@host-10-0-80-25 ~ $ ovs-ofctl dump-flows ovs-wp NXST_FLOW reply (xid=0x4): cookie=0x0, duration=3.230s, table=0, n_packets=0, n_bytes=0, idle_age=3, icmp,in_port=100 actions=drop cookie=0x0, duration=321049.505s, table=0, n_packets=1875, n_bytes=164190, idle_age=240, hard_age=65534, priority=0 actions=NORMAL |
添加一条规则,修改wp0到wp1的数据包的源ip地址为1.2.3.4,并验证priority字段的用途:
1 2 3 4 5 6 |
# root netns执行 root@host-10-0-80-25 ~ $ ovs-ofctl add-flow ovs-wp "table=0, in_port=100, priority=1, actions=mod_nw_src:1.2.3.4,normal" root@host-10-0-80-25 ~ $ ovs-ofctl dump-flows ovs-wp NXST_FLOW reply (xid=0x4): cookie=0x0, duration=1.126s, table=0, n_packets=0, n_bytes=0, idle_age=1, priority=1,in_port=100 actions=mod_nw_src:1.2.3.4,NORMAL cookie=0x0, duration=321475.439s, table=0, n_packets=1891, n_bytes=165422, idle_age=381, hard_age=65534, priority=0 actions=NORMAL |
1 2 3 4 5 6 |
# ns0执行 root@host-10-0-80-25 ~ $ ping 10.0.0.12 PING 10.0.0.12 (10.0.0.12) 56(84) bytes of data. ^C --- 10.0.0.12 ping statistics --- 5 packets transmitted, 0 received, 100% packet loss, time 3999ms |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# ns1执行 root@host-10-0-80-25 ~ $ tcpdump -i wp1 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on wp1, link-type EN10MB (Ethernet), capture size 262144 bytes 10:52:23.955783 IP 1.2.3.4 > 10.0.0.12: ICMP echo request, id 27149, seq 1, length 64 10:52:24.954834 IP 1.2.3.4 > 10.0.0.12: ICMP echo request, id 27149, seq 2, length 64 10:52:25.954850 IP 1.2.3.4 > 10.0.0.12: ICMP echo request, id 27149, seq 3, length 64 10:52:26.954831 IP 1.2.3.4 > 10.0.0.12: ICMP echo request, id 27149, seq 4, length 64 10:52:27.954880 IP 1.2.3.4 > 10.0.0.12: ICMP echo request, id 27149, seq 5, length 64 10:52:28.963968 ARP, Request who-has 10.0.0.12 tell 10.0.0.11, length 28 10:52:28.963996 ARP, Reply 10.0.0.12 is-at 4a:eb:8d:f7:b8:aa (oui Unknown), length 28 ^C 7 packets captured 7 packets received by filter 0 packets dropped by kernel |
再加一条优先级数字更大的,源ip改为2.3.4.5的规则:
1 2 3 4 5 6 |
root@host-10-0-80-25 ~ $ ovs-ofctl add-flow ovs-wp "table=0, in_port=100, priority=10, actions=mod_nw_src:2.3.4.5,normal" root@host-10-0-80-25 ~ $ ovs-ofctl dump-flows ovs-wp NXST_FLOW reply (xid=0x4): cookie=0x0, duration=13384.251s, table=0, n_packets=6, n_bytes=532, idle_age=13349, priority=1,in_port=100 actions=mod_nw_src:1.2.3.4,NORMAL cookie=0x0, duration=2.782s, table=0, n_packets=0, n_bytes=0, idle_age=2, priority=10,in_port=100 actions=mod_nw_src:2.3.4.5,NORMAL cookie=0x0, duration=334858.564s, table=0, n_packets=1892, n_bytes=165464, idle_age=13349, hard_age=65534, priority=0 actions=NORMAL |
priority数字越大表示越高,后加的这条优先级为priority=10的规则覆盖了之前的优先级为priority=1的,源ip修改为2.3.4.5:
1 2 3 4 5 6 7 8 9 |
root@host-10-0-80-25 ~ $ tcpdump -i wp1 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on wp1, link-type EN10MB (Ethernet), capture size 262144 bytes 14:35:32.454862 IP 2.3.4.5 > 10.0.0.12: ICMP echo request, id 8372, seq 4, length 64 14:35:33.454858 IP 2.3.4.5 > 10.0.0.12: ICMP echo request, id 8372, seq 5, length 64 ^C 2 packets captured 2 packets received by filter 0 packets dropped by kernel |
从ns0、ns1 ping公网ip如114.114.114.114:
1 2 |
root@host-10-0-80-25 ~ $ ping 114.114.114.114 connect: Network is unreachable |
ip r查看路由信息,没有默认路由信息:
1 2 |
root@host-10-0-80-25 ~ $ ip r 10.0.0.0/24 dev wp1 proto kernel scope link src 10.0.0.12 |
添加默认路由,之后还是不通,应该是ovs-wp这个bridge没有添加出口网卡设备,也没有配置相关的路由,相当于是无法连通公网的一个局域网。在ovs-wp加上eth0之后,把ovs-wp的mac改成eth0的,把eth0的改成其他的,用dhclient -v ovs-wp获取到之前eth0上的ip地址,默认路由也加回来了,在root netns ping外网地址可以通,但是在ns1里面还是不行,tcpdump看到数据包走到ovs-wp之后就没有再回来,eth0网卡没有数据包,添加流表规则,把wp1 port上的包都转到eth0所在的port,可以看到eth0上有包了,但是还是没有出去(ping 同网段的其他主机,在其他主机上tcpdump没看到有包过来),到此就不知道啥原因了,身边也没有可以请教的人,这个问题就先遗留吧。
流表规则还有几个字段的意义不太懂,
问题6:cookie、table、duration的含义是啥?
问题思考
问题1:为啥要把port修改为internal类型?
只有internal类型的port才能设置ip地址。
更多请参考:http://www.isjian.com/openstack/openstack-base-use-openvswitch/#port
问题2:什么是netns?可以用来做什么?
network namespace是Linux命名空间的一种,主要目的是为了实现网络隔离(隔离的网卡设备、独立的路由表)。
network namespace更多信息请参考:https://blog.scottlowe.org/2013/09/04/introducing-linux-network-namespaces/
Linux namespace介绍:https://coolshell.cn/articles/17010.html
问题3:为啥ping自己的ip(非lo设备上的127.0.0.1)必须要启用lo设备?
所有只在本机内部流转的网络数据包都需要经过lo设备的处理。
参考:http://blog.csdn.net/xie0812/article/details/32075613
问题4:n_packets、n_bytes、idle_age、hard_age的含义是啥?
n_packets、n_bytes,匹配到这条规则的网络包数、字节数。
idle_age:多久没有数据包经过这条规则,单位秒
hard_age:距这条规则创建、修改经过的时间,单位秒
参考:http://www.openvswitch.org//support/dist-docs/ovs-ofctl.8.txt
问题5:dl_src、dl_type、nw_proto、actions的含义是啥?
dl_src:datalink source,也就是源mac地址,对应的dl_dst就是目标mac地址。
dl_type:datalink type,也就是数据链路类型,
nw_proto:network protocol,也就是网络层协议。
actions:规则执行的操作,操作有很多种,可参考下面的链接。
具体可参考:https://www.ibm.com/developerworks/cn/cloud/library/1401_zhaoyi_openswitch/
问题6:cookie、table、duration的含义是啥?
cookie=0x98cd36ade228396c,一个64bit的整数,相同的cookie值可以用来标记是同一批或同一类规则。
table:流表编号,可以用来建立规则的层次关系,比如先经过0号表,之后actions里面指定继续匹配10号表的规则,如:
1 |
cookie=0x98cd36ade228396c, duration=16162.671s, table=0, n_packets=0, n_bytes=0, idle_age=16162, priority=90,dl_dst=fa:16:3e:2b:f5:40 actions=load:0x3547->NXM_NX_REG5[],load:0x2->NXM_NX_REG6[],resubmit(,81) |
duration=secs,规则创建了多长时间。
最后唠叨一句,重启openvswitchd进程,会导致所有规则丢失,这也是OpenStack neutron项目很久才解决的一大难题。
参考: