公司计划再XC服务器上做业务软件的兼容测试,为了满足需要,想利用操作系统自带的KVM虚拟化做些虚拟机。再配置过程中发现虚拟机无法与宿主机通信,无法访问外网。以下对该问题做些简要的故障分析记录。
服务器:
飞腾S2500*2 128Core 1T内存
操作系统:
#版本
Kylin Linux Advanced Server release V10 (Sword)
#内核
Linux localhost.localdomain 4.19.90-24.4.v2101.ky10.aarch64 #1 SMP Mon May 24 14:45:37 CST 2021 aarch64 aarch64 aarch64 GNU/Linux
网络服务:
NetworkManager.service
网卡规划:
ens2f0: 业务网络1,目的对外访问
ens2f2: 业务网络2,桥接网络对应网卡
由于Kylin V10采用的是NetworkManager来管理网络,以下记录了通过nmcli配置桥接网卡过程:
# nmcli connection add type bridge con-name br0 ifname br0
# nmcli connection modify ens2f2 master br0
# nmcli connection modify br0 ipv4.addresses '10.110.136.42/22'
# nmcli connection modify br0 ipv4.gateway '10.110.139.254'
# nmcli connection modify bridge0 ipv4.dns '114.114.114.114'
# nmcli connection modify br0 ipv4.method manual
# nmcli connection show
# nmcli connection up br0
# brctl show
# ip link show master br0
注意:配置过程ens2f2所在网络会断网
虚拟机采用桥接网卡时发现以下问题:
虚拟机采用NAT网卡时:
对比其他正常x86服务器上的vm配置,调整至相同。问题依旧,暂时不排除。
对桥接网卡采用NM管理方式,配置文件等方式多次进行重建。问题依旧。
重点参考文档:
https://blog.csdn.net/qq_28903377/article/details/121035000
https://guo-sj.github.io/kvm/2022/04/28/qemu-kvm-installation.html
宿主机路由情况如下:
# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.110.139.254 0.0.0.0 UG 100 0 0 ens2f0
0.0.0.0 10.110.139.254 0.0.0.0 UG 425 0 0 br0
10.110.136.0 0.0.0.0 255.255.252.0 U 100 0 0 ens2f0
10.110.136.0 0.0.0.0 255.255.252.0 U 425 0 0 br0
192.168.101.0 0.0.0.0 255.255.255.0 U 102 0 0 ens2f3
192.168.122.0 0.0.0.0 255.255.255.0 U 0 0 0 virbr0
通过分析宿主机arp,发现vm对应条目为“incomplete”,且对应网卡为 业务网卡ens2f0,而不是规划中的桥接网卡br0,示例如下:
# arp -an
? (10.110.136.47) at <incomplete> on ens2f0
? (10.110.136.48) at <incomplete> on ens2f0
测试网卡与网关的通信情况
# ping -I ens2f0 10.110.139.254 -c 3
# ping -I br0 10.110.139.254 -c 3
测试结果是:业务网卡ens2f0可以ping通网关;桥接网卡br0无法ping通网关
对多网卡情况,OS有如下描述:
It is not recommended to have two interfaces in the same subnet, as both interfaces cannot use the default gateway.
In the above scenario,the default gateway can only be assigned to one interface at a time, preference is usually assigned to the lower numbered interface.
This is due to the default behavior of ARP in Linux. When a request is made externally by someone pinging the host on the secondary IP, the gateway will ARP and subsequently ask what interface holds the IP it is requesting.
因此最佳解决方案为:变更规划,使用1个网卡进行业务通信并再次网卡上使用桥接网络。经过测试验证,故障解决。
简单小结下:
echo 2 > /proc/sys/net/ipv4/conf/all/rp_filter
echo 2 > /proc/sys/net/ipv4/conf/br0/rp_filter