- A Tungsten Fabric Primer
- Components in Tungsten Fabric
- Orchestrator integration
- More on Installation
- Monitoring integration
- Day 2 operation
- Troubleshooting Tips
- Appendix
Let me briefly describe what I have learnt in this two years journey around Tungsten Fabric. https://tungsten.io/
There are a lot of good implementation of SDN/neutron/CNI, so why try another one is an important point. AFAIK, TungstenFabric has two key differentiators, which makes that unique one around.
Although there are a lot of technology which makes linux software a good candidate of production router/switch, ASIC still is a vital part of this industry. To interoperate with them, SDN platform need some routing protocol, such as bgp or ovsdb.
To makes things more complex, many service providers and cloud providers use VRFs to terminate and separate each customer's network connection, which makes stitching between Routers and SDNs makes a complex task.
- Mostly, vlans can be used between them, but termination points at SDN platform could be a source of bottleneck
- Moreover, each SDN termination points (similar to network nodes in openstack) need to have separate configuration per customer, which makes configuration more complex
TungstenFabric resolved this issue, with the help of mature implementation of MP-BGP, which allows each VRF on the routers to send packets directly to vRouters which serves each customer's application. This feature allows horizontal scaling of compute nodes with separate networks per customer, to be based on control plane rather than data plane, and makes that much more intuitive.
Since packets are sent directly from routers to vRouters, there is no need for network nodes, which makes Tungsten Fabric much more scalable, in terms of data plane.
Moreover, in control plane perspective, it has a curious feature named route target filtering (https://tools.ietf.org/html/rfc4684).
- This feature is common in MP-BGP and other routers also has that feature
- This feature means if vRouters doesn't have a prefix with that route-target, control plane drop that prefix when it received
Since in cloud service, customer uses limited part of cloud providers' DCs, and different customer will use different route-target, vRouters and controllers don't need to know all the prefixes. Route target filtering feature makes that behavior possible, and dramatically reduce the number of prefixes each vRouter (and each controller if RR is used between them) needs to take care of, which makes this control plane much more scalable.
Combininng them with other features like security-policy, network-policy/logical-router (it is similar to VPC peerling or transit-gateway in AWS), I think it will be a good candidate of VPC infrastructure (similar to AWS/Azure/GCP VPC/vnet) for both of private cloud or managed cloud world, and that makes it so interesting platform which is worth a try.
To try TungstenFabric for the first time, I recommend using ansible-deployer (https://github.com/Juniper/contrail-ansible-deployer), even if you're already familiar with other CNI implementation, since TungstenFabric uses several tools which is not in vanilla linux. So I would recommend firstly trying the setting which works well to see what's new, and after that, integrate other systems.
Unfortunately, many repos of Tungsten Fabric are similar to rawhide and in some cases, have broken dependency.
So I picked one combination which I think mostly always works and stable enough to try most features.
To try this, you need two servers, one is for k8s master, and the other is for k8s node. k8s master need to have at least 2 vCPUs and 8GB mem, and 8GB disk. k8s node needs 1 vCPU and 4GB mem, 8GB disk.
- I personally always use ami-3185744e (CentOS7.5, login-id: centos) in ap-northeast-1 region, with t2.large size
- Note: from R1910 and later, ami-06a46da680048c8ae or 3.10.0-1062.el7.x86_64 can be used instead
- Since in my impression, openstack and vCenter integration with Tungsten Fabric is much more complex than one with kubernetes, I recommend firstly try this setup, even if you don't need container support
- For installation, internet connection is required
## all the commands are typed at k8s master node
sudo yum -y remove PyYAML python-requests
sudo yum -y install git python3
sudo pip3 install -U pip
sudo pip3 install PyYAML requests ansible\<2.9
ssh-keygen
cd .ssh/
cat id_rsa.pub >> authorized_keys
ssh-copy-id root@(k8s node's ip) ## or manually register id_rsa.pub to authorized_keys
cd
git clone -b R2011 http://github.com/tungstenfabric/tf-ansible-deployer
cd tf-ansible-deployer
vi config/instances.yaml
(replace contents with this)
provider_config:
bms:
ssh_user: root
ssh_public_key: /root/.ssh/id_rsa.pub
ssh_private_key: /root/.ssh/id_rsa
domainsuffix: local
ntpserver: 0.centos.pool.ntp.org
instances:
bms1:
provider: bms
roles:
config_database:
config:
control:
analytics:
webui:
k8s_master:
kubemanager:
ip: 172.31.14.47 ## k8s master's ip
bms2:
provider: bms
roles:
vrouter:
k8s_node:
ip: 172.31.41.236 ## k8s node's ip
contrail_configuration:
CONTRAIL_CONTAINER_TAG: R2011-latest
JVM_EXTRA_OPTS: "-Xms128m -Xmx2g"
global_configuration:
CONTAINER_REGISTRY: tungstenfabric
K8S_VERSION: 1.19.11
ansible-playbook -e orchestrator=kubernetes -i inventory/ playbooks/configure_instances.yml
- it takes about 10 minutes
ansible-playbook -e orchestrator=kubernetes -i inventory/ playbooks/install_k8s.yml
- it takes about 5 minutes
ansible-playbook -e orchestrator=kubernetes -i inventory/ playbooks/install_contrail.yml
- it takes about 20 minutes
One point to be taken cared of is that it is a fairly strict requirement to use supported kernel version, since Tungsten Fabric uses its own kernel module (vrouter.ko) for it's data plane. I tried CentOS7.5-7.9, Ubuntu Xenial and Bionic and noticed it works well, but if it is the first time to try, I will recommend that specific AMI id, since debuging what's not working is not an easy task.
If all the playbooks worked well, you can firstly type,
contrail-status
, which checks if everything is ok.
[root@ip-172-31-14-47 contrail-ansible-deployer]# contrail-status
Pod Service Original Name State Status
redis contrail-external-redis running Up 5 minutes
analytics alarm-gen contrail-analytics-alarm-gen running Up 2 minutes
analytics api contrail-analytics-api running Up 2 minutes
analytics collector contrail-analytics-collector running Up 2 minutes
analytics nodemgr contrail-nodemgr running Up 2 minutes
analytics query-engine contrail-analytics-query-engine running Up 2 minutes
analytics snmp-collector contrail-analytics-snmp-collector running Up 2 minutes
analytics topology contrail-analytics-topology running Up 2 minutes
config api contrail-controller-config-api running Up 4 minutes
config device-manager contrail-controller-config-devicemgr running Up 3 minutes
config nodemgr contrail-nodemgr running Up 4 minutes
config schema contrail-controller-config-schema running Up 4 minutes
config svc-monitor contrail-controller-config-svcmonitor running Up 4 minutes
config-database cassandra contrail-external-cassandra running Up 4 minutes
config-database nodemgr contrail-nodemgr running Up 4 minutes
config-database rabbitmq contrail-external-rabbitmq running Up 4 minutes
config-database zookeeper contrail-external-zookeeper running Up 4 minutes
control control contrail-controller-control-control running Up 3 minutes
control dns contrail-controller-control-dns running Up 3 minutes
control named contrail-controller-control-named running Up 3 minutes
control nodemgr contrail-nodemgr running Up 3 minutes
database cassandra contrail-external-cassandra running Up 2 minutes
database kafka contrail-external-kafka running Up 2 minutes
database nodemgr contrail-nodemgr running Up 2 minutes
database zookeeper contrail-external-zookeeper running Up 2 minutes
kubernetes kube-manager contrail-kubernetes-kube-manager running Up About a minute
webui job contrail-controller-webui-job running Up 3 minutes
webui web contrail-controller-webui-web running Up 3 minutes
WARNING: container with original name 'contrail-external-redis' have Pod or Service empty. Pod: '' / Service: 'redis'. Please pass NODE_TYPE with pod name to container's env
== Contrail control ==
control: active
nodemgr: active
named: active
dns: active
== Contrail config-database ==
nodemgr: initializing (Disk for DB is too low. )
zookeeper: active
rabbitmq: active
cassandra: active
== Contrail kubernetes ==
kube-manager: active
== Contrail database ==
kafka: active
nodemgr: initializing (Disk for DB is too low. )
zookeeper: active
cassandra: active
== Contrail analytics ==
snmp-collector: active
query-engine: active
api: active
alarm-gen: active
nodemgr: active
collector: active
topology: active
== Contrail webui ==
web: active
job: active
== Contrail config ==
svc-monitor: active
nodemgr: active
device-manager: active
api: active
schema: active
[root@ip-172-31-14-47 contrail-ansible-deployer]#
[root@ip-172-31-41-236 ~]# contrail-status
Pod Service Original Name State Status
vrouter agent contrail-vrouter-agent running Up 52 seconds
vrouter nodemgr contrail-nodemgr running Up 52 seconds
vrouter kernel module is PRESENT
== Contrail vrouter ==
nodemgr: active
agent: active
[root@ip-172-31-41-236 ~]#
That should show most components are in 'active' state, except for
nodemgr: initializing (Disk for DB is too low.)
, which you can safely ignore in demo setup.
Note: Which basically indicates /'s usage is over 50% and it is an important issue for cassandra.
If everything is ok, you can try this command, to see the status of Tungsten Fabric routing tables.
pip install lxml prettytable
git clone https://github.com/vcheny/contrail-introspect-cli.git
## or curl -O https://raw.githubusercontent.com/vcheny/contrail-introspect-cli/master/ist.py
./contrail-introspect-cli/ist.py ctr status
./contrail-introspect-cli/ist.py ctr nei ## similar to 'show bgp summary'
./contrail-introspect-cli/ist.py ctr route summary ## similar to 'show route summary'
./contrail-introspect-cli/ist.py ctr route tables ## show routing-tables
./contrail-introspect-cli/ist.py ctr route show ## similar to 'show route'
[root@ip-172-31-14-47 contrail-ansible-deployer]# ./contrail-introspect-cli/ist.py ctr status
module_id: contrail-control
state: Functional
description
+-----------+-----------+---------------------+--------+----------------------------------+
| type | name | server_addrs | status | description |
+-----------+-----------+---------------------+--------+----------------------------------+
| Collector | n/a | 172.31.14.47:8086 | Up | Established |
| Database | Cassandra | 172.31.14.47:9041 | Up | Established Cassandra connection |
| Database | RabbitMQ | 172.31.14.47:5673 | Up | RabbitMQ connection established |
+-----------+-----------+---------------------+--------+----------------------------------+
[root@ip-172-31-14-47 contrail-ansible-deployer]# ./contrail-introspect-cli/ist.py ctr nei
+--------------------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| peer | peer_address | peer_asn | encoding | peer_type | state | send_state | flap_count | flap_time |
+--------------------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| ip-172-31-41-236.ap- | 172.31.41.236 | 0 | XMPP | internal | Established | in sync | 0 | n/a |
| northeast-1.compute.internal | | | | | | | | |
+--------------------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
[root@ip-172-31-14-47 contrail-ansible-deployer]# ./contrail-introspect-cli/ist.py ctr route summary
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| name | prefixes | paths | primary_paths | secondary_paths | infeasible_paths |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| default-domain:default- | 0 | 0 | 0 | 0 | 0 |
| project:__link_local__:__link_local__.inet.0 | | | | | |
| default-domain:default-project:default-virtual- | 0 | 0 | 0 | 0 | 0 |
| network:default-virtual-network.inet.0 | | | | | |
| inet.0 | 0 | 0 | 0 | 0 | 0 |
| default-domain:default-project:ip-fabric:ip- | 3 | 3 | 1 | 2 | 0 |
| fabric.inet.0 | | | | | |
| default-domain:k8s-default:k8s-default-pod-network | 3 | 3 | 1 | 2 | 0 |
| :k8s-default-pod-network.inet.0 | | | | | |
| default-domain:k8s-default:k8s-default-service- | 3 | 3 | 1 | 2 | 0 |
| network:k8s-default-service-network.inet.0 | | | | | |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
[root@ip-172-31-14-47 contrail-ansible-deployer]# ./contrail-introspect-cli/ist.py ctr route tables
name: default-domain:default-project:__link_local__:__link_local__.inet.0
name: default-domain:default-project:default-virtual-network:default-virtual-network.inet.0
name: inet.0
name: default-domain:default-project:ip-fabric:ip-fabric.inet.0
name: default-domain:k8s-default:k8s-default-pod-network:k8s-default-pod-network.inet.0
name: default-domain:k8s-default:k8s-default-service-network:k8s-default-service-network.inet.0
[root@ip-172-31-14-47 contrail-ansible-deployer]# ./contrail-introspect-cli/ist.py ctr route show
bgp.ermvpn.0: 6 destinations, 6 routes (0 primary, 6 secondary, 0 infeasible)
1-172.31.41.236:1-172.31.14.47,255.255.255.255,0.0.0.0, age: 0:02:26.545449, last_modified: 2019-Apr-13 01:41:18.023211
[Local|None] age: 0:02:26.548569, localpref: 100, nh: 172.31.14.47, encap: [], label: 0, AS path: None
1-172.31.41.236:2-172.31.14.47,255.255.255.255,0.0.0.0, age: 0:01:09.096721, last_modified: 2019-Apr-13 01:42:35.471939
[Local|None] age: 0:01:09.100272, localpref: 100, nh: 172.31.14.47, encap: [], label: 0, AS path: None
1-172.31.41.236:3-172.31.14.47,255.255.255.255,0.0.0.0, age: 0:00:41.812247, last_modified: 2019-Apr-13 01:43:02.756413
[Local|None] age: 0:00:41.816037, localpref: 100, nh: 172.31.14.47, encap: [], label: 0, AS path: None
2-172.31.41.236:1-172.31.14.47,255.255.255.255,0.0.0.0, age: 0:02:26.544851, last_modified: 2019-Apr-13 01:41:18.023809
[Local|None] age: 0:02:26.548875, localpref: 100, nh: 172.31.14.47, encap: [], label: 0, AS path: None
2-172.31.41.236:2-172.31.14.47,255.255.255.255,0.0.0.0, age: 0:01:09.096567, last_modified: 2019-Apr-13 01:42:35.472093
[Local|None] age: 0:01:09.100828, localpref: 100, nh: 172.31.14.47, encap: [], label: 0, AS path: None
2-172.31.41.236:3-172.31.14.47,255.255.255.255,0.0.0.0, age: 0:00:41.812032, last_modified: 2019-Apr-13 01:43:02.756628
[Local|None] age: 0:00:41.816542, localpref: 100, nh: 172.31.14.47, encap: [], label: 0, AS path: None
bgp.evpn.0: 3 destinations, 3 routes (0 primary, 3 secondary, 0 infeasible)
2-172.31.41.236:1-0-0e:92:cc:bd:aa:08,0.0.0.0, age: 0:02:26.545224, last_modified: 2019-Apr-13 01:41:18.023436
[XMPP|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:02:26.550028, localpref: 200, nh: 172.31.41.236, encap: ['gre', 'mpls-o-gre', 'udp'], label: 20, AS path: None
2-172.31.41.236:1-0-0e:92:cc:bd:aa:08,172.31.41.236, age: 0:02:26.545271, last_modified: 2019-Apr-13 01:41:18.023389
[XMPP|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:02:26.550313, localpref: 200, nh: 172.31.41.236, encap: ['gre', 'mpls-o-gre', 'udp'], label: 20, AS path: None
3-172.31.41.236:1-2-172.31.41.236, age: 0:02:26.545365, last_modified: 2019-Apr-13 01:41:18.023295
[Local|None] age: 0:02:26.550656, localpref: 100, nh: 172.31.41.236, encap: ['vxlan'], label: 2, AS path: None
bgp.l3vpn.0: 3 destinations, 3 routes (0 primary, 3 secondary, 0 infeasible)
172.31.41.236:1:172.31.41.236/32, age: 0:02:26.545019, last_modified: 2019-Apr-13 01:41:18.023641
[XMPP (interface)|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:02:26.550608, localpref: 200, nh: 172.31.41.236, encap: ['gre', 'udp', 'native'], label: 16, AS path: None
172.31.41.236:2:10.47.255.252/32, age: 0:00:41.733374, last_modified: 2019-Apr-13 01:43:02.835286
[XMPP (interface)|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:00:41.739187, localpref: 200, nh: 172.31.41.236, encap: ['gre', 'udp'], label: 25, AS path: None
172.31.41.236:3:10.96.0.10/32, age: 0:00:41.732905, last_modified: 2019-Apr-13 01:43:02.835755
[XMPP (interface)|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:00:41.738945, localpref: 200, nh: 172.31.41.236, encap: ['gre', 'udp'], label: 25, AS path: None
bgp.rtarget.0: 7 destinations, 7 routes (7 primary, 0 secondary, 0 infeasible)
64512:target:64512:8000001, age: 0:02:26.592101, last_modified: 2019-Apr-13 01:41:17.976559
[XMPP|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:02:26.598445, localpref: 100, nh: 172.31.14.47, encap: [], label: 0, AS path: None
64512:target:64512:8000002, age: 0:02:26.592073, last_modified: 2019-Apr-13 01:41:17.976587
[XMPP|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:02:26.598626, localpref: 100, nh: 172.31.14.47, encap: [], label: 0, AS path: None
64512:target:64512:8000003, age: 0:02:26.592051, last_modified: 2019-Apr-13 01:41:17.976609
[XMPP|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:02:26.598800, localpref: 100, nh: 172.31.14.47, encap: [], label: 0, AS path: None
64512:target:172.31.14.47:0, age: 0:05:09.194543, last_modified: 2019-Apr-13 01:38:35.374117
[Local|None] age: 0:05:09.201488, localpref: 100, nh: 172.31.14.47, encap: [], label: 0, AS path: None
64512:target:172.31.14.47:1, age: 0:02:26.592028, last_modified: 2019-Apr-13 01:41:17.976632
[XMPP|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:02:26.599168, localpref: 100, nh: 172.31.14.47, encap: [], label: 0, AS path: None
64512:target:172.31.14.47:4, age: 0:01:09.099898, last_modified: 2019-Apr-13 01:42:35.468762
[XMPP|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:01:09.107253, localpref: 100, nh: 172.31.14.47, encap: [], label: 0, AS path: None
64512:target:172.31.14.47:5, age: 0:00:41.824049, last_modified: 2019-Apr-13 01:43:02.744611
[XMPP|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:00:41.831612, localpref: 100, nh: 172.31.14.47, encap: [], label: 0, AS path: None
default-domain:default-project:ip-fabric:ip-fabric.ermvpn.0: 3 destinations, 3 routes (3 primary, 0 secondary, 0 infeasible)
0-172.31.41.236:1-0.0.0.0,255.255.255.255,0.0.0.0, age: 0:02:26.544896, last_modified: 2019-Apr-13 01:41:18.023764
[XMPP|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:02:26.552710, localpref: 100, nh: 172.31.41.236, encap: ['gre', 'udp'], label: 0, AS path: None
1-0:0-172.31.14.47,255.255.255.255,0.0.0.0, age: 0:02:26.545544, last_modified: 2019-Apr-13 01:41:18.023116
[Local|None] age: 0:02:26.553571, localpref: 100, nh: 172.31.14.47, encap: [], label: 0, AS path: None
2-0:0-172.31.14.47,255.255.255.255,0.0.0.0, age: 0:02:26.544992, last_modified: 2019-Apr-13 01:41:18.023668
[Local|None] age: 0:02:26.553215, localpref: 100, nh: 172.31.14.47, encap: [], label: 0, AS path: None
default-domain:default-project:ip-fabric:ip-fabric.evpn.0: 4 destinations, 4 routes (4 primary, 0 secondary, 0 infeasible)
2-0:0-0-0e:92:cc:bd:aa:08,0.0.0.0, age: 0:02:26.545298, last_modified: 2019-Apr-13 01:41:18.023362
[XMPP|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:02:26.553810, localpref: 200, nh: 172.31.41.236, encap: ['gre', 'mpls-o-gre', 'udp'], label: 20, AS path: None
2-0:0-0-0e:92:cc:bd:aa:08,172.31.41.236, age: 0:02:26.545318, last_modified: 2019-Apr-13 01:41:18.023342
[XMPP|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:02:26.554076, localpref: 200, nh: 172.31.41.236, encap: ['gre', 'mpls-o-gre', 'udp'], label: 20, AS path: None
2-172.31.41.236:1-2-ff:ff:ff:ff:ff:ff,0.0.0.0, age: 0:02:26.545486, last_modified: 2019-Apr-13 01:41:18.023174
[XMPP|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:02:26.554476, localpref: 100, nh: 172.31.41.236, encap: ['vxlan'], label: 2, AS path: None
3-172.31.41.236:1-2-172.31.41.236, age: 0:02:26.545411, last_modified: 2019-Apr-13 01:41:18.023249
[Local|None] age: 0:02:26.554614, localpref: 100, nh: 172.31.41.236, encap: ['vxlan'], label: 2, AS path: None
default-domain:default-project:ip-fabric:ip-fabric.inet.0: 3 destinations, 3 routes (1 primary, 2 secondary, 0 infeasible)
10.47.255.252/32, age: 0:00:41.733312, last_modified: 2019-Apr-13 01:43:02.835348
[XMPP (interface)|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:00:41.742801, localpref: 200, nh: 172.31.41.236, encap: ['gre', 'udp'], label: 25, AS path: None
10.96.0.10/32, age: 0:00:41.732847, last_modified: 2019-Apr-13 01:43:02.835813
[XMPP (interface)|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:00:41.742561, localpref: 200, nh: 172.31.41.236, encap: ['gre', 'udp'], label: 25, AS path: None
172.31.41.236/32, age: 0:02:26.545051, last_modified: 2019-Apr-13 01:41:18.023609
[XMPP (interface)|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:02:26.554985, localpref: 200, nh: 172.31.41.236, encap: ['gre', 'udp', 'native'], label: 16, AS path: None
default-domain:k8s-default:k8s-default-pod-network:k8s-default-pod-network.ermvpn.0: 3 destinations, 3 routes (3 primary, 0 secondary, 0 infeasible)
0-172.31.41.236:2-0.0.0.0,255.255.255.255,0.0.0.0, age: 0:01:09.096823, last_modified: 2019-Apr-13 01:42:35.471837
[XMPP|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:01:09.107020, localpref: 100, nh: 172.31.41.236, encap: ['gre', 'udp'], label: 0, AS path: None
1-0:0-172.31.14.47,255.255.255.255,0.0.0.0, age: 0:01:09.096765, last_modified: 2019-Apr-13 01:42:35.471895
[Local|None] age: 0:01:09.107383, localpref: 100, nh: 172.31.14.47, encap: [], label: 0, AS path: None
2-0:0-172.31.14.47,255.255.255.255,0.0.0.0, age: 0:01:09.096621, last_modified: 2019-Apr-13 01:42:35.472039
[Local|None] age: 0:01:09.107473, localpref: 100, nh: 172.31.14.47, encap: [], label: 0, AS path: None
default-domain:k8s-default:k8s-default-pod-network:k8s-default-pod-network.inet.0: 3 destinations, 3 routes (1 primary, 2 secondary, 0 infeasible)
10.47.255.252/32, age: 0:00:41.733411, last_modified: 2019-Apr-13 01:43:02.835249
[XMPP (interface)|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:00:41.744526, localpref: 200, nh: 172.31.41.236, encap: ['gre', 'udp'], label: 25, AS path: None
10.96.0.10/32, age: 0:00:41.732872, last_modified: 2019-Apr-13 01:43:02.835788
[XMPP (interface)|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:00:41.744256, localpref: 200, nh: 172.31.41.236, encap: ['gre', 'udp'], label: 25, AS path: None
172.31.41.236/32, age: 0:02:26.544986, last_modified: 2019-Apr-13 01:41:18.023674
[XMPP (interface)|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:02:26.556602, localpref: 200, nh: 172.31.41.236, encap: ['gre', 'udp', 'native'], label: 16, AS path: None
default-domain:k8s-default:k8s-default-service-network:k8s-default-service-network.ermvpn.0: 3 destinations, 3 routes (3 primary, 0 secondary, 0 infeasible)
0-172.31.41.236:3-0.0.0.0,255.255.255.255,0.0.0.0, age: 0:00:41.812457, last_modified: 2019-Apr-13 01:43:02.756203
[XMPP|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:00:41.824352, localpref: 100, nh: 172.31.41.236, encap: ['gre', 'udp'], label: 0, AS path: None
1-0:0-172.31.14.47,255.255.255.255,0.0.0.0, age: 0:00:41.812393, last_modified: 2019-Apr-13 01:43:02.756267
[Local|None] age: 0:00:41.824504, localpref: 100, nh: 172.31.14.47, encap: [], label: 0, AS path: None
2-0:0-172.31.14.47,255.255.255.255,0.0.0.0, age: 0:00:41.812099, last_modified: 2019-Apr-13 01:43:02.756561
[Local|None] age: 0:00:41.824428, localpref: 100, nh: 172.31.14.47, encap: [], label: 0, AS path: None
default-domain:k8s-default:k8s-default-service-network:k8s-default-service-network.inet.0: 3 destinations, 3 routes (1 primary, 2 secondary, 0 infeasible)
10.47.255.252/32, age: 0:00:41.733337, last_modified: 2019-Apr-13 01:43:02.835323
[XMPP (interface)|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:00:41.745932, localpref: 200, nh: 172.31.41.236, encap: ['gre', 'udp'], label: 25, AS path: None
10.96.0.10/32, age: 0:00:41.732935, last_modified: 2019-Apr-13 01:43:02.835725
[XMPP (interface)|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:00:41.745758, localpref: 200, nh: 172.31.41.236, encap: ['gre', 'udp'], label: 25, AS path: None
172.31.41.236/32, age: 0:02:26.544959, last_modified: 2019-Apr-13 01:41:18.023701
[XMPP (interface)|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:02:26.558031, localpref: 200, nh: 172.31.41.236, encap: ['gre', 'udp', 'native'], label: 16, AS path: None
[root@ip-172-31-14-47 contrail-ansible-deployer]#
If it shows similar, since everything is working well, you can create containers based on k8s yaml.
## if bash completion is needed, type these two commands
yum -y install bash-completion
echo "source <(kubectl completion bash)" >> ~/.bashrc
vi first-containers.yaml
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: cirros-deployment
labels:
app: cirros-deployment
spec:
replicas: 2
selector:
matchLabels:
app: cirros-deployment
template:
metadata:
labels:
app: cirros-deployment
spec:
containers:
- name: cirros
image: cirros
ports:
- containerPort: 22
or simply type
kubectl create deployment --image=cirros --replicas=2 cirros-deployment
kubectl create -f first-containers.yaml
kubectl get pod -o wide ## check pod name and ip
kubectl exec -it cirros-deployment-xxxx sh
ping (another pod's ip)
[root@ip-172-31-14-47 ~]# kubectl create -f first-containers.yaml
deployment "cirros-deployment" created
[root@ip-172-31-14-47 ~]#
[root@ip-172-31-14-47 ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE
cirros-deployment-54b65ccf48-cr9dd 1/1 Running 0 34s 10.47.255.250 ip-172-31-41-236.ap-northeast-1.compute.internal
cirros-deployment-54b65ccf48-z9dds 1/1 Running 0 34s 10.47.255.251 ip-172-31-41-236.ap-northeast-1.compute.internal
[root@ip-172-31-14-47 ~]#
[root@ip-172-31-14-47 ~]# kubectl exec -it cirros-deployment-54b65ccf48-cr9dd sh
/ #
/ #
/ # ping 10.47.255.251
PING 10.47.255.251 (10.47.255.251): 56 data bytes
64 bytes from 10.47.255.251: seq=0 ttl=63 time=0.572 ms
64 bytes from 10.47.255.251: seq=1 ttl=63 time=0.086 ms
^C
--- 10.47.255.251 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.086/0.329/0.572 ms
/ #
Cool! It will be the first packet transmitted through TungstenFabric vRouters.
If it doesn't work well, please never mind. Tungten Fabric has an active slack which could help you. Put logs on there and try to have a help to resolve that issue. https://tungstenfabric.slack.com
Typing 'ist.py ctr route show' again, you will see k8s-pod-network is filled with ips from two pods and next-hop for each pod is the same as k8s node's ip.
./contrail-introspect-cli/ist.py ctr route show (pod ip) ## similar to 'show route (some ip)'
[root@ip-172-31-14-47 contrail-ansible-deployer]# ./contrail-introspect-cli/ist.py ctr route show 10.47.255.250
default-domain:default-project:ip-fabric:ip-fabric.inet.0: 5 destinations, 5 routes (1 primary, 4 secondary, 0 infeasible)
10.47.255.250/32, age: 0:03:10.553628, last_modified: 2019-Apr-13 01:46:13.217388
[XMPP (interface)|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:03:10.556716, localpref: 200, nh: 172.31.41.236, encap: ['gre', 'udp'], label: 37, AS path: None
default-domain:k8s-default:k8s-default-pod-network:k8s-default-pod-network.inet.0: 5 destinations, 5 routes (3 primary, 2 secondary, 0 infeasible)
10.47.255.250/32, age: 0:03:10.553734, last_modified: 2019-Apr-13 01:46:13.217282
[XMPP (interface)|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:03:10.557251, localpref: 200, nh: 172.31.41.236, encap: ['gre', 'udp'], label: 37, AS path: None
default-domain:k8s-default:k8s-default-service-network:k8s-default-service-network.inet.0: 5 destinations, 5 routes (1 primary, 4 secondary, 0 infeasible)
10.47.255.250/32, age: 0:03:10.553654, last_modified: 2019-Apr-13 01:46:13.217362
[XMPP (interface)|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:03:10.557453, localpref: 200, nh: 172.31.41.236, encap: ['gre', 'udp'], label: 37, AS path: None
[root@ip-172-31-14-47 contrail-ansible-deployer]#
Note that ip-fabric VN and k8s-default-service-network also has that prefix, since k8s-pod-network's routes are leaked to those networks. To have route for a specific routing table, you can use -t option.
[root@ip-172-31-14-47 contrail-ansible-deployer]# ./contrail-introspect-cli/ist.py ctr route show -t default-domain:k8s-default:k8s-default-pod-network:k8s-default-pod-network.inet.0 10.47.255.251
default-domain:k8s-default:k8s-default-pod-network:k8s-default-pod-network.inet.0: 5 destinations, 5 routes (3 primary, 2 secondary, 0 infeasible)
10.47.255.251/32, age: 0:05:44.533377, last_modified: 2019-Apr-13 01:46:09.193202
[XMPP (interface)|ip-172-31-41-236.ap-northeast-1.compute.internal] age: 0:05:44.536291, localpref: 200, nh: 172.31.41.236, encap: ['gre', 'udp'], label: 32, AS path: None
[root@ip-172-31-14-47 contrail-ansible-deployer]#
I think there are some misunderstandng that Tungsten Fabric always needs good routers to have external access.
It is actually not true, since from v4.1, it supports a feature called gatewayless, which allow containers directly communicate with outside world (it is also useful for similar usecase with calico)
To enable this feature, you can login Tungsten Fabric webui (https://(k8s masters's ip):8143, admin:contrail123) and reach Configure > Networks > k8s-default-pod-network, to toggle Advanced Options > IP Fabric Forwarding.
- You also need to set a network policy between that VN and default-domain:default-project:ip-fabric, since without that, RPF check will drop that packet
If ping from a container to k8s master ip is typed, you will notice k8s master receive a packet from container, and adding static route to k8s master, ping works well.
- please note that you need to configure k8s node's interface setting (EC2 > Network Interfaces > Change Source/Dest Check > Disabled) if you're using AWS.
So it allows similar setting with network nodes based external access, which is based on static route on routers.
You can optionally use IPV4 bgp in combination with gatewayless, which is also recommended, since it dynamically updates the next-hops for each containers and directly send packets to the correct vRouters, which eliminates bottleneck.
Note: this virtual-network can also be used as a source of floating-ip.
- Set 'Advanced Options' > 'External' to this virtual-network (Then floating ip pool will be created with the name 'default')
- Assign floating ip from kubernetes or openstack
- for kubernetes,it will be the source of external-ip, and need to be specifed with this parameter to kube-manager: KUBERNETES_PUBLIC_FIP_POOL example: KUBERNETES_PUBLIC_FIP_POOL={'domain': 'default-domain', 'project': 'k8s-default', 'network': 'public-network1', 'name': 'default' }
- for openstack, horizon or cli can be used to assign floating-ip to VMs,
- You can also directly assign floting-ip to specific port from Tungsten Fabric Webui. (Configure > Ports > edit > floating-ip)
Since it might be the first exposure to Tungsten Fabric, where to go after this reading is an important subject. There are a lot of things to be worked on, such as HA, monitoring, integration with other orchestrators or router/switches, etc ...
There are a lot of resources on the web, but to pick some, I'll firstly recommend some resources from Contrail packages and education materials, even if you only will use open source version.
- https://www.juniper.net/documentation/product/en_US/contrail-networking
- https://www.juniper.net/uk/en/training/certification/certification-tracks/cloud-track?tab=jncia-cloud
Tungsten Fabric is a powerful platform with bunch of features, such as security-policy, analytics, l3dsr loadbalancers, service-chain, bgpaas, to name some, and many of them are non trivial features to solve real world problems. Those links will contain a lot of contents and links to other resources.
There are also several communication channel such as mail lists and slack. Please try them if you need some help :) https://tungsten.io/community/
There are a lot of different components in Tungsten Fabric. Let me briefly describe the usage of these parts.
In summary, there are 7 roles and (up to) 30 micro-services in Tungsten Fabric.
- roles: vRouter, control, config, config-database, analytics (From 5.1, that can be further break down into analytics, analytics-snmp, analytics-alarm), analytics-database, webui
Although there are a lot of components, in simple usecase, only 4 role will be required
- vRouter, control, config, config-database , although in most cases, webui also will be a requirement.
You can also omit analytics if you're only interested in control-plane / data-plane part of Tungsten Fabric, although in that case, some feature (v1 service-chain, haproxy loadbalancer (and k8s ingress), SNAT etc) won't work well.
control, vRouter will be the control plane and data plane of Tungsten Fabric, so arguably, this is the most important part of Tungsten Fabric system.
Since both control and vRouter use MPLS-VPN internally, I would recommend at least skimming through this material before delving into the detail of them.
- https://www.juniper.net/uk/en/training/certification/certification-tracks/sp-routing-switching-track?tab=jncis-sp
- https://www.juniper.net/uk/en/training/certification/certification-tracks/sp-routing-switching-track?tab=jncip-sp
Since most of advanced features in control, vRouter is inherent in MPLS, those material will help to undestand what they are trying to do.
Since control and vrouter-agent uses VPNV4 bgp internally, vRouter and it's internal VRFs will install prefix needed based on extended community (a.k.a. route-target). So when containers or vms are created on vRouter, it can signal VPNV4 route to control, and it reflects all the routes to other vRouters, and dataplane will understand where to send the packets automatically.
One interesting behavior is vRouter's virtual-network could have multiple default gateway, with same ip and same mac! (similar behavior with virtual-gateway-address, in junos's term) Since no VRRP is required to serve default gw for each virtual-network, it eliminates the bottleneck and lets everything fully distributed.
vRouter also is doing flow based handling for some features like statefull firewall, NAT, flow-based ECMP, .. That is an important difference, since that behavior will introduce some tuning points, such as connection per second and maximum number of flows. (In packet based system, PPS (packet per second), and throughput (and latency in some case) will be the key) If you're system is keen on these parameter, perhaps you need to review these parameter also.
Note: This behavior is optionally disabled with 'packet-mode' parameter in 'ports' configuration
Config also has several components. Config-api serves an api endpoint for Tungsten Fabric configuration, which is used many components, like control, analytics, etc
- vRouter won't use that directly, since only the data needed is propagated from control, through xmpp
Two processes, schema-transformer and svc-monitor, are doing important things, so let me also describe them.
This process is converting some abstract config parameter, such as logical-router, network-policy, service-chain, into the words of L3VPN. So it is one of the core components of Tungsten Fabric, and doing most of all the things which can't be explained simply by MPLS-VPN.
Logical-router, as an example, internally creates a new route-target id, which will have all the prefix connected virtual-network has. So if virtual-network is attached logical-router, it would receive all the routes logical-router has. That behavior uses MPLS-VPN internally, but route-target configuration is controlled by schema-transformer.
So changes are propagated to dataplane in this manner:
edit config -> (rabbitmq) -> schema-transformer, which creates new route-target -> (internally edit config) -> (rabbitmq) -> control -> (xmpp) -> vrouter-agent -> (netlink) -> vrouter.ko
Schema-transformer also is doing all the things related to service-chain. I won't delve into all the detail of service chain, since that is not used simple DC usecases (even AWS VPC doesn't offer similar service currently), although internally, that's doing interesting handling of all the prefixes received around VRFs, and I personally think it is worth a read.
Note: You can have all the detail in this book.
This process serves several services which have to use external processes internally, such as haproxy load balancer, v1 service-chain instance based on nova API, iptables MASQUERADE for SNAT, ... .
Internally, vrouter-agent has some logic to kick haproxy or set iptables MASQUERADE, svc-monitor will kick that logic, when related service is defined.
Svc-monitor chooses some vRouters to create these services, and instantiate some network function and do traffic handling to these elements. To choose one, it uses analytics-api's output (analytics/uves/vrouter), and pick one that is 'Functional'.
That behavior is the one reason currently analytics is required for TungstenFabric installation, although it might be changed in the future release.
Tungsten Fabric uses several databases. Most of the data are saved in cassandra, and if they are changed, rabbitmq is notified those changes to propagate other components, such as control, schema-transformer, svc-monitor, ...
Zookeeper is used only for the operation that needs lock for consistency. For example, creating one port requires to assign one ip address, whose consistency is covered by zookeeper, so ip address assignment always will be one-by-one.
I think most of the important components are covered by now, so I will cover other parts. Firstly, let me describe what nodemgr is.
Nodemgr basically meant to be the source of the state of each node, so it checks things such as /'s usage, docker ps or cpu usage and send analytics UVE NodeStatus.
This value could be the source of contrail-status, and other logic like analytics-alarm or svc-monitor, which check if this value is Functional when it choose vRouter, so to keep those Functional is fairly important to make Tungsten Fabric operational.
This component have a bit different behavior if assigned different role. So it is installed on each node, with slightly different behavior.
Additionaly, it also does the first provision of each nodes, which means to notify config-api that this ip has a role xxx assigned. So even if the analytics feature is not required, this module need to be there, at least for the first time a node is up.
This process is used to configure physical-router, based on objects in config-database.
Internally, it uses the same logic with schema-transformer and svc-monitor, which subscribe rabbitmq to see config change, and when something changed, the amqp client kicks some logic.
- For schema-transformer, it will update some more config, for svc-monitor, it will kick some logic in vRouters, and for device-manager, it will update physical-router's configuration
This behavior is controlled by reaction_map, which defines the way some change on some config objects will propagate other config's change.
For example, when a bgp-router is updated,
'bgp_router': {
'self': ['bgp_router', 'physical_router'],
'bgp_router': ['physical_router'],
'physical_router': [],
},
based on 'self' definition, it will propagate to bgp-router and physical-router with refs to the original bgp-router object.
- For bgp-router, it mean bgp-router object which peered with the original bgp-router
After that, updated bgp-router in turn propagate that to the physical-router, which bgp-router objects resides on.
'bgp_router': {
(snip)
'bgp_router': ['physical_router'],
(snip)
},
Since physical-router won't update something when event is propagated from bgp-router, event stopped there, and physical-router config with original bgp-router, and peered bgp-router will be updated.
'physical_router': {
(snip)
'bgp_router': [],
(snip)
},
When physical-router receives update event, it will call push_conf function from plugin, it will basically create router config based on objects in config-database.
- currently, only MX / QFX has opensource plugin: https://github.com/Juniper/contrail-controller/tree/master/src/config/device-manager/device_manager/plugins/juniper
To enable this feature, this knob is needed to be configured in /etc/contrail/common_config.env: DEVICE_MANAGER__DEFAULTS__push_mode=0, and config procedure is described there: https://www.juniper.net/documentation/en_US/contrail5.0/topics/concept/using-device-manager-netconf-contrail.html
Unfortunately, this open source plugin didn't work well after R1911-latest .. :(
To make it work, please apply this patch (https://github.com/tnaganawa/tungstenfabric-docs/blob/master/R2011-latest-netconf-device-manager.patch) to 3 files under /usr/lib/python2.7/site-packages/device_manager, and it will work fine again.
- briefly tested with tungstenfabric/R2011-latest and vMX / vQFX
Note: After this commit (https://review.opencontrail.org/c/tungstenfabric/tf-controller/+/62296), the first attempt of setting device password to physical router failed with authentication failure, since device password was encrypted with incompatible way to that plugin. To make it work, please kindly update that password with the same one again (the second attempt should work).
Tungsten Fabric analytics has a lot of features, but most of the feature is currently optional, so let me skip most of the components. If interested, please check those links for snmp, lldp, alarms etc.
- https://tungsten.io/sandesh-a-sdn-analytics-interface/
- https://tungsten.io/operational-state-in-the-opencontrail-system-uve-user-visible-entities-through-analytics-api/
- https://tungsten.io/contrail-alerts/
- https://tungsten.io/overlay-to-physical-network-correlation/
Analytics itself has curious architecture, which covers both of logs/flows, and stats.
- AFAIK, those are frequently covered by different set of systems, such as EFK for logs/flows and prometheus for stats
If you need something handy for all of them, Tungsten Fabric analytics will be a good fit.
Most of the important metrics analytics serve is tagged as UVE (User Visible Entity), and have a URL to serve data with JSON format.
- http://(analytics-ip):8081/analytics/uves has all the values available
If you need to integrate Tungsten Fabric with other monitoring systems, that could be a good start point.
Analytics also uses several databases like redis, cassandra, kafka (internally, it also uses zookeeper for HA deployment of optional components).
If only analytics is used, redis is the only requirement and even in this setup, most of webui feature is available.
- Most of the visualization uses UVEs so that can be available even if cassandra is not installed
Cassandra is needed if you need 'Query' feature of webui, which retrieve logs/flows or stats in cassndra db.
Kafka is used to propagate UVEs to analytics-alarms, so if you want to use alarm feature, kafka is also required.
Finally, webui is reached. It basically is a simple webui, to see the status of components and to configure parameters for Tungsten Fabric.
A bit interesting behavior is it uses AJAX behavior, to update some graph which needs long query against analytics-api (such as Monitor > Dashboard access), and that async job is covered by webui-job process.
Tungsten Fabric has been integrated with several orchestrators.
Internally, Tungsten Fabric's orchestrator integration components basically do the same things with each orchstrator.
- assign a port when vm or container is up
- plug that to vm or container
Let me describe what is done for each orchestrator.
When used with openstack, neutron-plugin (https://github.com/Juniper/contrail-neutron-plugin) will be the main interface between openstack and Tungsten Fabric Controller.
Neutron-plugin will be directly loaded in neutron-api process (some modules need to be specified in neutron.conf), and that logic will do things related to neutron request/response, such as network-list or port-create, and so on.
One feature of this module is that it won't use neutron db, which will be created in MySQL in typical openstack setup.
Since it directly uses Tungsten Fabric db, some features, such as bridge assignment to vm, will be a bit more difficult to achieve.
- Since nova still uses the same vif assign logic, it might not be impossible to emulate neutron response to assign specific vif-type which can be used in neutron, although not all combination is tested, AFAIK.
- SR-IOV is the exception of this, since emulation of that is supported and tested well
- https://github.com/Juniper/contrail-controller/wiki/SRIOV
When a port is assigned vif-type: vrouter, which will be automatically done by 'create port' API through that neutron-plugin, it will use nova-vif-driver for vRouter (https://github.com/Juniper/contrail-nova-vif-driver), which will do some tasks other than just creating a tap device when called, such as creating vif on vRouter through vrouter-port-control script, etc.
- In most cases, you don't need to delve into the detail of those behavior. Although in some situations like live migration stopped somewhere, you might need to be careful about the status of vif ..
Note: One recent addition is tungsten fabric also has got ml2 based plugin.
So if users already use ml2 with MySQL, they can firstly add vRouter as one of ml2 network-type, use that in specific virtual-network, and migrate from other ml2 plugin to vRouter by detach and attach interface. (optionally to replace neutron core plugin, if all the migration finished)
Some installation detail is also added.
When used with kubernetes, the behavior is similar to openstack case, although it uses CNI for nova-vif-driver, and kube-manager for neutron-api.
- https://github.com/Juniper/contrail-controller/tree/master/src/container/cni
- https://github.com/Juniper/contrail-controller/tree/master/src/container/kube-manager
So when a container is created, kube-manager will create a port in Tungsten Fabric controller, and cni will assign that port to that container.
vCenter / Tungsten Fabric integration takes a bit different approach with kvm, since modules can't be installed directly on ESXi.
Firstly, to make the overlay available between ESXis, one vRouterVM needs to be created on each ESXi (that is a simple CentOS vm internally)
When one vm is created on that ESXi, and that was attached to dv-portgroup which was created by vcenter-plugin (https://github.com/Juniper/contrail-vcenter-plugin) when a virtual-network is created in 'vCenter' tenant, vcenter-manager (https://github.com/Juniper/contrail-vcenter-manager), which is installed on each vRouterVM with ESXi's ip / user / pass, will do two things.
- Set one vlan-id to the specific dv-portgroup port where that vm is attached
- Create a vif on the vRouterVM with interface(vlan) with the same vlan-id with that dv-portgroup port, and the VRF for that virtual-network
So when a vm sent a traffic, it will got tagged when it goes into dvswitch, and reach vRouterVM, and untagged there and go into the specific VRF, that the vm belongs to.
- Since traffic from each vm will be tagged with different vlan-id, micro-segmentation also will be achieved
After traffic go into vRouterVM, it will be the same behavior with kvm case.
Please note that those behavior will be kicked only when vm is attached to dv-portgroups create by Tungsten Fabric controller, so vm's interfaces can be still assigned to some vSS or vDS, to use underlay access.
- It is even possible to install vCenter and Tungsten Fabric controller to the same ESXi with vRouters (one ESXi install), if it is assigned to such as 'VM Network', rather than dv-portgroups created by Tungsten Fabric controller.
Since vRouter's behavior is the same with other cases, sharing virtual-networks between vCenter and openstack, or route leak between them are also readily available. So with Tungsten Fabric, it is much easier to use both VMIs simultaneously, with shared networks and network services, such as fw, lb, and so on.
In Up and Running section, I descirbed 1 controller and 1 vRouter setting, so no HA case is covered yet (And no overlay traffic case, indeed!) Let me describe more realistic case, with 3 controllers and 2 computes (and possibly with multi-NICs) for each orchestrator.
- In this chapter, I'll use opencontrailnightly:latest repo, since several features are not available in 5.0.1 release, but please notice that this repo could be a bit unstable in some cases.
When setup for serious traiffc is planned, HA always will be a requirement.
Tungsten Fabric has a decent HA implmentation, which are already documented there.
One thing I'd like to add is cassandra's keyspace has different replication-factor between configdb and analyticsdb.
- configdb: https://github.com/Juniper/contrail-controller/blob/master/src/config/common/vnc_cassandra.py#L609
- analytics: https://github.com/Juniper/contrail-analytics/blob/master/contrail-collector/db_handler.cc#L524
Since configdb's data is replicated to all cassandras, it is fairly unlikely to lose some data, even if some node's disk has crashed and needs to be wiped out. On the other hand, since analyticsdb's replication-factor is always two, if two nodes lost data simultaneously, the data could be lost.
When installing Tungsten Fabric, there are many situations that requires multi-nic installation, such as separate NIC for management plane and control / data plane.
- Bonding is not included in this discussion, since bond0 can be directly specified by VROUTER_GATEWAY parameter
Let me clarify curious behavior of vRouter in this setup.
For controller / analytics, that won't be much different from typical linux installation, since linux will work well with multiple NICs and its own routing-table, including the use of static route.
On the other hand, in vRouter nodes, you need to be a bit careful, since vRouter won't use linux routing-table when it sends packets, rather it always sends packets to one and only one gateway ip.
- It can be set with gateway parameter in contrail-vrouter-agent.conf, and VROUTER_GATEWAY in vrouter-agent container's environment variable
So when setting up multi-nic installation, you need to be a bit careful if you need to specify VROUTER_GATEWAY.
If it doesn't specified, vrouter-agent container will pick the nic that holds default route of that node, although that won't be the correct NIC, if internet access (0.0.0.0/0) is covered by management NIC, rather than data plane NIC.
In such situation, you need to explicitly specify VROUTER_GATEWAY parameter.
Because of those behavior, you also need to be a bit careful when you want to send packets from vms or containers to the NICs other than one NIC vRouter uses, since it also doesn't check linux routing-table, and it always uses the same NIC with other vRouter traffic.
- AFAIK, packets from link-local serivce or gatewayless also show similar behavior
In such situation, you might need to use simple-gateway or SR-IOV.
For general sizing of Tungsten Fabric cluster, you can use this table.
If cluster size is large, you need good amount of resources to serve stable control plane.
Please note that from R5.1, analytics database (and some components of analytics) become optional, so I would recommend using R5.1 release, if you want to use control plane only from Tungsten Fabric.
How large a cluster can be also is an important subject, although I don't have a handy answer, since it depends a lot of factors.
- I once tried nearly 5,000 nodes with one k8s cluster (https://kubernetes.io/docs/setup/cluster-large/). It worked well with one controller node with 64vCPUs, 58GB mem, although at that time, I haven't created much ports, policies, and logical-routers, etc.
- This wiki also has some real world experience about gigantic cluster: https://wiki.tungsten.io/display/TUN/KubeCon+NA+in+Seattle+2018
Since you can instantly got a lot of resources from cloud, perhaps the best option is to emulate the cluster with the size and traffic what you need, and see if it works ok and what will be the bottleneck.
Tungsten Fabric has several good features to be gigantic, such as multi-cluster setup based on MP-BGP between clusters, and BUM drop feature based on L3-only virtual-network, which could be a key to have scalable and stable virtual-network.
To illustrate control's scale-out behavior, I created a cluster with 980 vRouters and 15 controls in AWS.
- All the control nodes have 4vCPUs and 16GB mem
Following instances.yaml is used to provision controller nodes,
and non-nested.yaml for kubeadm (https://github.com/Juniper/contrail-container-builder/blob/master/kubernetes/manifests/contrail-non-nested-kubernetes.yaml) is used to provision vRouters
(venv) [root@ip-172-31-21-119 ~]# cat contrail-ansible-deployer/config/instances.yaml
provider_config:
bms:
ssh_user: centos
ssh_public_key: /root/.ssh/id_rsa.pub
ssh_private_key: /tmp/aaa.pem
domainsuffix: local
ntpserver: 0.centos.pool.ntp.org
instances:
bms1:
provider: bms
roles:
config_database:
config:
control:
analytics:
webui:
ip: 172.31.21.119
bms2:
provider: bms
roles:
control:
analytics:
ip: 172.31.21.78
bms3:
provider: bms
...
bms13:
provider: bms
roles:
control:
analytics:
ip: 172.31.14.189
bms14:
provider: bms
roles:
control:
analytics:
ip: 172.31.2.159
bms15:
provider: bms
roles:
control:
analytics:
ip: 172.31.7.239
contrail_configuration:
CONTRAIL_CONTAINER_TAG: r5.1
KUBERNETES_CLUSTER_PROJECT: {}
JVM_EXTRA_OPTS: "-Xms128m -Xmx1g"
global_configuration:
CONTAINER_REGISTRY: tungstenfabric
(venv) [root@ip-172-31-21-119 ~]#
[root@ip-172-31-4-80 ~]# kubectl get node | head
NAME STATUS ROLES AGE VERSION
ip-172-31-0-112.ap-northeast-1.compute.internal Ready <none> 9m24s v1.15.0
ip-172-31-0-116.ap-northeast-1.compute.internal Ready <none> 9m37s v1.15.0
ip-172-31-0-133.ap-northeast-1.compute.internal Ready <none> 9m37s v1.15.0
ip-172-31-0-137.ap-northeast-1.compute.internal Ready <none> 9m24s v1.15.0
ip-172-31-0-141.ap-northeast-1.compute.internal Ready <none> 9m24s v1.15.0
ip-172-31-0-142.ap-northeast-1.compute.internal Ready <none> 9m24s v1.15.0
ip-172-31-0-151.ap-northeast-1.compute.internal Ready <none> 9m37s v1.15.0
ip-172-31-0-163.ap-northeast-1.compute.internal Ready <none> 9m37s v1.15.0
ip-172-31-0-168.ap-northeast-1.compute.internal Ready <none> 9m16s v1.15.0
[root@ip-172-31-4-80 ~]#
[root@ip-172-31-4-80 ~]# kubectl get node | grep -w Ready | wc -l
980
[root@ip-172-31-4-80 ~]#
(venv) [root@ip-172-31-21-119 ~]# contrail-api-cli --host 172.31.21.119 ls virtual-router | wc -l
980
(venv) [root@ip-172-31-21-119 ~]#
When number of control nodes are 15, number of XMPP connections are up 113, so CPU usage is not so high (up to 5.4%).
[root@ip-172-31-21-119 ~]# ./contrail-introspect-cli/ist.py ctr nei | grep -w XMPP | wc -l
113
[root@ip-172-31-21-119 ~]#
top - 05:52:14 up 42 min, 1 user, load average: 1.73, 5.50, 3.57
Tasks: 154 total, 1 running, 153 sleeping, 0 stopped, 0 zombie
%Cpu(s): 2.4 us, 2.9 sy, 0.0 ni, 94.6 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem : 15233672 total, 8965420 free, 2264516 used, 4003736 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 12407304 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
32368 root 20 0 839848 55240 11008 S 7.6 0.4 0:21.40 contrail-collec
28773 root 20 0 1311252 132552 14540 S 5.3 0.9 1:40.72 contrail-contro
17129 polkitd 20 0 56076 22496 1624 S 3.7 0.1 0:11.42 redis-server
32438 root 20 0 248496 40336 5328 S 2.0 0.3 0:15.80 python
18346 polkitd 20 0 2991576 534452 22992 S 1.7 3.5 4:56.90 java
15344 root 20 0 972324 97248 35360 S 1.3 0.6 2:25.84 dockerd
15351 root 20 0 1477100 32988 12532 S 0.7 0.2 0:08.72 docker-containe
18365 centos 20 0 5353996 131388 9288 S 0.7 0.9 0:09.49 java
19994 polkitd 20 0 3892836 127772 3644 S 0.7 0.8 1:34.55 beam.smp
17112 root 20 0 7640 3288 2456 S 0.3 0.0 0:00.24 docker-containe
24723 root 20 0 716512 68920 6288 S 0.3 0.5 0:01.75 node
However, when 12 control nodes are stoppped, number of XMPP connections per one control will be as high as 708, so CPU usage become pretty high (21.6%).
So if you need to provision fairly large number of nodes, number of control nodes might need to be planned carefully.
[root@ip-172-31-21-119 ~]# ./contrail-introspect-cli/ist.py ctr nei | grep -w BGP
| ip-172-31-13-119.local | 172.31.13.119 | 64512 | BGP | internal | Active | not advertising | 1 | 2019-Jun-29 06:10:47.527354 |
| ip-172-31-13-87.local | 172.31.13.87 | 64512 | BGP | internal | Active | not advertising | 1 | 2019-Jun-29 06:10:08.610734 |
| ip-172-31-14-189.local | 172.31.14.189 | 64512 | BGP | internal | Active | not advertising | 1 | 2019-Jun-29 06:16:34.953311 |
| ip-172-31-14-243.local | 172.31.14.243 | 64512 | BGP | internal | Active | not advertising | 1 | 2019-Jun-29 06:06:12.379006 |
| ip-172-31-17-212.local | 172.31.17.212 | 64512 | BGP | internal | Active | not advertising | 1 | 2019-Jun-29 06:03:15.650529 |
| ip-172-31-2-159.local | 172.31.2.159 | 64512 | BGP | internal | Established | in sync | 0 | n/a |
| ip-172-31-21-78.local | 172.31.21.78 | 64512 | BGP | internal | Active | not advertising | 1 | 2019-Jun-29 05:58:15.068791 |
| ip-172-31-22-95.local | 172.31.22.95 | 64512 | BGP | internal | Active | not advertising | 1 | 2019-Jun-29 05:59:43.238465 |
| ip-172-31-23-207.local | 172.31.23.207 | 64512 | BGP | internal | Active | not advertising | 1 | 2019-Jun-29 06:02:24.922901 |
| ip-172-31-25-214.local | 172.31.25.214 | 64512 | BGP | internal | Active | not advertising | 1 | 2019-Jun-29 06:04:52.624323 |
| ip-172-31-30-137.local | 172.31.30.137 | 64512 | BGP | internal | Active | not advertising | 1 | 2019-Jun-29 06:05:33.020029 |
| ip-172-31-4-76.local | 172.31.4.76 | 64512 | BGP | internal | Active | not advertising | 1 | 2019-Jun-29 06:12:04.853319 |
| ip-172-31-7-239.local | 172.31.7.239 | 64512 | BGP | internal | Established | in sync | 0 | n/a |
| ip-172-31-9-245.local | 172.31.9.245 | 64512 | BGP | internal | Active | not advertising | 1 | 2019-Jun-29 06:07:01.750834 |
[root@ip-172-31-21-119 ~]# ./contrail-introspect-cli/ist.py ctr nei | grep -w XMPP | wc -l
708
[root@ip-172-31-21-119 ~]#
top - 06:19:56 up 1:10, 1 user, load average: 2.04, 2.47, 2.27
Tasks: 156 total, 2 running, 154 sleeping, 0 stopped, 0 zombie
%Cpu(s): 11.5 us, 9.7 sy, 0.0 ni, 78.4 id, 0.0 wa, 0.0 hi, 0.3 si, 0.2 st
KiB Mem : 15233672 total, 7878520 free, 3006892 used, 4348260 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 11648264 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
32368 root 20 0 890920 145632 11008 S 15.6 1.0 3:25.34 contrail-collec
28773 root 20 0 1357728 594448 14592 S 13.0 3.9 9:00.69 contrail-contro
18686 root 20 0 249228 41000 5328 R 10.3 0.3 1:00.89 python
15344 root 20 0 972324 97248 35360 S 9.0 0.6 3:26.60 dockerd
17129 polkitd 20 0 107624 73908 1644 S 8.3 0.5 1:50.81 redis-server
21458 root 20 0 248352 40084 5328 S 2.7 0.3 0:41.11 python
18302 root 20 0 9048 3476 2852 S 2.0 0.0 0:05.32 docker-containe
28757 root 20 0 248476 40196 5328 S 1.7 0.3 0:37.21 python
32438 root 20 0 248496 40348 5328 S 1.7 0.3 0:34.26 python
15351 root 20 0 1477100 33204 12532 S 1.3 0.2 0:16.82 docker-containe
18346 polkitd 20 0 2991576 563864 25552 S 1.0 3.7 5:45.65 java
19994 polkitd 20 0 3880472 129392 3644 S 0.7 0.8 1:51.54 beam.smp
28744 root 20 0 1373980 136520 12180 S 0.7 0.9 3:13.94 contrail-dns
When I'm writing this document, ansible-deployer haven't yet supported k8s master HA.
Since kubeadm already supports k8s master HA, I'll describe the way to integrate kubeadm based k8s install and YAML based Tungsten Fabric install.
- https://kubernetes.io/docs/setup/independent/high-availability/
- https://github.com/Juniper/contrail-ansible-deployer/wiki/Provision-Contrail-Kubernetes-Cluster-in-Non-nested-Mode
As other CNIs, Tungsten Fabric also can be installed directly by 'kubectl apply' command. But to achieve this, you need to configure some parameters, such as IP addr of controller nodes, manually.
For this example setup, I used 5 EC2 instances (AMI is the same, ami-3185744e). 2 vcpu, 8 GB mem, 20 GB disk is assigned to those instances. VPC has CIDR with 172.31.0.0/16
(on all nodes)
# cat <<CONTENTS > install-k8s-packages.sh
bash -c 'cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF'
setenforce 0
yum install -y kubelet kubeadm kubectl docker
systemctl enable docker && systemctl start docker
systemctl enable kubelet && systemctl start kubelet
echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables
swapoff -a
CONTENTS
# bash install-k8s-packages.sh
(on the first k8s master node)
yum -y install haproxy
# vi /etc/haproxy/haproxy.cfg
(add those lines at the last of this file)
listen kube
mode tcp
bind 0.0.0.0:1443
server master1 172.31.13.9:6443
server master2 172.31.8.73:6443
server master3 172.31.32.58:6443
# systemctl start haproxy
# systemctl enable haproxy
# vi kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
kubernetesVersion: stable
apiServer:
certSANs:
- "ip-172-31-13-9"
controlPlaneEndpoint: "ip-172-31-13-9:1443"
# kubeadm init --config=kubeadm-config.yaml
(save those lines for later use)
kubeadm join ip-172-31-13-9:1443 --token mlq9gw.gt5m13cbro6c8xsu \
--discovery-token-ca-cert-hash sha256:677ea74fa03311a38ecb497d2f0803a5ea1eea85765aa2daa4503f24dd747f9a \
--experimental-control-plane
kubeadm join ip-172-31-13-9:1443 --token mlq9gw.gt5m13cbro6c8xsu \
--discovery-token-ca-cert-hash sha256:677ea74fa03311a38ecb497d2f0803a5ea1eea85765aa2daa4503f24dd747f9a
# mkdir -p $HOME/.kube
# cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
# chown $(id -u):$(id -g) $HOME/.kube/config
# cd /etc/kubernetes
# tar czvf /tmp/k8s-master-ca.tar.gz pki/ca.crt pki/ca.key pki/sa.key pki/sa.pub pki/front-proxy-ca.crt pki/front-proxy-ca.key pki/etcd/ca.crt pki/etcd/ca.key admin.conf
(scp that tar file to 2nd and 3rd k8s master node)
(On 2nd and 3rd k8s master nodes)
# mkdir -p /etc/kubernetes/pki/etcd
# cd /etc/kubernetes
# tar xvf /tmp/k8s-master-ca.tar.gz
# kubeadm join ip-172-31-13-9:1443 --token mlq9gw.gt5m13cbro6c8xsu \
--discovery-token-ca-cert-hash sha256:677ea74fa03311a38ecb497d2f0803a5ea1eea85765aa2daa4503f24dd747f9a \
--experimental-control-plane
(on k8s nodes)
- type kubeadm join commands, which is previosly saved
# kubeadm join ip-172-31-13-9:1443 --token mlq9gw.gt5m13cbro6c8xsu \
--discovery-token-ca-cert-hash sha256:677ea74fa03311a38ecb497d2f0803a5ea1eea85765aa2daa4503f24dd747f9a
(on the first k8s master node)
# vi set-label.sh
masternodes=$(kubectl get node | grep -w master | awk '{print $1}')
agentnodes=$(kubectl get node | grep -v -w -e master -e NAME | awk '{print $1}')
for i in config configdb analytics webui control
do
for masternode in ${masternodes}
do
kubectl label node ${masternode} node-role.opencontrail.org/${i}=
done
done
for i in ${agentnodes}
do
kubectl label node ${i} node-role.opencontrail.org/agent=
done
# bash set-label.sh
# yum -y install git
# git clone https://github.com/Juniper/contrail-container-builder.git
# cd /root/contrail-container-builder/kubernetes/manifests
# cat <<EOF > ../../common.env
CONTRAIL_CONTAINER_TAG=latest
CONTRAIL_REGISTRY=opencontrailnightly
EOF
# ./resolve-manifest.sh contrail-standalone-kubernetes.yaml > cni-tungsten-fabric.yaml
# vi cni-tungsten-fabric.yaml
(manually modify those lines)
- lines which includes ANALYTICS_API_VIP, CONFIG_API_VIP, VROUTER_GATEWAY need to be deleted
- Several lines which include ANALYTICS_NODES. ANALYTICSDB_NODES, CONFIG_NODES, CONFIGDB_NODES, CONTROL_NODES, CONTROLLER_NODES, RABBITMQ_NODES, ZOOKEEPER_NODES need to be set properly, like CONFIG_NODES: ip1,ip2,ip3
# kubectl apply -f cni-tungsten-fabric.yaml
I'll attach original and modified yaml file for further reference.
- https://github.com/tnaganawa/tungstenfabric-docs/blob/master/cni-tungsten-fabric.yaml.orig
- https://github.com/tnaganawa/tungstenfabric-docs/blob/master/cni-tungsten-fabric.yaml
Then you finally have kubernetes HA environment with TungstenFabric CNI, which is (mostly) up.
Note: Coredns is not active in this output, I'll fix this later in this section.
[root@ip-172-31-13-9 ~]# kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ip-172-31-13-9.ap-northeast-1.compute.internal NotReady master 34m v1.14.1 172.31.13.9 <none> CentOS Linux 7 (Core) 3.10.0-862.2.3.el7.x86_64 docker://1.13.1
ip-172-31-17-120.ap-northeast-1.compute.internal Ready <none> 30m v1.14.1 172.31.17.120 <none> CentOS Linux 7 (Core) 3.10.0-862.2.3.el7.x86_64 docker://1.13.1
ip-172-31-32-58.ap-northeast-1.compute.internal NotReady master 32m v1.14.1 172.31.32.58 <none> CentOS Linux 7 (Core) 3.10.0-862.2.3.el7.x86_64 docker://1.13.1
ip-172-31-5-235.ap-northeast-1.compute.internal Ready <none> 30m v1.14.1 172.31.5.235 <none> CentOS Linux 7 (Core) 3.10.0-862.2.3.el7.x86_64 docker://1.13.1
ip-172-31-8-73.ap-northeast-1.compute.internal NotReady master 31m v1.14.1 172.31.8.73 <none> CentOS Linux 7 (Core) 3.10.0-862.2.3.el7.x86_64 docker://1.13.1
[root@ip-172-31-13-9 ~]# kubectl get pod --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system config-zookeeper-d897f 1/1 Running 0 7m14s 172.31.13.9 ip-172-31-13-9.ap-northeast-1.compute.internal <none> <none>
kube-system config-zookeeper-fvnbq 1/1 Running 0 7m14s 172.31.32.58 ip-172-31-32-58.ap-northeast-1.compute.internal <none> <none>
kube-system config-zookeeper-t5vjc 1/1 Running 0 7m14s 172.31.8.73 ip-172-31-8-73.ap-northeast-1.compute.internal <none> <none>
kube-system contrail-agent-cqpxc 2/2 Running 0 7m12s 172.31.17.120 ip-172-31-17-120.ap-northeast-1.compute.internal <none> <none>
kube-system contrail-agent-pv7c8 2/2 Running 0 7m12s 172.17.0.1 ip-172-31-5-235.ap-northeast-1.compute.internal <none> <none>
kube-system contrail-analytics-cfcx8 3/3 Running 0 7m14s 172.31.13.9 ip-172-31-13-9.ap-northeast-1.compute.internal <none> <none>
kube-system contrail-analytics-h5jbr 3/3 Running 0 7m14s 172.31.8.73 ip-172-31-8-73.ap-northeast-1.compute.internal <none> <none>
kube-system contrail-analytics-wvc5n 3/3 Running 0 7m14s 172.31.32.58 ip-172-31-32-58.ap-northeast-1.compute.internal <none> <none>
kube-system contrail-config-database-nodemgr-7f5h5 1/1 Running 0 7m14s 172.31.13.9 ip-172-31-13-9.ap-northeast-1.compute.internal <none> <none>
kube-system contrail-config-database-nodemgr-bkmpz 1/1 Running 0 7m14s 172.31.8.73 ip-172-31-8-73.ap-northeast-1.compute.internal <none> <none>
kube-system contrail-config-database-nodemgr-z6qx9 1/1 Running 0 7m14s 172.31.32.58 ip-172-31-32-58.ap-northeast-1.compute.internal <none> <none>
kube-system contrail-configdb-5vd8t 1/1 Running 0 7m14s 172.31.32.58 ip-172-31-32-58.ap-northeast-1.compute.internal <none> <none>
kube-system contrail-configdb-kw6v7 1/1 Running 0 7m14s 172.31.8.73 ip-172-31-8-73.ap-northeast-1.compute.internal <none> <none>
kube-system contrail-configdb-vjv2b 1/1 Running 0 7m14s 172.31.13.9 ip-172-31-13-9.ap-northeast-1.compute.internal <none> <none>
kube-system contrail-controller-config-dk78j 5/5 Running 0 7m13s 172.31.13.9 ip-172-31-13-9.ap-northeast-1.compute.internal <none> <none>
kube-system contrail-controller-config-jrh27 5/5 Running 0 7m14s 172.31.32.58 ip-172-31-32-58.ap-northeast-1.compute.internal <none> <none>
kube-system contrail-controller-config-snxnn 5/5 Running 0 7m13s 172.31.8.73 ip-172-31-8-73.ap-northeast-1.compute.internal <none> <none>
kube-system contrail-controller-control-446v8 4/4 Running 0 7m14s 172.31.13.9 ip-172-31-13-9.ap-northeast-1.compute.internal <none> <none>
kube-system contrail-controller-control-fzpwz 4/4 Running 0 7m14s 172.31.32.58 ip-172-31-32-58.ap-northeast-1.compute.internal <none> <none>
kube-system contrail-controller-control-tk52v 4/4 Running 1 7m14s 172.31.8.73 ip-172-31-8-73.ap-northeast-1.compute.internal <none> <none>
kube-system contrail-controller-webui-94s26 2/2 Running 0 7m13s 172.31.32.58 ip-172-31-32-58.ap-northeast-1.compute.internal <none> <none>
kube-system contrail-controller-webui-bdzbj 2/2 Running 0 7m13s 172.31.13.9 ip-172-31-13-9.ap-northeast-1.compute.internal <none> <none>
kube-system contrail-controller-webui-qk4ww 2/2 Running 0 7m13s 172.31.8.73 ip-172-31-8-73.ap-northeast-1.compute.internal <none> <none>
kube-system contrail-kube-manager-g6vsg 1/1 Running 0 7m12s 172.31.8.73 ip-172-31-8-73.ap-northeast-1.compute.internal <none> <none>
kube-system contrail-kube-manager-ppjkf 1/1 Running 0 7m12s 172.31.13.9 ip-172-31-13-9.ap-northeast-1.compute.internal <none> <none>
kube-system contrail-kube-manager-rjpmw 1/1 Running 0 7m12s 172.31.32.58 ip-172-31-32-58.ap-northeast-1.compute.internal <none> <none>
kube-system coredns-fb8b8dccf-wmdw2 0/1 Running 2 34m 10.47.255.252 ip-172-31-17-120.ap-northeast-1.compute.internal <none> <none>
kube-system coredns-fb8b8dccf-wsrtl 0/1 Running 2 34m 10.47.255.251 ip-172-31-17-120.ap-northeast-1.compute.internal <none> <none>
kube-system etcd-ip-172-31-13-9.ap-northeast-1.compute.internal 1/1 Running 0 33m 172.31.13.9 ip-172-31-13-9.ap-northeast-1.compute.internal <none> <none>
kube-system etcd-ip-172-31-32-58.ap-northeast-1.compute.internal 1/1 Running 0 32m 172.31.32.58 ip-172-31-32-58.ap-northeast-1.compute.internal <none> <none>
kube-system etcd-ip-172-31-8-73.ap-northeast-1.compute.internal 1/1 Running 0 30m 172.31.8.73 ip-172-31-8-73.ap-northeast-1.compute.internal <none> <none>
kube-system kube-apiserver-ip-172-31-13-9.ap-northeast-1.compute.internal 1/1 Running 0 33m 172.31.13.9 ip-172-31-13-9.ap-northeast-1.compute.internal <none> <none>
kube-system kube-apiserver-ip-172-31-32-58.ap-northeast-1.compute.internal 1/1 Running 1 32m 172.31.32.58 ip-172-31-32-58.ap-northeast-1.compute.internal <none> <none>
kube-system kube-apiserver-ip-172-31-8-73.ap-northeast-1.compute.internal 1/1 Running 1 30m 172.31.8.73 ip-172-31-8-73.ap-northeast-1.compute.internal <none> <none>
kube-system kube-controller-manager-ip-172-31-13-9.ap-northeast-1.compute.internal 1/1 Running 1 33m 172.31.13.9 ip-172-31-13-9.ap-northeast-1.compute.internal <none> <none>
kube-system kube-controller-manager-ip-172-31-32-58.ap-northeast-1.compute.internal 1/1 Running 0 31m 172.31.32.58 ip-172-31-32-58.ap-northeast-1.compute.internal <none> <none>
kube-system kube-controller-manager-ip-172-31-8-73.ap-northeast-1.compute.internal 1/1 Running 0 31m 172.31.8.73 ip-172-31-8-73.ap-northeast-1.compute.internal <none> <none>
kube-system kube-proxy-6ls9w 1/1 Running 0 32m 172.31.32.58 ip-172-31-32-58.ap-northeast-1.compute.internal <none> <none>
kube-system kube-proxy-82jl8 1/1 Running 0 30m 172.31.5.235 ip-172-31-5-235.ap-northeast-1.compute.internal <none> <none>
kube-system kube-proxy-bjdj9 1/1 Running 0 31m 172.31.8.73 ip-172-31-8-73.ap-northeast-1.compute.internal <none> <none>
kube-system kube-proxy-nd7hq 1/1 Running 0 31m 172.31.17.120 ip-172-31-17-120.ap-northeast-1.compute.internal <none> <none>
kube-system kube-proxy-rb7nk 1/1 Running 0 34m 172.31.13.9 ip-172-31-13-9.ap-northeast-1.compute.internal <none> <none>
kube-system kube-scheduler-ip-172-31-13-9.ap-northeast-1.compute.internal 1/1 Running 1 33m 172.31.13.9 ip-172-31-13-9.ap-northeast-1.compute.internal <none> <none>
kube-system kube-scheduler-ip-172-31-32-58.ap-northeast-1.compute.internal 1/1 Running 0 31m 172.31.32.58 ip-172-31-32-58.ap-northeast-1.compute.internal <none> <none>
kube-system kube-scheduler-ip-172-31-8-73.ap-northeast-1.compute.internal 1/1 Running 0 31m 172.31.8.73 ip-172-31-8-73.ap-northeast-1.compute.internal <none> <none>
kube-system rabbitmq-9lp4n 1/1 Running 0 7m12s 172.31.13.9 ip-172-31-13-9.ap-northeast-1.compute.internal <none> <none>
kube-system rabbitmq-lxkgz 1/1 Running 0 7m12s 172.31.8.73 ip-172-31-8-73.ap-northeast-1.compute.internal <none> <none>
kube-system rabbitmq-wfk2f 1/1 Running 0 7m12s 172.31.32.58 ip-172-31-32-58.ap-northeast-1.compute.internal <none> <none>
kube-system redis-h2x2b 1/1 Running 0 7m13s 172.31.13.9 ip-172-31-13-9.ap-northeast-1.compute.internal <none> <none>
kube-system redis-pkmng 1/1 Running 0 7m13s 172.31.8.73 ip-172-31-8-73.ap-northeast-1.compute.internal <none> <none>
kube-system redis-r68ks 1/1 Running 0 7m13s 172.31.32.58 ip-172-31-32-58.ap-northeast-1.compute.internal <none> <none>
[root@ip-172-31-13-9 ~]#
[root@ip-172-31-13-9 ~]# contrail-status
Pod Service Original Name State Id Status
redis contrail-external-redis running 8f38c94fc370 Up About a minute
analytics api contrail-analytics-api running 2edde00b4525 Up About a minute
analytics collector contrail-analytics-collector running c1d0c24775a6 Up About a minute
analytics nodemgr contrail-nodemgr running 4a4c455cc0df Up About a minute
config api contrail-controller-config-api running b855ad79ace4 Up About a minute
config device-manager contrail-controller-config-devicemgr running 50d590e6f6cf Up About a minute
config nodemgr contrail-nodemgr running 6f0f64f958d9 Up About a minute
config schema contrail-controller-config-schema running 2057b21f50b7 Up About a minute
config svc-monitor contrail-controller-config-svcmonitor running ba48df5cb7f9 Up About a minute
config-database cassandra contrail-external-cassandra running 1d38278d304e Up About a minute
config-database nodemgr contrail-nodemgr running 8e4f9315cc38 Up About a minute
config-database rabbitmq contrail-external-rabbitmq running 4a424e2f456c Up About a minute
config-database zookeeper contrail-external-zookeeper running 4b46c83f1376 Up About a minute
control control contrail-controller-control-control running 17e4b9b9e3b8 Up About a minute
control dns contrail-controller-control-dns running 39fc34e19e13 Up About a minute
control named contrail-controller-control-named running aef0bf56a0e2 Up About a minute
control nodemgr contrail-nodemgr running 21f091df35d5 Up About a minute
kubernetes kube-manager contrail-kubernetes-kube-manager running db661ef685b0 Up About a minute
webui job contrail-controller-webui-job running 0bf35b774aac Up About a minute
webui web contrail-controller-webui-web running 9213ce050547 Up About a minute
== Contrail control ==
control: active
nodemgr: active
named: active
dns: active
== Contrail config-database ==
nodemgr: active
zookeeper: active
rabbitmq: active
cassandra: active
== Contrail kubernetes ==
kube-manager: backup
== Contrail analytics ==
nodemgr: active
api: active
collector: active
== Contrail webui ==
web: active
job: active
== Contrail config ==
svc-monitor: backup
nodemgr: active
device-manager: active
api: active
schema: backup
[root@ip-172-31-13-9 ~]#
[root@ip-172-31-8-73 ~]# contrail-status
Pod Service Original Name State Id Status
redis contrail-external-redis running 39af38401d31 Up 2 minutes
analytics api contrail-analytics-api running 29fa05f18927 Up 2 minutes
analytics collector contrail-analytics-collector running 994bffbe4c1f Up About a minute
analytics nodemgr contrail-nodemgr running 1eb143c7b864 Up About a minute
config api contrail-controller-config-api running 92ee8983bc81 Up About a minute
config device-manager contrail-controller-config-devicemgr running 7f9ab5d2a9ca Up About a minute
config nodemgr contrail-nodemgr running c6a88b487031 Up About a minute
config schema contrail-controller-config-schema running 1fe2e2767dca Up About a minute
config svc-monitor contrail-controller-config-svcmonitor running ec1d66894036 Up About a minute
config-database cassandra contrail-external-cassandra running 80f394c8d1a8 Up 2 minutes
config-database nodemgr contrail-nodemgr running af9b70285564 Up About a minute
config-database rabbitmq contrail-external-rabbitmq running edae18a7cf9f Up 2 minutes
config-database zookeeper contrail-external-zookeeper running f00c2e5d94ac Up 2 minutes
control control contrail-controller-control-control running 6e3e22625a50 Up About a minute
control dns contrail-controller-control-dns running b1b6b9649761 Up About a minute
control named contrail-controller-control-named running f8aa237fca10 Up About a minute
control nodemgr contrail-nodemgr running bb0868390322 Up About a minute
kubernetes kube-manager contrail-kubernetes-kube-manager running 02e99f8b9490 Up About a minute
webui job contrail-controller-webui-job running f5ffdfc1076f Up About a minute
webui web contrail-controller-webui-web running 09c3f77223d3 Up About a minute
== Contrail control ==
control: active
nodemgr: active
named: active
dns: active
== Contrail config-database ==
nodemgr: active
zookeeper: active
rabbitmq: active
cassandra: active
== Contrail kubernetes ==
kube-manager: backup
== Contrail analytics ==
nodemgr: active
api: active
collector: active
== Contrail webui ==
web: active
job: active
== Contrail config ==
svc-monitor: backup
nodemgr: active
device-manager: backup
api: active
schema: backup
[root@ip-172-31-8-73 ~]#
[root@ip-172-31-32-58 ~]# contrail-status
Pod Service Original Name State Id Status
redis contrail-external-redis running 44363e63f104 Up 2 minutes
analytics api contrail-analytics-api running aa8c5dc17c57 Up 2 minutes
analytics collector contrail-analytics-collector running 6856b8e33f34 Up 2 minutes
analytics nodemgr contrail-nodemgr running c1ec67695618 Up About a minute
config api contrail-controller-config-api running ff95a8e3e4a9 Up 2 minutes
config device-manager contrail-controller-config-devicemgr running abc0ad6b32c0 Up 2 minutes
config nodemgr contrail-nodemgr running c883e525205a Up About a minute
config schema contrail-controller-config-schema running 0b18780b02da Up About a minute
config svc-monitor contrail-controller-config-svcmonitor running 42e74aad3d3d Up About a minute
config-database cassandra contrail-external-cassandra running 3994d9f51055 Up 2 minutes
config-database nodemgr contrail-nodemgr running 781c5c93e662 Up 2 minutes
config-database rabbitmq contrail-external-rabbitmq running 849427f37237 Up 2 minutes
config-database zookeeper contrail-external-zookeeper running fbb778620915 Up 2 minutes
control control contrail-controller-control-control running 85b2e8366a13 Up 2 minutes
control dns contrail-controller-control-dns running b1f05dc6b8ee Up 2 minutes
control named contrail-controller-control-named running ca68ff0e118b Up About a minute
control nodemgr contrail-nodemgr running cf8aaff71343 Up About a minute
kubernetes kube-manager contrail-kubernetes-kube-manager running 62022a542509 Up 2 minutes
webui job contrail-controller-webui-job running 28413e9f378b Up 2 minutes
webui web contrail-controller-webui-web running 4a6edac6d596 Up 2 minutes
== Contrail control ==
control: active
nodemgr: active
named: active
dns: active
== Contrail config-database ==
nodemgr: active
zookeeper: active
rabbitmq: active
cassandra: active
== Contrail kubernetes ==
kube-manager: active
== Contrail analytics ==
nodemgr: active
api: active
collector: active
== Contrail webui ==
web: active
job: active
== Contrail config ==
svc-monitor: active
nodemgr: active
device-manager: backup
api: active
schema: active
[root@ip-172-31-32-58 ~]#
[root@ip-172-31-5-235 ~]# contrail-status
Pod Service Original Name State Id Status
vrouter agent contrail-vrouter-agent running 48377d29f584 Up 2 minutes
vrouter nodemgr contrail-nodemgr running 77d7a409d410 Up 2 minutes
vrouter kernel module is PRESENT
== Contrail vrouter ==
nodemgr: active
agent: active
[root@ip-172-31-5-235 ~]#
[root@ip-172-31-17-120 ~]# contrail-status
Pod Service Original Name State Id Status
vrouter agent contrail-vrouter-agent running f97837959a0b Up 3 minutes
vrouter nodemgr contrail-nodemgr running 4e48673efbcc Up 3 minutes
vrouter kernel module is PRESENT
== Contrail vrouter ==
nodemgr: active
agent: active
[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py ctr nei
+--------------------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------------------------+
| peer | peer_address | peer_asn | encoding | peer_type | state | send_state | flap_count | flap_time |
+--------------------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------------------------+
| ip-172-31-32-58.ap- | 172.31.32.58 | 64512 | BGP | internal | Established | in sync | 0 | n/a |
| northeast-1.compute.internal | | | | | | | | |
| ip-172-31-8-73.ap- | 172.31.8.73 | 64512 | BGP | internal | Established | in sync | 0 | n/a |
| northeast-1.compute.internal | | | | | | | | |
| ip-172-31-17-120.ap- | 172.31.17.120 | 0 | XMPP | internal | Established | in sync | 5 | 2019-Apr-28 07:35:40.743648 |
| northeast-1.compute.internal | | | | | | | | |
| ip-172-31-5-235.ap- | 172.31.5.235 | 0 | XMPP | internal | Established | in sync | 6 | 2019-Apr-28 07:35:40.251476 |
| northeast-1.compute.internal | | | | | | | | |
+--------------------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------------------------+
[root@ip-172-31-13-9 ~]#
[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py ctr route summary
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| name | prefixes | paths | primary_paths | secondary_paths | infeasible_paths |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| default-domain:default- | 0 | 0 | 0 | 0 | 0 |
| project:__link_local__:__link_local__.inet.0 | | | | | |
| default-domain:default-project:dci- | 0 | 0 | 0 | 0 | 0 |
| network:__default__.inet.0 | | | | | |
| default-domain:default-project:dci-network:dci- | 0 | 0 | 0 | 0 | 0 |
| network.inet.0 | | | | | |
| default-domain:default-project:default-virtual- | 0 | 0 | 0 | 0 | 0 |
| network:default-virtual-network.inet.0 | | | | | |
| inet.0 | 0 | 0 | 0 | 0 | 0 |
| default-domain:default-project:ip-fabric:ip- | 4 | 8 | 2 | 6 | 0 |
| fabric.inet.0 | | | | | |
| default-domain:k8s-default:k8s-default-pod-network | 4 | 8 | 2 | 6 | 0 |
| :k8s-default-pod-network.inet.0 | | | | | |
| default-domain:k8s-default:k8s-default-service- | 4 | 8 | 0 | 8 | 0 |
| network:k8s-default-service-network.inet.0 | | | | | |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
[root@ip-172-31-13-9 ~]#
After the cirros deployment is created just like Up and Running section, ping between two vRouter nodes will be available.
- Output is the same, but it now uses MPLS encapsulation between two vRouters!
[root@ip-172-31-13-9 ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cirros-deployment-86885fbf85-pkzqz 1/1 Running 0 16s 10.47.255.249 ip-172-31-17-120.ap-northeast-1.compute.internal <none> <none>
cirros-deployment-86885fbf85-w4w6h 1/1 Running 0 16s 10.47.255.250 ip-172-31-5-235.ap-northeast-1.compute.internal <none> <none>
[root@ip-172-31-13-9 ~]#
[root@ip-172-31-13-9 ~]#
[root@ip-172-31-13-9 ~]# kubectl exec -it cirros-deployment-86885fbf85-pkzqz sh
/ # ping 10.47.255.250
PING 10.47.255.250 (10.47.255.250): 56 data bytes
64 bytes from 10.47.255.250: seq=0 ttl=63 time=3.376 ms
64 bytes from 10.47.255.250: seq=1 ttl=63 time=2.587 ms
64 bytes from 10.47.255.250: seq=2 ttl=63 time=2.549 ms
^C
--- 10.47.255.250 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 2.549/2.837/3.376 ms
/ #
/ #
/ # ip -o a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000\ link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
1: lo inet6 ::1/128 scope host \ valid_lft forever preferred_lft forever
23: eth0@if24: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue \ link/ether 02:64:0d:41:b0:69 brd ff:ff:ff:ff:ff:ff
23: eth0 inet 10.47.255.249/12 scope global eth0\ valid_lft forever preferred_lft forever
23: eth0 inet6 fe80::489a:28ff:fedf:2e7b/64 scope link \ valid_lft forever preferred_lft forever
/ #
[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py ctr route summary
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| name | prefixes | paths | primary_paths | secondary_paths | infeasible_paths |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| default-domain:default- | 0 | 0 | 0 | 0 | 0 |
| project:__link_local__:__link_local__.inet.0 | | | | | |
| default-domain:default-project:dci- | 0 | 0 | 0 | 0 | 0 |
| network:__default__.inet.0 | | | | | |
| default-domain:default-project:dci-network:dci- | 0 | 0 | 0 | 0 | 0 |
| network.inet.0 | | | | | |
| default-domain:default-project:default-virtual- | 0 | 0 | 0 | 0 | 0 |
| network:default-virtual-network.inet.0 | | | | | |
| inet.0 | 0 | 0 | 0 | 0 | 0 |
| default-domain:default-project:ip-fabric:ip- | 6 | 12 | 2 | 10 | 0 |
| fabric.inet.0 | | | | | |
| default-domain:k8s-default:k8s-default-pod-network | 6 | 12 | 4 | 8 | 0 |
| :k8s-default-pod-network.inet.0 | | | | | |
| default-domain:k8s-default:k8s-default-service- | 6 | 12 | 0 | 12 | 0 |
| network:k8s-default-service-network.inet.0 | | | | | |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
[root@ip-172-31-13-9 ~]#
[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py ctr route show -t default-domain:k8s-default:k8s-default-pod-network:k8s-default-pod-network.inet.0 10.47.255.251
default-domain:k8s-default:k8s-default-pod-network:k8s-default-pod-network.inet.0: 6 destinations, 12 routes (4 primary, 8 secondary, 0 infeasible)
10.47.255.251/32, age: 0:08:37.590508, last_modified: 2019-Apr-28 07:37:16.031523
[XMPP (interface)|ip-172-31-17-120.ap-northeast-1.compute.internal] age: 0:08:37.596128, localpref: 200, nh: 172.31.17.120, encap: ['gre', 'udp'], label: 25, AS path: None
[BGP|172.31.32.58] age: 0:08:37.594533, localpref: 200, nh: 172.31.17.120, encap: ['gre', 'udp'], label: 25, AS path: None
[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py ctr route show -t default-domain:k8s-default:k8s-default-pod-network:k8s-default-pod-network.inet.0 10.47.255.250
default-domain:k8s-default:k8s-default-pod-network:k8s-default-pod-network.inet.0: 6 destinations, 12 routes (4 primary, 8 secondary, 0 infeasible)
10.47.255.250/32, age: 0:01:50.135045, last_modified: 2019-Apr-28 07:44:06.371447
[XMPP (interface)|ip-172-31-5-235.ap-northeast-1.compute.internal] age: 0:01:50.141480, localpref: 200, nh: 172.31.5.235, encap: ['gre', 'udp'], label: 25, AS path: None
[BGP|172.31.32.58] age: 0:01:50.098328, localpref: 200, nh: 172.31.5.235, encap: ['gre', 'udp'], label: 25, AS path: None
[root@ip-172-31-13-9 ~]#
Note: To make coredns active, I need to make two changes
- Update: When tungstenfabric, R2011-latest is used for container tag, this procedure is not needed and coredns becomes active automatically (kubernetes v1.19.6 is used)
[root@ip-172-31-8-73 ~]# kubectl edit configmap -n kube-system coredns
- forward . /etc/resolv.conf
+ forward . 10.47.255.253
# kubectl edit deployment -n kube-system coredns
-> delete livenessProbe, readinessProbe
Then finally, coredns also is active and cluster is fully UP!
[root@ip-172-31-13-9 ~]# kubectl get pod --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default cirros-deployment-86885fbf85-pkzqz 1/1 Running 0 47m
default cirros-deployment-86885fbf85-w4w6h 1/1 Running 0 47m
kube-system config-zookeeper-l8m9l 1/1 Running 0 24m
kube-system config-zookeeper-lvtmq 1/1 Running 0 24m
kube-system config-zookeeper-mzlgm 1/1 Running 0 24m
kube-system contrail-agent-jc4x2 2/2 Running 0 24m
kube-system contrail-agent-psk2v 2/2 Running 0 24m
kube-system contrail-analytics-hsm7w 3/3 Running 0 24m
kube-system contrail-analytics-vgwcb 3/3 Running 0 24m
kube-system contrail-analytics-xbpwf 3/3 Running 0 24m
kube-system contrail-config-database-nodemgr-7xvnb 1/1 Running 0 24m
kube-system contrail-config-database-nodemgr-9bznv 1/1 Running 0 24m
kube-system contrail-config-database-nodemgr-lqtkq 1/1 Running 0 24m
kube-system contrail-configdb-4svwg 1/1 Running 0 24m
kube-system contrail-configdb-gdvmc 1/1 Running 0 24m
kube-system contrail-configdb-sll25 1/1 Running 0 24m
kube-system contrail-controller-config-gmkpr 5/5 Running 0 24m
kube-system contrail-controller-config-q6rvx 5/5 Running 0 24m
kube-system contrail-controller-config-zbpjm 5/5 Running 0 24m
kube-system contrail-controller-control-4m9fd 4/4 Running 0 24m
kube-system contrail-controller-control-9klxh 4/4 Running 0 24m
kube-system contrail-controller-control-wk6jp 4/4 Running 0 24m
kube-system contrail-controller-webui-268bc 2/2 Running 0 24m
kube-system contrail-controller-webui-57dbf 2/2 Running 0 24m
kube-system contrail-controller-webui-z6c68 2/2 Running 0 24m
kube-system contrail-kube-manager-6nh9d 1/1 Running 0 24m
kube-system contrail-kube-manager-stqf5 1/1 Running 0 24m
kube-system contrail-kube-manager-wqgl4 1/1 Running 0 24m
kube-system coredns-7f865bd4f9-g8j8f 1/1 Running 0 13s
kube-system coredns-7f865bd4f9-zftsc 1/1 Running 0 13s
kube-system etcd-ip-172-31-13-9.ap-northeast-1.compute.internal 1/1 Running 0 82m
kube-system etcd-ip-172-31-32-58.ap-northeast-1.compute.internal 1/1 Running 0 81m
kube-system etcd-ip-172-31-8-73.ap-northeast-1.compute.internal 1/1 Running 0 79m
kube-system kube-apiserver-ip-172-31-13-9.ap-northeast-1.compute.internal 1/1 Running 0 82m
kube-system kube-apiserver-ip-172-31-32-58.ap-northeast-1.compute.internal 1/1 Running 1 81m
kube-system kube-apiserver-ip-172-31-8-73.ap-northeast-1.compute.internal 1/1 Running 1 80m
kube-system kube-controller-manager-ip-172-31-13-9.ap-northeast-1.compute.internal 1/1 Running 1 83m
kube-system kube-controller-manager-ip-172-31-32-58.ap-northeast-1.compute.internal 1/1 Running 0 80m
kube-system kube-controller-manager-ip-172-31-8-73.ap-northeast-1.compute.internal 1/1 Running 0 80m
kube-system kube-proxy-6ls9w 1/1 Running 0 81m
kube-system kube-proxy-82jl8 1/1 Running 0 80m
kube-system kube-proxy-bjdj9 1/1 Running 0 81m
kube-system kube-proxy-nd7hq 1/1 Running 0 80m
kube-system kube-proxy-rb7nk 1/1 Running 0 83m
kube-system kube-scheduler-ip-172-31-13-9.ap-northeast-1.compute.internal 1/1 Running 1 83m
kube-system kube-scheduler-ip-172-31-32-58.ap-northeast-1.compute.internal 1/1 Running 0 80m
kube-system kube-scheduler-ip-172-31-8-73.ap-northeast-1.compute.internal 1/1 Running 0 80m
kube-system rabbitmq-b6rpx 1/1 Running 0 24m
kube-system rabbitmq-gn67t 1/1 Running 0 24m
kube-system rabbitmq-r8dvb 1/1 Running 0 24m
kube-system redis-5qvbv 1/1 Running 0 24m
kube-system redis-8mck5 1/1 Running 0 24m
kube-system redis-9d9dv 1/1 Running 0 24m
[root@ip-172-31-13-9 ~]#
[root@ip-172-31-13-9 ~]# kubectl get deployment -n kube-system
NAME READY UP-TO-DATE AVAILABLE AGE
coredns 2/2 2 2 98m
[root@ip-172-31-13-9 ~]#
[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py ctr route summary
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| name | prefixes | paths | primary_paths | secondary_paths | infeasible_paths |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| default-domain:default- | 0 | 0 | 0 | 0 | 0 |
| project:__link_local__:__link_local__.inet.0 | | | | | |
| default-domain:default-project:dci- | 0 | 0 | 0 | 0 | 0 |
| network:__default__.inet.0 | | | | | |
| default-domain:default-project:dci-network:dci- | 3 | 8 | 0 | 8 | 0 |
| network.inet.0 | | | | | |
| default-domain:default-project:default-virtual- | 0 | 0 | 0 | 0 | 0 |
| network:default-virtual-network.inet.0 | | | | | |
| inet.0 | 0 | 0 | 0 | 0 | 0 |
| default-domain:default-project:ip-fabric:ip- | 5 | 12 | 2 | 10 | 0 |
| fabric.inet.0 | | | | | |
| default-domain:k8s-default:k8s-default-pod-network | 5 | 14 | 4 | 10 | 0 |
| :k8s-default-pod-network.inet.0 | | | | | |
| default-domain:k8s-default:k8s-default-service- | 5 | 12 | 2 | 10 | 0 |
| network:k8s-default-service-network.inet.0 | | | | | |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
[root@ip-172-31-13-9 ~]#
Since MP-BGP supports stitching between two clusters, those clusters are easily extended to multi-cluster environment.
- Prefixes from each cluster will be leaked to other cluster
I'll describe the detail of this setup in Appendix section.
Openstack HA installation is directly covered by ansible-deployer.
For this example setup, I used 5 EC2 instances (AMI is the same, ami-3185744e). 2 vcpu, 8 GB mem, 20 GB disk is assigned to those instances. VPC has CIDR with 172.31.0.0/16.
yum -y install epel-release
yum -y install git ansible-2.4.2.0
ssh-keygen
cd .ssh/
cat id_rsa.pub >> authorized_keys
cd
git clone http://github.com/Juniper/contrail-ansible-deployer
cd contrail-ansible-deployer
vi config/instances.yaml
(replace contents with this)
provider_config:
bms:
ssh_user: root
ssh_public_key: /root/.ssh/id_rsa.pub
ssh_private_key: /root/.ssh/id_rsa
domainsuffix: local
ntpserver: 0.centos.pool.ntp.org
instances:
bms1:
provider: bms
ip: 172.31.6.90 # controller1's ip
roles:
config_database:
config:
control:
analytics:
webui:
openstack:
bms2:
provider: bms
ip: 172.31.25.90 # controller2's ip
roles:
config_database:
config:
control:
analytics:
webui:
openstack:
bms3:
provider: bms
ip: 172.31.31.242 # controller3's ip
roles:
config_database:
config:
control:
analytics:
webui:
openstack:
bms11:
provider: bms
ip: 172.31.42.209 # compute1's ip
roles:
vrouter:
openstack_compute:
bms12:
provider: bms
ip: 172.31.15.199 # compute2's ip
roles:
vrouter:
openstack_compute:
contrail_configuration:
RABBITMQ_NODE_PORT: 5673
AUTH_MODE: keystone
KEYSTONE_AUTH_URL_VERSION: /v3
JVM_EXTRA_OPTS: "-Xms128m -Xmx1g"
kolla_config:
kolla_globals:
kolla_internal_vip_address: 172.31.0.11 ## kolla-ansible will deploy haproxy to serve HA vip
kolla_passwords:
keystone_admin_password: contrail123 # admin user's password
global_configuration:
## if previously described AMI is used, it uses cloud-init packages whose rpm dependency is not compatible with ansible-deployer in R5.1 and later. To workaroud this, I used these commands.
yum -y remove PyYAML python-requests
easy_install pip
pip install PyYAML requests
pip install ansible
ansible-playbook -e orchestrator=openstack -i inventory/ playbooks/configure_instances.yml
- it takes about 10 minutes
ansible-playbook -e orchestrator=openstack -i inventory/ playbooks/install_openstack.yml
- it takes about 40 minutes
ansible-playbook -e orchestrator=openstack -i inventory/ playbooks/install_contrail.yml
- it takes about 20 minutes
[root@ip-172-31-6-90 ~]# contrail-status
Pod Service Original Name State Id Status
redis contrail-external-redis running 23ef79b48ae8 Up 41 minutes
analytics api contrail-analytics-api running 3139f5fd9256 Up 36 minutes
analytics collector contrail-analytics-collector running 89c9e02fb551 Up 36 minutes
analytics nodemgr contrail-nodemgr running 5eecb461f95c Up 36 minutes
config api contrail-controller-config-api running fb0dc55f76c7 Up 39 minutes
config device-manager contrail-controller-config-devicemgr running 8dbff58776a2 Up 39 minutes
config nodemgr contrail-nodemgr running b64af838545d Up 39 minutes
config schema contrail-controller-config-schema running 83e0acf17e39 Up 39 minutes
config svc-monitor contrail-controller-config-svcmonitor running 623e17e8e74e Up 39 minutes
config-database cassandra contrail-external-cassandra running db30d874dce3 Up 40 minutes
config-database nodemgr contrail-nodemgr running 590463f627f6 Up 38 minutes
config-database rabbitmq contrail-external-rabbitmq running 712ee26dda64 Up 40 minutes
config-database zookeeper contrail-external-zookeeper running 46dbdec00e46 Up 40 minutes
control control contrail-controller-control-control running 3e0e653d1588 Up 37 minutes
control dns contrail-controller-control-dns running 2cebc37c18cf Up 37 minutes
control named contrail-controller-control-named running 112bd2d8ed5f Up 37 minutes
control nodemgr contrail-nodemgr running f2e0fdc4bfb2 Up 37 minutes
device-manager dnsmasq contrail-external-dnsmasq running f84b45234d70 Up 39 minutes
webui job contrail-controller-webui-job running 3dece86513a1 Up 38 minutes
webui web contrail-controller-webui-web running 408c772b1628 Up 38 minutes
== Contrail control ==
control: active
nodemgr: active
named: active
dns: active
== Contrail config-database ==
nodemgr: initializing (Disk for DB is too low. )
zookeeper: active
rabbitmq: active
cassandra: active
== Contrail analytics ==
nodemgr: active
api: active
collector: active
== Contrail webui ==
web: active
job: active
== Contrail device-manager ==
== Contrail config ==
svc-monitor: backup
nodemgr: active
device-manager: backup
api: active
schema: backup
[root@ip-172-31-25-90 ~]# contrail-status
Pod Service Original Name State Id Status
redis contrail-external-redis running 1ed7e967085e Up 41 minutes
analytics api contrail-analytics-api running 7392ea345e83 Up 36 minutes
analytics collector contrail-analytics-collector running 82332a53a566 Up 36 minutes
analytics nodemgr contrail-nodemgr running 89141bb180cd Up 36 minutes
config api contrail-controller-config-api running b2af8bc8a6d7 Up 38 minutes
config device-manager contrail-controller-config-devicemgr running d8ed77431dfa Up 39 minutes
config nodemgr contrail-nodemgr running 8c7f3d5f05e4 Up 39 minutes
config schema contrail-controller-config-schema running 4a6099aaea2a Up 39 minutes
config svc-monitor contrail-controller-config-svcmonitor running 3a3e6d37b30e Up 39 minutes
config-database cassandra contrail-external-cassandra running 0b05e121c017 Up 40 minutes
config-database nodemgr contrail-nodemgr running fb4857fe16c1 Up 39 minutes
config-database rabbitmq contrail-external-rabbitmq running a8137277a40f Up 40 minutes
config-database zookeeper contrail-external-zookeeper running 9571f4d9fde2 Up 40 minutes
control control contrail-controller-control-control running 5460dc02cc03 Up 37 minutes
control dns contrail-controller-control-dns running 17b27877ef6e Up 37 minutes
control named contrail-controller-control-named running cdbe1bae4c40 Up 37 minutes
control nodemgr contrail-nodemgr running cb36c2b4625a Up 37 minutes
device-manager dnsmasq contrail-external-dnsmasq running dd9002e6f58d Up 39 minutes
webui job contrail-controller-webui-job running 60dc895d439e Up 38 minutes
webui web contrail-controller-webui-web running 3ddfb5e2e851 Up 38 minutes
== Contrail control ==
control: active
nodemgr: active
named: active
dns: active
== Contrail config-database ==
nodemgr: initializing (Disk for DB is too low. )
zookeeper: active
rabbitmq: active
cassandra: active
== Contrail analytics ==
nodemgr: active
api: active
collector: active
== Contrail webui ==
web: active
job: active
== Contrail device-manager ==
== Contrail config ==
svc-monitor: backup
nodemgr: active
device-manager: active
api: active
schema: backup
[root@ip-172-31-31-242 ~]# contrail-status
Pod Service Original Name State Id Status
redis contrail-external-redis running 172e35daca5a Up 42 minutes
analytics api contrail-analytics-api running 2edf90837a43 Up 36 minutes
analytics collector contrail-analytics-collector running 812d4c190841 Up 36 minutes
analytics nodemgr contrail-nodemgr running d0eafce0d49d Up 36 minutes
config api contrail-controller-config-api running 7819c7792960 Up 39 minutes
config device-manager contrail-controller-config-devicemgr running c22addf8f1f1 Up 38 minutes
config nodemgr contrail-nodemgr running bd742928f26e Up 39 minutes
config schema contrail-controller-config-schema running 8ad72d0a2c12 Up 39 minutes
config svc-monitor contrail-controller-config-svcmonitor running 86283bfc21dc Up 39 minutes
config-database cassandra contrail-external-cassandra running 315d17494665 Up 41 minutes
config-database nodemgr contrail-nodemgr running a78521b2b940 Up 39 minutes
config-database rabbitmq contrail-external-rabbitmq running dfefb054808b Up 41 minutes
config-database zookeeper contrail-external-zookeeper running a16d1a2d259b Up 41 minutes
control control contrail-controller-control-control running bc9ecb41131c Up 37 minutes
control dns contrail-controller-control-dns running beff8cf11fdd Up 37 minutes
control named contrail-controller-control-named running 2322d5598a24 Up 37 minutes
control nodemgr contrail-nodemgr running 32b611d85d19 Up 37 minutes
device-manager dnsmasq contrail-external-dnsmasq running a0b3dd0ad254 Up 39 minutes
webui job contrail-controller-webui-job running 257721b46207 Up 38 minutes
webui web contrail-controller-webui-web running c2e7b95e7321 Up 38 minutes
== Contrail control ==
control: active
nodemgr: active
named: active
dns: active
== Contrail config-database ==
nodemgr: initializing (Disk for DB is too low. )
zookeeper: active
rabbitmq: active
cassandra: active
== Contrail analytics ==
nodemgr: active
api: active
collector: active
== Contrail webui ==
web: active
job: active
== Contrail device-manager ==
== Contrail config ==
svc-monitor: active
nodemgr: active
device-manager: backup
api: active
schema: active
[root@ip-172-31-42-209 ~]# contrail-status
Pod Service Original Name State Id Status
vrouter agent contrail-vrouter-agent running a17883037f12 Up 36 minutes
vrouter nodemgr contrail-nodemgr running 6dc2258ac4f6 Up 36 minutes
vrouter kernel module is PRESENT
== Contrail vrouter ==
nodemgr: active
agent: active
[root@ip-172-31-15-199 ~]# contrail-status
Pod Service Original Name State Id Status
vrouter agent contrail-vrouter-agent running a1e7767b3302 Up 36 minutes
vrouter nodemgr contrail-nodemgr running 40d5613fec21 Up 36 minutes
vrouter kernel module is PRESENT
== Contrail vrouter ==
nodemgr: active
agent: active
Then, you can create instances with openstack command.
docker cp /etc/kolla/kolla-toolbox/admin-openrc.sh kolla_toolbox:/var/tmp
docker exec -it kolla_toolbox bash
source /var/tmp/admin-openrc.sh
cd /var/tmp
curl -O -L http://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img
openstack image create cirros --disk-format qcow2 --public --container-format bare --file cirros-0.4.0-x86_64-disk.img
openstack flavor create --ram 512 --disk 1 --vcpus 1 m1.tiny
openstack network create testvn
openstack subnet create --subnet-range 192.168.100.0/24 --network testvn subnet1
NET_ID=`openstack network list | grep testvn | awk -F '|' '{print $2}' | tr -d ' '`
openstack server create --flavor m1.tiny --image cirros --nic net-id=${NET_ID} vm1
openstack server create --flavor m1.tiny --image cirros --nic net-id=${NET_ID} vm2
exit
(on compute nodes)
ip route ## check metadata ip of two instances
ssh cirros@169.254.0.x
ping 192.168.100.4
(kolla-toolbox)[ansible@ip-172-31-6-90 /]$ openstack server list
+--------------------------------------+------+--------+----------------------+--------+---------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+------+--------+----------------------+--------+---------+
| 9d66f0ed-d7d5-4a53-983d-dfba0385bd22 | vm2 | ACTIVE | testvn=192.168.100.4 | cirros | m1.tiny |
| 6595b4c1-1e6f-4f02-8f66-83b6355065b2 | vm1 | ACTIVE | testvn=192.168.100.3 | cirros | m1.tiny |
+--------------------------------------+------+--------+----------------------+--------+---------+
(kolla-toolbox)[ansible@ip-172-31-6-90 /]$
[root@ip-172-31-42-209 ~]# ip route
default via 172.31.32.1 dev vhost0
169.254.0.1 dev vhost0 proto 109 scope link
169.254.0.3 dev vhost0 proto 109 scope link
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
172.31.32.0/20 dev vhost0 proto kernel scope link src 172.31.42.209
[root@ip-172-31-42-209 ~]# ssh cirros@169.254.0.3
cirros@169.254.0.3's password:
$ ip -o a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1\ link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
1: lo inet6 ::1/128 scope host \ valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000\ link/ether 02:79:59:ea:d4:17 brd ff:ff:ff:ff:ff:ff
2: eth0 inet 192.168.100.3/24 brd 192.168.100.255 scope global eth0\ valid_lft forever preferred_lft forever
2: eth0 inet6 fe80::79:59ff:feea:d417/64 scope link \ valid_lft forever preferred_lft forever
$
$ ping 192.168.100.4
PING 192.168.100.4 (192.168.100.4): 56 data bytes
64 bytes from 192.168.100.4: seq=0 ttl=64 time=13.876 ms
64 bytes from 192.168.100.4: seq=1 ttl=64 time=2.417 ms
64 bytes from 192.168.100.4: seq=2 ttl=64 time=2.375 ms
^C
--- 192.168.100.4 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 2.375/6.222/13.876 ms
$
$
[root@ip-172-31-15-199 ~]# ip route
default via 172.31.0.1 dev vhost0
169.254.0.1 dev vhost0 proto 109 scope link
169.254.0.3 dev vhost0 proto 109 scope link
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
172.31.0.0/20 dev vhost0 proto kernel scope link src 172.31.15.199
[root@ip-172-31-15-199 ~]# ssh cirros@169.254.0.3
cirros@169.254.0.3's password:
$ ip -o a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1\ link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
1: lo inet6 ::1/128 scope host \ valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000\ link/ether 02:08:e6:0d:1e:3b brd ff:ff:ff:ff:ff:ff
2: eth0 inet 192.168.100.4/24 brd 192.168.100.255 scope global eth0\ valid_lft forever preferred_lft forever
2: eth0 inet6 fe80::8:e6ff:fe0d:1e3b/64 scope link \ valid_lft forever preferred_lft forever
$
Note: you might need this setting to be added, if compute nodes don't support kvm.
vi /etc/kolla/nova-compute/nova.conf
(add them in [libvirt] section)
virt_type=qemu
cpu_mode=none
docker restart nova_compute
Note: If AWS is used, you also need to set Networking > Manage IP Addresses from EC2 instance's right click menu, to allow access to the haproxy VIP from other nodes
Finally, full HA between controllers and overlay between 2 computes are configured!
There are some points that is not covered in this document, such as the behavior when some controllers are down, or live migration between them are performed between computes. When I tried live migration last time, about 1 sec packet loss is seen, but please check in your own setup, since there are a lot of points to be taken cared of. (prefix will be updated when live migration is finished)
Looking through each controller's neighbor state and routing table, you can see curious difference between them.
[root@ip-172-31-6-90 ~]# ./contrail-introspect-cli/ist.py ctr nei
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| peer | peer_address | peer_asn | encoding | peer_type | state | send_state | flap_count | flap_time |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| ip-172-31-25-90.local | 172.31.25.90 | 64512 | BGP | internal | Established | in sync | 0 | n/a |
| ip-172-31-31-242.local | 172.31.31.242 | 64512 | BGP | internal | Established | in sync | 0 | n/a |
| ip-172-31-42-209.local | 172.31.42.209 | 0 | XMPP | internal | Established | in sync | 0 | n/a |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
[root@ip-172-31-6-90 ~]# ./contrail-introspect-cli/ist.py --host 172.31.25.90 ctr nei
Introspect Host: 172.31.25.90
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| peer | peer_address | peer_asn | encoding | peer_type | state | send_state | flap_count | flap_time |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| ip-172-31-31-242.local | 172.31.31.242 | 64512 | BGP | internal | Established | in sync | 0 | n/a |
| ip-172-31-6-90.local | 172.31.6.90 | 64512 | BGP | internal | Established | in sync | 0 | n/a |
| ip-172-31-15-199.local | 172.31.15.199 | 0 | XMPP | internal | Established | in sync | 0 | n/a |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
[root@ip-172-31-6-90 ~]#
[root@ip-172-31-6-90 ~]# ./contrail-introspect-cli/ist.py --host 172.31.31.242 ctr nei
Introspect Host: 172.31.31.242
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| peer | peer_address | peer_asn | encoding | peer_type | state | send_state | flap_count | flap_time |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| ip-172-31-25-90.local | 172.31.25.90 | 64512 | BGP | internal | Established | in sync | 0 | n/a |
| ip-172-31-6-90.local | 172.31.6.90 | 64512 | BGP | internal | Established | in sync | 0 | n/a |
| ip-172-31-15-199.local | 172.31.15.199 | 0 | XMPP | internal | Established | in sync | 0 | n/a |
| ip-172-31-42-209.local | 172.31.42.209 | 0 | XMPP | internal | Established | in sync | 0 | n/a |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
[root@ip-172-31-6-90 ~]#
[root@ip-172-31-6-90 ~]#
[root@ip-172-31-6-90 ~]# ./contrail-introspect-cli/ist.py ctr route summary
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| name | prefixes | paths | primary_paths | secondary_paths | infeasible_paths |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| default-domain:admin:testvn:testvn.inet.0 | 2 | 4 | 1 | 3 | 0 |
| default-domain:default- | 0 | 0 | 0 | 0 | 0 |
| project:__link_local__:__link_local__.inet.0 | | | | | |
| default-domain:default-project:dci- | 0 | 0 | 0 | 0 | 0 |
| network:__default__.inet.0 | | | | | |
| default-domain:default-project:dci-network:dci- | 0 | 0 | 0 | 0 | 0 |
| network.inet.0 | | | | | |
| default-domain:default-project:default-virtual- | 0 | 0 | 0 | 0 | 0 |
| network:default-virtual-network.inet.0 | | | | | |
| inet.0 | 0 | 0 | 0 | 0 | 0 |
| default-domain:default-project:ip-fabric:ip- | 1 | 1 | 1 | 0 | 0 |
| fabric.inet.0 | | | | | |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
[root@ip-172-31-6-90 ~]# ./contrail-introspect-cli/ist.py --host 172.31.25.90 ctr route summary
Introspect Host: 172.31.25.90
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| name | prefixes | paths | primary_paths | secondary_paths | infeasible_paths |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| default-domain:admin:testvn:testvn.inet.0 | 2 | 4 | 1 | 3 | 0 |
| default-domain:default- | 0 | 0 | 0 | 0 | 0 |
| project:__link_local__:__link_local__.inet.0 | | | | | |
| default-domain:default-project:dci- | 0 | 0 | 0 | 0 | 0 |
| network:__default__.inet.0 | | | | | |
| default-domain:default-project:dci-network:dci- | 0 | 0 | 0 | 0 | 0 |
| network.inet.0 | | | | | |
| default-domain:default-project:default-virtual- | 0 | 0 | 0 | 0 | 0 |
| network:default-virtual-network.inet.0 | | | | | |
| inet.0 | 0 | 0 | 0 | 0 | 0 |
| default-domain:default-project:ip-fabric:ip- | 1 | 1 | 1 | 0 | 0 |
| fabric.inet.0 | | | | | |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
[root@ip-172-31-6-90 ~]# ./contrail-introspect-cli/ist.py --host 172.31.31.242 ctr route summary
Introspect Host: 172.31.31.242
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| name | prefixes | paths | primary_paths | secondary_paths | infeasible_paths |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| default-domain:admin:testvn:testvn.inet.0 | 2 | 4 | 2 | 2 | 0 |
| default-domain:default- | 0 | 0 | 0 | 0 | 0 |
| project:__link_local__:__link_local__.inet.0 | | | | | |
| default-domain:default-project:dci- | 0 | 0 | 0 | 0 | 0 |
| network:__default__.inet.0 | | | | | |
| default-domain:default-project:dci-network:dci- | 0 | 0 | 0 | 0 | 0 |
| network.inet.0 | | | | | |
| default-domain:default-project:default-virtual- | 0 | 0 | 0 | 0 | 0 |
| network:default-virtual-network.inet.0 | | | | | |
| inet.0 | 0 | 0 | 0 | 0 | 0 |
| default-domain:default-project:ip-fabric:ip- | 2 | 2 | 2 | 0 | 0 |
| fabric.inet.0 | | | | | |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
[root@ip-172-31-6-90 ~]#
[root@ip-172-31-6-90 ~]# ./contrail-introspect-cli/ist.py --host 172.31.31.242 ctr route show 192.168.100.3
Introspect Host: 172.31.31.242
default-domain:admin:testvn:testvn.inet.0: 2 destinations, 4 routes (2 primary, 2 secondary, 0 infeasible)
192.168.100.3/32, age: 0:01:18.234010, last_modified: 2019-Apr-27 14:03:19.075046
[XMPP (interface)|ip-172-31-42-209.local] age: 0:01:18.239011, localpref: 200, nh: 172.31.42.209, encap: ['gre', 'udp'], label: 25, AS path: None
[BGP|172.31.6.90] age: 0:01:18.230559, localpref: 200, nh: 172.31.42.209, encap: ['gre', 'udp'], label: 25, AS path: None
[root@ip-172-31-6-90 ~]# ./contrail-introspect-cli/ist.py --host 172.31.31.242 ctr route show 192.168.100.4
Introspect Host: 172.31.31.242
default-domain:admin:testvn:testvn.inet.0: 2 destinations, 4 routes (2 primary, 2 secondary, 0 infeasible)
192.168.100.4/32, age: 0:00:52.035230, last_modified: 2019-Apr-27 14:03:47.460835
[XMPP (interface)|ip-172-31-15-199.local] age: 0:00:52.039485, localpref: 200, nh: 172.31.15.199, encap: ['gre', 'udp'], label: 25, AS path: None
[BGP|172.31.25.90] age: 0:00:51.996464, localpref: 200, nh: 172.31.15.199, encap: ['gre', 'udp'], label: 25, AS path: None
[root@ip-172-31-6-90 ~]#
Since vRouter always have 2 XMPP connections, when 3 controllers are there, XMPP connection states are not the same between controllers, and routing tables could be a bit different between them. Considering route target filtering, it is even possible that they have completely different routing tables, if some of controllers don't received some specific route-target from XMPP.
That is from the nature of scale-out behavior of Tungsten Fabric.
For more detailed configuration of ansible-deployer (including multi-NIC sample), you can check those documents.
- https://github.com/Juniper/contrail-ansible-deployer/wiki/Contrail-with-Openstack-Kolla
- https://github.com/Juniper/contrail-ansible-deployer/wiki/Configuration-Sample-for-Multi-Node-Openstack-HA-and-Contrail-(single-interface)
- https://github.com/Juniper/contrail-ansible-deployer/wiki/Configuration-Sample-for-Multi-Node-Openstack-HA-and-Contrail-(multi-interface)
Tungsten Fabric can be well integrated with vCenter, as already described orchestrator integration section.
To try this feature, you can follow these instructions.
- https://github.com/Juniper/contrail-ansible-deployer/blob/master/README_vcenter.md
- Let me note that vRouterVM's OVF file is not publicly available currently
Since HA behavior of Tungsten Fabric is completely the same as the one from kubernetes and openstack installation, I don't describe the detail of them.
For orchestrator side HA, vCenter HA is available.
- I haven't yet tried this combination, but since vCenter HA will use the same IP for vCenter service, I think there is high possibility that vcenter-plugin can work with vCenter HA.
Multi-vCenter or cross-vCenter (when link-mode is used) will be a bit interesting subject. I will discuss them further in Appendix.
Container registry docker.io/opencontrailnightly has various tags.
Let me describe some thought about what tag to be chosen in new installation.
There are three tags which I will often use, latest, 5.1.0-latest, 5.0-latest. They are at the head of each Tungsten Fabric branch, master / R5.1 / R5.0 and various bug fix is included in each branch. So you can choose one of those tags, for your usecase. If you need new features in R5.1, such as optional analytics components, you could choose 5.1.0-latest tag.
Since latest is the truly development branch, I don't recommend them for general use, since in some case, this build is broken to add the new features.
Other release branches are much more stable, since in most cases, they are bugfix only, although some specific period after new branch is created, release branches also seem to receive new feature.
To specifiy tags, you can use this parameter, and when git clone is typed against ansible-deployer, and contrail-container-builder, the same branch also need to be specified
(ansible-deployer)
git clone -b R5.1 http://github.com/Juniper/contrail-ansible-deployer
contrail_configuration:
CONTRAIL_CONTAINER_TAG: 5.1.0-latest
(kubeadm)
git clone -b R5.1 https://github.com/Juniper/contrail-container-builder.git
common.env in contrail-container-builder repo
CONTRAIL_CONTAINER_TAG: 5.1.0-latest
One point to be careful about is, since containers used with openstack (such as nova-init, neutron-init, heat-init, ...) have version dependency with openstack release, so tags might need to be changed to 5.1.0-latest-queens, 5.1.0-latest-rocky etc.
They have some openstack modules with specific version installed, so if the tags are different, openstack containers won't work well.
Although Tungsten Fabric has decent monitoring / alarm features, it could be a requirement to integrate them to full-fledged monitoring systems.
Let me describe how to integrate them with promethesus and EFK, as an example.
To monitor and visualize what's going on in Tungsten Fabric systems, prometheus will be one possible choice.
- Several tools, such as zabbix, support scraping prometheus format, so this also could be useful as a common format among monitoring tools: https://www.zabbix.com/documentation/4.2/manual/config/items/itemtypes/prometheus
To scraped by prometheus, Tungsten Fabric's metrics need to be exported in prometheus exporter format, and there are two ways to achieve this.
- Directly export metrics from introspect HTTP Server (this feature is not availale today)
- Export values from analytics, or from analytics UVE (it could be available today)
As the first step, I tried a short script (WIP) to export values to prometheus.
- Currently, limited number of vRouters's metrics, like number of packets, number of bytes, number of flows, number of drop packets are exported
- https://github.com/tnaganawa/tf-analytics-exporter
Those values also can be used to send alerts from prometheus, rather than from analytics-alarms.
Since Tungsten Fabric have several system logs (file or docker stdout), it can easily be collected by fluentd.
It can be useful for administration purpose, if number of nodes are fairly large.
One more interesting subject is vRouter also supports flow exports, in a sense similar to ipfix or stateful firewall's allow / deny log .
- https://github.com/Juniper/contrail-controller/wiki/Flow-Sampling
- https://github.com/Juniper/contrail-specs/blob/master/security_logging_object.md
To enable this, you can set flow-export-rate > 0 (say 100), at Configure > Global Config > edit > Flow Export Rate in the Tungsten Fabric webui.
By default, it will be sent to analytics to be queried from Tungsten Fabric webui or commands like contrail-flows, contrail-sessions, but it also can be directly exported to local file to be sent to other log collectors, such as EFK, for later use.
To enable local flow logging, these parameters can be used.
(ansible-deployer)
contrail_configuration:
SLO_DESTINATION: file
SAMPLE_DESTINATION: file
(kubeadm)
env:
SLO_DESTINATION: file
SAMPLE_DESTINATION: file
If this parameters are set, log_file of vrouter-agent (such as /var/log/contrail/contrail-vrouter-agent.log) will have log output like this.
INFO - [SYS_INFO]: SessionData: [ vmi = default-domain:k8s-kube-system:coredns-7f865bd4f9-6gq52__8d7eb81a-6b16-11e9-b466-0e4d26f73e5e vn = default-domain:k8s-default:k8s-default-pod-network application = application=k8s ] security_policy_rule = 00000000-0000-0000-0000-000000000004 remote_vn = default-domain:default-project:ip-fabric is_client_session = 1 is_si = 0 vrouter_ip = 172.31.6.218 local_ip = 10.47.255.250 service_port = 443 protocol = 6 sampled_forward_bytes = 132 sampled_forward_pkts = 2 sampled_reverse_bytes = 1140 sampled_reverse_pkts = 2 ip = 10.96.0.1 port = 35420 forward_flow_info= [ sampled_bytes = 132 sampled_pkts = 2 flow_uuid = 130bbb8d-9be7-46bc-8b3a-938a5c2c36bb tcp_flags = 120 setup_time = 1556609440638565 action = pass sg_rule_uuid = 00000000-0000-0000-0000-000000000004 nw_ace_uuid = 00000000-0000-0000-0000-000000000004 underlay_source_port = 55670 ] reverse_flow_info= [ sampled_bytes = 1140 sampled_pkts = 2 flow_uuid = e09e2579-7764-472b-8a35-b300c5ec34e7 tcp_flags = 120 setup_time = 1556609440638565 action = pass sg_rule_uuid = 00000000-0000-0000-0000-000000000004 nw_ace_uuid = 00000000-0000-0000-0000-000000000004 underlay_source_port = 49942 ] vm = coredns-7f865bd4f9-6gq52__8d536915-6b16-11e9-9f48-0e4d26f73e5e other_vrouter_ip = 172.31.6.218 underlay_proto = 0 ]
With these parameters jsonized by fluentd, and queried by kibana through ES, it is much easier to see what kind of packets went through between vRouters, or physical switches in turn.
I tried sending vRouter's flow log to elasticsearch in a similar manner with ipfix module, and flow collector UI can show vRouter flow log.
- Detail is described in this url: https://github.com/tnaganawa/tungstenfabric-efk-flow-log
In my understanding, stats, logs, and topologies are three different components for software monitoring.
Promtheus and EFK part already covered stats and logs management, but since for topology there is no universal tool AFAIK, let me describe topology visualization feature in Tungsten Fabric webui.
There are a lot of topology between IT systems, but for Tungsten Fabric, those two, overlay topology and underlay topology, are most important.
- overlay:
- which VNF is connected to which VN through service-chain
- which physical device has which VN extended
- underlay:
- which VM is on which vRouter
- vRouter to leaf switch connection
- leaf switch to spine switch connection
For overlay visualization, Monitor > Networking > Networks have some detailed view about which VNF is connected to VN through service-chain.
- AFAIK, there is no way to see which physical device has some VN extended
For underlay visualization, Tungsten Fabric has a feature to collect lldp info through SNMP, and depict a view between leafs, spines, and vRouters and VMs.
- http://www.opencontrail.org/wp-content/uploads/2014/11/overlay_to_physical_blogpost_image2.png
- http://www.opencontrail.org/wp-content/uploads/2014/11/overlay_to_physical_blogpost_image3.png
- vRouter to leaf switch connection also seems to be visualised based on arp table in leaf switch
To enable this feature in ansible-deployer installation, this role need to be added.
roles:
analytics_snmp
Adding this, two containers, snmp-collector and topology are added to analytics nodes, and http://(analytics-ip):8081/analytics/uves/prouter/* will be filled by PRouterLinkEntry parameter, to describe interface_name detected by lldp.
- http://www.opencontrail.org/wp-content/uploads/2014/11/overlay_to_physical_blogpost_image1.png
- To enable this feature, Configure > Physical Device > (edit) > 'SNMP Enabled' needs to be checked
After installation of Tungsten Fabric is completed, users need to see the operational state such as route table and vif status, and configure various objects in Tungsten Fabric DB, such as virtual-network, logical-router, bgp-router, ...
Although Tungsten Fabric has integration with openstack neutron and kubernetes YAML to configure some parameters, there are many situations those DBs need to be directly edited by Tungsten Fabric API or Tungsten Fabric webui.
Let me describe various options to achieve this.
Since ist.py is already used a lot of times in this document, there won't be much more to comment about this.
It can dump similar information with router's operational commands, based on introspect API of various Tungsten Fabric components, including route table, bgp status, components status, ...
One thing to be added is on vRouter, there are several other commands which will show similar information, such as vif, flow, vxlan, nh, rt, ...
Since ist.py will pick the info from vrouter-agent, and those tool pick the info from netlink, those info (mostly) always synced.
- Still, realtime info such as vif --list --rate, flow -s will be a good addition when vRouter throughput is the key
When configuration update of Tungsten Fabric from CLI is needed, perhaps to use this tool will be one of the best approach.
It also can dump and traverse the contents of Tungsten Fabric DB in an intuitive way, just like Unix shell, and do ls, cat, edit and check refs and back_refs if needed.
Let me describe some commands I think it is useful.
Please type these commands to install this tool on Centos7.
yum -y install gcc python-devel
pip install contrail-api-cli
If some dependency error is shown, virtualenv might help.
yum -y install gcc python-devel
pip install virtualenv
virtualenv venv
source venv/bin/activate
pip install contrail-api-cli
After the installation, try these commands to test TungstenFabric access. (kubernetes installation is used in this example)
contrail-api-cli --host xx.xx.xx.xx ls ## xx.xx.xx.xx indicates config-api's ip
If you installed this tool, I firstly recommend to type this command.
contrail-api-cli --host xx.xx.xx.xx ls -l \*
Then it will just dump all the uuids inside Tungsten Fabric DB with their names!
Combining this with the cat command, a command which dumps all the configuration inside that DB can be written in a few lines, and it is highly useful to investigate what's configured.
for i in $(contrail-api-cli --host xx.xx.xx.xx ls \*)
do
echo $i
contrail-api-cli --host xx.xx.xx.xx cat $i
done
This command is similar to Unix cat. It dumps the json file inside Tungsten Fabric DB. To see what's configured in each element, this command can be used.
contrail-api-cli --host xx.xx.xx.xx ls -l virtual-network
contrail-api-cli --host xx.xx.xx.xx cat virtual-network/xxxx-xxxx-xxxx-xxxx
This command has two options and I think both options are useful.
This command basically dumps the refs and back_refs which one element has.
So, for example, If you want to see all the ports inside a virtual-network, this is the command you need.
(forward_refs)
contrail-api-cli --host xx.xx.xx.xx tree virtual-network/xxxx-xxxx-xxxx-xxxx
(back_refs)
contrail-api-cli -r --host xx.xx.xx.xx tree virtual-network/xxxx-xxxx-xxxx-xxxx
One additional option is -P, which dumps parent of one element. This option also would be useful in some situation.
contrail-api-cli -P --host xx.xx.xx.xx tree virtual-network/xxxx-xxxx-xxxx-xxxx
Basic idea of this command is firstly to GET a json file with a specific uuid, and save that in a temporary file, and edit this file, and PUT that with the same uuid to update the contents.
- Similar behavior with visudo, for example
Additionally, this command could be a bit more powerful, since it supports EDITOR enviroment variable.
By default, EDITOR is defined as 'vim', but since it can be any commands or scripts, such as python files, it can be a good base of Tungsten Fabric automaion based on its REST API.
- Since, unfortuately, no major automation tool, such as ansible, manageiq, terraform, currently supports Tungsten Fabric API directly, this might be the only way to configure Tungsten Fabric specific options, such as route-target for virtual-networks or packet-mode for ports.
- If neutron-plugin is installed, you can also use tools like ansible, manageiq, terraform through neutron API
Basic usage of this command will be like this, to update some elements specified by a uuid.
contrail-api-cli edit --host xx.xx.xx.xx cat virtual-network/xxxx-xxxx-xxxx-xxxx
EDITOR=/bin/vi contrail-api-cli edit --host xx.xx.xx.xx cat virtual-network/xxxx-xxxx-xxxx-xxxx
If automation is an intended use case, commands similar to this could be used.
EDITOR=(path-of-a-script) contrail-api-cli edit --host xx.xx.xx.xx cat virtual-network/xxxx-xxxx-xxxx-xxxx
(venv) [root@ip-172-31-11-240 ~]# EDITOR=/tmp/configure-vn.py contrail-api-cli --host 172.31.11.240 edit virtual-network/035a1e3d-966b-45fd-941c-b845fd48d0c5
-> json in Tungsten Fabric DB is updated
(venv) [root@ip-172-31-11-240 ~]# cat /tmp/configure-vn.py
#!/usr/bin/python
import sys
import json
filename=sys.argv[1]
with open (filename) as f:
js=json.load(f)
##print (js)
js["flood_unknown_unicast"]=True ### edit json data here
with open (filename, 'w') as f:
json.dump(js, f)
(venv) [root@ip-172-31-11-240 ~]#
So with this command, you can edit Tungsten Fabric json programmatically, without deep knowledge on Tungsten Fabric API. Since many objects are created by such as neutron API, it might be the way to use them firstly, and update them with Tungsten Fabric specific parameters (route-target is one example) by this tool.
Internally, contrail-api-cli uses REST API served by config-api, which does various CRUD operation based on HTTP method and URL specified.
- When keystone authentication is enabled, each command needs -H 'x-auth-token: token-id', in which token-id is a keystone token, and it can be created by such as 'openstack token issue' command
When this command can be used, it returns similar result with 'contrail-api-cli ls'.
curl config-api-ip:8082
After object name such as virtual-network is identified, this command will dump all the uuids in that object.
- detail=True optionally dump all the json definitions
curl config-api-ip:8082/virtual-networks
curl config-api-ip:8082/virtual-networks\?detail=True
When uuid is identified, json for that uuid would be retrieved by this command. (similar to contrail-api-cli cat)
curl config-api-ip:8082/virtual-network/xxxx-xxxx-xxxx-xxxx
To modify json definition, PUT command can be used. (contrail-api-cli edit also uses this)
curl -X PUT -H "Content-Type: application/json" -d@jsonfile config-api-ip:8082/virtual-network/xxxx-xxxx-xxxx-xxxx
POST and DELETE can be used to create and delete objects.
curl -X POST -H "Content-Type: application/json" -d@jsonfile config-api-ip:8082/virtual-networks
curl -X DELETE config-api-ip:8082/virtual-network/xxxx-xxxx-xxxx-xxxx
So if object name is known, it is also possible to get and modify config database through REST API.
Unfortunately, not so many docs are available for each internal object, so to see which object has what behavior, perhaps to check these source files would be the easiest way ..
- https://github.com/Juniper/contrail-api-client/blob/master/schema/vnc_cfg.xsd
- https://github.com/Juniper/contrail-controller/tree/master/src/config/api-server/vnc_cfg_api_server/resources
For illustration purpose, some objects which are frequently touched will be described.
(same meaning with the one in webui)
virtual-network
logical-router
network-policy
service-instance
physical-router
service-appliance: definition for load-balancer (historically, some PNF implementation also uses this object)
load-balancer
global-system-config
global-vrouter-config
(slightly different meaning)
virtual-router: same as Configure > Infrastructure > Nodes > vRouter nodes
bgp-router: same as Configure > Infrastructure > BGP router. It seems also used as control's definition for ifmap objects, so if some of control is removed from this objects, it shows 'No configuration for self' error, and stop working.
instance-ip: ip for each ports (that is separate object internally)
virtual-machine-interface: same as Configure > Networking > Ports
project: same as Configure > Infrastructure > Projects. When openstack is used, plugin will copy all the projects in keystone to this. When kubernetes is used, kube-manager will create this object based on kubenetes namespaces.
domain: similar to project. When openstack is used, it will be copied from keystone. When kubernetes is used, default-domain will be used.
(not seen in webui)
virtual-machine: internal object for virtual-machine. When vRouter API is called by CNI or nova-vif-driver, this object will have a link to virtual-router, and ifmap object for this virtual-machine will be downloaded from control process
access-contol-list: vRouter's ACL entry, which will be calculated by schema-transformer from security-group definition
I recently start working on Tungsten Fabric ansible module.
Although it is not exhaustive (yet), it can create /update / delete several tenant objects, such as virtual-network, logical-router, network-policy, service-instance, and could update global-system-config and global-vrouter-config.
- Since it internally directly uses Rest API, it can also configure some features which cannot be configured from webui, such as vxlan-routing logical-router, and potentialy application-policy-set per virtual-machine or nh-reachability.
I'm adding features and test targets now, so hopefully it can soon cover most of frequently used knobs.
Usage example:
Create virtual-network:
# ansible -m tungstenfabric.networking.virtual_network localhost -a 'name=vn1 controller_ip=x.x.x.x state=present'
Update virtual-network:
# ansible -m tungstenfabric.networking.virtual_network localhost -a 'name=vn1 controller_ip=x.x.x.x state=present subnet=10.0.1.0 subnet_prefix=24 rpf=disable'
Delete virtual-network:
# ansible -m tungstenfabric.networking.virtual_network localhost -a 'name=vn1 controller_ip=x.x.x.x state=absent'
Although there are several good CLI tools available today, historically, most of all the operation is done through Tungsten Fabric webui.
It is served at https://(controller-ip):8143, and admin / contrail123 will be the default user / pass.
- user / pass can be changed by webui config parameter: https://github.com/Juniper/contrail-container-builder/blob/master/containers/controller/webui/base/entrypoint.sh#L248
At the top left, there are four icons, which indicates 'Monitor', 'Configure', 'Inspect', 'Query'.
Each module has those features.
- Monitor: This module primarily shows the status of each components, based on information from introspects, analytics UVEs, and configuration DB in some cases. (There might be some feature which won't work well if analyticsdb is not installed)
- Configure: Most of the configuration tasks will be done in this module.
- Inspect: This module has three tabs: list-of-uuid, introspect, config editor. Introspect shows the same information with ist.py. List-of-uuid, config editor will show the similar information with contrail-api-cli ls and contrail-api-cli cat / edit.
- Query: This module will query analyticsdb's contents. It shows the same information with the commands such as contrail-logs, contrail-flows, contrail-sessions, ... (https://github.com/Juniper/contrail-controller/wiki/Contrail-utility-scripts-for-getting-logs,-stats-and-flows-from-analytics) If analyticsdb is not installed, this module will be greyed out.
Although this webui is highly useful to grasp the current state of Tungsten Fabric, its response could be a bit slow if number of nodes are large (such as over 2,000). In that case, CLI based approach will be a bit more relevant.
Although config-api supports complete Rest API to cover Tungsten Fabric configuration, for some high level operation such as to create a logical-router or a service-instance, one Rest API is not sufficient, and several configuration needs to be created or updated simultaneously.
- creating logical-router, one virtual-machine-interface also needs to be created and attached to the virtual-network connected, and service-instance creation also includes port-tuple creation, and it needs to be attached to that service-instance
To cover those scenario, webui implements its own HTTP API, which handles some high level operation in its nodejs logic.
For CRUD operatoin, those url can be used. (json payload can be extracted with Developer Tools of each web browser)
- For openstack authentication, keystone token (-H 'x-auth-token: xxxx-xxxx-xxxx') can be used.
- GET
list objects:
curl -k -X POST -H 'x-auth-token: xxxxx' -H "content-type: application/json" -d '{"data":[{"type":"virtual-networks"}]}' https://(webui-ip):8143/api/tenants/config/get-config-list
get the detail of one specific object:
curl -k -X POST -H 'x-auth-token: xxxxx' -H 'content-type: application/json' -d '{"data": [{"type": "virtual-network", "uuid": ["xxxx-xxxx-xxxx-xxxx"]}]}' https://(webui-ip):8143/api/tenants/config/get-config-objects
2-1. CREATE
$ curl -k -X POST -H 'x-auth-token: xxxxx' -H 'content-type: application/json' -d @vn1.json https://(webui-ip):8143/api/tenants/config/create-config-object
$ cat vn1.json
{"virtual-network": {"network_ipam_refs": [{"to": ["default-domain", "default-project", "default-network-ipam"], "attr": {"ipam_subnets": [{"subnet": {"ip_prefix": "10.0.1.0", "ip_prefix_len": 24}}]}}], "fq_name": ["default-domain", "admin", "vn1"], "parent_type": "project"}}
2-2. UPDATE
$ curl -k -X POST -H 'x-auth-token: xxxxx' -H 'content-type: application/json' -d @vn1-updated.json https://(webui-ip):8143/api/tenants/config/update-config-object
$ cat vn1-updated.json
{"virtual-network": {"network_ipam_refs": [{"to": ["default-domain", "default-project", "default-network-ipam"], "attr": {"ipam_subnets": [{"subnet": {"ip_prefix": "10.0.1.0", "ip_prefix_len": 24}}]}}], "fq_name": ["default-domain", "admin", "vn1"], "parent_type": "project", "uuid": "xxxx-xxxx-xxxx-xxxx"}}
- DELETE
curl -k -X POST -H 'x-auth-token: xxxxx' -H 'content-type: application/json' -d '[{"type": "virtual-network", "deleteIDs": ["xxxx-xxxx-xxxx-xxxx"]}]' https://(webui-ip):8143/api/tenants/config/delete
Note:
In this API, uuid is sometimes needed instead of fq_name. To convert fq_name to uuid (or vice versa), this url can be used.
fqname to uuid:
curl -X POST -H 'content-type: application/json' -d '{"type": "virtual_network", "fq_name": ["default-domain", "admin", "vn1"]}' http://config-api-ip:8082/fqname-to-id
uuid to fqname:
curl -X POST -H 'content-type: application/json' -d '{"uuid": "xxxx-xxxx-xxxx-xxxx"}' http://config-api-ip:8082/id-to-fqname
Backup / restore is the vital feature for important data, like SDN configuration.
Tungsten Fabric supports backup and restore through db_json_exim.py script. The procedure is described there.
Note: This repo also might be useful (restore is tested)
After R5.0 and later, Tungsten Fabric components are distributed through docker containers. Since those containers have various environment variables to change the behavior, it is sometimes necessary to update containers' environment varialbles after the installation. Let me describe how to change them.
Container parameters are used mostly in /entrypoint.sh to create conf file, which change the behavior of each microservices. To see the container environments and related parameters, it is most straightforward to see this repo.
This repo contains Dockerfiles and entrypoint.sh of various containers, so looking through this, you can check how to modify the parameter you need.
As an example, if you want to change the gateway parameter of vrouter-agent, you can check this file, and VROUTER_GATEWAY is directly used to replace that parameter.
[VIRTUAL-HOST-INTERFACE]
name=vhost0
ip=$vrouter_cidr
physical_interface=$phys_int
gateway=$VROUTER_GATEWAY ### this is the container environment variable which needs to be changed
compute_node_address=$vrouter_ip
So if you know the parameter of microservice you need, you can check the corresponding container environment variable.
Note that in some cases, there is no container environment variable which directly modify the microservices parameters.
In that case, you can use add_ini_params_from_env function, which is at the last part of each entrypoint.sh.
add_ini_params_from_env VROUTER_AGENT /etc/contrail/contrail-vrouter-agent.conf
In that case, if you give this environment variable,
VROUTER_AGENT__FLOWS__thread_count=8
it can be translated to [FLOWS], thread_count=8, so with that method, you can directly modify microservices' conf file, even if no handy parameters are supplied to modify this.
If ansible-deployer is used, it uses docker-compose to create docker containers, and environment variables are defined in /etc/contrail/common_xxx.env. (xxx is the rolename)
So if you want to update such as vrouter parameters, you can edit /etc/contrail/common_vrouter.env, and type these commands.
docker-compose -f /etc/contrail/vrouter/docker-compose.yaml down
docker-compose -f /etc/contrail/vrouter/docker-compose.yaml up -d
Then vrouter containers are recreated and the new parameters are applied.
If kubeadm and kubernetes yaml is used to install Tungsten Fabric containers, each container uses configmap named 'env' as the source of environment variables. So you can type this command to edit environment variables, and can delete some Tungsten Fabric pods to recreate the containers. (Since containers are defined as DaemonMap, it will be recreated automatically)
kubectl edit configmap -n kube-system env
Since cassandra doesn't support transaction, and both of zookeeper and casssandra has some data, when config-api is used, it is unavoidabale to maintain db consistency in application level.
Let me summarize some failure scenario and management script to fix this.
There are some situations which needs data maintenance.
- When two objects which have refs and back_refs, are created, but config-api crashed before back_refs are added.
- Zookeeper maintains list of uuids and indexallocator for some objects, such as ip allocation in each subnet. So when config-api crashed after updating zookeeper, and before updating cassandra, it will cause discrepancy between two dbs.
To fix this situation, config-api has a data maintenance script, db_manage.py.
There are mainly three commands, which is expected to be typed periodically.
docker exec -it config_api_1 /bin/bash
cd /usr/lib/python2.7/site-packages/vnc_cfg_api_server
python db_manage.py check
python db_manage.py --execute clean
python db_manage.py --execute heal
Internally, cassandra also has some maintence job to keep consistency (it will be automatically executed by nodemgr) To check cassandra's status and data consistency manually, those command can be used. (port-number is 7200 for analytics-database, 7201 for config-database)
# docker exec -it config_database_cassandra_1 bash
# nodetool -p port-number info
# nodetool -p port-number status
# nodetool -p port-number repair --full
Other useful command is described there.
Although it might add some more complexity to this discussion, some of objects which require lock (such as IP address allocation from each subnet), uses IndexAllocator, which is internally a bitarray (in-memory array) in each config-api.
When IndexAllocator is created, it uses ls /somepath in zookeeper and fill the memory, and increment that and create znode when it receves another similar request.
One possible situation is that, for some reason, zookeeper needs to be modified manually.
- such as to free IP address, which is not deleted successfully
- https://bugs.launchpad.net/opencontrail/+bug/1365824
In this case, since config-api has in-memory db, it might be required to restart all config-api process, to sync that with manually modified znode.
This behavior might be changed in future release.
When using vRouter, there could be some situations routing won't work well as expected.
I have gathered most common issues and steps to investigate vRouter's routing behavior.
To investigate this, there are two ways, one is to see config detail related, and the other is to see operational state of control and vRouter.
Since for the former, contrail-api-cli is most useful, and the latter, ist.py is (especailly for remote-debug), let me describe some info in this format.
Note: if these tools are not available, you can use curl for that purpose.
For example, when
source /etc/kolla/kolla_toolbox/admin-openrc.sh
contrail-api-cli --host x.x.x.x ls -l virtual-network
contrail-api-cli --host x.x.x.x cat virtual-network/xxxx-xxxx-xxxx-xxxx
is needed, those command collect the same info.
source /etc/kolla/kolla_toolbox/admin-openrc.sh
openstack token issue
curl -H 'x-auth-token: tokenid' x.x.x.x:8082/virtual-networks
curl -H 'x-auth-token: tokenid' x.x.x.x:8082/virtual-network/xxxx-xxxx-xxxx-xxxx
In a similar manner, ist.py also can be replaced by various curl command. Let me describe curl command for most common case. (cli is more memorable though)
ist.py ctr route show
curl control-ip:8083/Snh_ShowRouteReq
ist.py ctr route nei
curl control-ip:8083/Snh_BgpNeighborReq
ist.py vr intf
curl vrouter-ip:8085/Snh_ItfReq
ist.py vr vrf
curl vrouter-ip:8085/Snh_VrfListReq
ist.py vr route -v vrf-id
curl vrouter-ip:8085/Snh_Inet4UcRouteReq?vrf_index=vrf-id
Additionally, ifmap information (it is comparable to device configuration for vRouter, such as interface, vrf, virtual-machine, ...) is also useful to see what is configured.
It can be seen by those commands.
ist.py ctr ifmap table
curl control-ip:8083/Snh_IFMapNodeTableListShowReq
ist.py ctr ifmap table virtual-network
curl control-ip:8083/Snh_IFMapTableShowReq?table_name=virtual-network
ist.py ctr ifmap client
curl control-ip:8083/Snh_IFMapPerClientLinksShowReq
ist.py ctr ifmap node
curl control-ip:8083/Snh_IFMapLinkTableShowReq
ist.py ctr ifmap link
curl control-ip:8083/Snh_IFMapNodeShowReq
ist.py vr ifmap
curl vrouter-ip:8085/Snh_ShowIFMapAgentReq
Note: When ist.py is used, each target have two common options, uve and trace. Those also might be used for detailed status examination.
ist.py vr uve
curl vrouter-ip:8085/Snh_SandeshUVETypesReq
ist.py vr uve VrouterStatsAgent
curl vrouter-ip:8085/Snh_SandeshUVECacheReq?x=VrouterStatsAgent
ist.py ctr trace
curl control-ip:8083/Snh_SandeshTraceBufferListRequest
ist.py ctr trace BgpTraceBuf
curl control-ip:8083/Snh_SandeshTraceRequest?x=BgpTraceBuf
ist.py vr trace
curl vrouter-ip:8085/Snh_SandeshTraceBufferListRequest
ist.py vr trace Flow
curl vrouter-ip:8085/Snh_SandeshTraceRequest?x=Flow
UVE (User Visible Entity) is a metrics, used by each components of Tungsten Fabric, which mostly seen from analytics/uves API. It can also be seen from introspect of each component directly.
Trace is a trace log of each component, which is stored in the memory of each process. With this option, this trace memory can be dumped.
Logs from various components, mostly docker logs of each container or logs under /var/log/contrail, also will be useful. To change log level higher to see more detail, LOG_LEVEL=SYS_DEBUG parameter can be used.
To investigate this, firstly, it needs to be seen that is control plane issue or data plane issue. For control plane issue, those commands will be most useful.
# ist.py ctr route show
# ist.py vr intf
# ist.py vr vrf
# ist.py vr route -v (vrf id)
If routing seems ok, you can firstly see if packet is arrived at the destination vrouter by tcpdump.
# tcpdump -i any -nn udp port 6635 or udp port 4789 or proto gre or icmp # for physical NIC
# tcpdump -i any -nn icmp # for tap device
When packet reached the destination vRouter, check
# flow -l
# flow -l --show-evicted ### this can be used to see flow for drop packets
# ist.py vr trace Flow ### to check Flow log
to see if it is dropped by flow action.
- For example, action: D(Policy), D(SG) indicates it is dropped by network-policy or security-group To investigate the flow action further, those command would help.
# ist.py vr intf -f text
# ist.py vr acl
Note: To see the reason of packet drop, dropstats command could have some more info.
# watch -n 1 'dropstats | grep -v -w 0'
# watch -n 1 'vif --get 0 --get-drop-stats'
# watch -n 1 'vif --get n --get-drop-stats' (n is vif id)
# ping -i 0.2 overlay-ip # this can be used to see specific dropstats counter is incrementing because of that packets
check
# flow -l
to see flow action of this packet. If action was D(SG), it is dropped by security-group, so it need to be changed to permit external access (default for openstack ingress rule is to allow VM-to-VM access only)
Since those are set up by svc-monitor, you can firstly check
# tail -f /var/log/contrail/contrail-svc-monitor.log
to see some error is seen.
- One example is 'No vRouter is availale' is logged, so they can't start those service. That is caused by NodeStatus from vRouter to analytics-api is 'Non-Functional' for some reason, so it needs to be investigated from vRouter side.
If svc-monitor is working well, you need to investigate behavior of load-balancer object.
When service is used, it adds ecmp route to reach application, so those commands can be used to investigate control plane (same procedure to see VM-to-VM routing)
# ist.py ctr route show
# ist.py vr route -v (vrf-id)
When ingress or SNAT is used, it will start haproxy process inside linux namespace in vRouter container. To investigate the detail, you can try those commands, to see those name
# docker exec -it vrouter-agent bash
# ip netns
# ip netns exec vrouter-xxx ip -o a
# ip netns exec vrouter-xxx ip route
# ip netns exec vrouter-xxx iptables -L -n -v -t nat
# tail -f /var/log/messages # haproxy log is logged
Since ingress service and SNAT also use vRouter routing, those commands also are helpful to see prefixes to those service are exported to vRouter's routing table.
# ist.py ctr route show
# ist.py vr vrf
# ist.py vr route -v (vrf-id)
Since service-chain use change vRouter routing table, firstly those commands can be used, to see if routing-instances are successfully created, and ServiceChain route is correctly imported
# ist.py ctr route summary
# ist.py ctr route show
# ist.py ctr route show -p ServiceChain
# ist.py ctr sc
If control plane works well, you need to investigate data plane behaivor in the same manner with VM-to-VM traffic (security-group also can block service-chain traffic, so please also check that to investigate service-chain from external traffic)
# tcpdump -i any -nn udp port 6635 or udp port 4789 or proto gre or icmp
# ist.py vr intf
# ist.py vr vrf
# ist.py vr route -v (vrf-id)
# flow -l
That feature is implemented by ACL on vRouter, so to investigate this feature, this command is useful.
# ist.py vr intf -f text
If icmp works well and tcp / udp won't work well, please also check port lists are specified.
In kubernetes deployment, cni sometimes returns this error and won't assign IPs to pod. (This is seen in various place such kubectl describe pod)
networkPlugin cni failed to set up pod "coredns-5644d7b6d9-p8fkk_kube-system" network: Failed in Poll VM-CFG. Error : Failed in PollVM. Error : Failed HTTP Get operation. Return code 404
This message is a generic error, and caused by several reasons ..
Internally, when pod is created, cni tries to receive its IP from vrouter-agent, which in turn receive that from control process by XMPP.
- that is based on virtual-machine-interface info, which is created by kube-manager from kube-apiserver info.
So to fix this issue, serveral steps need to be done.
- contrail-status on controller node
- config-api, control needs to be in ‘active’ state
- contrail-status on contrail-kube-manager node is in 'active' state
- this process will retrieve the info from kube-apiserver and create pod / load balancer etc on config-api
- contrail-status on vrouter node
- vrouter-agent needs to be in ‘active’ state
- if standalone kubernetes yaml is used, it has known limitation about race condition between vrouter registration and vrouter-agent restart. Restarting control might resolve this issue.
# docker restart control_control_1
- if everything is fine,
- /var/log/contrail/contrail-kube-manager.log
- /var/log/contrail/api-zk.log
- /var/log/contrail/contrail-vrouter-agent.log
- /var/log/contrail/cni/opencontrail.log <- cni log
needs to be investigated further ..
- root cause might be xmpp issue, underlay issue, /etc/hosts issue and so on
When disk size is, say, 50GB, it might be full after one week or so from installtion. When this occured, analytics data need to be removed and analytics database need to be restarted.
[check analytics db size]
du -smx /var/lib/docker/volumes/analytics_database_analytics_cassandra/_data/ContrailAnalyticsCql
[if it is large, remove by this]
rm -rf /var/lib/docker/volumes/analytics_database_analytics_cassandra/_data/ContrailAnalyticsCql
docker-compose -f /etc/contrail/analytics_database/docker-compose.yaml down
docker-compose -f /etc/contrail/analytics_database/docker-compose.yaml up -d
To avoid this issue in future, this knob can be used.
echo 'ANALYTICS_STATISTICS_TTL=2' >> /etc/contrail/common_analytics.env
docker-compose -f /etc/contrail/analytics/docker-compose.yaml down
docker-compose -f /etc/contrail/analytics/docker-compose.yaml up -d
If this error is seen in contrail-status, it states analytics cassandra is not working well.
== Contrail database ==
nodemgr: initializing (Cassandra state detected DOWN.)
If JVM_EXTRA_OPTS: "-Xms128m -Xmx1g" is set, most likely the cause is java's OutOfMemory error, so it can be updated to something like
JVM_EXTRA_OPTS: "-Xms128m -Xmx2g"
in /etc/contrail/common.env, and analytics database can be restarted.
docker-compose -f /etc/contrail/analytics_database/docker-compose.yaml down
docker-compose -f /etc/contrail/analytics_database/docker-compose.yaml up -d
If this error is seen in cassandra log (/var/log/cassandra/system.log in cassandra containers), some of cassandra KEYSPACE is not working well (most likely config_db_uuid keyspace, which contain all the uuid of config objects), so in turn, config-api also might not work well.
From cassandra's perspective, it has some configuration to set maximum number of tombstone in specific keyspace (tombstone_failure_threshold, by default 100000), and if that is above this threshold, it will abort that query.
- tombstone is a flag to indicate that data is deleted from that keyspace. It will remain to be there after data deletion during gc_grace_seconds (by default, 10 days). So practically, if 100000 objects is deleted in 10 days, config-database stop responding ..
If it is seen,
# docker exec -it config_database_cassandra_1 bash
# cqlsh config-db-ip 9041
cql> ALTER TABLE config_db_uuid.obj_uuid_table WITH gc_grace_seconds = 0;
cql> ALTER TABLE config_db_uuid.obj_uuid_table WITH gc_grace_seconds = 864000;
will forcefully delete all tombstones.
- One drawback is that if this is typed at the same time with data deletion, that delete operation will not be propagated to other cassandra nodes successfully ..
In some cases, 'No configuration for self' message is seen in contrail-status of vRouter or control process.
It indicates those processes can't find node definition for them in config-database, so they cannot download ifmap info for them.
To check node definition, this command can be used.
# contrail-api-cli ls -l virtual-router ## for vRouter
# contrail-api-cli ls -l bgp-router ## for control
Most likely cause is name of each object is a bit different from hostname, such as hostname: node1, but objects: node1.local.
For vRouter, to set this name manually, VROUTER_HOSTNAME parameter can be used, so setting this value to the one in config-database would resolve this issue.
Cluster-wide update is an important subject, to keep SLA of the production cluster, with the newest features still avaiable in that cluster.
Since Tungsten Fabric uses similar protocol with MPLS-VPN, even if the module version of control and vRouter is different, basic interoperability is available, as far as I tried.
So the general idea is, firstly update controllers one by one, and update vRouters, one by one, with vMotion or maintence mode if needed.
Let me describe this procedure firstly.
Additionally, Tungsten Fabric controller supports curious feature named ISSU, although I think this name is a bit confusing, since Tungsten Fabric controller is much more similar to route reflector, rather rhan routing-engine.
So basic idea is, firstly replicate all the configs to the newly created controllers (or route reflectors), and after that, update vRouter settings (and vRouter modules if servers can be rebooted) to use those new controllers. With this procedure, rollback operation of vRouter module update also will be much easier.
Let me descirbe this procedure later in this chapter.
Since ansible-deployer follows idempotent behavior, update is not much different from install. These command will update all the modules.
cd contrail-ansible-deployer
git pull
vi config/instances.yaml
(update CONTRAIL_CONTAINER_TAG)
ansible-playbook -e orchestrator=xxx -i inventory/ playbooks/install_contrail.yml
One caveat is, since this command restarts all the nodes mostly simultaneously, it is not easy to restart controllers and vRouters one-by-one. Additionally, removing other nodes from instances.yaml won't work, since one node's update require some parameters of other nodes.
- For example, vRouter update needs controls' ip, that is deduced from control role nodes in instances.yaml
To overcome this, from R2005, ziu.yaml is added for this purpose, at least for control plane, to update them one-by-one.
cd contrail-ansible-deployer
git pull
vi config/instances.yaml
(update CONTRAIL_CONTAINER_TAG)
ansible-playbook -e orchestrator=xxx -i inventory/ playbooks/ziu.yml
ansible-playbook -e orchestrator=xxx -i inventory/ playbooks/install_contrail.yml
As far as I tried, it will do serial update and restart of control processes, so no packet drop is seen, when control plane update is performed.
- During install_contrail.yaml, control process restart is skipped, since they are already updated
- some packet drop is still seen when vrouter-agent restart is performed, so if it is possible, workload migration is recommended
ISSU can be used even if container formats are largely different, like 4.x to 5.x case, since it creates a new cluster of controllers, and copy the data inside.
Firstly, I'll describe the simplest case, 1 old controller and 1 new controller to see the overall procedure. All the commands are typed at the new controller.
old-controller:
ip: 172.31.2.209
hostname: ip-172-31-2-209
new-controller:
ip: 172.31.1.154
hostname: ip-172-31-1-154
(both controllers are installed with this instances.yaml)
provider_config:
bms:
ssh_user: root
ssh_public_key: /root/.ssh/id_rsa.pub
ssh_private_key: /root/.ssh/id_rsa
domainsuffix: local
ntpserver: 0.centos.pool.ntp.org
instances:
bms1:
provider: bms
roles:
config_database:
config:
control:
analytics:
analytics_database:
webui:
ip: x.x.x.x ## controller's ip
contrail_configuration:
CONTRAIL_CONTAINER_TAG: r5.1
KUBERNETES_CLUSTER_PROJECT: {}
JVM_EXTRA_OPTS: "-Xms128m -Xmx1g"
global_configuration:
CONTAINER_REGISTRY: tungstenfabric
[commands]
1. stop batch jobs
docker stop config_devicemgr_1
docker stop config_schema_1
docker stop config_svcmonitor_1
2. register new control in cassandra and set up bgp between them
docker exec -it config_api_1 bash
python /opt/contrail/utils/provision_control.py --host_name ip-172-31-1-154.local --host_ip 172.31.1.154 --api_server_ip 172.31.2.209 --api_server_port 8082 --oper add --router_asn 64512 --ibgp_auto_mesh
3. sync the data between controllers
vi contrail-issu.conf
(write down this)
[DEFAULTS]
old_rabbit_address_list = 172.31.2.209
old_rabbit_port = 5673
new_rabbit_address_list = 172.31.1.154
new_rabbit_port = 5673
old_cassandra_address_list = 172.31.2.209:9161
old_zookeeper_address_list = 172.31.2.209:2181
new_cassandra_address_list = 172.31.1.154:9161
new_zookeeper_address_list = 172.31.1.154:2181
new_api_info={"172.31.1.154": [("root"), ("password")]} ## ssh public-key can be used
image_id=`docker images | awk '/config-api/{print $3}' | head -1`
docker run --rm -it --network host -v $(pwd)/contrail-issu.conf:/etc/contrail/contrail-issu.conf --entrypoint /bin/bash -v /root/.ssh:/root/.ssh $image_id -c "/usr/bin/contrail-issu-pre-sync -c /etc/contrail/contrail-issu.conf"
4. start the process to do real-time data sync
docker run --rm --detach -it --network host -v $(pwd)/contrail-issu.conf:/etc/contrail/contrail-issu.conf --entrypoint /bin/bash -v /root/.ssh:/root/.ssh --name issu-run-sync $image_id -c "/usr/bin/contrail-issu-run-sync -c /etc/contrail/contrail-issu.conf"
(check the log if needed)
docker exec -t issu-run-sync tail -f /var/log/contrail/issu_contrail_run_sync.log
5. (update vrouters)
6. stop the job and sync all the data when finished
docker rm -f issu-run-sync
image_id=`docker images | awk '/config-api/{print $3}' | head -1`
docker run --rm -it --network host -v $(pwd)/contrail-issu.conf:/etc/contrail/contrail-issu.conf --entrypoint /bin/bash -v /root/.ssh:/root/.ssh --name issu-run-sync $image_id -c "/usr/bin/contrail-issu-post-sync -c /etc/contrail/contrail-issu.conf"
docker run --rm -it --network host -v $(pwd)/contrail-issu.conf:/etc/contrail/contrail-issu.conf --entrypoint /bin/bash -v /root/.ssh:/root/.ssh --name issu-run-sync $image_id -c "/usr/bin/contrail-issu-zk-sync -c /etc/contrail/contrail-issu.conf"
7. remove old nodes from cassandra and add new nodes
vi issu.conf
(write down this)
[DEFAULTS]
db_host_info={"172.31.1.154": "ip-172-31-1-154.local"}
config_host_info={"172.31.1.154": "ip-172-31-1-154.local"}
analytics_host_info={"172.31.1.154": "ip-172-31-1-154.local"}
control_host_info={"172.31.1.154": "ip-172-31-1-154.local"}
api_server_ip=172.31.1.154
docker cp issu.conf config_api_1:issu.conf
docker exec -it config_api_1 python /opt/contrail/utils/provision_issu.py -c issu.conf
8. start batch jobs
docker start config_devicemgr_1
docker start config_schema_1
docker start config_svcmonitor_1
These will be possible checkpoints.
- After 3, you can try contrail-api-cli ls -l \* to see all the data are copied successfully, and ist.py ctr nei to see ibgp is up between controllers.
- After 4, old db can be modified, to see the changes are successfully propagated to new db.
After this, I will cover more realistic case with orchestrators and two vRouters.
To illustrate the case combined with orchestrators, I tried two vRouters and kubernetes setup with ansible-deployer.
Even when combined with orchestrators, overall procedure won't be much different.
One thing which need to be careful about is, when kube-manager needs to be changed to the new one.
Since kube-manager dynamically subscribe events from kube-apiserver and update Tungsten Fabric config-database, in a sense, it is similar to batch jobs, such as schema-transformer, svc-monitor and device-manager. So I stopped and started old or new kube-manager (and actually webui also) at the same time with such batch jobs, but it might need to be changed based on each setup.
So overall procedure in this case will be the following.
1. setup one controller (with one kube-manager and kubernetes-master) and two vRouters
2. setup one new controller (with one kube-manager, but kubernetes-master is the same one with the old controller)
3. stop batch jobs and kube-manager, webui for new controller
4. start issu procedure and continue that until starting run-sync
-> iBGP will be established between controllers
5. update vRouters one-by-one based on new controller's ansible-deployer
-> When one vRouter is moved to the new one, new controller also will got the route-target of k8s-default-pod-network,
and ping between containers remain working well (ist.py ctr route summary and the ping result will be attached later)
6. When all the vRouters are moved to the new controller, stop batch jobs and kube-manager, webui on the old controller
After that, continue issu procedure, and start batch jobs and kube-manager and webui, on the new controller
-> From the beginning and till the end of this phase, you can't change config-database manually, so some maintenance time might be needed
(it could last up to 5-15 min, ping is ok, but new container creation won't work well, until new kube-manager is started)
7. finally, stop control, config and config-database on the old node
When updating vRouters, I used provider: bms-maint for controller, k8s_master, and vRouters which are already changed to new one, to avoid disturbing based on container restart. I'll attach original instances.yaml and instances.yaml to update vRouters, for further detail.
I'll also attach the result of ist.py ctr nei and ist.py ctr route summary on each phase, to illustrate the detail of what's going on.
- Let me note that I actually don't update the module in this example, since this setup is primarily to highlight the ISSU procedure (since ansible-deployer re-create vrouter-agent containers even when module version is the same, number of packet loss won't be much different even if the actual module update is done)
old-controller: 172.31.19.25
new-controller: 172.31.13.9
two-vRouters: 172.31.25.102, 172.31.33.175
Before stating issu:
[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py --host 172.31.19.25 ctr nei
Introspect Host: 172.31.19.25
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| peer | peer_address | peer_asn | encoding | peer_type | state | send_state | flap_count | flap_time |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| ip-172-31-25-102.local | 172.31.25.102 | 0 | XMPP | internal | Established | in sync | 0 | n/a |
| ip-172-31-33-175.local | 172.31.33.175 | 0 | XMPP | internal | Established | in sync | 0 | n/a |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
[root@ip-172-31-13-9 ~]#
[root@ip-172-31-13-9 ~]#
[root@ip-172-31-13-9 ~]#
[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py --host 172.31.13.9 ctr nei
Introspect Host: 172.31.13.9
[root@ip-172-31-13-9 ~]#
-> iBGP is not configured yet
[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py --host 172.31.19.25 ctr route summary
Introspect Host: 172.31.19.25
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| name | prefixes | paths | primary_paths | secondary_paths | infeasible_paths |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| default-domain:default- | 0 | 0 | 0 | 0 | 0 |
| project:__link_local__:__link_local__.inet.0 | | | | | |
| default-domain:default-project:dci- | 0 | 0 | 0 | 0 | 0 |
| network:__default__.inet.0 | | | | | |
| default-domain:default-project:dci-network:dci- | 0 | 0 | 0 | 0 | 0 |
| network.inet.0 | | | | | |
| default-domain:default-project:default-virtual- | 0 | 0 | 0 | 0 | 0 |
| network:default-virtual-network.inet.0 | | | | | |
| inet.0 | 0 | 0 | 0 | 0 | 0 |
| default-domain:default-project:ip-fabric:ip- | 7 | 7 | 2 | 5 | 0 |
| fabric.inet.0 | | | | | |
| default-domain:k8s-default:k8s-default-pod-network | 7 | 7 | 4 | 3 | 0 |
| :k8s-default-pod-network.inet.0 | | | | | |
| default-domain:k8s-default:k8s-default-service- | 7 | 7 | 1 | 6 | 0 |
| network:k8s-default-service-network.inet.0 | | | | | |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
[root@ip-172-31-13-9 ~]#
[root@ip-172-31-13-9 ~]#
[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py --host 172.31.13.9 ctr route summary
Introspect Host: 172.31.13.9
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| name | prefixes | paths | primary_paths | secondary_paths | infeasible_paths |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| default-domain:default- | 0 | 0 | 0 | 0 | 0 |
| project:__link_local__:__link_local__.inet.0 | | | | | |
| default-domain:default-project:dci- | 0 | 0 | 0 | 0 | 0 |
| network:__default__.inet.0 | | | | | |
| default-domain:default-project:dci-network:dci- | 0 | 0 | 0 | 0 | 0 |
| network.inet.0 | | | | | |
| default-domain:default-project:default-virtual- | 0 | 0 | 0 | 0 | 0 |
| network:default-virtual-network.inet.0 | | | | | |
| inet.0 | 0 | 0 | 0 | 0 | 0 |
| default-domain:default-project:ip-fabric:ip- | 0 | 0 | 0 | 0 | 0 |
| fabric.inet.0 | | | | | |
| default-domain:k8s-default:k8s-default-pod-network | 0 | 0 | 0 | 0 | 0 |
| :k8s-default-pod-network.inet.0 | | | | | |
| default-domain:k8s-default:k8s-default-service- | 0 | 0 | 0 | 0 | 0 |
| network:k8s-default-service-network.inet.0 | | | | | |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
[root@ip-172-31-13-9 ~]#
-> No route is imported in the new controller
[root@ip-172-31-19-25 contrail-ansible-deployer]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
cirros-deployment-75c98888b9-6qmcm 1/1 Running 0 4m58s 10.47.255.249 ip-172-31-25-102.ap-northeast-1.compute.internal <none>
cirros-deployment-75c98888b9-lxq4k 1/1 Running 0 4m58s 10.47.255.250 ip-172-31-33-175.ap-northeast-1.compute.internal <none>
[root@ip-172-31-19-25 contrail-ansible-deployer]#
/ # ip -o a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000\ link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
13: eth0@if14: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue \ link/ether 02:6b:dc:98:ac:95 brd ff:ff:ff:ff:ff:ff
13: eth0 inet 10.47.255.249/12 scope global eth0\ valid_lft forever preferred_lft forever
/ # ping 10.47.255.250
PING 10.47.255.250 (10.47.255.250): 56 data bytes
64 bytes from 10.47.255.250: seq=0 ttl=63 time=2.155 ms
64 bytes from 10.47.255.250: seq=1 ttl=63 time=0.904 ms
^C
--- 10.47.255.250 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.904/1.529/2.155 ms
/ #
-> two vRouters have one container on each node, and ping between two containers are working well
After provision_control:
[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py --host 172.31.19.25 ctr nei
Introspect Host: 172.31.19.25
+------------------------+---------------+----------+----------+-----------+-------------+-----------------+------------+-----------+
| peer | peer_address | peer_asn | encoding | peer_type | state | send_state | flap_count | flap_time |
+------------------------+---------------+----------+----------+-----------+-------------+-----------------+------------+-----------+
| ip-172-31-13-9.local | 172.31.13.9 | 64512 | BGP | internal | Idle | not advertising | 0 | n/a |
| ip-172-31-25-102.local | 172.31.25.102 | 0 | XMPP | internal | Established | in sync | 0 | n/a |
| ip-172-31-33-175.local | 172.31.33.175 | 0 | XMPP | internal | Established | in sync | 0 | n/a |
+------------------------+---------------+----------+----------+-----------+-------------+-----------------+------------+-----------+
[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py --host 172.31.13.9 ctr nei
Introspect Host: 172.31.13.9
[root@ip-172-31-13-9 ~]#
-> iBGP is configured at the old controller, but new controller doesn't have that configuration yet (that will be replicated to the new controller, when pre-sync is performed)
After run-sync:
[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py --host 172.31.19.25 ctr nei
Introspect Host: 172.31.19.25
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| peer | peer_address | peer_asn | encoding | peer_type | state | send_state | flap_count | flap_time |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| ip-172-31-13-9.local | 172.31.13.9 | 64512 | BGP | internal | Established | in sync | 0 | n/a |
| ip-172-31-25-102.local | 172.31.25.102 | 0 | XMPP | internal | Established | in sync | 0 | n/a |
| ip-172-31-33-175.local | 172.31.33.175 | 0 | XMPP | internal | Established | in sync | 0 | n/a |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py --host 172.31.13.9 ctr nei
Introspect Host: 172.31.13.9
+-----------------------+--------------+----------+----------+-----------+-------------+------------+------------+-----------+
| peer | peer_address | peer_asn | encoding | peer_type | state | send_state | flap_count | flap_time |
+-----------------------+--------------+----------+----------+-----------+-------------+------------+------------+-----------+
| ip-172-31-19-25.local | 172.31.19.25 | 64512 | BGP | internal | Established | in sync | 0 | n/a |
+-----------------------+--------------+----------+----------+-----------+-------------+------------+------------+-----------+
[root@ip-172-31-13-9 ~]#
-> iBGP is established, ctr route summary haven't yet changed, since new controller don't have k8s-default-pod-network's route-target and route target filtering prohibits importing those prefixes
After moving one node to the new controller:
/ # ping 10.47.255.250
PING 10.47.255.250 (10.47.255.250): 56 data bytes
64 bytes from 10.47.255.250: seq=0 ttl=63 time=1.684 ms
64 bytes from 10.47.255.250: seq=1 ttl=63 time=0.835 ms
64 bytes from 10.47.255.250: seq=2 ttl=63 time=0.836 ms
(snip)
64 bytes from 10.47.255.250: seq=37 ttl=63 time=0.878 ms
64 bytes from 10.47.255.250: seq=38 ttl=63 time=0.823 ms
64 bytes from 10.47.255.250: seq=39 ttl=63 time=0.820 ms
64 bytes from 10.47.255.250: seq=40 ttl=63 time=1.364 ms
64 bytes from 10.47.255.250: seq=44 ttl=63 time=2.209 ms
64 bytes from 10.47.255.250: seq=45 ttl=63 time=0.869 ms
64 bytes from 10.47.255.250: seq=46 ttl=63 time=0.857 ms
64 bytes from 10.47.255.250: seq=47 ttl=63 time=0.855 ms
64 bytes from 10.47.255.250: seq=48 ttl=63 time=0.845 ms
64 bytes from 10.47.255.250: seq=49 ttl=63 time=0.842 ms
64 bytes from 10.47.255.250: seq=50 ttl=63 time=0.885 ms
64 bytes from 10.47.255.250: seq=51 ttl=63 time=0.891 ms
64 bytes from 10.47.255.250: seq=52 ttl=63 time=0.909 ms
64 bytes from 10.47.255.250: seq=53 ttl=63 time=0.867 ms
64 bytes from 10.47.255.250: seq=54 ttl=63 time=0.884 ms
64 bytes from 10.47.255.250: seq=55 ttl=63 time=0.865 ms
64 bytes from 10.47.255.250: seq=56 ttl=63 time=0.840 ms
64 bytes from 10.47.255.250: seq=57 ttl=63 time=0.877 ms
^C
--- 10.47.255.250 ping statistics ---
58 packets transmitted, 55 packets received, 5% packet loss
round-trip min/avg/max = 0.810/0.930/2.209 ms
/ #
-> When vrouter-agent is restarted, 3 packet loss is seen (seq 40-44). After moving one vRouter to the new one, ping remain working well.
[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py --host 172.31.19.25 ctr nei
Introspect Host: 172.31.19.25
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| peer | peer_address | peer_asn | encoding | peer_type | state | send_state | flap_count | flap_time |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| ip-172-31-13-9.local | 172.31.13.9 | 64512 | BGP | internal | Established | in sync | 0 | n/a |
| ip-172-31-33-175.local | 172.31.33.175 | 0 | XMPP | internal | Established | in sync | 0 | n/a |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py --host 172.31.13.9 ctr nei
Introspect Host: 172.31.13.9
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| peer | peer_address | peer_asn | encoding | peer_type | state | send_state | flap_count | flap_time |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| ip-172-31-19-25.local | 172.31.19.25 | 64512 | BGP | internal | Established | in sync | 0 | n/a |
| ip-172-31-25-102.local | 172.31.25.102 | 0 | XMPP | internal | Established | in sync | 0 | n/a |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
[root@ip-172-31-13-9 ~]#
-> Both controllers have one XMPP connection and Established iBGP
[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py --host 172.31.19.25 ctr route summary
Introspect Host: 172.31.19.25
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| name | prefixes | paths | primary_paths | secondary_paths | infeasible_paths |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| default-domain:default- | 0 | 0 | 0 | 0 | 0 |
| project:__link_local__:__link_local__.inet.0 | | | | | |
| default-domain:default-project:dci- | 0 | 0 | 0 | 0 | 0 |
| network:__default__.inet.0 | | | | | |
| default-domain:default-project:dci-network:dci- | 0 | 0 | 0 | 0 | 0 |
| network.inet.0 | | | | | |
| default-domain:default-project:default-virtual- | 0 | 0 | 0 | 0 | 0 |
| network:default-virtual-network.inet.0 | | | | | |
| inet.0 | 0 | 0 | 0 | 0 | 0 |
| default-domain:default-project:ip-fabric:ip- | 7 | 7 | 1 | 6 | 0 |
| fabric.inet.0 | | | | | |
| default-domain:k8s-default:k8s-default-pod-network | 7 | 7 | 1 | 6 | 0 |
| :k8s-default-pod-network.inet.0 | | | | | |
| default-domain:k8s-default:k8s-default-service- | 7 | 7 | 0 | 7 | 0 |
| network:k8s-default-service-network.inet.0 | | | | | |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py --host 172.31.13.9 ctr route summary
Introspect Host: 172.31.13.9
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| name | prefixes | paths | primary_paths | secondary_paths | infeasible_paths |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| default-domain:default- | 0 | 0 | 0 | 0 | 0 |
| project:__link_local__:__link_local__.inet.0 | | | | | |
| default-domain:default-project:dci- | 0 | 0 | 0 | 0 | 0 |
| network:__default__.inet.0 | | | | | |
| default-domain:default-project:dci-network:dci- | 0 | 0 | 0 | 0 | 0 |
| network.inet.0 | | | | | |
| default-domain:default-project:default-virtual- | 0 | 0 | 0 | 0 | 0 |
| network:default-virtual-network.inet.0 | | | | | |
| inet.0 | 0 | 0 | 0 | 0 | 0 |
| default-domain:default-project:ip-fabric:ip- | 7 | 7 | 1 | 6 | 0 |
| fabric.inet.0 | | | | | |
| default-domain:k8s-default:k8s-default-pod-network | 7 | 7 | 3 | 4 | 0 |
| :k8s-default-pod-network.inet.0 | | | | | |
| default-domain:k8s-default:k8s-default-service- | 7 | 7 | 1 | 6 | 0 |
| network:k8s-default-service-network.inet.0 | | | | | |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
[root@ip-172-31-13-9 ~]#
-> Since both controllers have at least one container from k8s-default-pod-network, they used iBGP to exchange the prefixes between them, and they have the same prefixes
After moving the second vrouter to the new controller:
/ # ping 10.47.255.250
PING 10.47.255.250 (10.47.255.250): 56 data bytes
64 bytes from 10.47.255.250: seq=0 ttl=63 time=1.750 ms
64 bytes from 10.47.255.250: seq=1 ttl=63 time=0.815 ms
64 bytes from 10.47.255.250: seq=2 ttl=63 time=0.851 ms
64 bytes from 10.47.255.250: seq=3 ttl=63 time=0.809 ms
(snip)
64 bytes from 10.47.255.250: seq=34 ttl=63 time=0.853 ms
64 bytes from 10.47.255.250: seq=35 ttl=63 time=0.848 ms
64 bytes from 10.47.255.250: seq=36 ttl=63 time=0.833 ms
64 bytes from 10.47.255.250: seq=37 ttl=63 time=0.832 ms
64 bytes from 10.47.255.250: seq=38 ttl=63 time=0.910 ms
64 bytes from 10.47.255.250: seq=42 ttl=63 time=2.071 ms
64 bytes from 10.47.255.250: seq=43 ttl=63 time=0.826 ms
64 bytes from 10.47.255.250: seq=44 ttl=63 time=0.853 ms
64 bytes from 10.47.255.250: seq=45 ttl=63 time=0.851 ms
64 bytes from 10.47.255.250: seq=46 ttl=63 time=0.853 ms
64 bytes from 10.47.255.250: seq=47 ttl=63 time=0.851 ms
64 bytes from 10.47.255.250: seq=48 ttl=63 time=0.855 ms
64 bytes from 10.47.255.250: seq=49 ttl=63 time=0.869 ms
64 bytes from 10.47.255.250: seq=50 ttl=63 time=0.833 ms
64 bytes from 10.47.255.250: seq=51 ttl=63 time=0.859 ms
64 bytes from 10.47.255.250: seq=52 ttl=63 time=0.866 ms
64 bytes from 10.47.255.250: seq=53 ttl=63 time=0.840 ms
64 bytes from 10.47.255.250: seq=54 ttl=63 time=0.841 ms
64 bytes from 10.47.255.250: seq=55 ttl=63 time=0.854 ms
^C
--- 10.47.255.250 ping statistics ---
56 packets transmitted, 53 packets received, 5% packet loss
round-trip min/avg/max = 0.799/0.888/2.071 ms
/ #
-> 3 packet loss is seen (seq 38-42)
[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py --host 172.31.19.25 ctr nei
Introspect Host: 172.31.19.25
+----------------------+--------------+----------+----------+-----------+-------------+------------+------------+-----------+
| peer | peer_address | peer_asn | encoding | peer_type | state | send_state | flap_count | flap_time |
+----------------------+--------------+----------+----------+-----------+-------------+------------+------------+-----------+
| ip-172-31-13-9.local | 172.31.13.9 | 64512 | BGP | internal | Established | in sync | 0 | n/a |
+----------------------+--------------+----------+----------+-----------+-------------+------------+------------+-----------+
[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py --host 172.31.13.9 ctr nei
Introspect Host: 172.31.13.9
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| peer | peer_address | peer_asn | encoding | peer_type | state | send_state | flap_count | flap_time |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| ip-172-31-19-25.local | 172.31.19.25 | 64512 | BGP | internal | Established | in sync | 0 | n/a |
| ip-172-31-25-102.local | 172.31.25.102 | 0 | XMPP | internal | Established | in sync | 0 | n/a |
| ip-172-31-33-175.local | 172.31.33.175 | 0 | XMPP | internal | Established | in sync | 0 | n/a |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
[root@ip-172-31-13-9 ~]#
-> New controller has two XMPP connections.
[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py --host 172.31.19.25 ctr route summary
Introspect Host: 172.31.19.25
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| name | prefixes | paths | primary_paths | secondary_paths | infeasible_paths |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| default-domain:default- | 0 | 0 | 0 | 0 | 0 |
| project:__link_local__:__link_local__.inet.0 | | | | | |
| default-domain:default-project:dci- | 0 | 0 | 0 | 0 | 0 |
| network:__default__.inet.0 | | | | | |
| default-domain:default-project:dci-network:dci- | 0 | 0 | 0 | 0 | 0 |
| network.inet.0 | | | | | |
| default-domain:default-project:default-virtual- | 0 | 0 | 0 | 0 | 0 |
| network:default-virtual-network.inet.0 | | | | | |
| inet.0 | 0 | 0 | 0 | 0 | 0 |
| default-domain:default-project:ip-fabric:ip- | 0 | 0 | 0 | 0 | 0 |
| fabric.inet.0 | | | | | |
| default-domain:k8s-default:k8s-default-pod-network | 0 | 0 | 0 | 0 | 0 |
| :k8s-default-pod-network.inet.0 | | | | | |
| default-domain:k8s-default:k8s-default-service- | 0 | 0 | 0 | 0 | 0 |
| network:k8s-default-service-network.inet.0 | | | | | |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py --host 172.31.13.9 ctr route summary
Introspect Host: 172.31.13.9
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| name | prefixes | paths | primary_paths | secondary_paths | infeasible_paths |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| default-domain:default- | 0 | 0 | 0 | 0 | 0 |
| project:__link_local__:__link_local__.inet.0 | | | | | |
| default-domain:default-project:dci- | 0 | 0 | 0 | 0 | 0 |
| network:__default__.inet.0 | | | | | |
| default-domain:default-project:dci-network:dci- | 0 | 0 | 0 | 0 | 0 |
| network.inet.0 | | | | | |
| default-domain:default-project:default-virtual- | 0 | 0 | 0 | 0 | 0 |
| network:default-virtual-network.inet.0 | | | | | |
| inet.0 | 0 | 0 | 0 | 0 | 0 |
| default-domain:default-project:ip-fabric:ip- | 7 | 7 | 2 | 5 | 0 |
| fabric.inet.0 | | | | | |
| default-domain:k8s-default:k8s-default-pod-network | 7 | 7 | 4 | 3 | 0 |
| :k8s-default-pod-network.inet.0 | | | | | |
| default-domain:k8s-default:k8s-default-service- | 7 | 7 | 1 | 6 | 0 |
| network:k8s-default-service-network.inet.0 | | | | | |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
[root@ip-172-31-13-9 ~]#
-> Old controller doesn't have prefixes anymore
After ISSU procedure finished and new kube-manager started:
[root@ip-172-31-19-25 ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
cirros-deployment-75c98888b9-6qmcm 1/1 Running 0 34m 10.47.255.249 ip-172-31-25-102.ap-northeast-1.compute.internal <none>
cirros-deployment-75c98888b9-lxq4k 1/1 Running 0 34m 10.47.255.250 ip-172-31-33-175.ap-northeast-1.compute.internal <none>
cirros-deployment2-648b98685f-b8pxw 1/1 Running 0 15s 10.47.255.247 ip-172-31-25-102.ap-northeast-1.compute.internal <none>
cirros-deployment2-648b98685f-nv7z9 1/1 Running 0 15s 10.47.255.248 ip-172-31-33-175.ap-northeast-1.compute.internal <none>
[root@ip-172-31-19-25 ~]#
-> containers can be created with new ip (10.47.255.247, 10.47.255.248 are the new ip from new controller)
[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py --host 172.31.19.25 ctr nei
Introspect Host: 172.31.19.25
+----------------------+--------------+----------+----------+-----------+--------+-----------------+------------+-----------------------------+
| peer | peer_address | peer_asn | encoding | peer_type | state | send_state | flap_count | flap_time |
+----------------------+--------------+----------+----------+-----------+--------+-----------------+------------+-----------------------------+
| ip-172-31-13-9.local | 172.31.13.9 | 64512 | BGP | internal | Active | not advertising | 1 | 2019-Jun-23 05:37:02.614003 |
+----------------------+--------------+----------+----------+-----------+--------+-----------------+------------+-----------------------------+
[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py --host 172.31.13.9 ctr nei
Introspect Host: 172.31.13.9
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| peer | peer_address | peer_asn | encoding | peer_type | state | send_state | flap_count | flap_time |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| ip-172-31-25-102.local | 172.31.25.102 | 0 | XMPP | internal | Established | in sync | 0 | n/a |
| ip-172-31-33-175.local | 172.31.33.175 | 0 | XMPP | internal | Established | in sync | 0 | n/a |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
[root@ip-172-31-13-9 ~]#
-> New controller doesn't have an iBGP to the old controller anymore. Old controller still has an iBGP entry, although this process will be stopped soon :)
After stopping old control, config:
[root@ip-172-31-19-25 ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
cirros-deployment-75c98888b9-6qmcm 1/1 Running 0 48m 10.47.255.249 ip-172-31-25-102.ap-northeast-1.compute.internal <none>
cirros-deployment-75c98888b9-lxq4k 1/1 Running 0 48m 10.47.255.250 ip-172-31-33-175.ap-northeast-1.compute.internal <none>
cirros-deployment2-648b98685f-b8pxw 1/1 Running 0 13m 10.47.255.247 ip-172-31-25-102.ap-northeast-1.compute.internal <none>
cirros-deployment2-648b98685f-nv7z9 1/1 Running 0 13m 10.47.255.248 ip-172-31-33-175.ap-northeast-1.compute.internal <none>
cirros-deployment3-68fb484676-ct9q9 1/1 Running 0 18s 10.47.255.245 ip-172-31-25-102.ap-northeast-1.compute.internal <none>
cirros-deployment3-68fb484676-mxbzq 1/1 Running 0 18s 10.47.255.246 ip-172-31-33-175.ap-northeast-1.compute.internal <none>
[root@ip-172-31-19-25 ~]#
-> New containers still can be created
[root@ip-172-31-25-102 ~]# contrail-status
Pod Service Original Name State Id Status
vrouter agent contrail-vrouter-agent running 9a46a1a721a7 Up 33 minutes
vrouter nodemgr contrail-nodemgr running 11fb0a7bc86d Up 33 minutes
vrouter kernel module is PRESENT
== Contrail vrouter ==
nodemgr: active
agent: active
[root@ip-172-31-25-102 ~]#
-> vRouter is working well with the new control
/ # ping 10.47.255.250
PING 10.47.255.250 (10.47.255.250): 56 data bytes
64 bytes from 10.47.255.250: seq=0 ttl=63 time=1.781 ms
64 bytes from 10.47.255.250: seq=1 ttl=63 time=0.857 ms
^C
--- 10.47.255.250 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.857/1.319/1.781 ms
/ #
-> ping between vRouters is ok
Since there are several methods to update cluster (in-place and ISSU, ifdown vhost0 or not), which method is needed is an important subject.
Before discussing the detail of this, let me describe the behavior of vrouter-agent up / down, and ifup vhost0 / ifdown vhost0.
When vrouter-agent is restarted, one assumption is that both of vrouter-agent container, and vhost0 is re-created.
Actually, this is not the case, since vhost0 is tightly coupled with vrouter.ko, and it needs to be deleted at the same time with vrouter.ko is unloaded from the kernel. So from operational point of view, ifdown vhost0 is needed, if not only vrouter-agent but also vrouter.ko need to be updated. (ifdown vhost0 also will do rmmod vrouter internally)
So to discuss backward compatibility, there are three topics to be investigated.
- controller to vrouter-agent compatibility
- if no backward compatibility is available, ISSU is needed
- vrouter-agent to vrouter.ko compatibility
- if no backward compatibility is available, ifdown vhost0 is needed, which causes minimal 5-10 sec traffic loss, so it practically means traffic need to be moved to other nodes, with such as live migration
- Since vrouter-agent uses netlink to sync data with vrouter.ko, schema change could leads to unexpected behavior of vrouter-agent (such as segmentation fault of vrouter-agent at Ksync logic)
- vrouter.ko to kernel compatiblity
- if no backward compatibility is available, kernel also needs to be updated, so it means traffic need to be moved to other nodes
- when vrouter.ko has different in-kernal API, it can't be loaded by kernel, and vhost0 and vrouter-agent can't be created
For 2 and 3, since kernel update is unavoidable for various reasons, one possible plan is to firstly choose one new kernel version, and choose one vrouter-agent / vrouter.ko which supports that kernel, and check if vrouter-agent which is used currently, can work with that version of control.
- If it worked well, please use in-place update, and if it won't work for some reason, or rollback operation is required, then ISSU is used
For 1, since ifmap maintains white_list for each version when importing config-api definition,
- void IFMapGraphWalker::AddNodesToWhitelist(): https://github.com/Juniper/contrail-controller/blob/master/src/ifmap/ifmap_graph_walker.cc#L349
as far as I tried, it seems to have decent backward compatibility. (Since routing info update is similar to BGP, it also mostly should work well)
To verify this, I tried this setup with modules in different version, and it seems still working well.
I-1. config 2002-latest, control 2002-latest, vrouter 5.0-latest, openstack queens
I-2. config 2002-latest, control 5.0-latest, vrouter 5.0-latest, openstack queens
II-1. config 2002-latest, control 2002-latest, vrouter r5.1, kubernetes 1.12
Note: Unfortunately, this combination won't work well (cni can't get port info from vrouter-agent), I suppose this is caused by cni version change (0.2.0->0.3.1) between 5.0.x and 5.1.
II-2. config 2002-latest, control 2002-latest, vrouter 5.0-latest, kubernetes 1.12
So even if kernel and vRouter version don't need to be changed soon, it might be a good habit to update config / control slightly more frequently, for possible bug fix.
Before delving into this important subject, I'll firstly describe the encapsulation and control plane protocol I prefer, in two cases, which are DataCenter and NFVI.
- DataCenter: EVPN / VXLAN
- If you need MPLS over MPLS between DCs, you need router configuration to stitch them
- NFVI: L3VPN / MPLS over UDP
Let me describe the reason of those choices.
To choose encapsulation, two sides, NICs and routers / switches, need to be taken care of.
For NIC side, vxlan is much more prevalent, and it is not so easy to find a hardware which can offload MPLS encap / decap, even though linux itself supports MPLS encap / decap from 4.1.
- https://kernelnewbies.org/Linux_4.1#Multiprotocol_Label_Switching
- If no hw offload is used, kernel vRouter will have up to 1.0 Mpps performance limit based on linux network stack, AFAIK
- That said, let me note that vRouter currently does not support linux api to offload encap / decap of vxlan, although some configuration knob is already available: https://github.com/Juniper/contrail-specs/blob/master/smart-nic-generic-offload.md
For Router/Swtich side, it is also true that it is a bit more costly to find a hardware which can work with MPLS packets, since most of DataCenter switches currently use specific Broadcom chips, which can use vxlan, but cannot use MPLS.
So in DataCenter, to use vxlan encapsulation will be feasible.
To use VXLAN, EVPN will be the one control plane that works well.
Tungsten Fabric controller currently supports EVPN Type 2 and Type 5, and 1, 3, 4 also are used internally.
- https://github.com/Juniper/contrail-specs/blob/master/EVPN-type-5-support-in-Contrail.md
- https://github.com/Juniper/contrail-controller/blob/master/src/bgp/evpn/evpn_route.h#L47
- Type 6 implementation also seems to be on the way: https://github.com/Juniper/contrail-specs/blob/master/5.1/evpn_multicast_smet.md
So it is basically OK for vRouter to join EVPN/VXLAN network, although it is not always easy to reach full interoperability.
One thing to be careful about is vRouter is capable of vxlan routing, although some switches won't have this feature.
In this setup, you might need to be a bit careful about how inter-vxlan traffic will be sent between physical switches and vRouters.
- This document describes that behavior well: https://www.juniper.net/documentation/en_US/release-independent/solutions/information-products/pathway-pages/solutions/l3gw-vmto-evpn-vxlan-mpls.pdf
One corner case is MPLS-over-MPLS need to be used between DCs, because of advanced MPLS features like traffic engineering and link protection.
In this case, routers have to stitch EVPN/VXLAN and EVPN/MPLS, which will be achieved with those configurations.
If it is used as NFVI, since Tungsten Fabric currently doesn't support service-chain with EVPN type-5, L3VPN / MPLS over UDP will be the only possible choice.
- Note: from R1912, control / vRouter implemented service-chain based on EVPN T5 (and VXLAN), so L3VPN / MPLS over IP won't be a strict requirement: https://github.com/Juniper/contrail-specs/blob/master/R1912/bms-service-chaining.md
- https://github.com/Juniper/contrail-specs/blob/master/EVPN-type-5-support-in-Contrail.md#control-node
- MPLS over GRE is also ok, although it has less entropy to be used for such as LAG load-balance
Since it is prefered option to use DPDK in this case, linux stack's throughput limitation won't be an issue.
To illustrate the evpn / vxlan integration, let me describe L2VNI and L3VNI setup with CumulusVX (it uses FRRouting and vanilla linux's vrf / virtual-switch)
- other sample about L3VPN / MPLS over (GRE|UDP) could be found there (TODO: config sample for EVPN / MPLS over (GRE/UDP))
- https://marcelwiget.blog/2015/07/30/run-juniper-vmx-as-contrail-gateway-for-ipv6-overlay/
[1. sample setup]
Tungsten Fabric controller: 192.168.122.141/24
Tungsten Fabric vRouter: 192.168.122.142/24
vn1 (vxlan id: 7), 10.0.1.0/24, route-target: 64512:7 is set
10.0.1.3 is a cirros container inside vn1
vn1 is connected to lr1 (logical-router, vxlan id: 8, route-target 64512:8 is set)
Tungsten Fabric's project setting, 'vxlan routing: enabled' is also set (this settimg might be changed in the future)
https://review.opencontrail.org/c/Juniper/contrail-controller/+/51833
CumulusVX: 192.168.122.151/24
swp1: centos152 (10.0.1.152/24) is connected
-> same l2 subnet with the container inside vRouter
swp2: centos153 (192.168.130.153/24) is connected
-> L3VRF will route the traffic from this to the container
[2. bgp setting]
net add bgp autonomous-system 64513
net add bgp router-id 192.168.122.151
net add bgp neighbor 192.168.122.141 remote-as 64512
net add bgp neighbor 192.168.122.141 capability extended-nexthop
net add bgp l2vpn evpn neighbor 192.168.122.141 activate
net add bgp l2vpn evpn advertise-all-vni
net add bgp l2vpn evpn vni 7 rd 192.168.122.151:7
net add bgp l2vpn evpn vni 7 route-target import 64512:7
net add bgp l2vpn evpn vni 7 route-target export 64512:7
cumulus@cumulus:~$ net show bgp summary
show bgp ipv4 unicast summary
=============================
BGP router identifier 192.168.122.151, local AS number 64513 vrf-id 0
BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 1, using 19 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
192.168.122.141 4 64512 55 43 0 0 0 00:01:15 NoNeg
Total number of neighbors 1
show bgp ipv6 unicast summary
=============================
% No BGP neighbors found
show bgp l2vpn evpn summary
===========================
BGP router identifier 192.168.122.151, local AS number 64513 vrf-id 0
BGP table version 0
RIB entries 3, using 456 bytes of memory
Peers 1, using 19 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
192.168.122.141 4 64512 55 43 0 0 0 00:01:15 6
Total number of neighbors 1
cumulus@cumulus:~$
[3. l2vni setting]
net add bridge bridge ports vni7
net add bridge bridge vids 7
net add interface swp1 bridge pvid 7
net add vxlan vni7 vxlan id 7
net add vxlan vni7 bridge learning off
net add vxlan vni7 bridge access 7
net add vxlan vni7 bridge arp-nd-suppress on
net add vxlan vni7 vxlan local-tunnelip 192.168.122.151
net add vlan 7 ip forward off
net add vlan 7 ipv6 forward off
cumulus@cumulus:~$ net show bgp l2vpn evpn route
BGP table version is 18, local router ID is 192.168.122.151
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[ESI]:[EthTag]:[IPlen]:[IP]
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 192.168.122.142:1
*> [2]:[0]:[0]:[48]:[52:54:00:d9:db:32]
192.168.122.142 100 0 64512 ?
*> [2]:[0]:[0]:[48]:[52:54:00:d9:db:32]:[32]:[192.168.122.142]
192.168.122.142 100 0 64512 ?
*> [3]:[0]:[32]:[192.168.122.142]
192.168.122.142 200 0 64512 ?
Route Distinguisher: 192.168.122.142:3
*> [2]:[0]:[0]:[48]:[02:98:81:86:80:8a]
192.168.122.142 100 0 64512 ?
*> [2]:[0]:[0]:[48]:[02:98:81:86:80:8a]:[32]:[10.0.1.3]
192.168.122.142 100 0 64512 ?
*> [3]:[0]:[32]:[192.168.122.142]
192.168.122.142 200 0 64512 ?
Route Distinguisher: 192.168.122.142:4
*> [5]:[0]:[0]:[32]:[10.0.1.3]
192.168.122.142 100 0 64512 ?
(snip)
Route Distinguisher: 192.168.122.151:7
*> [3]:[0]:[32]:[192.168.122.151]
192.168.122.151 32768 i
Route Distinguisher: 192.168.122.151:8
*> [5]:[0]:[0]:[24]:[192.168.131.0]
192.168.122.151 0 32768 ?
Displayed 12 prefixes (12 paths)
cumulus@cumulus:~$
[root@centos152 ~]# ping 10.0.1.3
PING 10.0.1.3 (10.0.1.3) 56(84) bytes of data.
64 bytes from 10.0.1.3: icmp_seq=1 ttl=64 time=1.37 ms
64 bytes from 10.0.1.3: icmp_seq=2 ttl=64 time=0.836 ms
64 bytes from 10.0.1.3: icmp_seq=3 ttl=64 time=0.778 ms
64 bytes from 10.0.1.3: icmp_seq=4 ttl=64 time=0.753 ms
64 bytes from 10.0.1.3: icmp_seq=5 ttl=64 time=0.801 ms
--- 10.0.1.3 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4006ms
rtt min/avg/max/mdev = 0.753/0.908/1.374/0.235 ms
[root@centos152 ~]#
cumulus@cumulus:~$ net show evpn arp-cache vni all
VNI 7 #ARP (IPv4 and IPv6, local and remote) 3
IP Type State MAC Remote VTEP
10.0.1.152 local active 52:54:00:20:e5:9a
fe80::28a0:caff:fe62:d16c local active 2a:a0:ca:62:d1:6c
10.0.1.3 remote active 02:98:81:86:80:8a 192.168.122.142
cumulus@cumulus:~$
-> mac address of 10.0.1.3 container is learnt from Tungsten Fabric controller
[4. l3vni setting]
net add vrf vrf8 vni 8
net add bgp router-id 192.168.122.151
net add bgp vrf vrf8 autonomous-system 64513
net add bgp vrf vrf8 ipv4 unicast redistribute connected
net add bgp vrf vrf8 l2vpn evpn advertise ipv4 unicast
net add bgp vrf vrf8 l2vpn evpn rd 192.168.122.151:8
net add bgp vrf vrf8 l2vpn evpn route-target import 64512:8
net add bgp vrf vrf8 l2vpn evpn route-target export 64512:8
net add vxlan vni8 vxlan id 8
net add interface swp2 bridge pvid 8
net add vlan 8 ip address 192.168.131.254/24
net add vlan 8 vlan-id 8
net add vlan 8 vrf vrf8
net add vxlan vni8 vxlan local-tunnelip 192.168.122.151
net add vxlan vni8 bridge access 8
cumulus@cumulus:~$ net show bgp l2vpn evpn route type prefix
BGP table version is 4, local router ID is 192.168.122.151
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[ESI]:[EthTag]:[IPlen]:[IP]
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 192.168.122.142:4
*> [5]:[0]:[0]:[32]:[10.0.1.3]
192.168.122.142 100 0 64512 ?
Route Distinguisher: 192.168.122.151:8
*> [5]:[0]:[0]:[24]:[192.168.131.0]
192.168.122.151 0 32768 ?
Displayed 2 prefixes (2 paths) (of requested type)
cumulus@cumulus:~$
cumulus@cumulus:~$ net show route vrf vrf8
show ip route vrf vrf8
=======================
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR,
> - selected route, * - FIB route
VRF vrf8:
K * 0.0.0.0/0 [255/8192] unreachable (ICMP unreachable), 00:31:09
B>* 10.0.1.3/32 [20/100] via 192.168.122.142, vlan8 onlink, 00:31:09
C>* 192.168.131.0/24 is directly connected, vlan8, 00:29:05
[root@centos153 ~]# ping 10.0.1.3
PING 10.0.1.3 (10.0.1.3) 56(84) bytes of data.
64 bytes from 10.0.1.3: icmp_seq=1 ttl=62 time=1.27 ms
64 bytes from 10.0.1.3: icmp_seq=2 ttl=62 time=0.892 ms
64 bytes from 10.0.1.3: icmp_seq=3 ttl=62 time=0.912 ms
64 bytes from 10.0.1.3: icmp_seq=4 ttl=62 time=0.851 ms
--- 10.0.1.3 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3004ms
rtt min/avg/max/mdev = 0.851/0.981/1.272/0.173 ms
[root@centos153 ~]#
[root@centos153 ~]#
[root@centos153 ~]# ip -o a
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
1: lo inet6 ::1/128 scope host \ valid_lft forever preferred_lft forever
2: eth0 inet 192.168.131.153/24 brd 192.168.131.255 scope global noprefixroute eth0\ valid_lft forever preferred_lft forever
2: eth0 inet6 fe80::24a9:6145:e488:5f15/64 scope link noprefixroute \ valid_lft forever preferred_lft forever
[root@centos153 ~]#
[root@centos153 ~]# ip route
default via 192.168.131.254 dev eth0 proto static metric 100
192.168.131.0/24 dev eth0 proto kernel scope link src 192.168.131.153 metric 100
[root@centos153 ~]#
Before R1908, to enable EVPN T5, vxlan-routing is a project level setting, so once this knob is enabled, all the logical-router are in type: vxlan-routing, and can't be used as snat-routing logical-router.
After R1908, this setting can be set per logical-router.
Having said that, currently there is no way to create vxlan-routing logical-router from webui (it can be created by API).
One way to try this feature is, to modify config-api module to use vxlan-routing instead of snat-routing.
# docker exec -it config_api_1 bash
# sed -i 's/snat-routing/vxlan-routing/' /usr/lib/python2.7/site-packages/vnc_cfg_api_server/resources/logical_router.py
# exit
# docker restart config_api_1
After this, when some logical-router is attached to a vitual-network, EVPN T5 route is sent to other bgp peer.
- Orchestrator needs to be openstack
(one VM is created in virtual-network vn1)
(kolla-toolbox)[ansible@ip-172-31-13-153 /]$ openstack server list
+--------------------------------------+------+--------+--------------+--------+---------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+------+--------+--------------+--------+---------+
| e3a43979-a8ae-4f05-b065-0b0841cee47b | vm1 | ACTIVE | vn1=10.0.1.3 | cirros | m1.tiny |
+--------------------------------------+------+--------+--------------+--------+---------+
(kolla-toolbox)[ansible@ip-172-31-13-153 /]$
(when logical-router is not connected to vn1, no type 5 route is seen)
[root@ip-172-31-13-153 ~]# ./contrail-introspect-cli/ist.py ctr route show --family evpn | grep ^5
[root@ip-172-31-13-153 ~]#
(when logical-router is connected to vn1, type 5 route for this VM is sent to other bgp peer)
[root@ip-172-31-13-153 ~]# ./contrail-introspect-cli/ist.py ctr route show --family evpn | grep ^5
5-0:0-0-10.0.1.3/32, age: 0:00:07.126096, last_modified: 2020-Jan-12 13:50:27.307760
5-172.31.13.153:3-0-10.0.1.3/32, age: 0:00:07.077088, last_modified: 2020-Jan-12 13:50:27.356768
[root@ip-172-31-13-153 ~]#
In addition, after R1912, EVPN T5 also can be used for service-chain route. (it can be used with vxlan)
To configure this, those procedure need to be followed.
- tested with opencontrailnightly:1912-latest, one node install is used (openstack controller, tungsten fabric controller, vRouter is used)
- Create two virtual-networks (vn1, vn2) and logical-routers (lr1, lr2)
- Connect lr1 to vn1, lr2 to vn2
- Check that virtual-network LR::lr1, LR::lr2 are automatically created
(kolla-toolbox)[ansible@ip-172-31-13-153 /]$ openstack network list
+--------------------------------------+-------------------------+--------------------------------------+
| ID | Name | Subnets |
+--------------------------------------+-------------------------+--------------------------------------+
| 667344f9-36f1-4d56-8d9e-e5b8c856658b | LR::lr1 | ab81f262-52d3-496f-825e-758ca5e6d60f |
| 0acf42ab-f917-4a32-a95a-5f2a555e955d | ip-fabric | |
| 5ac821b2-b823-4ea7-8be2-e1ee71547df8 | LR::lr2 | 45b16ec8-0497-4610-843d-13d6913f4c41 |
| 0a0e30c2-d2fa-46dd-bd6f-233897f156f4 | vn1 | c739aa67-bad3-4a69-b110-797018579b22 |
| 822b12ae-8b9c-4c32-be91-1611c245e761 | vn2 | c67c9f25-8169-44dd-b1cd-8d9ab788a0da |
| 16715adc-93cb-4297-847a-50fcbcdef98b | __link_local__ | |
| 95b08fcc-b027-407a-8b35-8470989b7d5a | dci-network | |
| 728957ed-9db3-4502-b45a-2ce3ce0ed575 | default-virtual-network | |
+--------------------------------------+-------------------------+--------------------------------------+
(kolla-toolbox)[ansible@ip-172-31-13-153 /]$
- add subnets to LR::lr1 and LR::lr2 (TF webui can be used for this)
- create VNF with vNICs in LR::lr1 and LR::lr2
(kolla-toolbox)[ansible@ip-172-31-13-153 /]$ openstack server list
+--------------------------------------+------------+--------+--------------------------------------+--------+---------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+------------+--------+--------------------------------------+--------+---------+
| 4477700f-8183-4f81-b7bf-7fb16e74aba8 | vm2 | ACTIVE | vn2=10.0.2.4 | cirros | m1.tiny |
| b631b50c-5ccf-4e48-86a8-bf390c174180 | lr1-to-lr2 | ACTIVE | LR::lr1=10.0.11.3; LR::lr2=10.0.12.3 | cirros | m1.tiny |
| e3a43979-a8ae-4f05-b065-0b0841cee47b | vm1 | ACTIVE | vn1=10.0.1.3 | cirros | m1.tiny |
+--------------------------------------+------------+--------+--------------------------------------+--------+---------+
(kolla-toolbox)[ansible@ip-172-31-13-153 /]$
- create service-instance, network-policy with LR::lr1 and LR::lr2, and attach network-policy to LR::lr1 and LR::lr2
When everything works fine, EVPN T5 route with protocol ServiceChain, will be added
[root@ip-172-31-13-153 ~]# ./contrail-introspect-cli/ist.py ctr route show --family evpn | grep -e ^5 -e evpn -A 1
default-domain:admin:__contrail_lr_internal_vn_62651c76-7851-4459-8d54-41b2b1289e21__:__contrail_lr_internal_vn_62651c76-7851-4459-8d54-41b2b1289e21__.evpn.0: 2 destinations, 2 routes (1 primary, 1 secondary, 0 infeasible)
5-0:0-0-10.0.1.3/32, age: 0:00:40.299110, last_modified: 2020-Jan-12 14:00:39.070835
[ServiceChain (service-interface)|None] age: 0:00:40.302293, localpref: 200, nh: 172.31.13.153, encap: ['vxlan'], label: 8, AS path: None
--
5-0:0-0-10.0.2.4/32, age: 0:04:22.046440, last_modified: 2020-Jan-12 13:56:57.323505
[XMPP|ip-172-31-13-153.local] age: 0:04:22.049981, localpref: 200, nh: 172.31.13.153, encap: ['vxlan'], label: 8, AS path: None
--
default-domain:admin:__contrail_lr_internal_vn_62651c76-7851-4459-8d54-41b2b1289e21__:service-20c08253-7212-40e2-8211-1548652de4b9-default-domain_admin_lr1-to-lr2.evpn.0: 2 destinations, 2 routes (1 primary, 1 secondary, 0 infeasible)
5-0:0-0-10.0.1.3/32, age: 0:00:40.299524, last_modified: 2020-Jan-12 14:00:39.070421
[ServiceChain (service-interface)|None] age: 0:00:40.303335, localpref: 200, nh: 172.31.13.153, encap: ['vxlan'], label: 8, AS path: None
--
5-0:0-0-10.0.2.4/32, age: 0:00:40.316583, last_modified: 2020-Jan-12 14:00:39.053362
[XMPP|ip-172-31-13-153.local] age: 0:00:40.320727, localpref: 200, nh: 172.31.13.153, encap: ['vxlan'], label: 8, AS path: None
--
default-domain:admin:__contrail_lr_internal_vn_7693de7f-9b96-41de-84af-c6db113132e2__:__contrail_lr_internal_vn_7693de7f-9b96-41de-84af-c6db113132e2__.evpn.0: 2 destinations, 2 routes (1 primary, 1 secondary, 0 infeasible)
5-0:0-0-10.0.1.3/32, age: 0:10:52.062185, last_modified: 2020-Jan-12 13:50:27.307760
[XMPP|ip-172-31-13-153.local] age: 0:10:52.066796, localpref: 200, nh: 172.31.13.153, encap: ['vxlan'], label: 6, AS path: None
--
5-0:0-0-10.0.2.4/32, age: 0:00:40.299766, last_modified: 2020-Jan-12 14:00:39.070179
[ServiceChain (service-interface)|None] age: 0:00:40.304752, localpref: 200, nh: 172.31.13.153, encap: ['vxlan'], label: 6, AS path: None
--
default-domain:admin:__contrail_lr_internal_vn_7693de7f-9b96-41de-84af-c6db113132e2__:service-20c08253-7212-40e2-8211-1548652de4b9-default-domain_admin_lr1-to-lr2.evpn.0: 2 destinations, 2 routes (1 primary, 1 secondary, 0 infeasible)
5-0:0-0-10.0.1.3/32, age: 0:00:40.465418, last_modified: 2020-Jan-12 14:00:38.904527
[XMPP|ip-172-31-13-153.local] age: 0:00:40.470671, localpref: 200, nh: 172.31.13.153, encap: ['vxlan'], label: 6, AS path: None
--
5-0:0-0-10.0.2.4/32, age: 0:00:40.299958, last_modified: 2020-Jan-12 14:00:39.069987
[ServiceChain (service-interface)|None] age: 0:00:40.305449, localpref: 200, nh: 172.31.13.153, encap: ['vxlan'], label: 6, AS path: None
--
default-domain:admin:vn1:vn1.evpn.0: 4 destinations, 4 routes (4 primary, 0 secondary, 0 infeasible)
--
default-domain:admin:vn2:vn2.evpn.0: 4 destinations, 4 routes (4 primary, 0 secondary, 0 infeasible)
--
bgp.evpn.0: 13 destinations, 13 routes (0 primary, 13 secondary, 0 infeasible)
--
5-172.31.13.153:3-0-10.0.1.3/32, age: 0:10:52.013177, last_modified: 2020-Jan-12 13:50:27.356768
[XMPP|ip-172-31-13-153.local] age: 0:10:52.023700, localpref: 200, nh: 172.31.13.153, encap: ['vxlan'], label: 6, AS path: None
--
5-172.31.13.153:5-0-10.0.2.4/32, age: 0:04:22.046385, last_modified: 2020-Jan-12 13:56:57.323560
[XMPP|ip-172-31-13-153.local] age: 0:04:22.057108, localpref: 200, nh: 172.31.13.153, encap: ['vxlan'], label: 8, AS path: None
--
5-172.31.13.153:6-0-10.0.2.4/32, age: 0:00:40.299816, last_modified: 2020-Jan-12 14:00:39.070129
[ServiceChain (service-interface)|None] age: 0:00:40.310798, localpref: 200, nh: 172.31.13.153, encap: ['vxlan'], label: 6, AS path: None
--
5-172.31.13.153:7-0-10.0.1.3/32, age: 0:00:40.299164, last_modified: 2020-Jan-12 14:00:39.070781
[ServiceChain (service-interface)|None] age: 0:00:40.310369, localpref: 200, nh: 172.31.13.153, encap: ['vxlan'], label: 8, AS path: None
--
default-domain:default-project:ip-fabric:ip-fabric.evpn.0: 4 destinations, 4 routes (4 primary, 0 secondary, 0 infeasible)
[root@ip-172-31-13-153 ~]#
vRouter's vrf also will got nh to VNF
[root@ip-172-31-13-153 ~]# ./contrail-introspect-cli/ist.py vr vrf
+--------------------------------------+---------+---------+---------+-----------+----------+--------------------------------------+
| name | ucindex | mcindex | brindex | evpnindex | vxlan_id | vn |
+--------------------------------------+---------+---------+---------+-----------+----------+--------------------------------------+
| default-domain:admin:__contrail_lr_i | 5 | 5 | 5 | 5 | 8 | default-domain:admin:__contrail_lr_i |
| nternal_vn_62651c76-7851-4459-8d54-4 | | | | | | nternal_vn_62651c76-7851-4459-8d54-4 |
| 1b2b1289e21__:__contrail_lr_internal | | | | | | 1b2b1289e21__ |
| _vn_62651c76-7851-4459-8d54-41b2b128 | | | | | | |
| 9e21__ | | | | | | |
| default-domain:admin:__contrail_lr_i | 7 | 7 | 7 | 7 | 0 | N/A |
| nternal_vn_62651c76-7851-4459-8d54-4 | | | | | | |
| 1b2b1289e21__:service-86899929-7419 | | | | | | |
| -427a-9b3f-f8e4a3d990eb-default- | | | | | | |
| domain_admin_lr1-to-lr2 | | | | | | |
| default-domain:admin | 3 | 3 | 3 | 3 | 6 | default-domain:admin |
| :__contrail_lr_internal_vn_7693de7f- | | | | | | :__contrail_lr_internal_vn_7693de7f- |
| 9b96-41de-84af-c6db113132e2__ | | | | | | 9b96-41de-84af-c6db113132e2__ |
| :__contrail_lr_internal_vn_7693de7f- | | | | | | |
| 9b96-41de-84af-c6db113132e2__ | | | | | | |
| default-domain:admin | 6 | 6 | 6 | 6 | 0 | N/A |
| :__contrail_lr_internal_vn_7693de7f- | | | | | | |
| 9b96-41de-84af- | | | | | | |
| c6db113132e2__:service-86899929-7419 | | | | | | |
| -427a-9b3f-f8e4a3d990eb-default- | | | | | | |
| domain_admin_lr1-to-lr2 | | | | | | |
| default-domain:admin:vn1:vn1 | 2 | 2 | 2 | 2 | 5 | default-domain:admin:vn1 |
| default-domain:admin:vn2:vn2 | 4 | 4 | 4 | 4 | 7 | default-domain:admin:vn2 |
| default-domain:default-project:ip- | 0 | 0 | 0 | 0 | 0 | N/A |
| fabric:__default__ | | | | | | |
| default-domain:default-project:ip- | 1 | 1 | 1 | 1 | 2 | default-domain:default-project:ip- |
| fabric:ip-fabric | | | | | | fabric |
+--------------------------------------+---------+---------+---------+-----------+----------+--------------------------------------+
[root@ip-172-31-13-153 ~]#
[root@ip-172-31-13-153 ~]# ./contrail-introspect-cli/ist.py vr route -v 3
0.255.255.252/32
[172.31.13.153] pref:200
to 2:34:66:61:a2:96 via tap346661a2-96, assigned_label:39, nh_index:46 , nh_type:interface, nh_policy:enabled, active_label:39, vxlan_id:0
[LocalVmPort] pref:200
to 2:34:66:61:a2:96 via tap346661a2-96, assigned_label:39, nh_index:46 , nh_type:interface, nh_policy:enabled, active_label:39, vxlan_id:0
10.0.1.3/32
[EVPN-ROUTING] pref:200
to 2:98:88:3c:38:50 via tap98883c38-50, assigned_label:-1, nh_index:34 , nh_type:interface, nh_policy:enabled, active_label:6, vxlan_id:6
10.0.2.4/32
[172.31.13.153] pref:200
to 2:34:66:61:a2:96 via tap346661a2-96, assigned_label:39, nh_index:46 , nh_type:interface, nh_policy:enabled, active_label:39, vxlan_id:0
10.0.11.0/24
[Local] pref:100
nh_index:1 , nh_type:discard, nh_policy:disabled, active_label:-1, vxlan_id:0
10.0.11.1/32
[Local] pref:100
to 0:0:0:0:0:1 via pkt0, assigned_label:-1, nh_index:13 , nh_type:interface, nh_policy:enabled, active_label:-1, vxlan_id:0
10.0.11.2/32
[Local] pref:100
to 0:0:0:0:0:1 via pkt0, assigned_label:-1, nh_index:13 , nh_type:interface, nh_policy:enabled, active_label:-1, vxlan_id:0
10.0.11.3/32
[172.31.13.153] pref:200
to 2:34:66:61:a2:96 via tap346661a2-96, assigned_label:39, nh_index:46 , nh_type:interface, nh_policy:enabled, active_label:39, vxlan_id:0
[LocalVmPort] pref:200
to 2:34:66:61:a2:96 via tap346661a2-96, assigned_label:39, nh_index:46 , nh_type:interface, nh_policy:enabled, active_label:39, vxlan_id:0
169.254.169.254/32
[LinkLocal] pref:100
via vhost0, nh_index:11 , nh_type:receive, nh_policy:enabled, active_label:0, vxlan_id:0
[root@ip-172-31-13-153 ~]#
[root@ip-172-31-13-153 ~]# ./contrail-introspect-cli/ist.py vr route -v 5
0.255.255.251/32
[172.31.13.153] pref:200
to 2:15:37:f5:fa:fb via tap1537f5fa-fb, assigned_label:44, nh_index:51 , nh_type:interface, nh_policy:enabled, active_label:44, vxlan_id:0
[LocalVmPort] pref:200
to 2:15:37:f5:fa:fb via tap1537f5fa-fb, assigned_label:44, nh_index:51 , nh_type:interface, nh_policy:enabled, active_label:44, vxlan_id:0
10.0.1.3/32
[172.31.13.153] pref:200
to 2:15:37:f5:fa:fb via tap1537f5fa-fb, assigned_label:44, nh_index:51 , nh_type:interface, nh_policy:enabled, active_label:44, vxlan_id:0
10.0.2.4/32
[EVPN-ROUTING] pref:200
to 2:19:e0:a2:b:f3 via tap19e0a20b-f3, assigned_label:-1, nh_index:63 , nh_type:interface, nh_policy:enabled, active_label:8, vxlan_id:8
10.0.12.0/24
[Local] pref:100
nh_index:1 , nh_type:discard, nh_policy:disabled, active_label:-1, vxlan_id:0
10.0.12.1/32
[Local] pref:100
to 0:0:0:0:0:1 via pkt0, assigned_label:-1, nh_index:13 , nh_type:interface, nh_policy:enabled, active_label:-1, vxlan_id:0
10.0.12.2/32
[Local] pref:100
to 0:0:0:0:0:1 via pkt0, assigned_label:-1, nh_index:13 , nh_type:interface, nh_policy:enabled, active_label:-1, vxlan_id:0
10.0.12.3/32
[172.31.13.153] pref:100
to 2:15:37:f5:fa:fb via tap1537f5fa-fb, assigned_label:44, nh_index:51 , nh_type:interface, nh_policy:enabled, active_label:44, vxlan_id:0
[LocalVmPort] pref:100
to 2:15:37:f5:fa:fb via tap1537f5fa-fb, assigned_label:44, nh_index:51 , nh_type:interface, nh_policy:enabled, active_label:44, vxlan_id:0
169.254.169.254/32
[LinkLocal] pref:100
via vhost0, nh_index:11 , nh_type:receive, nh_policy:enabled, active_label:0, vxlan_id:0
[root@ip-172-31-13-153 ~]#
[root@ip-172-31-13-153 ~]# ./contrail-introspect-cli/ist.py vr route -v 6
0.255.255.252/32
[172.31.13.153] pref:200
to 2:34:66:61:a2:96 via tap346661a2-96, assigned_label:39, nh_index:46 , nh_type:interface, nh_policy:enabled, active_label:39, vxlan_id:0
10.0.2.4/32
[172.31.13.153] pref:200
to 2:34:66:61:a2:96 via tap346661a2-96, assigned_label:39, nh_index:46 , nh_type:interface, nh_policy:enabled, active_label:39, vxlan_id:0
10.0.11.3/32
[172.31.13.153] pref:200
to 2:34:66:61:a2:96 via tap346661a2-96, assigned_label:39, nh_index:46 , nh_type:interface, nh_policy:enabled, active_label:39, vxlan_id:0
[root@ip-172-31-13-153 ~]#
[root@ip-172-31-13-153 ~]#
[root@ip-172-31-13-153 ~]# ./contrail-introspect-cli/ist.py vr route -v 7
0.255.255.251/32
[172.31.13.153] pref:200
to 2:15:37:f5:fa:fb via tap1537f5fa-fb, assigned_label:44, nh_index:51 , nh_type:interface, nh_policy:enabled, active_label:44, vxlan_id:0
10.0.1.3/32
[172.31.13.153] pref:200
to 2:15:37:f5:fa:fb via tap1537f5fa-fb, assigned_label:44, nh_index:51 , nh_type:interface, nh_policy:enabled, active_label:44, vxlan_id:0
10.0.12.3/32
[172.31.13.153] pref:100
to 2:15:37:f5:fa:fb via tap1537f5fa-fb, assigned_label:44, nh_index:51 , nh_type:interface, nh_policy:enabled, active_label:44, vxlan_id:0
[root@ip-172-31-13-153 ~]#
[root@ip-172-31-13-153 ~]# ./contrail-introspect-cli/ist.py ctr route show --family l3vpn
bgp.l3vpn.0: 9 destinations, 9 routes (0 primary, 9 secondary, 0 infeasible)
172.31.13.153:1:172.31.13.153/32, age: 0:40:32.414715, last_modified: 2020-Jan-12 13:38:26.922346
[XMPP (interface)|ip-172-31-13-153.local] age: 0:40:32.418428, localpref: 100, nh: 172.31.13.153, encap: ['gre', 'udp', 'native'], label: 17, AS path: None
172.31.13.153:2:10.0.1.3/32, age: 0:29:55.551280, last_modified: 2020-Jan-12 13:49:03.785781
[XMPP (interface)|ip-172-31-13-153.local] age: 0:29:55.555402, localpref: 200, nh: 172.31.13.153, encap: ['gre', 'udp'], label: 25, AS path: None
172.31.13.153:3:0.255.255.252/32, age: 0:19:58.759556, last_modified: 2020-Jan-12 13:59:00.577505
[XMPP (service-interface)|ip-172-31-13-153.local] age: 0:19:58.763917, localpref: 200, nh: 172.31.13.153, encap: ['gre', 'udp'], label: 39, AS path: None
172.31.13.153:3:10.0.11.3/32, age: 0:23:22.131030, last_modified: 2020-Jan-12 13:55:37.206031
[XMPP (interface)|ip-172-31-13-153.local] age: 0:23:22.135685, localpref: 200, nh: 172.31.13.153, encap: ['gre', 'udp'], label: 39, AS path: None
172.31.13.153:4:10.0.2.4/32, age: 0:22:02.013695, last_modified: 2020-Jan-12 13:56:57.323366
[XMPP (interface)|ip-172-31-13-153.local] age: 0:22:02.018717, localpref: 200, nh: 172.31.13.153, encap: ['gre', 'udp'], label: 49, AS path: None
172.31.13.153:5:0.255.255.251/32, age: 0:19:58.547299, last_modified: 2020-Jan-12 13:59:00.789762
[XMPP (service-interface)|ip-172-31-13-153.local] age: 0:19:58.552631, localpref: 200, nh: 172.31.13.153, encap: ['gre', 'udp'], label: 44, AS path: None
172.31.13.153:5:10.0.12.3/32, age: 0:23:35.850393, last_modified: 2020-Jan-12 13:55:23.486668
[XMPP (interface)|ip-172-31-13-153.local] age: 0:23:35.856031, localpref: 100, nh: 172.31.13.153, encap: ['gre', 'udp'], label: 44, AS path: None
172.31.13.153:6:10.0.2.4/32, age: 0:08:56.528333, last_modified: 2020-Jan-12 14:10:02.808728
[ServiceChain (service-interface)|None] age: 0:08:56.534255, localpref: 200, nh: 172.31.13.153, encap: ['gre', 'udp'], label: 39, AS path: None
172.31.13.153:7:10.0.1.3/32, age: 0:08:56.527653, last_modified: 2020-Jan-12 14:10:02.809408
[ServiceChain (service-interface)|None] age: 0:08:56.533918, localpref: 200, nh: 172.31.13.153, encap: ['gre', 'udp'], label: 44, AS path: None
[root@ip-172-31-13-153 ~]#
In EVPN T2, there are two flavors vlan-based, and vlan-aware, and they are mutually incompatible.
Tungsten Fabric controller, by default, uses vlan-aware flavor, so their evpn t2 route can't be imported by several datacenter switches, which only supports vlan-based flavor.
Having said that, this patch (and container based on R1912) makes ethernet tag id zero, and it is reported that some switches begin importing T2 route, if this is applied
- https://github.com/tnaganawa/tungstenfabric-docs/blob/master/TungstenFabricKnowledgeBase.md#vlan-base-interop
- https://hub.docker.com/r/tnaganawa/contrail-controller-control-control
Although it has a lot of usecases, NFVI will be one of Tungsten Fabric's most prominent usecase, because of a lot of unique features which makes NFVI implementation software based.
Most well-known feature of this line is service-chain, which is a feature to manage traffic without changing VNF's ip's, which makes realtime insertion and remove of VNF possible.
Since vRouter can have VRFs inside, it can have VRFs at the every interface of VNF, and can make the traffic handled by the fabricated next-hops, to send such as next VNF.
Tungsten Fabric's service-chain is implemented that way, so you will see several VRFs are created once service-chain is created, and next-hop will be inserted to send traffic to the next VNF of the chain.
- VRFs (routing-instance in control's term) are named as domain-name:project-name:virtual-network-name:routing-instance-name. In most cases, virtual-network-name and routing-instance-name is the same, but the service-chain is one exception of this rule
To set up sample service-chain, the procedure in this movie can be followed
After that, you can see left virtual-network has right virtual-network's prefixes with updated next-hop, which is oriented to left interface of VNF, and vice versa for right virtual-network.
Note: When service-chain v2 is used, only 'left' and 'right' interfaces are used for service-chain calculation, and 'management' and 'other' interfaces are omitted, AFAIK
There are a lot of VNFs with different set of traffic type, so NFVI's SDN also needs to support several type of traffic.
For this purpose, Tungsten Fabric service-chain supports three traffic type, namely l2, l3, nat.
l2 service-chain (also known as transparent service-chain) can be used with transparent VNF, which has similar feature with bridge, with sending packets based on the arp response.
Although vRouter always use the same mac address (00:01:00:5e:00:00),
this case is an exception of this rule, and the vRouter at the left of VNF sends traffic with dest mac: 2:0:0:0:0:2, and the vRouter at the right of VNF sends traffic with dest mac 1:0:0:0:0:1. So bridge-type VNF will send traffic to the opposite side of its interfaces.
Let me note that even if l2 vnf is used, the left virtual-network and right virtual-network need to have different subnet. This might be a bit counter intuitive, but since vRouter can do l3 routing, vRouter - L2VNF - vRouter is possible, just like router - L2VNF - router is acceptable.
l3 service-chain (also known as in-network service-chain), on the other hand, will send traffic to the VNF without changing mac address, since in this case, VNF will route packets based on its destination ip (similar behavior with router). Except for the mac address, the behavior is mostly the same with l2 case.
Nat service-chain is similar to l3 service-chain, since it expects VNF to route packets based on destination ip. One big difference is it replicates right virtual-network's prefixes to left virtual-network, but it won't replicate left virtual-network's prefixes to right virtual-network!
- so left / right interfaces need to be chosen carefully, since it's asymmetric in this case
Typical usecase of this flavor of service-chain is VNF's left interface has private ip, and the right interface has global ip, in a case such as SNAT for internet access is performed. Since private ip can't be exported to the internet, in this case, left virtual-network's prefix can't be replicated to right virtual-network.
Service-chain feature also supports ECMP setup for scale out deployment.
- Configuration is mostly the same, but several port-tuple need to be assigned to one service-instance.
After that, you will notice that traffic will be load-balanced based on 5-tuple of the packets.
Multi VNF also can be set, if several service-instances are assigned to one network-policy.
When l3 service-chain is used, although it might be counter intuitive, two VNFs need to be assigned to the same virtual-network.
- Since all the packets from VNFs will be in the separate VRFs for service-chain, they can have the same subnets.
The simultaneous use of l2 and l3 is also supported, although in that case, l2 vnf needs to be assigned to the different virtual-networks, with the one network-policy is attached
- setup example is described in this blog post: https://tungsten.io/building-and-testing-layer2-service-images-for-opencontrail/
BGPaaS is also a bit unique feature of Tungsten Fabric, which is used to insert VRFs' routes in VNFs.
- in a sense, it is a bit similar to AWS's VPN gateway, since it automatically got the routes from VPC's route table
From operational perspective, VNFs in vRouter will have IPV4 bgp peer with vRouter's gateway ip and service ip.
One notable usecase will be to set up ipsec VNFs, which might have a connection to public cloud with VPN gateway. In this case, VPC's route table will be copied to VNFs, and it will be replicated to vRouter's VRF through BGPaaS, so all the prefixes are distributed correctly when subnets are newly added modified in public cloud's VPC.
This is also a feature used in NFVI, so let me mention this here also.
VNF sends tagged packets for various reason. In this case, vRouter can use different VRFs if vlan tags are different.
- similar to subinterfaces in 'set routing-instances routing-interface-name interface xxx' in junos's term
Operation is described there https://www.youtube.com/watch?v=ANhBQe_DS2E
Since it uses MPLS-VPN internally, virtual-networks in Tunsten Fabric can be extended to other Tungsten Fabric clusters.
- it might be a bit surprising, since neutron ML2 plugin or some other CNI won't support this setup, AFAIK
That said, since they have different DBs, shared resources need to be marked between them.
I'll describe the usage of several bgp parameters for this purpose.
Since Tungsten Fabric uses L3VPN for inter-VRF routing, if route-target is correctly set between VRFs, it can route packets.
- Since network-policy / logical-router can not be used between several clusters, route-targets need to be directly configured on each virtual-network.
Note: if l3-only forwarding is specified, even in intra-VRF forwarding, L3VPN is used, so bridging won't be used in that setup.
For bridging between Tungsten Fabric clusters, ERM-VPN is used instead of EVPN.
Since this bgp family creates BUM tree between several control processes, the behavior won't change even if they are between several clusters.
One note, if VXLAN is used as l2 dataplane (To specify this, ENCAP_PRIORITY can be used), to set the same route-target to two virtual-networks is not sufficent, and two virtual-networks need to have the same VNI.
- for MPLS over IP, to set the same route-target will correctly create a BUM tree
Tungsten Fabric also have some extended community to convey security-group id.
Since this id also can be manually configured, you can set the same id to each cluster's security-group, and to allow traffic from that prefix.
Note: As far as I tried, tags' id can't be manually configured from Tungsten Fabric webui in R5.1 branch, so fw-policy can't be used between clusters. This behavior might be changed in future.
DNS is an important subject when dealing with several clusters.
Since Tungsten Fabric have vDNS implementation similar to openstack's default setup, you can resolve vmname in a cluster, and make those names available externally.
- https://github.com/Juniper/contrail-controller/wiki/DNS-and-IPAM
- Controller nodes has a process contrail-named, to respond to the external DNS query
- To enabled this, from Tungsten Fabric webui, Configure > DNS > DNS Server > (create) > External Access need to be checked
So at least when openstack (or vCenter) is used as an orchestrator, and if different clusters have different domain names, it can directly resove the names of other clusters.
- Upstream DNS forwarder need to be able to resolve all the name
When kubernetes is used, Tungsten Fabric use coredns as the source of name resolusion, rather than on its own vDNS. Those IPs and domain names can be changed in kubeadm setting.
cluster0:
kubeadm init --pod-network-cidr=10.32.0.0/24 --service-cidr=10.96.0.0/24
cluster1:
kubeadm init --pod-network-cidr=10.32.1.0/24 --service-cidr=10.96.1.0/24 --service-dns-domain=cluster1.local
cluster1:
# cat /etc/sysconfig/kubelet
-KUBELET_EXTRA_ARGS=
+KUBELET_EXTRA_ARGS="--cluster-dns=10.96.1.10"
# systemctl restart kubelet
Note: When it is configured, Tungsten Fabric setting also need to be changed (set in configmap env)
cluster0:
KUBERNETES_POD_SUBNETS: 10.32.0.0/24
KUBERNETES_IP_FABRIC_SUBNETS: 10.64.0.0/24
KUBERNETES_SERVICE_SUBNETS: 10.96.0.0/24
cluster1:
KUBERNETES_POD_SUBNETS: 10.32.1.0/24
KUBERNETES_IP_FABRIC_SUBNETS: 10.64.1.0/24
KUBERNETES_SERVICE_SUBNETS: 10.96.1.0/24
After setting coredns, it can resolve the name of other clusters (coredns IPs need to be leaked to each other's VRF, since those IPs need to be reachable)
kubectl edit -n kube-system configmap coredns
cluster0:
### add these lines to resolve cluster1 names
cluster1.local:53 {
errors
cache 30
forward . 10.96.1.10
}
cluster1:
### add these lines to resolve cluster0 names
cluster.local:53 {
errors
cache 30
forward . 10.96.0.10
}
So even if you have several separate Tungsten Fabric clusters, it is not too difficult to stitch virtual-networks between them.
To have larger number of nodes than orchestrator currently supports, could be one reason to do so, even though orchestrators like kubernetes, openstack, vCenter support fairly large number of hypervisors.
If traffic is around multi DCs, you need to be a bit careful when planning Tungsten Fabric installation.
There are two options: 1. single cluster, 2. multi clusters.
Single cluster option is simpler and easier to manage, although RTT between DCs could be an issue, since several traffic such as XMPP, rabbitmq, cassandra will go through controllers (Currently, locality support is not available around them)
Multi cluster approach will give a bit more operational complexity, since both clusters have different DBs, you need to manually set some parameters, such as route-targets or security-group ids.
Additionally, vMotion between them also will be much more difficult.
- Even if cross vCenter vMotion is used, since the new vCenter and new Tungsten Fabric cluster will create a new port, it would have different fixed ip with the original one.
- Nova won't support cross openstck live migration currently, so if openstack is used, it is not possible to do live migration between them
Since vCenter requires 150ms RTT between DCs (I couldn't find similar value for KVM), single cluster < 150 msec RTT < multi clusters might be one rule of thumb, although it has to be planned carefully for each specific case.
When single cluster installation is planned, and the number of DCs are two, one thing additionally need to be cared.
Since zookeeper / cassandra in Tungsten Fabric currently use Quorum consistency level, when primary site is down, second site can't keep working. (Both of Read and Write access will be unavaiable)
- https://github.com/Juniper/contrail-controller/blob/master/src/config/common/vnc_cassandra.py#L659 (used by config-api, schema-transformer, svc-monitor, device-manager)
- https://github.com/Juniper/contrail-common/blob/master/config-client-mgr/config_cassandra_client.cc#L458 (used by control, dns)
One possible option to workaround this is to change consistency level to ONE / TWO / THREE or LOCAL_ONE / LOCAL_QUORUM, although it needs rebuild of source code.
Since zookeeper has no such knob, the only way I'm aware of is to update weight, after the primary site is down.
- https://stackoverflow.com/questions/32189618/hierarchical-quorums-in-zookeeper
- Most of the components continue working even if zookeeper is temporary unavaialble, although components it uses that for HA stop working (schema-transformer, svc-monitor, kube-manager, vcenter-plugin, ...)
When number of DCs are over two, this won't be an issue.
One possible reason to have the second DC is to cover Disaster Recovery scenario.
Although Tungsten Fabric cannot cover this directly, since it needs quorum in their backend databases, interestingly, ISSU script can be used for this purpose, since this script will copy all the data with changing contoller's hostname and IP address.
- since module update is not required in this case, container_tag can be the same with the orignal cluster's
So .., setting up second Tungsten Fabric cluster, and doing issu-pre-sync such as once per day, could be a possible choice, if RTO is managable for that environmnent.
Sharing controll plane between several orchestrators do a lot of good thing, including routing/bridging, DNS, security, ..
Let me describe the usage and configuration about each scenario.
kubernetes + openstack combination is already covered and works well.
One additional comment is Tungsten Fabric supports both of nested installation and non-nested installation, so you can choose either option.
To add multiple kubernetes cluster to one Tungsten Fabric could be one installation option.
Since kube-manager supports one parameter cluster_name, which modifies the tenant name which will be created (default is 'k8s'), it is likely that can be OK, although when I tried that last time it doesn't work well, since some objects was deleted by other kube-manager as the stale objects.
This behavior might be changed in future release.
Note:
From R2002 and later, this patch fixed the issue and custom patch is not needed anymore.
Note: Applying these patch, it seems possible to add multiple kube-master to one Tungsten Fabric cluster.
diff --git a/src/container/kube-manager/kube_manager/kube_manager.py b/src/container/kube-manager/kube_manager/kube_manager.py
index 0f6f7a0..adb20a6 100644
--- a/src/container/kube-manager/kube_manager/kube_manager.py
+++ b/src/container/kube-manager/kube_manager/kube_manager.py
@@ -219,10 +219,10 @@ def main(args_str=None, kube_api_skip=False, event_queue=None,
if args.cluster_id:
client_pfx = args.cluster_id + '-'
- zk_path_pfx = args.cluster_id + '/'
+ zk_path_pfx = args.cluster_id + '/' + args.cluster_name
else:
client_pfx = ''
- zk_path_pfx = ''
+ zk_path_pfx = '' + args.cluster_name
# randomize collector list
args.random_collectors = args.collectors
diff --git a/src/container/kube-manager/kube_manager/vnc/vnc_namespace.py b/src/container/kube-manager/kube_manager/vnc/vnc_namespace.py
index 00cce81..f968cae 100644
--- a/src/container/kube-manager/kube_manager/vnc/vnc_namespace.py
+++ b/src/container/kube-manager/kube_manager/vnc/vnc_namespace.py
@@ -594,7 +594,8 @@ class VncNamespace(VncCommon):
self._queue.put(event)
def namespace_timer(self):
- self._sync_namespace_project()
+ # self._sync_namespace_project() ## temporary disabled
+ pass
def _get_namespace_firewall_ingress_rule_name(self, ns_name):
return "-".join([vnc_kube_config.cluster_name(),
Since both kube-masters create pod-networks to the same Tungsten Fabric controller, route-leak between them would be possible :)
- Since cluster_name will be one of the tags in Tungsten Fabric's fw-policy, it also would be possible to use same tags between multiple kubernetes clusters
172.31.9.29 Tungsten Fabric controller
172.31.22.24 kube-master1 (KUBERNETES_CLUSTER_NAME=k8s1 is set)
172.31.12.82 kube-node1 (it belongs to kube-master1)
172.31.41.5 kube-master2(KUBERNETES_CLUSTER_NAME=k8s2 is set)
172.31.4.1 kube-node2 (it belongs to kube-master2)
[root@ip-172-31-22-24 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
ip-172-31-12-82.ap-northeast-1.compute.internal Ready <none> 57m v1.12.3
ip-172-31-22-24.ap-northeast-1.compute.internal NotReady master 58m v1.12.3
[root@ip-172-31-22-24 ~]#
[root@ip-172-31-41-5 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
ip-172-31-4-1.ap-northeast-1.compute.internal Ready <none> 40m v1.12.3
ip-172-31-41-5.ap-northeast-1.compute.internal NotReady master 40m v1.12.3
[root@ip-172-31-41-5 ~]#
[root@ip-172-31-22-24 ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
cirros-deployment-75c98888b9-7pf82 1/1 Running 0 28m 10.47.255.249 ip-172-31-12-82.ap-northeast-1.compute.internal <none>
cirros-deployment-75c98888b9-sgrc6 1/1 Running 0 28m 10.47.255.250 ip-172-31-12-82.ap-northeast-1.compute.internal <none>
cirros-vn1 1/1 Running 0 7m56s 10.0.1.3 ip-172-31-12-82.ap-northeast-1.compute.internal <none>
[root@ip-172-31-22-24 ~]#
[root@ip-172-31-41-5 ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
cirros-deployment-75c98888b9-5lqzc 1/1 Running 0 27m 10.47.255.250 ip-172-31-4-1.ap-northeast-1.compute.internal <none>
cirros-deployment-75c98888b9-dg8bf 1/1 Running 0 27m 10.47.255.249 ip-172-31-4-1.ap-northeast-1.compute.internal <none>
cirros-vn2 1/1 Running 0 5m36s 10.0.2.3 ip-172-31-4-1.ap-northeast-1.compute.internal <none>
[root@ip-172-31-41-5 ~]#
/ # ping 10.0.2.3
PING 10.0.2.3 (10.0.2.3): 56 data bytes
64 bytes from 10.0.2.3: seq=83 ttl=63 time=1.333 ms
64 bytes from 10.0.2.3: seq=84 ttl=63 time=0.327 ms
64 bytes from 10.0.2.3: seq=85 ttl=63 time=0.319 ms
64 bytes from 10.0.2.3: seq=86 ttl=63 time=0.325 ms
^C
--- 10.0.2.3 ping statistics ---
87 packets transmitted, 4 packets received, 95% packet loss
round-trip min/avg/max = 0.319/0.576/1.333 ms
/ #
/ # ip -o a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000\ link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
18: eth0@if19: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue \ link/ether 02:b9:11:c9:4c:b1 brd ff:ff:ff:ff:ff:ff
18: eth0 inet 10.0.1.3/24 scope global eth0\ valid_lft forever preferred_lft forever
/ #
-> ping between pods, which belong to different kubernetes clusters, worked well
[root@ip-172-31-9-29 ~]# ./contrail-introspect-cli/ist.py ctr route show -t default-domain:k8s1-default:vn1:vn1.inet.0
default-domain:k8s1-default:vn1:vn1.inet.0: 2 destinations, 2 routes (1 primary, 1 secondary, 0 infeasible)
10.0.1.3/32, age: 0:06:50.001343, last_modified: 2019-Jul-28 18:23:08.243656
[XMPP (interface)|ip-172-31-12-82.local] age: 0:06:50.005553, localpref: 200, nh: 172.31.12.82, encap: ['gre', 'udp'], label: 50, AS path: None
10.0.2.3/32, age: 0:02:25.188713, last_modified: 2019-Jul-28 18:27:33.056286
[XMPP (interface)|ip-172-31-4-1.local] age: 0:02:25.193517, localpref: 200, nh: 172.31.4.1, encap: ['gre', 'udp'], label: 50, AS path: None
[root@ip-172-31-9-29 ~]#
[root@ip-172-31-9-29 ~]# ./contrail-introspect-cli/ist.py ctr route show -t default-domain:k8s2-default:vn2:vn2.inet.0
default-domain:k8s2-default:vn2:vn2.inet.0: 2 destinations, 2 routes (1 primary, 1 secondary, 0 infeasible)
10.0.1.3/32, age: 0:02:36.482764, last_modified: 2019-Jul-28 18:27:33.055702
[XMPP (interface)|ip-172-31-12-82.local] age: 0:02:36.489419, localpref: 200, nh: 172.31.12.82, encap: ['gre', 'udp'], label: 50, AS path: None
10.0.2.3/32, age: 0:04:37.126317, last_modified: 2019-Jul-28 18:25:32.412149
[XMPP (interface)|ip-172-31-4-1.local] age: 0:04:37.133912, localpref: 200, nh: 172.31.4.1, encap: ['gre', 'udp'], label: 50, AS path: None
[root@ip-172-31-9-29 ~]#
-> each virtual-network in each kube-master has a route to other kube-master's pod, based on network-policy below
(venv) [root@ip-172-31-9-29 ~]# contrail-api-cli --host 172.31.9.29 ls -l virtual-network
virtual-network/f9d06d27-8fc1-413d-a6d6-c51c42191ac0 default-domain:k8s2-default:vn2
virtual-network/384fb3ef-247b-42e6-a628-7111fe343f90 default-domain:k8s2-default:k8s2-default-service-network
virtual-network/c3098210-983b-46bc-b750-d06acfc66414 default-domain:k8s1-default:k8s1-default-pod-network
virtual-network/1ff6fdbd-ac2e-4601-b08c-5f7255466312 default-domain:default-project:ip-fabric
virtual-network/d8d95738-0a00-457f-b21e-60304859d1f9 default-domain:k8s2-default:k8s2-default-pod-network
virtual-network/0c075b76-4219-4f79-a4f5-1b4e6729f16e default-domain:k8s1-default:k8s1-default-service-network
virtual-network/985b3b5f-84b7-4810-a54d-abd09a37f525 default-domain:k8s1-default:vn1
virtual-network/23782ea7-4000-491f-b20d-01c6ab9e2ba8 default-domain:default-project:default-virtual-network
virtual-network/90cce352-ef9b-4358-81b3-ef87a9cb63e8 default-domain:default-project:__link_local__
virtual-network/0292810c-c511-4147-89c0-9fdd571ccce8 default-domain:default-project:dci-network
(venv) [root@ip-172-31-9-29 ~]#
(venv) [root@ip-172-31-9-29 ~]# contrail-api-cli --host 172.31.9.29 ls -l network-policy
network-policy/134d38b2-79e2-4a3e-a2f7-a3d61ceaf5e2 default-domain:k8s1-default:vn1-to-vn2 <-- route-leak between to kubernetes cluster
network-policy/8e5c5c4a-14eb-4fc4-9b46-81a5b923bbe0 default-domain:k8s1-default:k8s1-default-ip-fabric-np
network-policy/544d5076-3dff-45a1-bb47-8aec5e1e5a37 default-domain:k8s1-default:k8s1-default-pod-service-np
network-policy/33884d88-6492-4e0f-934c-080a794ce132 default-domain:k8s2-default:k8s2-default-ip-fabric-np
network-policy/232beb43-2008-4df3-969f-a4eee653ff46 default-domain:k8s2-default:k8s2-default-pod-service-np
network-policy/a6ee02bd-ad0d-4393-be60-66da8032237a default-domain:k8s2-default:k8s2-default-service-np
network-policy/a9cedd67-127a-40fd-9f44-78890dc3cfe4 default-domain:k8s1-default:k8s1-default-service-np
(venv) [root@ip-172-31-9-29 ~]#
I haven't yet tried to add two openstack clusters to one Tungsten Fabric controller, but it might be possible if they don't use same tenant name.
Kubernetes and vCenter combination could be used simultaneously. Usecase is similar to kubernetes+openstack.
Openstack and vCenter combination is a bit curious, since openstack dashboard might be used as the management UI for vCenter network.
As far as I tried, vcenter-plugin checks all the virtual-networks under all avaiable tenants, rather than virtual-networks under 'vCenter' tenant, so if virtual-network or other neutron components are created, that also can be available at vRouterVM on ESXi. With this setup, vCenter users can implement network function by themselves, just like their using EC2 / VPC.
- They can also use permission feature of vCenter, to implement pseudo multi-tenancy of VMI and NF.
- https://docs.vmware.com/en/VMware-vSphere/6.5/com.vmware.vsphere.security.doc/GUID-4B47F690-72E7-4861-A299-9195B9C52E71.html
Multi vCenter is an important subject, since vCenter has well defined configuration maximums and multi vCenter installation is a common way to work around them.
Simplest setting in this case is to configure multi Tungsten Fabric cluters per vCenter, but in that case it will be diffcult to do vMotion between two clusters, since Tungsten Fabric create a new port after vMotion finished, and might assign different fixed ip.
So I think assigning several vCenters to one Tungsten Fabric cluster would have legitimate usecase.
As far as I tried, in current implementation, since vcenter-plugin uses only 'vCenter' tenant for some objects, it is not possible to use two vcenter-plugin simultaneously, without some code modification.
If tenants can be modified per vcenter-plugin and vcenter-manager, it might be possible to assign each vCenter a separate tenant, and use them simultaneously, just like use kubernetes and openstack simultaneously.
If this were available, it also will be possible to use service-insertion and physical switch extenstion with multi-vCenter environment.
- Even SRM integration also might be on that way, since place holder VM will assign a new port, which can be editted to assign correct fixed ip
I don't know if this configuration will be ever used, since kubernetes / openstack / vCenter have some feature overlap, although it would work well if set up.
vRouter has a feature to use DPDK to interact with physical NIC.
It will be frequently used for NFV type deployment, since it is still not easy to have forwarding performance comparable to typical VNF (which itself might use DPDK or similar technology), based pure linux kernel networking stack.
To enable this feature with ansible-deployer, those parameters need to be set.
bms1:
roles:
vrouter:
AGENT_MODE: dpdk
CPU_CORE_MASK: “0xe” ## coremask for forwarding core (Note: please don't include first core in numa to reach optimal performance :( )
SERVICE_CORE_MASK: “0x1” ## this is for non-forwarding thread, so isolcpu for this core is not needed
DPDK_CTRL_THREAD_MASK: “0x1” ## same as SERVICE_CORE_MASK
DPDK_UIO_DRIVER: uio_pci_generic ## uio driver name
HUGE_PAGES: 16000 ## number of 2MB hugepages, it can be smaller
Note: SERVICE_CORE_MASK and DPDK_CTRL_THREAD_MASK can be used from R2003 and later. https://github.com/Juniper/contrail-specs/blob/master/full_cpu_partitioning_support.md
When AGENT_MODE: dpdk is set, ansible-deployer will install some containers such as vrouter-dpdk, which is a process to run PMD against physical NIC, so in that case, forwarding from vRouter to physical NIC will be based on DPDK.
Note:
- Since vRouter is linked to limited number of PMDs, to use some specific NIC, vRouter rebuild might be needed
- For some NIC such as XL710, uio_pci_generic can't be used. In that case, vfio-pci need to be used instead
Since in that case, vRouter's forwarding plane is not in kernel space, tap device can't be used to get the packets from VMs. For this purpose, QEMU has a feature 'vhostuser', to send packets to dpdk process in user space. When vRouter is configured with AGENT_MODE: dpdk, nova-vif-driver automatically create vhostuser vif, rather than tap vif, which is used for kernel vRouter
- From VM side, it still looks like virtio, so usual virtio driver can be used to communicate with DPDK vRouter.
One caveat is that when QEMU will be connected to vhostuser interface, qemu also need to have hugepage for that. When openstack is used, this knob will assign hugepage to each VM.
openstack flavor set flavorname --property hw:mem_page_size=large
- hw:mem_page_size=2MB, hw:mem_page_size=1GB also can be used
To reach optimal performance, there are a lot of tuning parameters, both in kernel and dpdk process itself. From kernel side, for me, these two articles are most helpful.
- https://www.redhat.com/en/blog/tuning-zero-packet-loss-red-hat-openstack-platform-part-1
- https://www.redhat.com/en/blog/going-full-deterministic-using-real-time-openstack
- cat /proc/sched_debug also can be used to see if core isolation is working well
From vRouter side, this point might need to be taken care of.
- vRouter will use core load-balance based on 5-tuple, so for optimal performance, number of flows might need to be increased
Note: When vrouter-dpdk is used, to use untagged packet might lead to some more throughput (it means to provision vrouter-dpdk without --vlan_tci)
istio is working well, multicluster could be interesting subject