URGENT UPDATE 11/18/2015
Please, view https://github.com/beekhof/osp-ha-deploy/commit/b2e01e86ca93cfad9ad01d533b386b4c9607c60d
It looks as work in progress.
See also https://www.redhat.com/archives/rdo-list/2015-November/msg00168.html
END UPDATE
Actually, setup bellow follows closely https://github.com/beekhof/osp-ha-deploy/blob/master/HA-keepalived.md
As far as to my knowledge Cisco's schema has been implemented :-
Keepalived, HAProxy,Galera for MySQL Manual install, at least 3 controller nodes. I just highlighted several steps which as I believe allowed me to bring this work to success. Javier is using flat external network provider for Controllers cluster disabling from the same start NetworkManager && enabling service network, there is one step which i was unable to skip. It's disabling IP's of eth0's interfaces && restarting network service right before running `ovs-vsctl add-port br-eth0 eth0` per Neutron building instructions of mentioned "Howto", which seems to be one of the best I've ever seen.
I (just) guess that due this sequence of steps even on already been built and seems to run OK three nodes Controller Cluster external network is still ping able :-
However, would i disable eth0's IPs from the start i would lost connectivity right away switching to network service from NetworkManager . In general, external network is supposed to be ping able from qrouter namespace due to Neutron router's DNAT/SNAT IPtables forwarding, but not from Controller . I am also aware of that when Ethernet interface becomes an OVS port of OVS bridge it's IP is supposed to be suppressed. When external network provider is not used , then br-ex gets any IP available IP on external network. Using external network provider changes situation. Details may be seen here :-
https://www.linux.com/community/blogs/133-general-linux/858156-multiple-external-networks-with-a-single-l3-agent-testing-on-rdo-liberty-per-lars-kellogg-stedman
[root@hacontroller1 ~(keystone_admin)]# systemctl status NetworkManager
NetworkManager.service - Network Manager
Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; disabled)
Active: inactive (dead)
[root@hacontroller1 ~(keystone_admin)]# systemctl status network
network.service - LSB: Bring up/down networking
Loaded: loaded (/etc/rc.d/init.d/network)
Active: active (exited) since Wed 2015-11-18 08:36:53 MSK; 2h 10min ago
Process: 708 ExecStart=/etc/rc.d/init.d/network start (code=exited, status=0/SUCCESS)
Nov 18 08:36:47 hacontroller1.example.com network[708]: Bringing up loopback interface: [ OK ]
Nov 18 08:36:51 hacontroller1.example.com network[708]: Bringing up interface eth0: [ OK ]
Nov 18 08:36:53 hacontroller1.example.com network[708]: Bringing up interface eth1: [ OK ]
Nov 18 08:36:53 hacontroller1.example.com systemd[1]: Started LSB: Bring up/down networking.
[root@hacontroller1 ~(keystone_admin)]# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet6 fe80::5054:ff:fe6d:926a prefixlen 64 scopeid 0x20<link>
ether 52:54:00:6d:92:6a txqueuelen 1000 (Ethernet)
RX packets 5036 bytes 730778 (713.6 KiB)
RX errors 0 dropped 12 overruns 0 frame 0
TX packets 15715 bytes 930045 (908.2 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.169.142.221 netmask 255.255.255.0 broadcast 192.169.142.255
inet6 fe80::5054:ff:fe5e:9644 prefixlen 64 scopeid 0x20<link>
ether 52:54:00:5e:96:44 txqueuelen 1000 (Ethernet)
RX packets 1828396 bytes 283908183 (270.7 MiB)
RX errors 0 dropped 13 overruns 0 frame 0
TX packets 1839312 bytes 282429736 (269.3 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 0 (Local Loopback)
RX packets 869067 bytes 69567890 (66.3 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 869067 bytes 69567890 (66.3 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[root@hacontroller1 ~(keystone_admin)]# ping -c 3 10.10.10.1
PING 10.10.10.1 (10.10.10.1) 56(84) bytes of data.
64 bytes from 10.10.10.1: icmp_seq=1 ttl=64 time=2.04 ms
64 bytes from 10.10.10.1: icmp_seq=2 ttl=64 time=0.103 ms
64 bytes from 10.10.10.1: icmp_seq=3 ttl=64 time=0.118 ms
--- 10.10.10.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.103/0.754/2.043/0.911 ms
Both mgmt and external networks emulated by corresponging Libvirt Networks
on F23 Virtualization Server. Total four VMs been setup , 3 of them for Controller
nodes and one for compute (4 VCPUS, 4 GB RAM)
[root@fedora23wks ~]# cat openstackvms.xml ( for eth1's)
<network>
<name>openstackvms</name>
<uuid>d0e9964a-f91a-40c0-b769-a609aee41bf2</uuid>
<forward mode='nat'>
<nat>
<port start='1024' end='65535'/>
</nat>
</forward>
<bridge name='virbr1' stp='on' delay='0' />
<mac address='52:54:00:60:f8:6d'/>
<ip address='192.169.142.1' netmask='255.255.255.0'>
<dhcp>
<range start='192.169.142.2' end='192.169.142.254' />
</dhcp>
</ip>
</network>
[root@fedora23wks ~]# cat public.xml ( for external network provider )
<network>
<name>public</name>
<uuid>d0e9965b-f92c-40c1-b749-b609aed42cf2</uuid>
<forward mode='nat'>
<nat>
<port start='1024' end='65535'/>
</nat>
</forward>
<bridge name='virbr2' stp='on' delay='0' />
<mac address='52:54:00:60:f8:6d'/>
<ip address='10.10.10.1' netmask='255.255.255.0'>
<dhcp>
<range start='10.10.10.2' end='10.10.10.254' />
</dhcp>
</ip>
</network>
Only one file is bit different on Controller Nodes , it is l3_agent.ini
[root@hacontroller1 neutron(keystone_demo)]# cat l3_agent.ini | grep -v ^# | grep -v ^$
[DEFAULT]
interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver
handle_internal_only_routers = True
send_arp_for_ha = 3
metadata_ip = controller-vip.example.com
external_network_bridge =
gateway_external_network_id =
[AGENT]
When "external_network_bridge = " , Neutron places the external interface of the router into the OVS bridge specified by the "provider_network" provider attribute in the Neutron network. Traffic is processed by Open vSwitch flow rules. In this configuration it is possible to utilize flat and VLAN provider networks.
*************************************************************************************
Due to posted "UPDATE" on the top of the blog entry in meantime
perfect solution is provided by https://github.com/beekhof/osp-ha-deploy/commit/b2e01e86ca93cfad9ad01d533b386b4c9607c60d
Per mentioned patch, assuming eth0 is your interface attached to the external network, create two files in /etc/sysconfig/network-scripts/ as follows (change MTU if you need):
cat <<EOF > /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
ONBOOT=yes
DEVICETYPE=ovs
TYPE=OVSPort
OVS_BRIDGE=br-eth0
ONBOOT=yes
BOOTPROTO=none
VLAN=yes
MTU="9000"
NM_CONTROLLED=no
EOF
cat <<EOF > /etc/sysconfig/network-scripts/ifcfg-br-eth0
DEVICE=br-eth0
DEVICETYPE=ovs
OVSBOOTPROTO=none
TYPE=OVSBridge
ONBOOT=yes
BOOTPROTO=static
MTU="9000"
NM_CONTROLLED=no
EOF
Restart the network for the changes to take effect.
systemctl restart network
The commit has been done on 11/14/2015 right after discussion at RDO mailing list.
Details maiby seen here Nova and Neutron work-flow && CLI for HAProxy/Keepalived 3 Node Controller RDO Liberty
*************************************************************************************
One more step which I did ( not sure that is really has
to be done at this point of time ).IP's on eth0's interfaces were disabled just before
running `ovs-vsctl add-port br-eth0 eth0`:-
1. Updated ifcfg-eth0 files on all Controllers
2. `service network restart` on all Controllers
3. `ovs-vsctl add-port br-eth0 eth0`on all Controllers
*****************************************************************************************
Targeting just POC ( to get floating ips accessible from Fedora 23 Virtualization
host ) resulted Controllers Cluster setup:-
*****************************************************************************************
I installed only
Keystone
**************************
UPDATE to official docs
**************************
[root@hacontroller1 ~(keystone_admin)]# cat keystonerc_admin
export OS_USERNAME=admin
export OS_TENANT_NAME=admin
export OS_PROJECT_NAME=admin
export OS_REGION_NAME=regionOne
export OS_PASSWORD=keystonetest
export OS_AUTH_URL=http://controller-vip.example.com:35357/v2.0/
export OS_SERVICE_ENDPOINT=http://controller-vip.example.com:35357/v2.0
export OS_SERVICE_TOKEN=$(cat /root/keystone_service_token)
export PS1='[\u@\h \W(keystone_admin)]\$ '
Glance
Neutron
Nova
Horizon
Due to running Galera Synchronous MultiMaster Replication between Controllers each commands like :-
# su keystone -s /bin/sh -c "keystone-manage db_sync"
# su glance -s /bin/sh -c "glance-manage db_sync"
# neutron-db-manage --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugin.ini upgrade head
# su nova -s /bin/sh -c "nova-manage db sync"
are supposed to run just once from Conroller node 1 ( for instance )
************************
Compute Node setup:-
*************************
Compute setup
**********************
On all nodes
**********************
[root@hacontroller1 neutron(keystone_demo)]# cat /etc/hosts
192.169.142.220 controller-vip.example.com controller-vip
192.169.142.221 hacontroller1.example.com hacontroller1
192.169.142.222 hacontroller2.example.com hacontroller2
192.169.142.223 hacontroller3.example.com hacontroller3
192.169.142.224 compute.example.con compute
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
[root@hacontroller1 ~(keystone_admin)]# cat /etc/neutron/neutron.conf | grep -v ^$| grep -v ^#
[DEFAULT]
bind_host = 192.169.142.22(X)
auth_strategy = keystone
notification_driver = neutron.openstack.common.notifier.rpc_notifier
notify_nova_on_port_status_changes = True
notify_nova_on_port_data_changes = True
core_plugin = neutron.plugins.ml2.plugin.Ml2Plugin
service_plugins = router,lbaas
router_scheduler_driver = neutron.scheduler.l3_agent_scheduler.ChanceScheduler
dhcp_agents_per_network = 2
api_workers = 2
rpc_workers = 2
l3_ha = True
min_l3_agents_per_router = 2
max_l3_agents_per_router = 2
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
[keystone_authtoken]
auth_uri = http://controller-vip.example.com:5000/
identity_uri = http://127.0.0.1:5000
admin_tenant_name = %SERVICE_TENANT_NAME%
admin_user = %SERVICE_USER%
admin_password = %SERVICE_PASSWORD%
auth_plugin = password
auth_url = http://controller-vip.example.com:35357/
username = neutron
password = neutrontest
project_name = services
[database]
connection = mysql://neutron:neutrontest@controller-vip.example.com:3306/neutron
max_retries = -1
[nova]
nova_region_name = regionOne
project_domain_id = default
project_name = services
user_domain_id = default
password = novatest
username = compute
auth_url = http://controller-vip.example.com:35357/
auth_plugin = password
[oslo_concurrency]
[oslo_policy]
[oslo_messaging_amqp]
[oslo_messaging_qpid]
[oslo_messaging_rabbit]
rabbit_hosts = hacontroller1,hacontroller2,hacontroller3
rabbit_ha_queues = true
[qos]
[root@hacontroller1 haproxy(keystone_demo)]# cat haproxy.cfg
global
daemon
stats socket /var/lib/haproxy/stats
defaults
mode tcp
maxconn 10000
timeout connect 5s
timeout client 30s
timeout server 30s
listen monitor
bind 192.169.142.220:9300
mode http
monitor-uri /status
stats enable
stats uri /admin
stats realm Haproxy\ Statistics
stats auth root:redhat
stats refresh 5s
frontend vip-db
bind 192.169.142.220:3306
timeout client 90m
default_backend db-vms-galera
backend db-vms-galera
option httpchk
stick-table type ip size 1000
stick on dst
timeout server 90m
server rhos8-node1 192.169.142.221:3306 check inter 1s port 9200 backup on-marked-down shutdown-sessions
server rhos8-node2 192.169.142.222:3306 check inter 1s port 9200 backup on-marked-down shutdown-sessions
server rhos8-node3 192.169.142.223:3306 check inter 1s port 9200 backup on-marked-down shutdown-sessions
# Note the RabbitMQ entry is only needed for CloudForms compatibility
# and should be removed in the future
frontend vip-rabbitmq
option clitcpka
bind 192.169.142.220:5672
timeout client 900m
default_backend rabbitmq-vms
backend rabbitmq-vms
option srvtcpka
balance roundrobin
timeout server 900m
server rhos8-node1 192.169.142.221:5672 check inter 1s
server rhos8-node2 192.169.142.222:5672 check inter 1s
server rhos8-node3 192.169.142.223:5672 check inter 1s
frontend vip-keystone-admin
bind 192.169.142.220:35357
default_backend keystone-admin-vms
timeout client 600s
backend keystone-admin-vms
balance roundrobin
timeout server 600s
server rhos8-node1 192.169.142.221:35357 check inter 1s on-marked-down shutdown-sessions
server rhos8-node2 192.169.142.222:35357 check inter 1s on-marked-down shutdown-sessions
server rhos8-node3 192.169.142.223:35357 check inter 1s on-marked-down shutdown-sessions
frontend vip-keystone-public
bind 192.169.142.220:5000
default_backend keystone-public-vms
timeout client 600s
backend keystone-public-vms
balance roundrobin
timeout server 600s
server rhos8-node1 192.169.142.221:5000 check inter 1s on-marked-down shutdown-sessions
server rhos8-node2 192.169.142.222:5000 check inter 1s on-marked-down shutdown-sessions
server rhos8-node3 192.169.142.223:5000 check inter 1s on-marked-down shutdown-sessions
frontend vip-glance-api
bind 192.169.142.220:9191
default_backend glance-api-vms
backend glance-api-vms
balance roundrobin
server rhos8-node1 192.169.142.221:9191 check inter 1s
server rhos8-node2 192.169.142.222:9191 check inter 1s
server rhos8-node3 192.169.142.223:9191 check inter 1s
frontend vip-glance-registry
bind 192.169.142.220:9292
default_backend glance-registry-vms
backend glance-registry-vms
balance roundrobin
server rhos8-node1 192.169.142.221:9292 check inter 1s
server rhos8-node2 192.169.142.222:9292 check inter 1s
server rhos8-node3 192.169.142.223:9292 check inter 1s
frontend vip-cinder
bind 192.169.142.220:8776
default_backend cinder-vms
backend cinder-vms
balance roundrobin
server rhos8-node1 192.169.142.221:8776 check inter 1s
server rhos8-node2 192.169.142.222:8776 check inter 1s
server rhos8-node3 192.169.142.223:8776 check inter 1s
frontend vip-swift
bind 192.169.142.220:8080
default_backend swift-vms
backend swift-vms
balance roundrobin
server rhos8-node1 192.169.142.221:8080 check inter 1s
server rhos8-node2 192.169.142.222:8080 check inter 1s
server rhos8-node3 192.169.142.223:8080 check inter 1s
frontend vip-neutron
bind 192.169.142.220:9696
default_backend neutron-vms
backend neutron-vms
balance roundrobin
server rhos8-node1 192.169.142.221:9696 check inter 1s
server rhos8-node2 192.169.142.222:9696 check inter 1s
server rhos8-node3 192.169.142.223:9696 check inter 1s
frontend vip-nova-vnc-novncproxy
bind 192.169.142.220:6080
default_backend nova-vnc-novncproxy-vms
backend nova-vnc-novncproxy-vms
balance roundrobin
timeout tunnel 1h
server rhos8-node1 192.169.142.221:6080 check inter 1s
server rhos8-node2 192.169.142.222:6080 check inter 1s
server rhos8-node3 192.169.142.223:6080 check inter 1s
frontend nova-metadata-vms
bind 192.169.142.220:8775
default_backend nova-metadata-vms
backend nova-metadata-vms
balance roundrobin
server rhos8-node1 192.169.142.221:8775 check inter 1s
server rhos8-node2 192.169.142.222:8775 check inter 1s
server rhos8-node3 192.169.142.223:8775 check inter 1s
frontend vip-nova-api
bind 192.169.142.220:8774
default_backend nova-api-vms
backend nova-api-vms
balance roundrobin
server rhos8-node1 192.169.142.221:8774 check inter 1s
server rhos8-node2 192.169.142.222:8774 check inter 1s
server rhos8-node3 192.169.142.223:8774 check inter 1s
frontend vip-horizon
bind 192.169.142.220:80
timeout client 180s
default_backend horizon-vms
backend horizon-vms
balance roundrobin
timeout server 180s
mode http
cookie SERVERID insert indirect nocache
server rhos8-node1 192.169.142.221:80 check inter 1s cookie rhos8-horizon1 on-marked-down shutdown-sessions
server rhos8-node2 192.169.142.222:80 check inter 1s cookie rhos8-horizon2 on-marked-down shutdown-sessions
server rhos8-node3 192.169.142.223:80 check inter 1s cookie rhos8-horizon3 on-marked-down shutdown-sessions
frontend vip-heat-cfn
bind 192.169.142.220:8000
default_backend heat-cfn-vms
backend heat-cfn-vms
balance roundrobin
server rhos8-node1 192.169.142.221:8000 check inter 1s
server rhos8-node2 192.169.142.222:8000 check inter 1s
server rhos8-node3 192.169.142.223:8000 check inter 1s
frontend vip-heat-cloudw
bind 192.169.142.220:8003
default_backend heat-cloudw-vms
backend heat-cloudw-vms
balance roundrobin
server rhos8-node1 192.169.142.221:8003 check inter 1s
server rhos8-node2 192.169.142.222:8003 check inter 1s
server rhos8-node3 192.169.142.223:8003 check inter 1s
frontend vip-heat-srv
bind 192.169.142.220:8004
default_backend heat-srv-vms
backend heat-srv-vms
balance roundrobin
server rhos8-node1 192.169.142.221:8004 check inter 1s
server rhos8-node2 192.169.142.222:8004 check inter 1s
server rhos8-node3 192.169.142.223:8004 check inter 1s
frontend vip-ceilometer
bind 192.169.142.220:8777
timeout client 90s
default_backend ceilometer-vms
backend ceilometer-vms
balance roundrobin
timeout server 90s
server rhos8-node1 192.169.142.221:8777 check inter 1s
server rhos8-node2 192.169.142.222:8777 check inter 1s
server rhos8-node3 192.169.142.223:8777 check inter 1s
frontend vip-sahara
bind 192.169.142.220:8386
default_backend sahara-vms
backend sahara-vms
balance roundrobin
server rhos8-node1 192.169.142.221:8386 check inter 1s
server rhos8-node2 192.169.142.222:8386 check inter 1s
server rhos8-node3 192.169.142.223:8386 check inter 1s
frontend vip-trove
bind 192.169.142.220:8779
default_backend trove-vms
backend trove-vms
balance roundrobin
server rhos8-node1 192.169.142.221:8779 check inter 1s
server rhos8-node2 192.169.142.222:8779 check inter 1s
server rhos8-node3 192.169.142.223:8779 check inter 1s
[root@hacontroller1 ~(keystone_demo)]# cat /etc/my.cnf.d/galera.cnf
[mysqld]
skip-name-resolve=1
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
max_connections=8192
query_cache_size=0
query_cache_type=0
bind_address=192.169.142.22(X)
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_name="galera_cluster"
wsrep_cluster_address="gcomm://192.169.142.221,192.169.142.222,192.169.142.223"
wsrep_slave_threads=1
wsrep_certify_nonPK=1
wsrep_max_ws_rows=131072
wsrep_max_ws_size=1073741824
wsrep_debug=0
wsrep_convert_LOCK_to_trx=0
wsrep_retry_autocommit=1
wsrep_auto_increment_control=1
wsrep_drupal_282555_workaround=0
wsrep_causal_reads=0
wsrep_notify_cmd=
wsrep_sst_method=rsync
[root@hacontroller1 ~(keystone_demo)]# cat /etc/keepalived/keepalived.conf
vrrp_script chk_haproxy {
script "/usr/bin/killall -0 haproxy"
interval 2
}
vrrp_instance VI_PUBLIC {
interface eth1
state BACKUP
virtual_router_id 52
priority 101
virtual_ipaddress {
192.169.142.220 dev eth1
}
track_script {
chk_haproxy
}
# Avoid failback
nopreempt
}
vrrp_sync_group VG1
group {
VI_PUBLIC
}
*************************************************************************
The most difficult procedure is re-syncing Galera Mariadb cluster
*************************************************************************
https://github.com/beekhof/osp-ha-deploy/blob/master/keepalived/galera-bootstrap.md
Due to nova services start not waiting for getting in sync Galera databases, after sync is done and regardless systemctl reports that service are up and running, database update by `openstack-service restart nova` is required on every Controller. Also the most suspicious reason for failure access Nova metadata Server by starting VMs is failure to start neutron-l3-agent service on each Controller due to classical design - VM's access metadata via neutron-ns-metadata-proxy running in qrouter namespace. neutron-l3-agents may be started with no problems, some times just restarted when needed.
*****************************************
Creating Neutron Router via CLI.
*****************************************
[root@hacontroller1 ~(keystone_admin)]# cat keystonerc_admin
export OS_USERNAME=admin
export OS_TENANT_NAME=admin
export OS_PROJECT_NAME=admin
export OS_REGION_NAME=regionOne
export OS_PASSWORD=keystonetest
export OS_AUTH_URL=http://controller-vip.example.com:35357/v2.0/
export OS_SERVICE_ENDPOINT=http://controller-vip.example.com:35357/v2.0
export OS_SERVICE_TOKEN=$(cat /root/keystone_service_token)
export PS1='[\u@\h \W(keystone_admin)]\$ '
[root@hacontroller1 ~(keystone_admin)]# keystone tenant-list
+----------------------------------+----------+---------+
| id | name | enabled |
+----------------------------------+----------+---------+
| acdc927b53bd43ae9a7ed657d1309884 | admin | True |
| 7db0aa013d60434996585c4ee359f512 | demo | True |
| 9d8bf126d54e4d11a109bd009f54a87f | services | True |
+----------------------------------+----------+---------+
[root@hacontroller1 ~(keystone_admin)]# neutron router-create --ha True --tenant-id 7db0aa013d60434996585c4ee359f512 RouterDS
Created a new router:
+-----------------------+--------------------------------------+
| Field | Value |
+-----------------------+--------------------------------------+
| admin_state_up | True |
| distributed | False |
| external_gateway_info | |
| ha | True |
| id | fdf540d2-c128-4677-b403-d71c796d7e18 |
| name | RouterDS |
| routes | |
| status | ACTIVE |
| tenant_id | 7db0aa013d60434996585c4ee359f512 |
+-----------------------+--------------------------------------+
RUN Time Snapshots. Keepalived status on Controller's nodes
HA Neutron router belonging tenant demo create via Neutron CLI
***********************************************************************
At this point hacontroller1 goes down. On hacontroller2 run :-
***********************************************************************
root@hacontroller2 ~(keystone_admin)]# neutron l3-agent-list-hosting-router RouterHA
+--------------------------------------+---------------------------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+---------------------------+----------------+-------+----------+
| a03409d2-fbe9-492c-a954-e1bdf7627491 | hacontroller2.example.com | True | :-) | active |
| 0d6e658a-e796-4cff-962f-06e455fce02f | hacontroller1.example.com | True | xxx | active |
+--------------------------------------+---------------------------+----------------+-------+-------
***********************************************************************
At this point hacontroller2 goes down. hacontroller1 goes up :-
***********************************************************************
Nova Services status on all Controllers
Neutron Services status on all Controllers
Compute Node status
******************************************************************************
Cloud VM (L3) at runtime . Accessibility from F23 Virtualization Host,
running HA 3 Nodes Controller and Compute Node VMs (L2)
******************************************************************************
[root@fedora23wks ~]# ping 10.10.10.103
PING 10.10.10.103 (10.10.10.103) 56(84) bytes of data.
64 bytes from 10.10.10.103: icmp_seq=1 ttl=63 time=1.14 ms
64 bytes from 10.10.10.103: icmp_seq=2 ttl=63 time=0.813 ms
64 bytes from 10.10.10.103: icmp_seq=3 ttl=63 time=0.636 ms
64 bytes from 10.10.10.103: icmp_seq=4 ttl=63 time=0.778 ms
64 bytes from 10.10.10.103: icmp_seq=5 ttl=63 time=0.493 ms
^C
--- 10.10.10.103 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4001ms
rtt min/avg/max/mdev = 0.493/0.773/1.146/0.218 ms
[root@fedora23wks ~]# ssh -i oskey1.priv fedora@10.10.10.103
Last login: Tue Nov 17 09:02:30 2015
[fedora@vf23dev ~]$ uname -a
Linux vf23dev.novalocal 4.2.5-300.fc23.x86_64 #1 SMP Tue Oct 27 04:29:56 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
********************************************************************************
Verifying neutron workflow on 3 node controller been built via patch:-
********************************************************************************
[root@hacontroller1 ~(keystone_admin)]# ovs-ofctl show br-eth0
OFPT_FEATURES_REPLY (xid=0x2): dpid:0000baf0db1a854f
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
1(eth0): addr:52:54:00:aa:0e:fc
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
2(phy-br-eth0): addr:46:c0:e0:30:72:92
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
LOCAL(br-eth0): addr:ba:f0:db:1a:85:4f
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
[root@hacontroller1 ~(keystone_admin)]# ovs-ofctl dump-flows br-eth0
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=15577.057s, table=0, n_packets=50441, n_bytes=3262529, idle_age=2, priority=4,in_port=2,dl_vlan=3 actions=strip_vlan,NORMAL
cookie=0x0, duration=15765.938s, table=0, n_packets=31225, n_bytes=1751795, idle_age=0, priority=2,in_port=2 actions=drop
cookie=0x0, duration=15765.974s, table=0, n_packets=39982, n_bytes=42838752, idle_age=1, priority=0 actions=NORMAL
Check `ovs-vsctl show`
Bridge br-int
fail_mode: secure
Port "tapc8488877-45"
tag: 4
Interface "tapc8488877-45"
type: internal
Port br-int
Interface br-int
type: internal
Port patch-tun
Interface patch-tun
type: patch
options: {peer=patch-int}
Port "tap14aa6eeb-70"
tag: 2
Interface "tap14aa6eeb-70"
type: internal
Port "qr-8f5b3f4a-45"
tag: 2
Interface "qr-8f5b3f4a-45"
type: internal
Port "int-br-eth0"
Interface "int-br-eth0"
type: patch
options: {peer="phy-br-eth0"}
Port "qg-34893aa0-17"
tag: 3
[root@hacontroller2 ~(keystone_demo)]# ovs-ofctl show br-eth0
OFPT_FEATURES_REPLY (xid=0x2): dpid:0000b6bfa2bafd45
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
1(eth0): addr:52:54:00:73:df:29
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
2(phy-br-eth0): addr:be:89:61:87:56:20
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
LOCAL(br-eth0): addr:b6:bf:a2:ba:fd:45
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
[root@hacontroller2 ~(keystone_demo)]# ovs-ofctl dump-flows br-eth0
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=15810.746s, table=0, n_packets=0, n_bytes=0, idle_age=15810, priority=4,in_port=2,dl_vlan=2 actions=strip_vlan,NORMAL
cookie=0x0, duration=16105.662s, table=0, n_packets=31849, n_bytes=1786827, idle_age=0, priority=2,in_port=2 actions=drop
cookie=0x0, duration=16105.696s, table=0, n_packets=39762, n_bytes=2100763, idle_age=0, priority=0 actions=NORMAL
Check `ovs-vsctl show`
Bridge br-int
fail_mode: secure
Port "qg-34893aa0-17"
tag: 2
Interface "qg-34893aa0-17"
type: internal
Qrouter's namespace output interface qg-xxxxxx sends vlan tagged packets to eth0 (which has VLAN=yes, see link bellow) , but OVS bridge br-eth0 is not aware of vlan tagging , it strips tags before sending packets outside into external flat network. In case of external network providers qg-xxxxxx interfaces are on br-int and that is normal. I believe that it's core reason why patch https://github.com/beekhof/osp-ha-deploy/commit/b2e01e86ca93cfad9ad01d533b386b4c9607c60d
works pretty stable. This issue doesn't show up on single controller and appears
to be critical for HAProxy/Keepalived 3 node controllers cluster at least via my
experience.
Per Lars Kellogg-Stedman [ 1 ]
References
1. http://blog.oddbit.com/2015/08/13/provider-external-networks-details/
2. https://ask.openstack.org/en/question/85055/how-does-external-network-provider-work-flatvlangre/
Please, view https://github.com/beekhof/osp-ha-deploy/commit/b2e01e86ca93cfad9ad01d533b386b4c9607c60d
It looks as work in progress.
See also https://www.redhat.com/archives/rdo-list/2015-November/msg00168.html
END UPDATE
Actually, setup bellow follows closely https://github.com/beekhof/osp-ha-deploy/blob/master/HA-keepalived.md
As far as to my knowledge Cisco's schema has been implemented :-
Keepalived, HAProxy,Galera for MySQL Manual install, at least 3 controller nodes. I just highlighted several steps which as I believe allowed me to bring this work to success. Javier is using flat external network provider for Controllers cluster disabling from the same start NetworkManager && enabling service network, there is one step which i was unable to skip. It's disabling IP's of eth0's interfaces && restarting network service right before running `ovs-vsctl add-port br-eth0 eth0` per Neutron building instructions of mentioned "Howto", which seems to be one of the best I've ever seen.
I (just) guess that due this sequence of steps even on already been built and seems to run OK three nodes Controller Cluster external network is still ping able :-
However, would i disable eth0's IPs from the start i would lost connectivity right away switching to network service from NetworkManager . In general, external network is supposed to be ping able from qrouter namespace due to Neutron router's DNAT/SNAT IPtables forwarding, but not from Controller . I am also aware of that when Ethernet interface becomes an OVS port of OVS bridge it's IP is supposed to be suppressed. When external network provider is not used , then br-ex gets any IP available IP on external network. Using external network provider changes situation. Details may be seen here :-
https://www.linux.com/community/blogs/133-general-linux/858156-multiple-external-networks-with-a-single-l3-agent-testing-on-rdo-liberty-per-lars-kellogg-stedman
[root@hacontroller1 ~(keystone_admin)]# systemctl status NetworkManager
NetworkManager.service - Network Manager
Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; disabled)
Active: inactive (dead)
[root@hacontroller1 ~(keystone_admin)]# systemctl status network
network.service - LSB: Bring up/down networking
Loaded: loaded (/etc/rc.d/init.d/network)
Active: active (exited) since Wed 2015-11-18 08:36:53 MSK; 2h 10min ago
Process: 708 ExecStart=/etc/rc.d/init.d/network start (code=exited, status=0/SUCCESS)
Nov 18 08:36:47 hacontroller1.example.com network[708]: Bringing up loopback interface: [ OK ]
Nov 18 08:36:51 hacontroller1.example.com network[708]: Bringing up interface eth0: [ OK ]
Nov 18 08:36:53 hacontroller1.example.com network[708]: Bringing up interface eth1: [ OK ]
Nov 18 08:36:53 hacontroller1.example.com systemd[1]: Started LSB: Bring up/down networking.
[root@hacontroller1 ~(keystone_admin)]# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet6 fe80::5054:ff:fe6d:926a prefixlen 64 scopeid 0x20<link>
ether 52:54:00:6d:92:6a txqueuelen 1000 (Ethernet)
RX packets 5036 bytes 730778 (713.6 KiB)
RX errors 0 dropped 12 overruns 0 frame 0
TX packets 15715 bytes 930045 (908.2 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.169.142.221 netmask 255.255.255.0 broadcast 192.169.142.255
inet6 fe80::5054:ff:fe5e:9644 prefixlen 64 scopeid 0x20<link>
ether 52:54:00:5e:96:44 txqueuelen 1000 (Ethernet)
RX packets 1828396 bytes 283908183 (270.7 MiB)
RX errors 0 dropped 13 overruns 0 frame 0
TX packets 1839312 bytes 282429736 (269.3 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 0 (Local Loopback)
RX packets 869067 bytes 69567890 (66.3 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 869067 bytes 69567890 (66.3 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[root@hacontroller1 ~(keystone_admin)]# ping -c 3 10.10.10.1
PING 10.10.10.1 (10.10.10.1) 56(84) bytes of data.
64 bytes from 10.10.10.1: icmp_seq=1 ttl=64 time=2.04 ms
64 bytes from 10.10.10.1: icmp_seq=2 ttl=64 time=0.103 ms
64 bytes from 10.10.10.1: icmp_seq=3 ttl=64 time=0.118 ms
--- 10.10.10.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.103/0.754/2.043/0.911 ms
Both mgmt and external networks emulated by corresponging Libvirt Networks
on F23 Virtualization Server. Total four VMs been setup , 3 of them for Controller
nodes and one for compute (4 VCPUS, 4 GB RAM)
[root@fedora23wks ~]# cat openstackvms.xml ( for eth1's)
<network>
<name>openstackvms</name>
<uuid>d0e9964a-f91a-40c0-b769-a609aee41bf2</uuid>
<forward mode='nat'>
<nat>
<port start='1024' end='65535'/>
</nat>
</forward>
<bridge name='virbr1' stp='on' delay='0' />
<mac address='52:54:00:60:f8:6d'/>
<ip address='192.169.142.1' netmask='255.255.255.0'>
<dhcp>
<range start='192.169.142.2' end='192.169.142.254' />
</dhcp>
</ip>
</network>
[root@fedora23wks ~]# cat public.xml ( for external network provider )
<network>
<name>public</name>
<uuid>d0e9965b-f92c-40c1-b749-b609aed42cf2</uuid>
<forward mode='nat'>
<nat>
<port start='1024' end='65535'/>
</nat>
</forward>
<bridge name='virbr2' stp='on' delay='0' />
<mac address='52:54:00:60:f8:6d'/>
<ip address='10.10.10.1' netmask='255.255.255.0'>
<dhcp>
<range start='10.10.10.2' end='10.10.10.254' />
</dhcp>
</ip>
</network>
Only one file is bit different on Controller Nodes , it is l3_agent.ini
[root@hacontroller1 neutron(keystone_demo)]# cat l3_agent.ini | grep -v ^# | grep -v ^$
[DEFAULT]
interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver
handle_internal_only_routers = True
send_arp_for_ha = 3
metadata_ip = controller-vip.example.com
external_network_bridge =
gateway_external_network_id =
[AGENT]
When "external_network_bridge = " , Neutron places the external interface of the router into the OVS bridge specified by the "provider_network" provider attribute in the Neutron network. Traffic is processed by Open vSwitch flow rules. In this configuration it is possible to utilize flat and VLAN provider networks.
*************************************************************************************
Due to posted "UPDATE" on the top of the blog entry in meantime
perfect solution is provided by https://github.com/beekhof/osp-ha-deploy/commit/b2e01e86ca93cfad9ad01d533b386b4c9607c60d
Per mentioned patch, assuming eth0 is your interface attached to the external network, create two files in /etc/sysconfig/network-scripts/ as follows (change MTU if you need):
cat <<EOF > /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
ONBOOT=yes
DEVICETYPE=ovs
TYPE=OVSPort
OVS_BRIDGE=br-eth0
ONBOOT=yes
BOOTPROTO=none
VLAN=yes
MTU="9000"
NM_CONTROLLED=no
EOF
cat <<EOF > /etc/sysconfig/network-scripts/ifcfg-br-eth0
DEVICE=br-eth0
DEVICETYPE=ovs
OVSBOOTPROTO=none
TYPE=OVSBridge
ONBOOT=yes
BOOTPROTO=static
MTU="9000"
NM_CONTROLLED=no
EOF
Restart the network for the changes to take effect.
systemctl restart network
The commit has been done on 11/14/2015 right after discussion at RDO mailing list.
Details maiby seen here Nova and Neutron work-flow && CLI for HAProxy/Keepalived 3 Node Controller RDO Liberty
One more step which I did ( not sure that is really has
to be done at this point of time ).IP's on eth0's interfaces were disabled just before
running `ovs-vsctl add-port br-eth0 eth0`:-
1. Updated ifcfg-eth0 files on all Controllers
2. `service network restart` on all Controllers
3. `ovs-vsctl add-port br-eth0 eth0`on all Controllers
*****************************************************************************************
Targeting just POC ( to get floating ips accessible from Fedora 23 Virtualization
host ) resulted Controllers Cluster setup:-
*****************************************************************************************
I installed only
Keystone
**************************
UPDATE to official docs
**************************
[root@hacontroller1 ~(keystone_admin)]# cat keystonerc_admin
export OS_USERNAME=admin
export OS_TENANT_NAME=admin
export OS_PROJECT_NAME=admin
export OS_REGION_NAME=regionOne
export OS_PASSWORD=keystonetest
export OS_AUTH_URL=http://controller-vip.example.com:35357/v2.0/
export OS_SERVICE_ENDPOINT=http://controller-vip.example.com:35357/v2.0
export OS_SERVICE_TOKEN=$(cat /root/keystone_service_token)
export PS1='[\u@\h \W(keystone_admin)]\$ '
Glance
Neutron
Nova
Horizon
Due to running Galera Synchronous MultiMaster Replication between Controllers each commands like :-
# su keystone -s /bin/sh -c "keystone-manage db_sync"
# su glance -s /bin/sh -c "glance-manage db_sync"
# neutron-db-manage --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugin.ini upgrade head
# su nova -s /bin/sh -c "nova-manage db sync"
are supposed to run just once from Conroller node 1 ( for instance )
************************
Compute Node setup:-
*************************
Compute setup
**********************
On all nodes
**********************
[root@hacontroller1 neutron(keystone_demo)]# cat /etc/hosts
192.169.142.220 controller-vip.example.com controller-vip
192.169.142.221 hacontroller1.example.com hacontroller1
192.169.142.222 hacontroller2.example.com hacontroller2
192.169.142.223 hacontroller3.example.com hacontroller3
192.169.142.224 compute.example.con compute
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
[root@hacontroller1 ~(keystone_admin)]# cat /etc/neutron/neutron.conf | grep -v ^$| grep -v ^#
[DEFAULT]
bind_host = 192.169.142.22(X)
auth_strategy = keystone
notification_driver = neutron.openstack.common.notifier.rpc_notifier
notify_nova_on_port_status_changes = True
notify_nova_on_port_data_changes = True
core_plugin = neutron.plugins.ml2.plugin.Ml2Plugin
service_plugins = router,lbaas
router_scheduler_driver = neutron.scheduler.l3_agent_scheduler.ChanceScheduler
dhcp_agents_per_network = 2
api_workers = 2
rpc_workers = 2
l3_ha = True
min_l3_agents_per_router = 2
max_l3_agents_per_router = 2
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
[keystone_authtoken]
auth_uri = http://controller-vip.example.com:5000/
identity_uri = http://127.0.0.1:5000
admin_tenant_name = %SERVICE_TENANT_NAME%
admin_user = %SERVICE_USER%
admin_password = %SERVICE_PASSWORD%
auth_plugin = password
auth_url = http://controller-vip.example.com:35357/
username = neutron
password = neutrontest
project_name = services
[database]
connection = mysql://neutron:neutrontest@controller-vip.example.com:3306/neutron
max_retries = -1
[nova]
nova_region_name = regionOne
project_domain_id = default
project_name = services
user_domain_id = default
password = novatest
username = compute
auth_url = http://controller-vip.example.com:35357/
auth_plugin = password
[oslo_concurrency]
[oslo_policy]
[oslo_messaging_amqp]
[oslo_messaging_qpid]
[oslo_messaging_rabbit]
rabbit_hosts = hacontroller1,hacontroller2,hacontroller3
rabbit_ha_queues = true
[qos]
[root@hacontroller1 haproxy(keystone_demo)]# cat haproxy.cfg
global
daemon
stats socket /var/lib/haproxy/stats
defaults
mode tcp
maxconn 10000
timeout connect 5s
timeout client 30s
timeout server 30s
listen monitor
bind 192.169.142.220:9300
mode http
monitor-uri /status
stats enable
stats uri /admin
stats realm Haproxy\ Statistics
stats auth root:redhat
stats refresh 5s
frontend vip-db
bind 192.169.142.220:3306
timeout client 90m
default_backend db-vms-galera
backend db-vms-galera
option httpchk
stick-table type ip size 1000
stick on dst
timeout server 90m
server rhos8-node1 192.169.142.221:3306 check inter 1s port 9200 backup on-marked-down shutdown-sessions
server rhos8-node2 192.169.142.222:3306 check inter 1s port 9200 backup on-marked-down shutdown-sessions
server rhos8-node3 192.169.142.223:3306 check inter 1s port 9200 backup on-marked-down shutdown-sessions
# Note the RabbitMQ entry is only needed for CloudForms compatibility
# and should be removed in the future
frontend vip-rabbitmq
option clitcpka
bind 192.169.142.220:5672
timeout client 900m
default_backend rabbitmq-vms
backend rabbitmq-vms
option srvtcpka
balance roundrobin
timeout server 900m
server rhos8-node1 192.169.142.221:5672 check inter 1s
server rhos8-node2 192.169.142.222:5672 check inter 1s
server rhos8-node3 192.169.142.223:5672 check inter 1s
frontend vip-keystone-admin
bind 192.169.142.220:35357
default_backend keystone-admin-vms
timeout client 600s
backend keystone-admin-vms
balance roundrobin
timeout server 600s
server rhos8-node1 192.169.142.221:35357 check inter 1s on-marked-down shutdown-sessions
server rhos8-node2 192.169.142.222:35357 check inter 1s on-marked-down shutdown-sessions
server rhos8-node3 192.169.142.223:35357 check inter 1s on-marked-down shutdown-sessions
frontend vip-keystone-public
bind 192.169.142.220:5000
default_backend keystone-public-vms
timeout client 600s
backend keystone-public-vms
balance roundrobin
timeout server 600s
server rhos8-node1 192.169.142.221:5000 check inter 1s on-marked-down shutdown-sessions
server rhos8-node2 192.169.142.222:5000 check inter 1s on-marked-down shutdown-sessions
server rhos8-node3 192.169.142.223:5000 check inter 1s on-marked-down shutdown-sessions
frontend vip-glance-api
bind 192.169.142.220:9191
default_backend glance-api-vms
backend glance-api-vms
balance roundrobin
server rhos8-node1 192.169.142.221:9191 check inter 1s
server rhos8-node2 192.169.142.222:9191 check inter 1s
server rhos8-node3 192.169.142.223:9191 check inter 1s
frontend vip-glance-registry
bind 192.169.142.220:9292
default_backend glance-registry-vms
backend glance-registry-vms
balance roundrobin
server rhos8-node1 192.169.142.221:9292 check inter 1s
server rhos8-node2 192.169.142.222:9292 check inter 1s
server rhos8-node3 192.169.142.223:9292 check inter 1s
frontend vip-cinder
bind 192.169.142.220:8776
default_backend cinder-vms
backend cinder-vms
balance roundrobin
server rhos8-node1 192.169.142.221:8776 check inter 1s
server rhos8-node2 192.169.142.222:8776 check inter 1s
server rhos8-node3 192.169.142.223:8776 check inter 1s
frontend vip-swift
bind 192.169.142.220:8080
default_backend swift-vms
backend swift-vms
balance roundrobin
server rhos8-node1 192.169.142.221:8080 check inter 1s
server rhos8-node2 192.169.142.222:8080 check inter 1s
server rhos8-node3 192.169.142.223:8080 check inter 1s
frontend vip-neutron
bind 192.169.142.220:9696
default_backend neutron-vms
backend neutron-vms
balance roundrobin
server rhos8-node1 192.169.142.221:9696 check inter 1s
server rhos8-node2 192.169.142.222:9696 check inter 1s
server rhos8-node3 192.169.142.223:9696 check inter 1s
frontend vip-nova-vnc-novncproxy
bind 192.169.142.220:6080
default_backend nova-vnc-novncproxy-vms
backend nova-vnc-novncproxy-vms
balance roundrobin
timeout tunnel 1h
server rhos8-node1 192.169.142.221:6080 check inter 1s
server rhos8-node2 192.169.142.222:6080 check inter 1s
server rhos8-node3 192.169.142.223:6080 check inter 1s
frontend nova-metadata-vms
bind 192.169.142.220:8775
default_backend nova-metadata-vms
backend nova-metadata-vms
balance roundrobin
server rhos8-node1 192.169.142.221:8775 check inter 1s
server rhos8-node2 192.169.142.222:8775 check inter 1s
server rhos8-node3 192.169.142.223:8775 check inter 1s
frontend vip-nova-api
bind 192.169.142.220:8774
default_backend nova-api-vms
backend nova-api-vms
balance roundrobin
server rhos8-node1 192.169.142.221:8774 check inter 1s
server rhos8-node2 192.169.142.222:8774 check inter 1s
server rhos8-node3 192.169.142.223:8774 check inter 1s
frontend vip-horizon
bind 192.169.142.220:80
timeout client 180s
default_backend horizon-vms
backend horizon-vms
balance roundrobin
timeout server 180s
mode http
cookie SERVERID insert indirect nocache
server rhos8-node1 192.169.142.221:80 check inter 1s cookie rhos8-horizon1 on-marked-down shutdown-sessions
server rhos8-node2 192.169.142.222:80 check inter 1s cookie rhos8-horizon2 on-marked-down shutdown-sessions
server rhos8-node3 192.169.142.223:80 check inter 1s cookie rhos8-horizon3 on-marked-down shutdown-sessions
frontend vip-heat-cfn
bind 192.169.142.220:8000
default_backend heat-cfn-vms
backend heat-cfn-vms
balance roundrobin
server rhos8-node1 192.169.142.221:8000 check inter 1s
server rhos8-node2 192.169.142.222:8000 check inter 1s
server rhos8-node3 192.169.142.223:8000 check inter 1s
frontend vip-heat-cloudw
bind 192.169.142.220:8003
default_backend heat-cloudw-vms
backend heat-cloudw-vms
balance roundrobin
server rhos8-node1 192.169.142.221:8003 check inter 1s
server rhos8-node2 192.169.142.222:8003 check inter 1s
server rhos8-node3 192.169.142.223:8003 check inter 1s
frontend vip-heat-srv
bind 192.169.142.220:8004
default_backend heat-srv-vms
backend heat-srv-vms
balance roundrobin
server rhos8-node1 192.169.142.221:8004 check inter 1s
server rhos8-node2 192.169.142.222:8004 check inter 1s
server rhos8-node3 192.169.142.223:8004 check inter 1s
frontend vip-ceilometer
bind 192.169.142.220:8777
timeout client 90s
default_backend ceilometer-vms
backend ceilometer-vms
balance roundrobin
timeout server 90s
server rhos8-node1 192.169.142.221:8777 check inter 1s
server rhos8-node2 192.169.142.222:8777 check inter 1s
server rhos8-node3 192.169.142.223:8777 check inter 1s
frontend vip-sahara
bind 192.169.142.220:8386
default_backend sahara-vms
backend sahara-vms
balance roundrobin
server rhos8-node1 192.169.142.221:8386 check inter 1s
server rhos8-node2 192.169.142.222:8386 check inter 1s
server rhos8-node3 192.169.142.223:8386 check inter 1s
frontend vip-trove
bind 192.169.142.220:8779
default_backend trove-vms
backend trove-vms
balance roundrobin
server rhos8-node1 192.169.142.221:8779 check inter 1s
server rhos8-node2 192.169.142.222:8779 check inter 1s
server rhos8-node3 192.169.142.223:8779 check inter 1s
[root@hacontroller1 ~(keystone_demo)]# cat /etc/my.cnf.d/galera.cnf
[mysqld]
skip-name-resolve=1
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
max_connections=8192
query_cache_size=0
query_cache_type=0
bind_address=192.169.142.22(X)
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_name="galera_cluster"
wsrep_cluster_address="gcomm://192.169.142.221,192.169.142.222,192.169.142.223"
wsrep_slave_threads=1
wsrep_certify_nonPK=1
wsrep_max_ws_rows=131072
wsrep_max_ws_size=1073741824
wsrep_debug=0
wsrep_convert_LOCK_to_trx=0
wsrep_retry_autocommit=1
wsrep_auto_increment_control=1
wsrep_drupal_282555_workaround=0
wsrep_causal_reads=0
wsrep_notify_cmd=
wsrep_sst_method=rsync
[root@hacontroller1 ~(keystone_demo)]# cat /etc/keepalived/keepalived.conf
vrrp_script chk_haproxy {
script "/usr/bin/killall -0 haproxy"
interval 2
}
vrrp_instance VI_PUBLIC {
interface eth1
state BACKUP
virtual_router_id 52
priority 101
virtual_ipaddress {
192.169.142.220 dev eth1
}
track_script {
chk_haproxy
}
# Avoid failback
nopreempt
}
vrrp_sync_group VG1
group {
VI_PUBLIC
}
*************************************************************************
The most difficult procedure is re-syncing Galera Mariadb cluster
*************************************************************************
https://github.com/beekhof/osp-ha-deploy/blob/master/keepalived/galera-bootstrap.md
Due to nova services start not waiting for getting in sync Galera databases, after sync is done and regardless systemctl reports that service are up and running, database update by `openstack-service restart nova` is required on every Controller. Also the most suspicious reason for failure access Nova metadata Server by starting VMs is failure to start neutron-l3-agent service on each Controller due to classical design - VM's access metadata via neutron-ns-metadata-proxy running in qrouter namespace. neutron-l3-agents may be started with no problems, some times just restarted when needed.
*****************************************
Creating Neutron Router via CLI.
*****************************************
[root@hacontroller1 ~(keystone_admin)]# cat keystonerc_admin
export OS_USERNAME=admin
export OS_TENANT_NAME=admin
export OS_PROJECT_NAME=admin
export OS_REGION_NAME=regionOne
export OS_PASSWORD=keystonetest
export OS_AUTH_URL=http://controller-vip.example.com:35357/v2.0/
export OS_SERVICE_ENDPOINT=http://controller-vip.example.com:35357/v2.0
export OS_SERVICE_TOKEN=$(cat /root/keystone_service_token)
export PS1='[\u@\h \W(keystone_admin)]\$ '
[root@hacontroller1 ~(keystone_admin)]# keystone tenant-list
+----------------------------------+----------+---------+
| id | name | enabled |
+----------------------------------+----------+---------+
| acdc927b53bd43ae9a7ed657d1309884 | admin | True |
| 7db0aa013d60434996585c4ee359f512 | demo | True |
| 9d8bf126d54e4d11a109bd009f54a87f | services | True |
+----------------------------------+----------+---------+
[root@hacontroller1 ~(keystone_admin)]# neutron router-create --ha True --tenant-id 7db0aa013d60434996585c4ee359f512 RouterDS
Created a new router:
+-----------------------+--------------------------------------+
| Field | Value |
+-----------------------+--------------------------------------+
| admin_state_up | True |
| distributed | False |
| external_gateway_info | |
| ha | True |
| id | fdf540d2-c128-4677-b403-d71c796d7e18 |
| name | RouterDS |
| routes | |
| status | ACTIVE |
| tenant_id | 7db0aa013d60434996585c4ee359f512 |
+-----------------------+--------------------------------------+
RUN Time Snapshots. Keepalived status on Controller's nodes
HA Neutron router belonging tenant demo create via Neutron CLI
***********************************************************************
At this point hacontroller1 goes down. On hacontroller2 run :-
***********************************************************************
root@hacontroller2 ~(keystone_admin)]# neutron l3-agent-list-hosting-router RouterHA
+--------------------------------------+---------------------------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+---------------------------+----------------+-------+----------+
| a03409d2-fbe9-492c-a954-e1bdf7627491 | hacontroller2.example.com | True | :-) | active |
| 0d6e658a-e796-4cff-962f-06e455fce02f | hacontroller1.example.com | True | xxx | active |
+--------------------------------------+---------------------------+----------------+-------+-------
***********************************************************************
At this point hacontroller2 goes down. hacontroller1 goes up :-
***********************************************************************
Nova Services status on all Controllers
Neutron Services status on all Controllers
Compute Node status
******************************************************************************
Cloud VM (L3) at runtime . Accessibility from F23 Virtualization Host,
running HA 3 Nodes Controller and Compute Node VMs (L2)
******************************************************************************
[root@fedora23wks ~]# ping 10.10.10.103
PING 10.10.10.103 (10.10.10.103) 56(84) bytes of data.
64 bytes from 10.10.10.103: icmp_seq=1 ttl=63 time=1.14 ms
64 bytes from 10.10.10.103: icmp_seq=2 ttl=63 time=0.813 ms
64 bytes from 10.10.10.103: icmp_seq=3 ttl=63 time=0.636 ms
64 bytes from 10.10.10.103: icmp_seq=4 ttl=63 time=0.778 ms
64 bytes from 10.10.10.103: icmp_seq=5 ttl=63 time=0.493 ms
^C
--- 10.10.10.103 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4001ms
rtt min/avg/max/mdev = 0.493/0.773/1.146/0.218 ms
[root@fedora23wks ~]# ssh -i oskey1.priv fedora@10.10.10.103
Last login: Tue Nov 17 09:02:30 2015
[fedora@vf23dev ~]$ uname -a
Linux vf23dev.novalocal 4.2.5-300.fc23.x86_64 #1 SMP Tue Oct 27 04:29:56 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
********************************************************************************
Verifying neutron workflow on 3 node controller been built via patch:-
********************************************************************************
[root@hacontroller1 ~(keystone_admin)]# ovs-ofctl show br-eth0
OFPT_FEATURES_REPLY (xid=0x2): dpid:0000baf0db1a854f
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
1(eth0): addr:52:54:00:aa:0e:fc
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
2(phy-br-eth0): addr:46:c0:e0:30:72:92
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
LOCAL(br-eth0): addr:ba:f0:db:1a:85:4f
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
[root@hacontroller1 ~(keystone_admin)]# ovs-ofctl dump-flows br-eth0
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=15577.057s, table=0, n_packets=50441, n_bytes=3262529, idle_age=2, priority=4,in_port=2,dl_vlan=3 actions=strip_vlan,NORMAL
cookie=0x0, duration=15765.938s, table=0, n_packets=31225, n_bytes=1751795, idle_age=0, priority=2,in_port=2 actions=drop
cookie=0x0, duration=15765.974s, table=0, n_packets=39982, n_bytes=42838752, idle_age=1, priority=0 actions=NORMAL
Check `ovs-vsctl show`
Bridge br-int
fail_mode: secure
Port "tapc8488877-45"
tag: 4
Interface "tapc8488877-45"
type: internal
Port br-int
Interface br-int
type: internal
Port patch-tun
Interface patch-tun
type: patch
options: {peer=patch-int}
Port "tap14aa6eeb-70"
tag: 2
Interface "tap14aa6eeb-70"
type: internal
Port "qr-8f5b3f4a-45"
tag: 2
Interface "qr-8f5b3f4a-45"
type: internal
Port "int-br-eth0"
Interface "int-br-eth0"
type: patch
options: {peer="phy-br-eth0"}
Port "qg-34893aa0-17"
tag: 3
[root@hacontroller2 ~(keystone_demo)]# ovs-ofctl show br-eth0
OFPT_FEATURES_REPLY (xid=0x2): dpid:0000b6bfa2bafd45
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
1(eth0): addr:52:54:00:73:df:29
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
2(phy-br-eth0): addr:be:89:61:87:56:20
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
LOCAL(br-eth0): addr:b6:bf:a2:ba:fd:45
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
[root@hacontroller2 ~(keystone_demo)]# ovs-ofctl dump-flows br-eth0
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=15810.746s, table=0, n_packets=0, n_bytes=0, idle_age=15810, priority=4,in_port=2,dl_vlan=2 actions=strip_vlan,NORMAL
cookie=0x0, duration=16105.662s, table=0, n_packets=31849, n_bytes=1786827, idle_age=0, priority=2,in_port=2 actions=drop
cookie=0x0, duration=16105.696s, table=0, n_packets=39762, n_bytes=2100763, idle_age=0, priority=0 actions=NORMAL
Check `ovs-vsctl show`
Bridge br-int
fail_mode: secure
Port "qg-34893aa0-17"
tag: 2
Interface "qg-34893aa0-17"
type: internal
Qrouter's namespace output interface qg-xxxxxx sends vlan tagged packets to eth0 (which has VLAN=yes, see link bellow) , but OVS bridge br-eth0 is not aware of vlan tagging , it strips tags before sending packets outside into external flat network. In case of external network providers qg-xxxxxx interfaces are on br-int and that is normal. I believe that it's core reason why patch https://github.com/beekhof/osp-ha-deploy/commit/b2e01e86ca93cfad9ad01d533b386b4c9607c60d
works pretty stable. This issue doesn't show up on single controller and appears
to be critical for HAProxy/Keepalived 3 node controllers cluster at least via my
experience.
Per Lars Kellogg-Stedman [ 1 ]
-
The packet exits the
qg-...
interface of the router (where it is assigned the VLAN tag associated with the external network). (N)
-
The packet is delivered to the external bridge, where a flow rule
strip the VLAN tag. (P)
-
The packet is sent out the physical interface associated with the
bridge.
References
1. http://blog.oddbit.com/2015/08/13/provider-external-networks-details/
2. https://ask.openstack.org/en/question/85055/how-does-external-network-provider-work-flatvlangre/
No comments:
Post a Comment