This posting is just a brief extract from https://docs.redhat.com/en/documentation/red_hat_openstack_platform/10/single/understanding_red_hat_openstack_platform_high_availability/index
Following below is a kind of comment on the posts of Zen@yandex.com authors who write with enthusiasm about investments in cloud deployments due to the execution of virtual machines in the Cloud as cloud's major advantage according to the authors. Openstack's Cloud Fault Tolerance advantage vs traditional Client Server Unix/Lnux architecture seems to be completely ignored in writings mentioned above.
A three-node controller cluster is important in OpenStack because it provides high availability and fault tolerance for control plane services. With three nodes, the cluster can tolerate the failure of a single node without disrupting cloud functionality, according to Red Hat documentation.
This ensures that important services such as authentication, image storage, and networking remain operational even if one controller node experiences problems.
Here's a more detailed explanation:
High Availability (HA):
With three nodes, the cluster can maintain a quorum (majority vote) even if one node fails. This ensures that the remaining two nodes can continue to manage the cloud infrastructure.
Fault Tolerance:
If one controller node fails, the other two can take over its workload, preventing service disruption and data loss.
Redundancy:
The three nodes act as backups for each other, meaning if one node fails, another can immediately take over its responsibilities.
Quorum:
OpenStack high availability setup relies on quorum (majority voting) to ensure that the cluster can make decisions and maintain consistency even if some nodes are down or unavailable. A three-node cluster is ideal for achieving quorum.
Production Environments:
The Red Hat OpenStack platform requires three controller nodes for a production-grade environment to ensure high availability and reliability. In fact, a three-node controller cluster is essential for a reliable and fault-tolerant OpenStack cloud environment, especially in production environments where downtime is unacceptable.
This is the reason why you should seriously invest in Cloud deployments. Running virtual machines on Compute Nodes has nothing to do with the fault tolerance of a Cloud deployment.
Red Hat documentation is open source. Per. "Understanding Red Hat OpenStack Platform High Availability"
Most of the high availability (HA) coverage in this document pertains to controller nodes. There are two primary HA (High Availability) technologies used on Red Hat OpenStack Platform controller nodes:
Pacemaker: By configuring virtual IP addresses, services, and other functions as resources in a cluster, Pacemaker ensures that a specific set of OpenStack cluster resources are up and available. When a service or an entire node in a cluster fails, Pacemaker can restart the service, remove the node from the cluster, or reboot the node. Requests to most of these services are made through HAProxy.
HAProxy: When you configure more than one controller node with a director in Red Hat OpenStack Platform, HAProxy is configured on those nodes to load balance traffic to some of the OpenStack services running on those nodes.
Galera: Red Hat OpenStack Platform uses the MariaDB Galera cluster to manage database replication.
Highly available services in OpenStack operate in one of two modes:
Active/active: In this mode, the same service is started on multiple controller nodes using Pacemaker, then traffic can either be distributed among the nodes running the requested service using HAProxy, or directed to a specific controller via a single IP address. In some cases, HAProxy distributes traffic among active/active services on a round-robin basis. Performance can be improved by adding more controller nodes.
Active/passive: Services that are not capable or reliable enough to operate in active/active mode operate in active/passive mode. This means that only one instance of the service is active at a time. For Galera, HAProxy uses stick-table settings to ensure that incoming connections are directed to a single backend service. Galera master-master mode can become blocked when services access the same data from multiple Galera nodes at the same time.
As you begin exploring the high availability services described in this document, keep in mind that "the director system (called the undercloud) itself runs OpenStack. The purpose of the undercloud is to create and maintain the systems that will become your OpenStack running environment." The environment you create from the undercloud is called the overcloud. To get into your overcloud, this document asks you to log into your undercloud, then select which overcloud node you want to explore.