Istio: Multi-Cluster Federation and hybrid cloud
Your business is successful and you need to go global. How do you scale your app across multiple regions? How do you handle deployments to multiple clusters? How do you provide great user experience with low latency and resilience while maintaining low cost? One way is to use Kubernetes Federations. You deploy multiple clusters, join them in a federation and sync the API resources. Federations, however, are still in beta version and are not recommended for production usage. If you go up another layer, you can manage the multi-cluster control plane with a service mesh like Istio.
What is the Kubernetes Federation?
Kubernetes Federation is an open-source project that focuses on making it easy to manage multiple clusters. It does so by providing
2 major building blocks:
- Sync resources across clusters: Federation provides the ability to keep resources in multiple clusters in sync. For example, you can ensure that the same app deployment exists in multiple clusters.
- Cross-cluster discovery: Federation provides the ability to auto-configure DNS servers and load balancers with backends from all clusters.
Some other use cases that federation enables are:
- High availability: By spreading the load across clusters and auto-configuring DNS servers and load balancers, federation minimises the impact of cluster failure.
- Avoiding provider lock-in: By making it easier to migrate applications across clusters, federation prevents cluster provider lock-in.
Kubernetes Federation with clusters in SF, NY and Berlin. The architecture of the system. Image from CoreOS https://coreos.com/blog/kubernetes-cluster-federation.html
Federation is not helpful unless you have multiple clusters. Some of the reasons why you might want multiple clusters are:
- Low latency: Having clusters in multiple regions minimises latency by serving users from the cluster that is closest to them.
- Fault isolation: It might be better to have multiple small clusters rather than a single large cluster for fault isolation (for example: multiple clusters in different availability zones of a cloud provider).
- Scalability: There are scalability limits to a single Kubernetes cluster (about 5000 nodes per cluster the last time I checked)
- Hybrid cloud: You can have multiple clusters on different cloud providers or on-premises data centres.
While there are a lot of attractive use cases for the federation, there are also some caveats:
- Increased network bandwidth and cost: The federation control plane watches all clusters to ensure that the current state is as expected. This can lead to significant network cost if the clusters are running in different regions on a cloud provider or on different cloud providers.
- Reduced cross-cluster isolation: A bug in the Federation control plane can impact all clusters. This is mitigated by keeping the logic in the Federation control plane to a minimum. It mostly delegates to the control plane in Kubernetes clusters whenever it can. The design and implementation also errs on the side of safety and avoiding multi-cluster outage.
- Maturity: The Federation project is relatively new and is not very mature. Not all resources are available and many are still alpha.
Hybrid cloud capabilities
Federations of Kubernetes Clusters can include clusters running on different cloud providers (e.g. Google Cloud, AWS), and on-premises (e.g. on OpenStack). Kubefed is the recommended way to deploy federated clusters.
Thereafter, your API resources can span different clusters and cloud providers.
Should I go with Kubernetes Federation?
Kubernetes Federation is currently considered alpha for many of its features, and there is no clear path to evolve the API to GA. I would not recommend using Kubernetes Federation for your production systems. Ingresses typically don’t work even when you are using a simple federation of k8s cluster from one public provider. Managing more ingresses with Hybrid cloud could be an awful pain.
Federation uses Public DNS and IP addresses with external LoadBalancer for cross-cluster service discovery, which is usually a quite expensive option. I didn’t find out how to make it work on a private network as one cluster does not see the other cluster’s k8s services, but pods only. Moreover, Kubernetes Federation's project development seems rather stale. 143 stars on Github? Seriously?
Multi-cluster federation with Istio
Do you want the same features as Kubernetes Federation with a more stable and mature solution? Check out Istio’s multi-cluster support.
Multi-cluster functions by enabling Kubernetes control planes running a remote configuration to connect to one Istio control plane. Once one or more remote Kubernetes clusters are connected to the Istio control plane, Envoy can then communicate with the single Istio control plane and form a mesh network across multiple Kubernetes clusters.
This guide describes how to install a multi-cluster Istio topology using the manifests and Helm charts provided within the Istio repository.
I made a GitHub repo for easy provisioning of the whole system on GCP based on the previously mentioned guide.
We will deploy the Bookinfo application to two GKE clusters. All the services will run in one cluster, only the Reviews-3 will run in the other. We leverage the GKE’s alias IPs feature, where pods in one cluster can communicate with pods in the other cluster, using just private IPs on a private network.
Product page requests will be load balanced across all the reviews’ versions, even though it runs on a different cluster, in a different zone, region, continent…
As mentioned above, you typically don’t want your services to communicate cross-cluster to different zones/regions as it usually causes higher latency and network bandwidth fees. A typical use case would be if you had a central cluster close to your HQ - say in Frankfurt - and you had customers not only in Europe but in Brazil as well. You could deploy a smaller cluster to Brazil for public-facing frontend APIs and some subset of services and the rest (like payment-gateway APIs, databases, …) will run in Frankfurt only to save some costs.
You can autoscale the services in each cluster independently, depending on a local cluster’s traffic needs - there is no need for overprovisioning.
One disadvantage of this setup is that the Istio’s ingress-gateway is deployed as a LoadBalancer only in the master cluster. That means all traffic is being proxied through the master cluster, and even if your client is in Brazil, the request he makes goes to Frankfurt and back to Brazil. You could possibly avoid this by deploying more Istio masters.
I will play with this a little bit more in the future. I’d like to use Google https LoadBalancer with Istio ingress-gateway and have all the frontends deployed to all clusters.
Even nowadays with all the clouds, k8s and service meshes, multiple clusters are still hard. But it's 2018 and we can do better! Leveraging the advantages of having multi-cluster setups can benefit our business greatly. Kubernetes Federations might not be the perfect way to set up such an ecosystem, so take a look at Istio and see for yourself. It is definitely worth trying!