Friday 19 February 2021

10 Key Considerations for Kubernetes Cluster Design & Setup

 Design is a very important aspect when it comes to kubernetes cluster setup.

This blog will talk about the 10 high-level things that have to the taken into consideration when setting up a Kubernetes cluster.

1. Kubernetes Networking (Cloud, Hybrid, or On-Prem):

Kubernetes network has to be designed in such a way that it can accommodate future cluster and application requirements.

One common mistake organizations do is using CIDR ranges that are not part of the organization’s network. In the future when they want the clusters to be in a hybrid network, it ends up in migration.

It is better to discuss with the organization’s network team before finalizing the network design. This way, you can carve out and reserve an IP range even if you are not part of the hybrid network.

Each cloud provider gives multiple options for Node and Pod networking. For example, GKE offers multi-cluster services, VPC native clusters with routable pod IPs from the same VPC and the peered VPCs.

But if you don’t want to expose your pod IPs, you might need to use something like an IP masquerading agent in your cluster so that the outgoing traffic will always have the Node IP as the source identity.

Also, Ingress and egress traffic design are essential. There could be API gateways, on-prem systems, and third-party APIs that you need to connect from the cluster apps.

Your design should include all the access requirements so that you won’t face any access restrictions during implementation.

2. Kubernetes Security, Compliance & Benchmarks

Following are the generic security considerations.

  1. Understand the compliance requirements and security benchmarks as per your organization’s policy. If you are using managed services, make sure it complies with the organization’s compliance policies.
  2. Will there be any PCI/PII data apps?. If yes, then segregate these apps in terms of access and storage.
  3. Implement the standard pod security policies ( disabling container root access, privileged access, read-only file system, etc).
  4. Accessing the container registries securely.
  5. Network policies to control pod to pod traffic and isolate apps as per access requirements.
  6. A well-designed pipeline to make sure only approved container images are deployed in the cluster. Containers should be scanned for vulnerabilities and fail the CI process if the image scan fails to meet the security requirements,

3. Kubernetes Cluster Access

It is very important to design and document the way the kubernetes cluster is accessed.

Following are the key considerations.

  1. Restricting manual cluster-admin access. Instead, cluster-admin access should only be allowed through automation.
  2. Implement RBAC’s authorization.
  3. Allow kubernetes API access via service accounts with limited privileges.
  4. Implement Open Policy Agent for fine-grained access controls.
  5. Consider options for openID connect
  6. Have a good audit mechanism for checking the roles and removing unused users, roles, service accounts, etc.

Design the access levels so that you can hand off responsibilities to other teams using the cluster. It would save time for everyone, and you can focus more on the engineering par rather than working on repeated tasks.

4. Kubernetes High Availability & Scaling

High availability another key factor in the kubernetes cluster.

Here you need to consider the worker node availability across different availability zones.

Also, consider Pod Topology Spread Constraints to spread pods in different availability zones.

When we talk about scaling, it’s not just autoscaling of instances or pods.

It’s about how gracefully you can scale down and scale up the apps without any service interruptions.

Depending on the type of apps that needs to be hosted on kubernetes, you can design deployments to evict the pods gracefully during scale down and patching activities.

Also, consider chaos engineering experiments before production to check the cluster and app stability.

5. Kubernetes Ingress

Ingress is an essential component of Kubernetes clusters. There are many ways to set up a kubernetes ingress.

Also, there are different types of ingress controllers.

You can try out the best option that will be suitable for your organization’s compliance policies and scaling requirements.

Few considerations,

  1. Have separate ingress controllers for the platform-tools.
  2. SSL management for ingress endpoints.
  3. Do not try to route all the apps through the same ingress. If your apps grow day by day, they could end up in a big configuration file creating issues.

6. Kubernetes Backup & Restore Strategy

Whether it is a managed service or custom kubernetes implementation, it is essential to back up the cluster.

When we say backup, it is primarily backing up etcd.

You should have a very good design to automate the backup of the kubernetes cluster and its associated components.

Also, a design to restore the cluster if required.

There are also options to take the dump of existing objects in JSON format. You can use dump to restore the objects in the same or a different cluster.

7. Kubernetes Node & Container Image Patching & Lifecycle Management

Patching is a repeated process.

When it comes to kubernetes, there is node and container patching.

Make sure you implement DevSecOps principles in your CI/CD pipelines.

Here are some key considerations,

  1. An automated pipeline integrated with container scanning tools to patch container images on a monthly schedule.
  2. An automated pipeline to perform node patching without downtime.
  3. An automated pipeline to manage the lifecycle of container images. You don’t want to keep so many versions in your registry that are outdated.

8. Kubernetes Cluster Upgrades

Generally, you can perform a cluster updgrade in two ways

  1. Upgrading the existing cluster.
  2. Create a new cluster and migrate the apps to the new cluster.

You need a very good automated pipeline design to perform a cluster upgrade.

There could be Networking, DNS, and other component changes during an upgrade. It all depends on the design & organizational policies.

9. Kubernetes Cluster Capacity & Storage

Cluster capacity a very important topic of discussion.

You need to decide on the number of clusters you need to run.

Some organization prefers running multiple clusters to reduce the blast radius and easy maintenance. While others prefer a big cluster with a large number of worker nodes or less number of nodes with huge instance capacity.

You can decide on the cluster capacity based on your needs and the size of the team to manage the clusters.

Next comes the storage part.

Plan how you want to attach volumes to containers. Follow all the standard storage security practices on kubernetes.

When it comes to the cloud, there is out of the box support for provisioning storage,

If you are planning to run stateful sets, it is very important to design the storage to get high throughputs and maximum availability.

10. Kubernetes Logging & Monitoring

Most of the organizations will have a centralized logging and monitoring system and they prefer to integrate kubernetes with these systems.

Here are the key considertions.

  1. How much log data will be generated.
  2. Mechanisms to ingest Kubernetes logs into the logging systems considering huge data volume.
  3. Scaling logging and monitoring components deployed in the cluster.
  4. Data retention as per organizations policy.
  5. Define and document the KPIs for monitoring.

Conclusion

These are some of the key considerations which often get missed while setting up a kubernetes cluster.

Missing these aspects while implementing kubernetes could lead to issues in the overall cluster and might impose compromises for the business.

Ideally, the Solution/ Technical architect should keep all the mentioned Items (there could be many but worth considering) as a checklist while designing the cluster architecture to make sure they are implemented during the IaaC development.