Friday 19 February 2021

10 Key Considerations for Kubernetes Cluster Design & Setup

 Design is a very important aspect when it comes to kubernetes cluster setup.

This blog will talk about the 10 high-level things that have to the taken into consideration when setting up a Kubernetes cluster.

1. Kubernetes Networking (Cloud, Hybrid, or On-Prem):

Kubernetes network has to be designed in such a way that it can accommodate future cluster and application requirements.

One common mistake organizations do is using CIDR ranges that are not part of the organization’s network. In the future when they want the clusters to be in a hybrid network, it ends up in migration.

It is better to discuss with the organization’s network team before finalizing the network design. This way, you can carve out and reserve an IP range even if you are not part of the hybrid network.

Each cloud provider gives multiple options for Node and Pod networking. For example, GKE offers multi-cluster services, VPC native clusters with routable pod IPs from the same VPC and the peered VPCs.

But if you don’t want to expose your pod IPs, you might need to use something like an IP masquerading agent in your cluster so that the outgoing traffic will always have the Node IP as the source identity.

Also, Ingress and egress traffic design are essential. There could be API gateways, on-prem systems, and third-party APIs that you need to connect from the cluster apps.

Your design should include all the access requirements so that you won’t face any access restrictions during implementation.

2. Kubernetes Security, Compliance & Benchmarks

Following are the generic security considerations.

  1. Understand the compliance requirements and security benchmarks as per your organization’s policy. If you are using managed services, make sure it complies with the organization’s compliance policies.
  2. Will there be any PCI/PII data apps?. If yes, then segregate these apps in terms of access and storage.
  3. Implement the standard pod security policies ( disabling container root access, privileged access, read-only file system, etc).
  4. Accessing the container registries securely.
  5. Network policies to control pod to pod traffic and isolate apps as per access requirements.
  6. A well-designed pipeline to make sure only approved container images are deployed in the cluster. Containers should be scanned for vulnerabilities and fail the CI process if the image scan fails to meet the security requirements,

3. Kubernetes Cluster Access

It is very important to design and document the way the kubernetes cluster is accessed.

Following are the key considerations.

  1. Restricting manual cluster-admin access. Instead, cluster-admin access should only be allowed through automation.
  2. Implement RBAC’s authorization.
  3. Allow kubernetes API access via service accounts with limited privileges.
  4. Implement Open Policy Agent for fine-grained access controls.
  5. Consider options for openID connect
  6. Have a good audit mechanism for checking the roles and removing unused users, roles, service accounts, etc.

Design the access levels so that you can hand off responsibilities to other teams using the cluster. It would save time for everyone, and you can focus more on the engineering par rather than working on repeated tasks.

4. Kubernetes High Availability & Scaling

High availability another key factor in the kubernetes cluster.

Here you need to consider the worker node availability across different availability zones.

Also, consider Pod Topology Spread Constraints to spread pods in different availability zones.

When we talk about scaling, it’s not just autoscaling of instances or pods.

It’s about how gracefully you can scale down and scale up the apps without any service interruptions.

Depending on the type of apps that needs to be hosted on kubernetes, you can design deployments to evict the pods gracefully during scale down and patching activities.

Also, consider chaos engineering experiments before production to check the cluster and app stability.

5. Kubernetes Ingress

Ingress is an essential component of Kubernetes clusters. There are many ways to set up a kubernetes ingress.

Also, there are different types of ingress controllers.

You can try out the best option that will be suitable for your organization’s compliance policies and scaling requirements.

Few considerations,

  1. Have separate ingress controllers for the platform-tools.
  2. SSL management for ingress endpoints.
  3. Do not try to route all the apps through the same ingress. If your apps grow day by day, they could end up in a big configuration file creating issues.

6. Kubernetes Backup & Restore Strategy

Whether it is a managed service or custom kubernetes implementation, it is essential to back up the cluster.

When we say backup, it is primarily backing up etcd.

You should have a very good design to automate the backup of the kubernetes cluster and its associated components.

Also, a design to restore the cluster if required.

There are also options to take the dump of existing objects in JSON format. You can use dump to restore the objects in the same or a different cluster.

7. Kubernetes Node & Container Image Patching & Lifecycle Management

Patching is a repeated process.

When it comes to kubernetes, there is node and container patching.

Make sure you implement DevSecOps principles in your CI/CD pipelines.

Here are some key considerations,

  1. An automated pipeline integrated with container scanning tools to patch container images on a monthly schedule.
  2. An automated pipeline to perform node patching without downtime.
  3. An automated pipeline to manage the lifecycle of container images. You don’t want to keep so many versions in your registry that are outdated.

8. Kubernetes Cluster Upgrades

Generally, you can perform a cluster updgrade in two ways

  1. Upgrading the existing cluster.
  2. Create a new cluster and migrate the apps to the new cluster.

You need a very good automated pipeline design to perform a cluster upgrade.

There could be Networking, DNS, and other component changes during an upgrade. It all depends on the design & organizational policies.

9. Kubernetes Cluster Capacity & Storage

Cluster capacity a very important topic of discussion.

You need to decide on the number of clusters you need to run.

Some organization prefers running multiple clusters to reduce the blast radius and easy maintenance. While others prefer a big cluster with a large number of worker nodes or less number of nodes with huge instance capacity.

You can decide on the cluster capacity based on your needs and the size of the team to manage the clusters.

Next comes the storage part.

Plan how you want to attach volumes to containers. Follow all the standard storage security practices on kubernetes.

When it comes to the cloud, there is out of the box support for provisioning storage,

If you are planning to run stateful sets, it is very important to design the storage to get high throughputs and maximum availability.

10. Kubernetes Logging & Monitoring

Most of the organizations will have a centralized logging and monitoring system and they prefer to integrate kubernetes with these systems.

Here are the key considertions.

  1. How much log data will be generated.
  2. Mechanisms to ingest Kubernetes logs into the logging systems considering huge data volume.
  3. Scaling logging and monitoring components deployed in the cluster.
  4. Data retention as per organizations policy.
  5. Define and document the KPIs for monitoring.

Conclusion

These are some of the key considerations which often get missed while setting up a kubernetes cluster.

Missing these aspects while implementing kubernetes could lead to issues in the overall cluster and might impose compromises for the business.

Ideally, the Solution/ Technical architect should keep all the mentioned Items (there could be many but worth considering) as a checklist while designing the cluster architecture to make sure they are implemented during the IaaC development.

Saturday 13 February 2021

Why is space not being freed from disk after deleting a file in Red Hat Enterprise Linux?

 

Issue

  • Why is space not being freed from disk after deleting a file in Red Hat Enterprise Linux?
  • When deleting a large file or files, the file is deleted successfully but the size of the filesystem does not reflect the change.
  • I've deleted some files but the amount of free space on the filesystem has not changed.
  • The OS was holding several very large log files open with some as large as ~30G. The file was previously deleted, but only stopping and restarting the jvm/java process released the disk space. The lsof command shows the following output before restarting the java process

    COMMAND     PID      USER   FD      TYPE    DEVICE   SIZE/OFF       NODE NAME
    : 
    java      49097    awdmw   77w      REG     253,6 33955068440    1283397 /opt/jboss/jboss-eap-5/jboss-as/server/all/log/server.log (deleted)
    
  • When you perform a df, the storage shows 90+% utilized, however, there is not really that much written to that space.

Resolution

Graceful shutdown of relevant process

First, obtain a list of deleted files which are still held open by applications:

$ /usr/sbin/lsof | grep deleted
ora    25575 data   33u   REG      65,65  4294983680   31014933 /oradata/DATAPRE/file.dbf (deleted)

The lsof output shows the process with pid 25575 has kept file /oradata/DATAPRE/file.dbf open with file descriptor (fd) number 33.

After a file has been identified, free the file used space by shutting down the affected process. If a graceful shutdown does not work, then issue the kill command to forcefully stop it by referencing the PID.

Truncate File Size

Alternatively, it is possible to force the system to de-allocate the space consumed by an in-use file by forcing the system to truncate the file via the proc file system. This is an advanced technique and should only be carried out when the administrator is certain that this will cause no adverse effects to running processes. Applications may not be designed to deal elegantly with this situation and may produce inconsistent or undefined behavior when files that are in use are abruptly truncated in this manner.

$ echo > /proc/pid/fd/fd_number

For example, from the lsof output above:

$ file /proc/25575/fd/33
/proc/25575/fd/33: broken symbolic link to `/oradata/DATAPRE/file.dbf (deleted)'
$ echo > /proc/25575/fd/33

The same reason will cause different disk usage from du command and df command, please refer to Why does df show bigger disk usage than du?

To identify the used file size (in blocks), use the command below:

# lsof -Fn -Fs |grep -B1 -i deleted | grep ^s  | cut -c 2- | awk '{s+=$1} END {print s}'

Root Cause

On Linux or Unix systems, deleting a file via rm or through a file manager application will unlink the file from the file system's directory structure; however, if the file is still open (in use by a running process) it will still be accessible to this process and will continue to occupy space on disk. Therefore such processes may need to be restarted before that file's space will be cleared up on the filesystem.


No space left on device – running out of Inodes

 One of our development servers went down today. Problems started with deployment script that claimed that claimed “No space left on device”, although partition was not nearly full. If you ever run into such trouble – most likely you have too many small or 0-sized files on your disk, and while you have enough disk space, you have exhausted all available Inodes. Below is the solution for this problem.

1. check available disk space to ensure that you still have some

$ df

Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/xvda             33030016  10407780  22622236  32% /
tmpfs                   368748         0    368748   0% /lib/init/rw
varrun                  368748        56    368692   1% /var/run
varlock                 368748         0    368748   0% /var/lock
udev                    368748       108    368640   1% /dev
tmpfs                   368748         0    368748   0% /dev/shm

2. check available Inodes

$ df -i

Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/xvda            2080768 2080768       0  100% /
tmpfs                  92187       3   92184    1% /lib/init/rw
varrun                 92187      38   92149    1% /var/run
varlock                92187       4   92183    1% /var/lock
udev                   92187    4404   87783    5% /dev
tmpfs                  92187       1   92186    1% /dev/shm
If you have IUse% at 100 or near, then huge number of small files is the reason for “No space left on device” errors.

3. find those little bastards

$ for i in /*; do echo $i; find $i |wc -l; done

This command will list directories and number of files in them. Once you see a directory with unusually high number of files (or command just hangs over calculation for a long time), repeat the command for that directory to see where exactly the small files are.

$ for i in /home/*; do echo $i; find $i |wc -l; done

4. once you found the suspect – just delete the files

$ sudo rm -rf /home/bad_user/directory_with_lots_of_empty_files

You’re done. Check the results with df -i command again. You should see something like this:

Filesystem            Inodes   IUsed   IFree IUse% Mounted on

/dev/xvda            2080768  284431 1796337   14% /
tmpfs                  92187       3   92184    1% /lib/init/rw
varrun                 92187      38   92149    1% /var/run
varlock                92187       4   92183    1% /var/lock
udev                   92187    4404   87783    5% /dev
tmpfs                  92187       1   92186    1% /dev/shm