When, why, and how to run databases in Kubernetes

When, why, and how to run databases in Kubernetes ...

Should I run my database in Kubernetes? Because the honest answer is: it depends. And while there may be significant benefits, there are also trade-offs. However, any decision depends on what you need to do in the future.

Kubernetes is the preferred platform for managing containerized workloads and services. Most executives and developers agree that the benefits far outweigh the challenges. Even the largest enterprises are using the platform to run stateless and stateful applications on-premises or as hybrid cloud deployments in production.

When we think about data and the Kubernetes ecosystem, things become more complicated. Stateful applications require a new database architecture that addresses the applications' scale, latency, availability, and security demands. What database architecture is the most appropriate to deal with these problems?

In this article, we'll discuss the advantages and possible trade-offs of running a database in Kubernetes, and how many of these trade-offs may be mitigated.

Transform 2022

Join us for the leading event on applied artificial intelligence for enterprise business and technology decision makers in person on July 19 and virtually from July 20 to 28.

Improved resource utilization

The widespread adoption of microservices architecture results in a large number of relatively small databases with a finite number of nodes. This presents significant management difficulties, and companies often struggle to optimally allocate their databases. However, running Kubernetes provides an infrastructure-as-code strategy to overcome these issues. It makes it simple to manage multiple microservices deployments at scale while optimizing resource utilization on the available nodes.

This is definitely one of the greatest arguments for Kubernetes. It is useful when managing many databases in a multitenant environment. It allows businesses to not only save money, but also reduce the number of nodes required.

The pod's dynamic and flexible expansion requires very little effort.

Kubernetes has the unique ability to modify memory, CPU, and disk to scale databases according to workload demands. This flexibility to scale up automatically without incurring downtime is critical for large corporations that often face demand spikes.

Consistency and portability between clouds, on-premises, and edge

Companies want to be able to build, deploy, and manage workloads irrespective of where they are located. Moreover, they want the ability to migrate workloads from one cloud to another. The problem is, most organizations have at least some legacy code they still use on-premises that they'd like to move to the cloud.

Kubernetes allows organizations to develop infrastructure as code consistently, regardless of where it is located. So, if the development team can write a bit of code describing the resource requirements, the platform will take care of it. This provides the same level of control in the cloud that one would previously have on bare metal servers.

Out-of-the-box infrastructure orchestration

Because of the platform's ability to move workloads from pod to node or vice versa, pods can be started anywhere. This is a bigger concern for databases when dealing with stateful workloads, as it requires setting up specific policies in Kubernetes. However, a few simple rules allow your database instance to survive a hardware failure.

Day-2 operations are automatically created.

Periodic backups and software upgrades are important, but they are also costly. Kubernetes automates most day-2 operations, and, even better, performing these updates across a cluster is easy. For example, if you wanted to patch a security vulnerability across a cluster, Kubernetes makes that happen.

For a traditional relational database management system (RDBMS) such as Kubernetes, automated day-2 operations may be challenging. You typically have multiple copies of data, therefore when you lose a pod there's another copy elsewhere. This means that the user is still responsible for migrating data between pods and resynching.

When migrating data manually, one would check to see that the cluster isnt under much load, wait until the load dissipates, and then transfer the data to another node. However, if youre migrating data automatically, your replica may think it has the data, when it really does not.

Important trade-offs and how to manage them

For all the advantages of running databases in Kubernetes, there are drawbacks to keep an eye on. For starters, there is an increased likelihood of pod crashes because of process affinity, and if the process that starts a pod goes down, the whole pod may collapse.

Local storage is more common than external persistent storage. They provide fast performance, but they can also cause problems because, when you move a pod around, the storage doesn't go with it. whereas external persistent storage offers a network-attached form of storage with a logical view of drives.

Organizations should also recognize the dangers associated with network restrictions in Kubernetes clusters. If an application does not need to be on the same cluster as the actual database, then a load balancer may be required. Network difficulties, sometimes related to the geographical location of the cluster, may create additional difficulties.

Final, one must keep an eye on operational flaws since acquiring Kubernetes expertise takes time. Organizations will need to:

The advantages of running a database in Kubernetes are clear. There are roadblocks and trade-offs, but there are solutions.

Karthik Ranganathan is the CTO and cofounder ofYugabyte.

You may also like: