Why We Built BenchTest: Solving the Container Resource Sizing Problem

Getting container resource sizing right is one of the most challenging aspects of running applications in Kubernetes. Too little resources, and your application crashes or performs poorly. Too much, and you're burning money on idle capacity.

According to Datadog's research, over 65% of containers use less than half of their requested CPU and memory [1]. This means most teams are significantly over-provisioning their workloads, leading to unnecessary costs and inefficient resource utilization.

This is exactly why we built BenchTest. To help developers solve this problem before it becomes a production issue.

The Resource Management Challenge

In Kubernetes, a container requests a set of resources as part of its pod specification. The scheduler takes these requests into account when deciding where to place pods in the cluster. These requests dictate how your cluster's capacity is allocated, affecting how all other pods will be scheduled going forward.

To see this in action, let’s imagine a situation where a cluster has two worker nodes, each with 2 cores of CPU.

A new pod gets created with a container that is requesting 1,500 millicores (1.5 cores) of CPU:

kind: Pod
metadata:
  name: frontend
spec:
  containers:
    - name: frontend
      image: images.benchtest.dev/frontend:v1
      resources:
        requests:
          cpu: "1500m"

The Kubernetes scheduler selects a node with enough capacity to fulfill the request and reserves that amount for the pod.

Node 1: 1.5 / 2.0 Cores Reserved
Node 2: 0.0 / 2.0 Cores Reserved

Now, let’s create a second pod requesting 1,000 millicores (1 core) of CPU:

kind: Pod
metadata:
  name: backend
spec:
  containers:
    - name: backend
      image: images.benchtest.dev/backend:v1
      resources:
        requests:
          cpu: "1000m"

The scheduler sees that Node 1 only has 0.5 cores left (2.0 - 1.5 = 0.5), which isn't enough to satisfy the 1.0 core request. So, it must place the second pod on Node 2.

Node 1: 1.5 / 2.0 Cores Reserved
Node 2: 1.0 / 2.0 Cores Reserved

The scheduler has done its job correctly based on the requests. But what if the pods' actual usage is much lower than what they requested?

What if frontend only needs 250 millicores (0.25 cores) and backend only needs 500 millicores (0.5 cores)?

The problem becomes evident. The cluster state shows that 2.5 total cores are reserved, but only 0.75 cores are actually being used. Worse, Node 1 has 0.5 cores that are now "stranded", they are completely unused but cannot be scheduled by any pod requesting more than 0.5 cores.

Request Example

This wasted, unschedulable capacity ultimately increases the need for more (or bigger) nodes,and the costs associated with them. This is why it is so critical to size your pods correctly. [2]

How Kubernetes and Linux Handle Resource Management

To solve this, we need to understand the two layers of resource control. The core difference lies in their purpose:

Kubernetes Resource Requests and Limits are configuration objects used by the Kubernetes control plane for scheduling and high-level policy.
Linux Resource Controls (cgroups) are the low-level, kernel-based mechanisms that actually enforce those constraints on containers at runtime.

Kubernetes translates its requests and limits directly into Linux cgroup settings, but this translation isn't always intuitive.

A Deeper Dive into Kubernetes Resource Requests

As we saw, Resource Requests act as a scheduling guarantee. But how this guarantee is fulfilled at runtime differs significantly between CPU and memory.

How Memory Requests Work

When you set a memory request, you establish a guaranteed lower bound.

Scheduling: The scheduler finds a node with enough unallocated memory to satisfy the request. If not, the pod remains Pending.
Runtime Guarantee: Once scheduled, the container is guaranteed access to its requested amount of memory.
The Risk of Bursting: A container can use more memory than requested if it's available. However, this "borrowed" memory can be reclaimed at any time if a new pod needs it, causing your pod to be terminated. This creates a painful trade-off between node density and application stability.

How CPU Requests Work

In contrast, CPU is a compressible resource. Instead of terminating a process, the system can throttle it.

Scheduling: The scheduler finds a node with enough unallocated CPU capacity.
Runtime Enforcement via CPU Shares: At runtime, a CPU request is translated into cgroup CPU shares, which function as a relative weight. A container with 1000m shares will get twice the CPU time as a container with 500m shares, but only when the CPU is under contention. If the CPU is idle, a container can burst and use as much as it wants, leading to unpredictable performance across different nodes.

Resource Limits

Purpose: Hard Cap (Maximum) Defines the maximum amount of a resource the container is allowed to consume.

CPU Limit: Exceeding the limit results in throttling, slowing down your application.
Memory Limit: Exceeding the limit results in the container being terminated by the Out-Of-Memory (OOM) Killer.

In essence, Kubernetes provides the user-friendly interface and cluster-wide intelligence, while Linux provides the raw enforcement engine [3].

The Cost of Getting Resource Sizing Wrong

Rightsizing your workload directly impacts your cost, performance, and reliability. The consequences fall into two main categories:

1. Over-Provisioning: Unnecessary Costs

This is the exact problem we saw in our two-node cluster example. When you request more resources than your container uses, you pay for idle capacity.

Increased Costs: You pay for resources that sit unused.
Cluster Inefficiency: The scheduler reserves these resources, creating stranded capacity and making them unavailable for other workloads.
Premature Scaling: Wasted resources prevent efficient pod placement, forcing unnecessary and costly node scale-up events.

2. Under-Provisioning: Performance Impact

When you request fewer resources than your container needs, you face severe performance and reliability issues.

CPU Throttling: Containers are throttled, leading to high latency and application errors under load.
OOMKilled Errors: Containers exceeding memory limits are abruptly killed, causing application crashes and service instability.
Pod Eviction: Kubernetes may evict under-provisioned pods when nodes run low on memory.

Beyond Guesswork: A New Standard for Resource Management

Sizing container resources is a major challenge, stemming from a misunderstanding of resource requests and limits. When developers, who write the code, and operators, who manage it in production, aren't aligned, it creates a vicious cycle. Teams are left guessing, leading to either overprovisioning and wasted money or underprovisioning and critical performance incidents.

BenchTest closes this gap by making resource awareness an integral part of the development lifecycle. BenchTest empowers developers to answer a critical question before their code is ever deployed: "What resources does my application actually need to perform its job effectively?"

The ultimate result is a more efficient and reliable software ecosystem.

For Engineers: It means building applications that are not only correct but also well-behaved and cost-effective by design.
For Businesses: It means unlocking significant cloud cost savings, improving application stability, and increasing the velocity of development teams who can ship features with confidence.

By shifting resource optimization left, you're not just fine-tuning a configuration file; you're adopting a proactive standard of excellence for building and running cloud-native applications. The era of over-provisioning for safety or under-provisioning into failure is over. The future is building efficiently from the start.

Key Benefits of BenchTest:

Early Detection: See container utilization metrics during development, not after a production deployment.
Cost Optimization: Identify over-provisioned resources before they impact your cloud bill.
Performance Assurance: Ensure your containers have adequate resources to handle production workloads.
Reliability: Prevent OOM kills and performance degradation by setting appropriate limits.
Data-Driven Decisions: Use performance test results to make informed resource allocation decisions.

How It Works:

Run Performance Tests: Execute realistic load tests against your application.
Monitor Resource Usage: BenchTest tracks CPU and memory consumption during tests.
Analyze Results: Get detailed insights into actual resource needs vs. current allocations.
Optimize Configuration: Adjust your Kubernetes resource requests and limits based on real data.

The Bottom Line

Resource sizing doesn't have to be guesswork. With BenchTest, you can:

Eliminate waste by identifying over-provisioned containers.
Prevent failures by ensuring adequate resource allocation.
Reduce costs by optimizing resource utilization.
Improve reliability by setting appropriate limits based on real usage patterns.

The relationship between Kubernetes scheduling and Linux cgroups is complex, but understanding it is crucial. BenchTest simplifies this process by providing the data you need to make informed decisions about resource allocation.

Start optimizing your container resources today!

References

[1] Datadog Container Report

[2] Practical tips for rightsizing your Kubernetes workloads

[3] Production Kubernetes: Building Successful Application Platforms