Skip to content

Vertical Pod Autoscaler (VPA) in Simple Container

Overview

Vertical Pod Autoscaler (VPA) automatically adjusts CPU and memory requests for your containers based on actual usage patterns. Simple Container provides built-in VPA support for both application deployments and infrastructure components like Caddy ingress controllers.

Key Benefits

Cost Optimization

  • Prevents over-provisioning: Reduces wasted resources and cloud costs
  • Right-sizing: Automatically adjusts to actual usage patterns
  • Resource efficiency: Optimizes cluster utilization

Performance Optimization

  • Prevents resource starvation: Ensures adequate resources during load spikes
  • Automatic scaling: Responds to changing workload demands
  • Reduced manual tuning: Eliminates guesswork in resource allocation

Operational Efficiency

  • Hands-off management: Reduces manual resource configuration
  • Data-driven decisions: Based on actual usage metrics
  • Continuous optimization: Adapts to changing application behavior

VPA Configuration Levels

Simple Container supports VPA configuration at multiple levels:

1. Application Level (client.yaml)

Configure VPA for your applications using cloudExtras:

# client.yaml
stacks:
  production:
    config:
      cloudExtras:
        vpa:
          enabled: true
          updateMode: "Auto"
          minAllowed:
            cpu: "100m"
            memory: "128Mi"
          maxAllowed:
            cpu: "2"
            memory: "4Gi"

2. Infrastructure Level (server.yaml)

Configure VPA for infrastructure components like Caddy:

# server.yaml
resources:
  production:
    resources:
      gke-cluster:
        type: gcp-gke-autopilot-cluster
        config:
          caddy:
            vpa:
              enabled: true
              updateMode: "Auto"
              minAllowed:
                cpu: "50m"
                memory: "64Mi"

VPA Update Modes

Understanding VPA update modes is crucial for production deployments:

Off Mode

vpa:
  updateMode: "Off"
- Behavior: Only provides resource recommendations - Use case: Testing, analysis, and planning - Impact: No automatic changes to running pods

Initial Mode

vpa:
  updateMode: "Initial"
- Behavior: Sets resources only when pods are created - Use case: Conservative approach for critical applications - Impact: New pods get optimized resources, existing pods unchanged

Auto Mode

vpa:
  updateMode: "Auto"
- Behavior: Updates resources by recreating pods (equivalent to Recreate mode) - Use case: Recommended for stateless applications and ingress controllers - Impact: Brief service interruption during pod recreation

InPlaceOrRecreate Mode (Preview)

vpa:
  updateMode: "InPlaceOrRecreate"
- Behavior: Updates resources in-place when possible, recreates if needed - Use case: Advanced scenarios with minimal disruption tolerance - Impact: May cause brief interruptions for some resource changes

Resource Boundaries

VPA resource boundaries prevent runaway resource allocation:

Minimum Allowed Resources

vpa:
  minAllowed:
    cpu: "100m"      # Prevent resource starvation
    memory: "128Mi"   # Ensure basic functionality

Maximum Allowed Resources

vpa:
  maxAllowed:
    cpu: "4"          # Control maximum costs
    memory: "8Gi"     # Prevent memory exhaustion

Controlled Resources

vpa:
  controlledResources: ["cpu", "memory"]  # Specify which resources to manage

Controlled Values

By default VPA rewrites both requests and limits at admission, scaling the limit proportionally with the request. For workloads whose limits are sized to absorb cold-start CPU bursts (Django/gunicorn, Node SSR, JVM warmup), a low minAllowed.cpu paired with the default behaviour can shrink the CPU limit below what the cold-start path needs, causing startup-probe failures and SIGKILLs.

Set controlledValues: "RequestsOnly" to tell VPA to only rewrite requests and leave the deployment template's limits untouched. The deployment then keeps its full cold-start headroom while VPA still right-sizes the steady-state request.

vpa:
  enabled: true
  updateMode: "Auto"
  minAllowed:
    cpu: "50m"       # safe at this floor when limit is preserved
    memory: "64Mi"
  maxAllowed:
    cpu: "2"
    memory: "4Gi"
  controlledResources: ["cpu", "memory"]
  controlledValues: "RequestsOnly"     # leave deployment-template limits alone

Valid values:

  • RequestsAndLimits (default) — VPA scales both. Equivalent to omitting the field.
  • RequestsOnly — VPA scales only requests; limits stay at the values in the underlying deployment template.

VPA Best Practices

1. Environment-Specific Configuration

Use different VPA settings for different environments:

# Production: Aggressive optimization
production:
  cloudExtras:
    vpa:
      updateMode: "Auto"
      maxAllowed:
        cpu: "4"
        memory: "8Gi"

# Staging: Conservative approach
staging:
  cloudExtras:
    vpa:
      updateMode: "Initial"
      maxAllowed:
        cpu: "2"
        memory: "4Gi"

# Development: Recommendation only
development:
  cloudExtras:
    vpa:
      updateMode: "Off"
      maxAllowed:
        cpu: "1"
        memory: "2Gi"

2. Combining VPA with Manual Limits

VPA works alongside manual resource specifications:

# Manual resource limits
resources:
  limits:
    cpu: "2"
    memory: "4Gi"
  requests:
    cpu: "500m"    # VPA will adjust this
    memory: "1Gi"  # VPA will adjust this

# VPA configuration
vpa:
  enabled: true
  maxAllowed:
    cpu: "2"      # Matches manual limit
    memory: "4Gi" # Matches manual limit

3. Ingress Controller Considerations

For critical infrastructure like Caddy ingress controllers:

caddy:
  vpa:
    enabled: true
    updateMode: "Auto"  # Safer for ingress controllers
    minAllowed:
      cpu: "50m"              # Ensure minimum availability
      memory: "64Mi"
    maxAllowed:
      cpu: "1"                # Reasonable upper bound
      memory: "1Gi"

Monitoring VPA

Check VPA Status

# List all VPAs in the cluster
kubectl get vpa

# Get detailed information about a specific VPA
kubectl describe vpa <vpa-name>

# View current recommendations
kubectl get vpa <vpa-name> -o yaml

Understanding VPA Output

status:
  recommendation:
    containerRecommendations:
    - containerName: web-app
      target:
        cpu: "250m"     # Recommended CPU request
        memory: "512Mi" # Recommended memory request
      upperBound:
        cpu: "500m"     # Upper confidence bound
        memory: "1Gi"
      lowerBound:
        cpu: "100m"     # Lower confidence bound
        memory: "256Mi"

Common Patterns

Microservices Pattern

# Each microservice with appropriate VPA settings
web-api:
  cloudExtras:
    vpa:
      enabled: true
      updateMode: "Auto"
      maxAllowed:
        cpu: "1"
        memory: "2Gi"

background-worker:
  cloudExtras:
    vpa:
      enabled: true
      updateMode: "Auto"
      maxAllowed:
        cpu: "2"
        memory: "4Gi"

Multi-Tenant Pattern

# Different VPA settings per tenant
tenant-a:
  cloudExtras:
    vpa:
      maxAllowed:
        cpu: "2"
        memory: "4Gi"

tenant-enterprise:
  cloudExtras:
    vpa:
      maxAllowed:
        cpu: "8"
        memory: "16Gi"

Troubleshooting

VPA Not Providing Recommendations

  • Ensure VPA controller is installed in the cluster
  • Check if pods have sufficient runtime (usually 24+ hours)
  • Verify resource usage patterns exist

Recommendations Too High/Low

  • Adjust minAllowed and maxAllowed boundaries
  • Check if workload patterns are representative
  • Consider using controlledResources to limit scope

Pods Not Being Updated

  • Verify VPA update mode is not "Off"
  • Check VPA has sufficient RBAC permissions
  • Ensure resource boundaries allow for changes

Integration with Simple Container

VPA integrates seamlessly with Simple Container's architecture:

  • Parent Stack: DevOps configures VPA for infrastructure (Caddy, operators)
  • Client Stack: Developers configure VPA for applications
  • Environment Separation: Different VPA settings per environment
  • Resource Sharing: VPA optimizes shared infrastructure resources

This separation ensures that VPA configuration follows Simple Container's principle of separation of concerns while providing automatic resource optimization across the entire stack.

Next Steps