Kahibaro
Discord Login Register

CPU tuning

Understanding CPU Bottlenecks

Before tuning, you must confirm the CPU is actually the bottleneck and understand how it is being used.

Key CPU Utilization Metrics

Most tools ultimately report the same core metrics (from /proc/stat):

High CPU usage alone is not always bad; what matters is why and whether it’s affecting latency/throughput.

Typical patterns:

Identifying CPU Bottlenecks

Use monitoring tools (covered elsewhere) to answer:

Look specifically for:

CPU Scheduler and Priorities

Tuning starts with controlling which tasks get CPU time and when.

Nice Levels (Static Priority for Normal Tasks)

nice controls relative CPU share among regular (non-RT) processes.

Run a command with lower priority:

nice -n 10 long_batch_job

Increase priority of an already running process (requires sudo for negative nice):

sudo renice -n -5 -p 12345

Use cases:

Avoid extreme negative nice on many processes; you can starve system daemons.

Real-Time Scheduling Classes

Real-time (RT) policies force the scheduler to favor certain tasks above all normal ones.

Main policies:

Set real-time scheduling:

sudo chrt -f -p 90 12345      # SCHED_FIFO with priority 90
sudo chrt -r -p 80 12345      # SCHED_RR 80

Or start a command with RT:

sudo chrt -f 80 ./audio_engine

Warnings:

CFS Tuning via cgroups (CPU Shares and Quotas)

Using cgroups (v1/v2), you can partition CPU resources among groups of processes.

Typical knobs:

Example with systemd (per-service CPU weight):

Edit a unit override:

sudo systemctl edit myservice.service

Add:

[Service]
CPUWeight=1000   # default 100; 1000 gives higher share

Or to cap CPU:

[Service]
CPUQuota=50%

This is useful for:

Core and Thread Affinity

Controlling where processes run can reduce cache misses, migration overhead, and NUMA penalties.

Basic CPU Affinity

taskset binds a process or PID to specific cores:

# Run a program pinned to core 0
taskset -c 0 ./myapp
# Pin existing PID 12345 to cores 2–3
sudo taskset -cp 2-3 12345

Use cases:

Hyper-Threading Awareness

On CPUs with SMT/Hyper-Threading, each physical core presents 2+ logical CPUs.

Identify sibling threads:

lscpu | grep "Thread(s) per core"
lscpu -e   # lists CPUs with their core and socket IDs

Tuning approaches:

NUMA-Aware Placement

On multi-socket or NUMA systems, memory is closer to one CPU node than another.

Tools:

# Run on NUMA node 0 with memory allocated from node 0
numactl --cpunodebind=0 --membind=0 ./db_server

CPU Frequency and Power Management

CPU frequency scaling and power states affect both performance and latency.

CPU Frequency Governors

Linux exposes various governors that decide CPU frequency:

Common governors:

View current governor:

cpupower frequency-info

Set to performance (example for all cores):

sudo cpupower frequency-set -g performance

Or via /sys (per CPU):

echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

Tuning guidance:

Turbo Boost and Thermal Limits

Modern CPUs can boost above base frequency (Intel Turbo Boost, AMD Precision Boost).

Check if turbo is enabled (Intel example):

cat /sys/devices/system/cpu/intel_pstate/no_turbo
# 0 = turbo enabled, 1 = disabled

Enable/disable (Intel example):

# Disable turbo
echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo
# Enable turbo
echo 0 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo

Use cases:

C-States and Latency

C-states are CPU sleep states (C0 active, C1/C2… deeper sleep).

This often requires:

This is advanced: disabling power saving increases heat/power and may not be appropriate on general-purpose servers.

Kernel Parameters for CPU Scheduling

The Linux scheduler has tunables that affect latency vs throughput trade-offs.

Preemption Model and CONFIG Options

Chosen at kernel build time (not runtime tunable):

On distributions that ship RT or low-latency kernels (often for audio or trading), selecting them can dramatically reduce scheduling latency at some cost to raw throughput.

Scheduler Runtime Tunables

Runtime knobs are mostly under /proc/sys/kernel/ and /proc/sys/sched/. Examples (names vary by kernel version/dist):

You can experiment:

cat /proc/sys/kernel/sched_min_granularity_ns
echo 3000000 | sudo tee /proc/sys/kernel/sched_min_granularity_ns

But:

In practice, for most admins:

Application-Level CPU Tuning

Many performance gains come from how applications use CPU:

Reducing Context Switching and Overheads

Avoid spawning excessive threads or processes:

Tune application-level settings:

Benchmark with different settings rather than assuming more threads = more performance.

Optimizing Workload Characteristics

Where you have control over the software:

These changes can drastically cut CPU time and are often more impactful than OS-level tweaks.

Benchmarking and Validation

Any CPU tuning must be validated to avoid “cargo cult” optimizations.

Establish a Baseline

Before changes:

Record:

Apply One Change at a Time

To understand impact:

Avoid:

Watch for Regressions

After tuning, monitor:

Be ready to revert:

Putting It Together: Common CPU Tuning Scenarios

Scenario 1: Latency-Sensitive Web API on Bare Metal

Typical steps:

  [Service]
  CPUAffinity=0-7
  [Service]
  CPUWeight=1000

Scenario 2: Multi-Tenant VM Host

Goals: fairness, avoid noisy neighbors.

Scenario 3: High-Throughput Batch Processing

Goals: maximize throughput, tolerate higher latency.

Safety and Operational Considerations

Views: 24

Comments

Please login to add a comment.

Don't have an account? Register now!