Kahibaro
Discord Login Register

Performance Tuning

Understanding Linux Performance

Performance tuning is about making a system do more useful work with the same (or fewer) resources, while staying stable and predictable. It is not just “making benchmarks faster”.

At a high level, tuning always follows this loop:

  1. Define the goal
    What do you care about?
    • Maximum throughput (requests/second, jobs/hour, MB/s)
    • Minimum latency (response time, jitter)
    • Capacity (how many users / VMs / containers)
    • Efficiency (work per watt, work per dollar)
  2. Measure the current state
    Use monitoring and profiling tools (covered in other chapters) to see:
    • Where time is spent
    • Which resources are saturated
    • How performance changes with load
  3. Form a hypothesis
    Examples:
    • “The CPU is saturated on core 0 due to interrupts, moving NIC IRQs will help.”
    • “We’re I/O bound due to sync writes, enabling write‑back caching may help.”
    • “Context switches are high, reducing process count may help.”
  4. Apply a change
    Change one thing at a time:
    • A kernel parameter
    • A scheduler setting
    • A service configuration
    • Hardware / topology layout
  5. Measure again
    Compare before/after:
    • If it improved the target metric, keep it.
    • If it didn’t, revert and try a different hypothesis.

This loop is the core of performance tuning regardless of subsystem (CPU, memory, disk, network, etc.). Later chapters in this section look at CPU/memory/disk tuning specifics; here we stay at the strategy and system‑wide level.

Key Performance Concepts

Throughput vs Latency vs Utilization

These three concepts often pull in different directions:

Typical relationships:

Bottlenecks and Amdahl’s Law

A bottleneck is the resource or code path that limits performance. Speeding up anything else won’t help.

Amdahl’s Law quantifies this:

Implications for tuning:

This is why profiling and system‑wide tracing are more valuable than tweaking random kernel tunables.

Little’s Law and Queues

Whenever a resource is shared (CPU, disk, network, database), you essentially have:

$$
L = \lambda \times W
$$

Where:

For performance tuning:

Your job includes:

A Systematic Tuning Workflow

1. Define the Workload

Performance depends heavily on what you’re running. Clarify:

Without this, you can’t meaningfully judge whether a change is an improvement.

2. Establish a Baseline

Before tuning, capture:

Keep this baseline somewhere versioned (Git, docs) so you can:

3. Quick Health Checks

Before deep tuning, verify:

Often, fixing low‑hanging fruit yields large gains without complex tuning.

4. Identify the Primary Bottleneck

For each major resource, answer:

Once you know which resource saturates first under load, it becomes your primary tuning target.

5. Plan and Prioritize Changes

Possible classes of changes:

System‑level tuning cannot fix fundamental design issues. If every request requires N synchronous disk writes, no amount of scheduler tuning will match a design that batches or avoids those writes.

6. Test, Validate, and Document

For each change:

Treat tuning like code: change‑controlled, reviewed, and reversible.

System‑Level Tuning Themes

Schedulers and Priorities

Linux uses multiple schedulers that all influence performance:

Tuning often involves:

Details of how to do this are covered in the CPU and Disk performance chapters, but here the main principle is:

Match scheduler behavior to workload needs: throughput vs latency, fairness vs priority.

Caching and Locality

Most performance wins on modern systems come from better use of caches:

Core ideas:

From a tuning perspective:

NUMA Considerations

On multi‑socket systems, memory is separated into NUMA nodes:

Performance tuning on NUMA systems often includes:

If you treat a NUMA machine like a uniform SMP, you may leave significant performance on the table for memory‑sensitive workloads.

Power Management vs Performance

Modern systems use power‑saving features:

Trade‑offs:

For performance‑critical systems:

Measuring and Thinking About Performance

1. Time Scales and Granularity

Performance issues can appear at:

Tune your measurement tools to:

Over‑aggregated metrics (e.g., 1‑minute averages) can hide:

2. Tail Latency

Real systems care about more than averages:

Tuning for tail latency often involves:

3. Holistic vs Local Optimization

Danger of local optimization:

Always ask:

Holistic tuning sometimes means intentionally slowing down one component to:

Practical Tuning Guidelines

General Principles

  1. Change one thing at a time
    Combining multiple tweaks makes it impossible to know which one helped or hurt.
  2. Prefer simple over clever
    Simple, well‑understood configurations are easier to debug and operate.
  3. Don’t over‑tune for a synthetic benchmark
    Benchmark‑friendly settings might not reflect real‑world usage.
  4. Automate and codify tuning
    Keep sysctl and service configs under version control and deploy them consistently.
  5. Document intent
    Every non‑default parameter should have a comment:
    • What it does
    • Why it was changed
    • When it should be revisited

When to Tune vs When to Scale

Tuning helps when:

Scaling (more or better hardware) might be better when:

Often, you do some tuning to use hardware efficiently, then scale out.

Safe vs Risky Changes

Relatively safe (when tested properly):

Riskier (require careful testing and rollback plans):

Always maintain a rollback procedure and test in non‑production environments first.

Performance Tuning in the Bigger Picture

Performance tuning isn't a one‑time activity; it’s part of:

By treating performance as a continuous, measured, and collaborative practice, you reduce firefighting and increase system reliability.

In the following chapters (CPU tuning, memory tuning, and disk/I/O optimization), these general principles will be applied to specific subsystems, with concrete tools and configuration examples.

Views: 26

Comments

Please login to add a comment.

Don't have an account? Register now!