7.3.2 Memory tuning

Table of Contents

Understanding the Role of Memory in Performance

Memory tuning focuses on how Linux uses RAM, swap, and caches to keep applications responsive and the system stable under load. Good memory tuning does not simply try to “use less RAM.” Instead, it aims to use RAM in a way that reduces latency, avoids unnecessary swapping, and matches the workload characteristics.

Linux treats free memory as wasted opportunity. It actively fills unused RAM with caches to accelerate disk access, while still being able to reclaim it when applications need more space. Memory tuning is about influencing these policies and validating their effects with measurements.

Identifying Memory Bottlenecks

Before changing any settings, you must confirm that memory is the real source of performance issues. CPU or disk bottlenecks can look like memory problems, so you should rely on multiple indicators.

You will typically look for some combination of high memory utilization, frequent swapping, and increasing latency. If processes are killed by the Out Of Memory killer, the system is clearly under severe memory pressure.

To confirm a memory bottleneck, examine current usage, paging activity, and how caches behave when the workload runs. If you only see large caches but little swap activity and no application slowdowns, you probably do not have a real memory problem.

Always measure before and after changing memory settings. Never apply tuning changes blindly in production.

Key Memory Metrics and Tools

Several standard tools help you observe memory behavior. These tools do not change configuration. They summarize aspects like total and free RAM, buffers and cache, and swap activity. Advanced performance tools are covered elsewhere, so here we focus on what is specific to memory.

free is a quick overview tool. The modern output shows memory broken into total, used, free, shared, buff/cache, and available. The available column estimates how much memory can be given to new processes without swapping, which is more meaningful than the raw free field.

vmstat shows memory and paging in one view. The si and so columns show swap in and swap out per second, which are crucial when diagnosing whether the system is actively paging. High and sustained si and so usually indicate that the working set does not fit in RAM.

top and htop allow you to see which processes consume the most memory. The RES value corresponds to resident memory, the part of a process that is actually in RAM. The VIRT value includes all virtual mappings and may look large without necessarily being a problem.

/proc/meminfo exposes detailed kernel memory statistics. You can read it directly or with grep to find fields such as MemTotal, MemFree, MemAvailable, Buffers, Cached, SwapTotal, and SwapFree. Data from this file is also used by higher level tools.

These metrics should be observed over time, ideally under realistic load. A single snapshot rarely tells the full story.

Page Cache and File System Caching

Linux uses part of RAM as a page cache for file data and metadata. This cache improves performance because reading from RAM is much faster than accessing physical disks or even SSDs. When you read a file, its contents are placed in the page cache. Later reads can be served from memory instead of the disk.

If applications free memory or close files, the kernel often keeps the corresponding pages cached as long as there is no tighter memory pressure. When other processes allocate more memory, the kernel can reclaim some cached pages. Cache is therefore “soft” memory usage. It speeds up I/O but is not required by applications.

You should avoid trying to keep the page cache artificially small. A large cache with low swap activity is usually a sign of a healthy system that is using available RAM effectively. Problems appear when cache must be aggressively reclaimed and swapped data must be frequently read back into RAM.

For testing purposes, you can temporarily drop caches using /proc/sys/vm/drop_caches. This is not a tuning knob for regular operation. It is mostly used in controlled benchmarking to compare cold cache and warm cache behavior.

Do not routinely clear the page cache on production systems. Dropping caches can degrade performance and hide real tuning issues.

Swap and Swappiness

Swap is a disk based extension of memory. It allows the kernel to move inactive pages from RAM to disk so that RAM can be used for other purposes. While this prevents immediate out of memory conditions, accessing swapped data is significantly slower than using RAM.

Swappiness controls how aggressively the kernel prefers to reclaim file cache or swap out anonymous memory. It is a value from 0 to 100. A higher value means the kernel will more willingly move anonymous pages to swap and keep more cache in RAM. A lower value means the kernel will prefer to drop cache first and avoid swapping anonymous pages.

The current swappiness can be viewed in /proc/sys/vm/swappiness. You can adjust it temporarily with a command like:

sudo sysctl vm.swappiness=10

To make the change persistent, you can add a line to the sysctl configuration. The exact mechanism for persistent configuration is explained elsewhere so here we only mention that it exists.

Choosing a value depends on workload. Systems that run large in memory databases or applications that are sensitive to latency often prefer a lower swappiness value such as 1 or 10 to reduce anonymous page swapping. Workstations that benefit from larger file caches and run many background applications may perform better with a moderate value.

If you set swappiness too low and the system truly runs short of RAM, the kernel may have fewer choices and you may encounter the Out Of Memory killer more frequently. If you set it too high, the system might swap out application memory even when there is still buffer cache to reclaim, which can cause interactive lag.

Transparent Huge Pages

Transparent Huge Pages, often shortened to THP, allow the kernel to use larger memory pages automatically instead of the standard 4 KB pages. On some workloads, especially large in memory applications that access memory sequentially or in predictable patterns, huge pages can reduce TLB misses and CPU overhead, which can speed up memory intensive tasks.

THP behavior is controlled by files in /sys/kernel/mm/transparent_hugepage. There are usually modes such as always, madvise, and never. In always mode, the kernel actively tries to allocate huge pages everywhere possible. In madvise mode, applications must explicitly request huge pages via system calls. In never mode, THP is disabled.

Many database vendors and performance guides recommend switching THP from always to madvise or never, because automatic huge page promotions can sometimes create latency spikes, especially when large pages are collapsed or split at runtime.

You can inspect and temporarily change the THP mode by echoing values into the appropriate control files. As with other kernel parameters, permanent changes typically involve configuration files or boot parameters. You should verify the effect using both performance benchmarks and application specific metrics.

THP can be useful when carefully validated but it is not a guaranteed improvement for all workloads. Memory tuning must consider whether the particular applications running on the system actually benefit from huge pages.

Overcommit and Out Of Memory Behavior

Linux allows processes to request more virtual memory than physically exists. This is called overcommit. The rationale is that many applications reserve more address space than they actually use. Overcommit makes better use of resources at the cost of some risk if all applications decide to use their full reservations at the same time.

Overcommit behavior is controlled by vm.overcommit_memory and vm.overcommit_ratio. The exact policy affects how the kernel decides whether to allow memory allocations. Common modes are: heuristic overcommit, strict accounting, or a more permissive behavior.

In strict modes, allocations can fail if the system believes that they might not be backed by RAM and swap. This can protect servers from catastrophic memory exhaustion at the cost of occasional allocation failures. In permissive modes, the system may allow allocations that later cause genuine out of memory conditions if the workload spikes unexpectedly.

When the system runs completely out of memory, the Out Of Memory killer is invoked. The kernel chooses a process to kill in order to free memory. The selection is based on heuristics that consider memory usage and process importance. While this protects the system from a total stall, it is often disruptive. Tuning overcommit parameters and carefully sizing workloads can help reduce the risk of unexpected OOM events.

Memory tuning in this area focuses on choosing an overcommit policy that matches the reliability requirements of your workloads. Environments that host many untrusted or unpredictable applications may benefit from stricter settings, while controlled workloads with known memory behavior may tolerate more aggressive overcommit.

Controlling Application Memory Usage

System wide tuning is only part of memory optimization. Many memory issues are better solved by controlling the behavior of individual processes or classes of workloads.

On systems that use cgroups and modern init systems, you can restrict how much memory a service can consume. Memory limits at the cgroup level can prevent a single misbehaving process from exhausting RAM and causing swapping or OOM events for the whole system. You can choose to have excess memory usage result in the process being killed or throttled, depending on configuration.

For interactive desktops or development machines, you might instead choose to monitor memory usage and adjust application settings manually. Many applications such as database servers, JVM based software, and cache servers have their own internal memory limits and cache sizes. Tuning those application level settings is often more effective than only changing kernel parameters.

Combined strategies use cgroup limits as a safety net while tuning the application configuration to keep memory usage within a predictable envelope under normal conditions.

NUMA Considerations on Multi Socket Systems

On systems with multiple CPU sockets, memory is divided into NUMA nodes that are closer to certain CPUs in terms of latency. If a process runs on one socket but its memory is allocated from another, it may experience higher memory access latency.

Memory tuning on NUMA systems must consider both CPU affinity and memory locality. Tools such as numactl can be used to bind processes to specific nodes or to enforce memory allocation policies per node. System services can be configured to stay within a specific NUMA node to reduce cross node traffic.

You can view NUMA statistics and topology to determine whether memory allocations are balanced or imbalanced. If one node is heavily used while another is mostly idle, overall performance may suffer even if total RAM is sufficient.

For memory intensive applications such as large databases, pinning both CPU and memory to the same node can significantly improve performance. However, strict pinning may reduce flexibility when other workloads need resources, so the tuning must always consider the full system usage pattern.

Measurement Driven Memory Tuning

Effective memory tuning is an iterative and measurement driven process. You start by gathering baseline metrics under normal and peak workloads. You then adjust one parameter or setting at a time, such as swappiness or THP mode, and observe the effects over a meaningful period.

The goal is to improve tangible outcomes such as response times, throughput, or the absence of swap storms and OOM events, not just to maximize free memory. Different workloads respond differently, so guidelines must be validated in each environment.

You also need to track long term behavior, because memory fragmentation, leaks, or gradual changes in load can alter the impact of your tuning decisions. Regular reviews of memory metrics, logs that indicate OOM events, and application performance indicators help ensure that your tuning remains appropriate as the system evolves.

Change only one significant memory tuning parameter at a time, and keep records of the old and new values and their observed effects.

With a disciplined, measurement focused approach, you can align Linux memory behavior with the specific needs of your workloads and maintain both performance and stability under varying conditions.

Comments

Please login to add a comment.

Don't have an account? Register now!