Kahibaro
Discord Login Register

7.5.2 Memory management

Big Picture: What Linux Memory Management Actually Does

Linux memory management is about deciding:

You’ll see the same ideas repeated at different levels:

This chapter looks at how Linux implements those ideas internally and how you can observe and influence them.


Virtual Memory and Address Spaces

Each process in Linux sees a virtual address space (e.g. on x86_64, up to 128 TB of virtual addresses, but much less physical RAM).

Key points:

User vs Kernel Address Space

On most 64-bit Linux systems:

User processes cannot directly access kernel space; transitions happen via syscalls, interrupts, etc.

The kernel describes each process’s virtual address layout using the mm_struct and VMAs:

Each VMA has flags like:

Paging, TLBs, and Page Faults

Pages and Page Frames

There are also huge pages:

TLB (Translation Lookaside Buffer)

Hardware caches address translations in a TLB:

Page Faults

When a process accesses a virtual address that isn’t currently mapped, the CPU raises a page fault and hands control to the kernel:

Userspace tools like top, ps, pidstat, perf can show minor/major page fault counts.


Physical Memory: Zones, Nodes, and `struct page`

Linux hardware abstraction introduces:

NUMA Nodes

On NUMA (Non-Uniform Memory Access) systems:

NUMA-aware allocators exist in both kernel and userspace (e.g. numactl, libnuma).

Memory Zones

Within each node, memory is divided into zones (logical groupings by physical address range and hardware constraints):

Typical zones:

Allocations use GFP flags (e.g. GFP_KERNEL, GFP_USER) that indicate which zones and constraints apply.

The `struct page` Abstraction

Each physical page frame is described by a struct page:

Kernel code never directly deals with “bare” physical addresses; it works via these struct page objects.


Allocators: From Bytes to Pages and Slabs

Linux uses different allocators layered on top of each other:

  1. Buddy allocator — manages pools of pages
  2. Slab subsystem — manages objects inside pages
  3. Per-CPU caches — reduce contention on global structures

Buddy Allocator

Manages memory in powers of two:

Key properties:

The GFP flags you see in kernel code influence which zone, reclaim behavior, and allowances (e.g. can it block, can it use emergency reserves).

Slab / SLUB / SLOB Allocators

Above the buddy allocator, Linux uses slab-based allocators for small, frequently allocated objects:

Three implementations exist, but SLUB is the default on most modern distros:

slabtop lets you inspect kernel slab usage from userspace.


Process Memory Layout: Heap, Stack, and Mappings

Inside a process’s virtual address space, typical regions include:

The kernel doesn’t care about C-level notions like “heap” or “stack”; it just knows:

Userspace allocators (glibc malloc, jemalloc, tcmalloc, etc.) decide how to:

/proc/<pid>/maps shows VMAs; /proc/<pid>/smaps gives detailed per-VMA stats.


Copy-on-Write (CoW) and Fork

Linux optimizes memory usage during fork() using copy-on-write:

This allows:

CoW is also used by:

Page Cache and Disk I/O

Linux treats unused RAM as cache for file data to speed up I/O:

Page Cache

Properties:

Writeback

The kernel’s writeback subsystem:

Tunable parameters in /proc/sys/vm/ (e.g. dirty_ratio, dirty_background_ratio) influence when writeback kicks in.


Reclaim, Swapping, and Memory Pressure

Reclaimable Memory

When the system is low on free pages, the kernel tries to reclaim memory:

The reclaim logic uses:

Swapping

If reclaiming caches isn’t enough, Linux may use swap:

Swap is controlled by:

Swapping is essential for:

But excessive swapping causes thrashing and poor performance.


Overcommit and the OOM Killer

Linux allows memory overcommit by default:

The overcommit behavior is tunable via:

OOM (Out-Of-Memory) Killer

When the kernel cannot satisfy an allocation even after reclaim and swap, it may trigger the OOM killer:

Files:

Services like databases often set:

Kernel logs OOM events in dmesg / journalctl.


Kernel Memory vs User Memory

The kernel has its own memory needs independent of processes:

Unlike userspace:

Allocation APIs:

Kernel memory leaks or fragmentation can starve the system even if user processes look small.


NUMA-Aware Memory Management

On NUMA machines, memory locality matters:

Policies:

Key concepts:

Tools like numastat and numactl --hardware show NUMA-related memory statistics.


Huge Pages and THP

Memory management at 4 KiB page granularity has overhead:

Transparent Huge Pages (THP)

Control:

Explicit Huge Pages

Applications that want fine-grained control can:

Benefit: more predictable behavior, strict reservation of huge page pools.


Zswap, Zram, and Compressed Memory

To extend effective RAM without hitting disk as often, Linux can compress pages:

Zswap

Enabled via kernel parameters and /sys/module/zswap/parameters/*.

Zram

Observing Memory Internals from Userspace

Linux exposes a lot of memory information in /proc and /sys:

Common tools:

Interpreting these outputs effectively is a core skill for performance tuning and debugging.


Memory Control Groups (Overview Only)

Cgroups v2 (and v1 memory controller) provide resource control over memory:

Key ideas:

The detailed mechanics of cgroups is covered elsewhere; here the important point is that many memory management decisions now operate both globally and per-cgroup.


Summary

Linux memory management revolves around:

Understanding these mechanisms is crucial for advanced performance analysis, debugging subtle memory issues, and building efficient, reliable Linux systems at scale.

Views: 100

Comments

Please login to add a comment.

Don't have an account? Register now!