Kahibaro
Discord Login Register

7.3.4 Profiling tools

Overview

Profiling tools help you understand where time and resources are actually spent in your system and applications. In contrast to the broader monitoring focus of the parent chapter, here the goal is fine‑grained measurement for diagnosis and optimization.

This chapter focuses on:

You won’t see full performance theory here, just the tools and how to apply them.

Types of Profiling

Before picking a tool, clarify what you’re profiling:

Many tools can do multiple of these with different commands or options.

`perf`: General‑Purpose Kernel and CPU Profiler

perf is a standard profiling and tracing tool integrated with the Linux kernel. It works with hardware performance counters (cycles, cache misses, branches, etc.) and kernel tracepoints.

Installation and Setup

On common distros:

You often need debug symbols for meaningful function names:

CPU Sampling (`perf record` / `perf report`)

  1. Profile a command:
   sudo perf record -g -- ./your_program --arg1 --arg2
  1. Profile an already running process:
   sudo perf record -g -p <PID> -- sleep 30

This samples the target PID for 30 seconds.

  1. View the report:
   perf report

Key options:

System‑Wide Sampling

To capture everything on the system:

sudo perf record -g -a -- sleep 10
sudo perf report

Flame Graphs from `perf`

perf integrates well with Flame Graphs, which make “hot paths” visually obvious.

  1. Generate folded stacks:
   sudo perf record -F 99 -a -g -- sleep 30
   sudo perf script > out.perf
  1. Use Brendan Gregg’s FlameGraph scripts (clone the repo):
   ./stackcollapse-perf.pl out.perf > out.folded
   ./flamegraph.pl out.folded > perf.svg

Open perf.svg in a browser: tall stacks = high CPU cost; wide = broad code paths.

perf Top‑Like View (`perf top`)

For live CPU usage by symbol:

sudo perf top

Other `perf` Subcommands

  perf stat -e cycles,instructions,cache-misses ./your_program

`ftrace` and `trace-cmd`: Kernel Function Tracing

ftrace is a low‑level kernel tracing framework; trace-cmd is a wrapper tool that makes it easier to use.

Basic `trace-cmd` Usage

Install:

Trace sched and irq events for 10 seconds:

sudo trace-cmd record -e sched -e irq -- sleep 10
sudo trace-cmd report

Useful for:

Function Tracing with `ftrace` (sysfs Interface)

ftrace lives under /sys/kernel/debug/tracing (ensure debugfs mounted):

sudo mount -t debugfs none /sys/kernel/debug
cd /sys/kernel/debug/tracing

Example: trace vfs_read only:

echo function | sudo tee current_tracer
echo vfs_read | sudo tee set_ftrace_filter
echo 1 | sudo tee tracing_on
sleep 5
echo 0 | sudo tee tracing_on
sudo cat trace

Use this carefully; it can generate large trace logs on busy systems.

eBPF and BCC / bpftrace

eBPF (extended BPF) allows powerful, low‑overhead dynamic tracing and profiling inside the kernel. You can attach probes to:

BCC Tools: Ready‑Made Profilers

Install BCC (name varies by distro):

Useful BCC scripts include:

Example: CPU profile (system‑wide, 49 Hz, 10 seconds):

sudo profile -F 49 -d 10

Example: off‑CPU time (which stacks account for blocked time):

sudo offcputime -d 10

`bpftrace`: One‑Liners for Tracing

bpftrace offers an awk‑like scripting language:

Install:

Example: measure time spent in a user‑space function (uprobes):

sudo bpftrace -e '
uprobe:/usr/bin/myapp:myfunc {
  @start[tid] = nsecs;
}
uretprobe:/usr/bin/myapp:myfunc /@start[tid]/ {
  @time = hist((nsecs - @start[tid]) / 1000000);
  delete(@start[tid]);
}'

This builds a latency histogram (in ms) of myfunc.

eBPF tools are excellent for “black box” investigation when you can’t modify or rebuild code.

Memory Profiling Tools

`valgrind` / `massif` / `callgrind`

valgrind instruments programs for detailed memory behavior at the cost of heavy slowdown.

Install:

Leak Checking (`memcheck`)

valgrind --leak-check=full --show-leak-kinds=all ./your_program

Provides detailed leak backtraces; useful in development, less so in production.

Heap Profiling (`massif`)

valgrind --tool=massif ./your_program
ms_print massif.out.<pid> | less

Shows heap usage over time and where peak usage occurs.

CPU Simulation (`callgrind`)

For function‑level CPU cost (without relying on hardware counters):

valgrind --tool=callgrind ./your_program
callgrind_annotate callgrind.out.<pid> | less

This is slower than perf but sometimes easier to interpret in development environments.

`heaptrack`

heaptrack records all allocations and provides GUI and CLI analysis.

Install (names vary):

Usage:

heaptrack ./your_program
heaptrack_print heaptrack.<timestamp>.zst | less

Or open the output file with heaptrack_gui for interactive analysis.

`jemalloc` / `tcmalloc` Profiling

Some alternative allocators (jemalloc, tcmalloc) expose built‑in profiling:

These are advanced but extremely powerful in long‑running servers.

I/O and Block‑Level Profiling

`iostat`, `pidstat`, `iotop`

While not “profilers” in the strict sense, these tools are essential for correlating I/O with processes.

Install examples:

Use these to validate: “Is the disk actually saturated?” before going deeper.

Block I/O Latency via BCC/eBPF

biolatency (BCC):

sudo biolatency 1

Prints histograms of block I/O latencies per device.

biosnoop shows individual requests with process names and latency.

These help differentiate:

Application‑Level Profilers

Different language ecosystems have their own profilers. Here’s just how they fit into a Linux tuning workflow.

C/C++ with `gprof` and Perf‑Aware Compilers

`gprof` (Legacy but sometimes useful)

  1. Compile with -pg:
   gcc -pg -O2 -o myprog myprog.c
  1. Run:
   ./myprog
  1. Analyze:
   gprof ./myprog gmon.out | less

gprof gives call graph and per‑function statistics but is less accurate than sampling tools in optimized builds.

Compiler Support for Perf

Modern compilers emit DWARF and frame information; for better perf results:

Python Profiling

Built‑ins:

Run via command:

python3 -m cProfile -o stats.out your_script.py

View stats:

python3 -m pstats stats.out

For more advanced use (sampling, remote): look at tools like py-spy, scalene, yappi.

Java, JVM Languages

Use JVM tools:

Example (async-profiler):

./profiler.sh -d 30 -e cpu -f profile.svg <PID>

Open profile.svg for a flame graph.

Network and Latency Profiling

`perf` + Network Tracepoints

You can attach perf to networking tracepoints; e.g.:

sudo perf record -e net:net_dev_xmit -a -- sleep 10
sudo perf script

For more detailed packet‑level analysis use dedicated tracing (via eBPF tools, or tools in the Network Services / DevOps parts of the course).

eBPF Network Tools

BCC provides:

Example:

sudo tcpretrans

Helps correlate packet loss, RTT, or congestion with observed application slowness.

GUI and Integrated Profiling Tools

While much of Linux performance work is CLI‑driven, certain use cases benefit from GUI tooling.

`sysprof` (GNOME)

Useful for GNOME apps and general system tracing:

Launch via:

sysprof

KDE / Qt: `kcachegrind`, `hotspot`

Install:

Use them to explore data from perf record or valgrind --tool=callgrind.

Typical Profiling Workflows

This section ties multiple tools together into practical sequences.

Workflow 1: High CPU Usage

  1. Confirm with top/htop.
  2. System‑wide CPU profile:
   sudo perf record -F 99 -a -g -- sleep 30
   sudo perf report
  1. If results are unclear, generate a flame graph as described earlier.
  2. If the workload is in Python/Java/etc., switch to a language‑specific profiler to zoom in.

Workflow 2: System “Feels Slow” but CPU Not Maxed

  1. Check I/O:
    • iostat -x 1 for disk
    • pidstat -d 1 / iotop for per‑process I/O
  2. If disk latency seems high:
    • sudo biolatency 1
    • sudo biosnoop
  3. If CPU is mostly idle and I/O is fine, check for scheduling / lock problems:
    • sudo offcputime -d 10
    • sudo runqlat 1
  4. If still unclear, capture a broader eBPF or ftrace trace around the problematic period.

Workflow 3: Memory Growth / Out‑of‑Memory

  1. Track memory per process (smem, ps, top/htop).
  2. For reproducible dev workloads:
    • Use valgrind --leak-check=full or heaptrack.
  3. For production:
    • Use allocator‑specific profiles (jemalloc / tcmalloc) if available.
    • Use language‑specific tools (e.g. Java heap dumps, Python tracemalloc).

Practical Tips and Pitfalls

Profiling is iterative: measure, hypothesize, change, re‑measure. The tools covered here are your primary instruments for that cycle on Linux.

Views: 185

Comments

Please login to add a comment.

Don't have an account? Register now!