Kahibaro
Discord Login Register

3.5.4 Boot performance

Understanding Boot Performance

Boot performance describes how quickly a Linux system goes from powered off to a usable state. For interactive systems this usually means the time from pressing the power button to seeing a login prompt or desktop. On servers it often means the time until required services are fully started and ready to accept connections.

In practice, boot performance is concerned only with what happens after the firmware finishes and the bootloader hands control to the kernel. The focus is on how long the system takes to initialize the kernel, start systemd or another init system, and launch all required services and user sessions.

Improving boot performance starts with understanding how long each stage takes and which components are responsible for delays. Once you can see these details clearly, it becomes straightforward to target the worst offenders and make informed tradeoffs between speed and functionality.

Measuring Boot Time at a High Level

Before looking at detailed breakdowns, it is useful to know the total time a system needs to boot.

One simple way to see a basic summary is with the systemd-analyze command. On a system that uses systemd as its init system, running:

systemd-analyze

prints a line that reports the time spent in firmware, bootloader, kernel, and userspace. A typical output might look like:

Startup finished in 3.001s (firmware) + 1.872s (loader) + 4.320s (kernel) + 9.450s (userspace) = 18.645s

In this summary, userspace time is the portion that is most under your control as a Linux administrator. Firmware and bootloader times are influenced more by hardware and firmware settings, while kernel time has a lot to do with drivers and hardware detection. When you work on optimizing boot performance within Linux, you primarily focus on what happens in userspace after systemd starts.

If you simply need a rough measurement without details, you can note the time to reach the login prompt or the graphical display manager. However, for any systematic work on performance you will need tools that show per service timings.

Using systemd-analyze for Detailed Boot Analysis

On systems that use systemd, systemd-analyze is the primary tool for detailed boot performance analysis. It can show which services start, when they start, and how long each one takes to complete initialization.

The command:

systemd-analyze blame

prints a list of units sorted by the amount of time they took to initialize. Units at the top of this list are usually the best candidates for investigation. A typical excerpt might read:

12.345s NetworkManager-wait-online.service
 8.210s plymouth-quit-wait.service
 5.602s docker.service
 4.150s snapd.service
 3.099s apache2.service

Here, you can clearly see which services are the slowest, along with their startup durations. Network online wait services or container engines often appear near the top on many systems.

Another useful view is the critical chain. This shows the units that form the longest dependency path to reaching the default target. Use:

systemd-analyze critical-chain

This command prints each unit on the critical path, when it started, and how long it took. The output illustrates which services directly affect the moment when the system is considered fully booted. If a service is slow but not on the critical path, it might not actually be delaying user logins or core networking.

Finally, systemd-analyze can generate a boot timeline visualization. With:

systemd-analyze plot > boot.svg

the tool generates an SVG file that you can open in a web browser or image viewer. The graphic shows each service as a colored bar, aligned along a time axis. The left edge of each bar indicates when the service started, and the length represents its initialization time. This visual representation makes it easy to spot large slow blocks or many services stacked in sequence instead of running concurrently.

For accurate diagnosis, always use systemd-analyze blame and systemd-analyze critical-chain together. Do not rely on one alone when making decisions about what to optimize or disable.

Interpreting Service Startup Times

When you look at service startup times, it is important to understand what the numbers really mean. A unit that took a long time to start is not always a problem, and a unit with a small duration is not always harmless.

In systemd terminology, a service is considered started when the conditions defined in its unit file are met. For services of type simple, this might simply mean that the main process was spawned. For services of type forking, it might mean that the process has forked into the background. For services of type notify, it often means that the service sent a notification that it is ready. The reported time is the duration from when systemd began starting the unit to the moment it considers it active or failed.

When a service appears slow in the blame list, consider whether that service is essential for early boot. If it is not on the critical chain, it might be acceptable to keep it as is, or even to configure it to start later. On the other hand, if a service is on the critical chain and its initialization time is large, then any optimization you make will directly improve overall boot time.

You must also differentiate between services that genuinely require time, such as disk checks or network discovery, and services that are blocked by configuration issues, timeouts, or missing resources. If a service has a large duration that roughly matches its timeout value, then the slowness might be caused by failures or network timeouts rather than by legitimate work.

The system journal is very useful in this context. You can inspect messages from a specific service with:

journalctl -u service-name.service -b

This lets you see what happened during the time that systemd-analyze reports. Combining timing data with log messages often reveals configuration issues or unnecessary waits.

Identifying and Reducing Boot Bottlenecks

Once you have identified slow services and critical paths, you can work on reducing bottlenecks. The goal is not necessarily to make every service fast, but to reduce or remove delays that do not provide corresponding value.

The most straightforward technique is to disable services that you do not need. If a unit appears high in the blame list and it provides a feature that you never use, then disabling it is a quick way to reclaim boot time. You can inspect a unit to understand what it does with systemctl status. If you decide it is unnecessary, you can disable it with:

sudo systemctl disable service-name.service

and stop it immediately with:

sudo systemctl stop service-name.service

Some services are needed, but they are configured with options that make them wait longer than necessary. A common example is network wait services that block until a full network configuration is available. On systems where you do not need to wait for a network before logging in locally, you can adjust or remove these waits. This might involve masking a wait service or changing distribution specific networking settings.

Other services can be reconfigured to start on demand rather than during boot. Socket activation is one mechanism that allows a service to start when the first connection arrives instead of at boot time. On systems that use systemd, some services already use this feature. For services that do not, you can sometimes enable socket activation by installing alternative unit files, or by using distribution packages that support it.

There are also cases where you can optimize underlying resource usage. For example, a storage service might be slow because it performs full filesystem checks on each boot, or because it tries to mount a network filesystem that is not always reachable. In such cases, you can adjust filesystem options and fstab settings, carefully ensuring that you preserve data integrity while reducing unnecessary work.

Never disable or mask services blindly just because they look slow. Always understand the purpose of a service before changing it, and verify after each change that the system still boots correctly and provides required functionality.

Parallelization and Ordering of Units

Systemd is designed to start units in parallel as much as possible while respecting declared dependencies and orderings. Boot performance is strongly affected by how well these dependencies are modeled.

Each unit can specify dependencies such as Requires, Wants, Before, and After. Units with no dependency relationship can start in parallel. If many units declare strict ordering or hard dependencies when they do not need them, systemd is forced to start them sequentially. This leads to longer boot times.

When you inspect the critical chain, pay attention to units that are started strictly after others but that might not actually need such ordering. For example, a service that only uses local filesystem access might unnecessarily be ordered after a network target. If you create or modify custom units, you should use the weakest dependencies that still ensure correctness. For tasks that are only needed for optional features, Wants is often preferable to Requires. For ordering, remove Before or After relationships that are not necessary.

You can examine the dependencies of a unit with:

systemctl list-dependencies unit-name

This command shows which units are required, wanted, or part of the unit's dependency tree. By studying these relationships, you can identify ordering constraints that might be relaxed. Always test any change to unit files carefully, since incorrect dependencies can result in services starting too early or too late, which can create subtle failures.

Parallelization also interacts with hardware. Tasks that wait on slow disks or network links will cause delays if other services depend on them. When possible, design custom services so that they tolerate missing resources and retry later instead of blocking boot.

Boot Time and System Resource Usage

Boot performance is not only about time. It is also about how the system uses CPU, memory, and storage during the boot process. A system that tries to start many heavy services in parallel might fully load the CPU and disk, which can cause contention and extended delays.

During boot experiments, you can watch resource usage with tools such as top or htop immediately after login. Although they do not show activity before the login prompt, they help you observe how long the system remains busy after the first login. If the system is still starting many services and disk activity is high, then users might experience sluggishness even though the boot is technically complete.

On servers, where multiple services might compete for resources at startup, you can stagger startup or adjust resource limits. Systemd supports options such as CPUQuota and MemoryMax in unit files, which can prevent a single service from monopolizing resources at boot. In some situations it may be beneficial to reduce parallelization by setting default limits, but this is only appropriate when the hardware is constrained and the workload is well understood.

Disk I/O is a frequent bottleneck, especially on systems with spinning disks. Switching to solid state storage or placing heavily used directories on faster devices often has a significant impact on boot performance without any change to software configuration.

Persistent Boot Performance Monitoring

For ongoing administration, boot performance should be monitored over time, not just measured once. Sudden increases in boot time can indicate new services, configuration mistakes, or emerging hardware problems.

You can keep a simple record of systemd-analyze output after major updates or configuration changes. Comparing these records shows whether userspace time has grown and by how much. Similarly, you can track changes in the blame list to see when new slow services appear.

The system journal also keeps timestamps for service starts and completes, which makes it possible to analyze boot behavior from past boots. With:

systemd-analyze blame --boot=-1

or similar options, you can inspect previous boots. This lets you compare current and historical performance without needing prior manual recordings.

In more advanced setups, you can export these measurements to centralized monitoring systems and create alerts if boot times exceed certain thresholds. While that falls under broader monitoring and automation topics, the core idea is that boot performance is another important aspect of system health that benefits from systematic observation.

Treat any significant increase in boot time as a performance regression to investigate. Changes in boot behavior often reveal misconfigurations, failing hardware, or inappropriate defaults introduced by new software.

By understanding how to measure boot performance, how to read systemd analysis tools, and how to reason about dependencies and resource usage, you gain the ability to keep your systems responsive from the moment they start.

Views: 9

Comments

Please login to add a comment.

Don't have an account? Register now!