Table of Contents
Introduction to Docker Swarm
Docker Swarm is Docker’s built-in container orchestration solution. It groups multiple Docker hosts into a single virtual cluster called a swarm. From the outside, you can interact with this swarm as if it were one Docker Engine. This makes it easier to run containers in production across several machines, keep them running when something fails, and scale services up and down.
In this overview you will see what a swarm is, what roles nodes can have, how services and tasks work, and where Swarm fits compared to other orchestration tools, without going into deep configuration or Kubernetes details which are discussed elsewhere.
Swarm as a Cluster of Docker Engines
A Docker swarm is a collection of Docker hosts that cooperate. Each host runs the normal Docker Engine plus swarm features. When you initialize or join a swarm, the host becomes a node in the cluster.
From your perspective as a user, the main idea is that you talk to a manager node using the same Docker CLI you already know. When you run a Swarm specific command on the manager, such as creating a service, the manager decides on which nodes to run containers and keeps track of their state.
The swarm behaves like one logical engine. Instead of starting individual containers on specific machines, you define a desired state for the whole cluster, such as how many replicas of a service you want. The swarm control plane then enforces that state.
Node Roles in a Swarm
Inside a swarm, each node has a role that defines what it does. There are two main roles, manager and worker.
Manager nodes are responsible for the control plane. They handle the swarm’s configuration, run a distributed database of cluster state, and make scheduling decisions. When you use the CLI to create or update a service, you are talking to a manager. Managers can also run application containers, but their primary responsibility is coordination.
Worker nodes focus on running application workloads. They receive tasks from managers and execute the associated containers. Workers do not make scheduling decisions and they do not store the full swarm state. This separation helps with scalability and security, because you can limit how many nodes hold control data and API access.
In a larger swarm, there are usually several managers to provide fault tolerance. Internally, managers use a consensus algorithm based on Raft to agree on the cluster state. You do not need to understand Raft details to use Swarm, but you should know that there is a single leader among the managers that coordinates updates, and followers that replicate its decisions.
Always keep an odd number of manager nodes to maintain a healthy Raft quorum and avoid split brain situations.
Services and Tasks
In normal Docker usage you start individual containers. In Swarm, the main unit you define is a service. A service describes how you want to run containers in the cluster. It includes the image to use, environment configuration, ports to publish, and the number of replicas.
The swarm breaks a service into tasks. Each task is the instruction to run one container instance of that service on a particular node. The manager assigns tasks to worker nodes according to resource availability and constraints. If a task fails or the node running it disappears, the manager creates a new task elsewhere to keep the service at its desired state.
There are two main service modes. In replicated mode, you specify a fixed number of identical replicas, for example 5 instances of a web service. The swarm then keeps 5 tasks running across the cluster. In global mode, the swarm runs one task on every eligible node. Global services are often used for logging agents, monitoring services, or node level tools that must be present everywhere.
Desired State and Self Healing
A core concept in Docker Swarm is desired state. When you create or update a service, you declare how the service should look, such as the image version, number of replicas, and placement rules. The swarm’s control plane continuously tries to match the actual state of the cluster to this desired state.
If a container crashes, the node fails, or a task stops for any reason, the manager detects the change and schedules replacement tasks to restore the desired count. If you update a service to use a new image version, the swarm gradually replaces tasks according to the update policy, while still aiming to keep the required number of healthy replicas running.
Never modify swarm managed containers manually. Always change the desired state at the service level and let the swarm reconcile the cluster.
Overlay Networking and Service Discovery
In a multi host cluster, containers need to communicate across node boundaries. Docker Swarm uses overlay networks to provide this. An overlay network spans all participating nodes and allows containers on different machines to talk as if they were on the same local network, using virtual network interfaces and an encrypted data plane if you enable that option.
Swarm also provides built in service discovery. Each service you create gets a stable DNS name within the swarm network. When a container resolves that name, the swarm routes the traffic to one of the tasks belonging to that service. This gives you load balancing between replicas without external tools.
You do not need to manage static IPs or manually configure hostnames. As tasks are rescheduled, the service name remains constant. Internally, the swarm updates where that name points so that communication continues without application changes.
Load Balancing and Ingress
Swarm can expose services to the outside world through a feature known as the ingress network. When you publish a port on a service, Swarm ensures that traffic arriving on that port on any node can reach the service, even if the actual containers are running on different nodes.
Incoming requests are distributed among the service tasks using a load balancing mechanism. From the client’s point of view, connecting to any swarm node on the published port is sufficient. From the application’s point of view, it simply listens on its internal port inside each container, while Swarm handles external distribution and routing.
This built in routing makes it easier to deploy replicated services without using a separate load balancer for simple setups. For more complex environments, external load balancers can still be combined with Swarm to handle features such as advanced routing, TLS termination, or global traffic distribution.
Scaling and Rolling Updates
One of the central reasons to use Docker Swarm is the ability to scale services and roll out changes gradually. Since a service describes a desired replica count, scaling a service becomes a simple state change. You adjust the number of replicas and the swarm creates or removes tasks to match the new count. The scheduler tries to spread tasks across nodes to use resources efficiently.
Swarm also supports rolling updates. When you change a service to use a new image or configuration, the swarm can replace tasks in small batches, waiting between steps and checking for failures. If too many updates fail, the swarm can pause or roll back the change to keep the service stable.
These mechanisms let you adjust capacity and deploy updates without manually stopping and starting containers on individual machines. They do not remove the need for careful application design, monitoring, and testing, but they do provide automation for the repetitive operational work.
Security and Node Trust in Swarm
A swarm cluster needs secure communication between nodes so that only trusted machines participate. Docker Swarm includes a built in public key infrastructure where each node gets a certificate. When a node joins the swarm, it uses a join token to authenticate, and the managers issue it a certificate that proves its identity.
Swarm can also enable mutual TLS between nodes, which encrypts traffic and confirms that each node is who it claims to be. Certificates can rotate automatically after a configurable interval. From an operator’s perspective, this provides a basic but integrated trust system without external certificate management for internal control traffic.
Service traffic on overlay networks can be encrypted as well, though this has a performance cost. For many production scenarios, you would combine swarm features with additional security layers such as firewalls, hardened hosts, and secure image practices, which are covered in other chapters.
When Docker Swarm Makes Sense
Docker Swarm is tightly integrated with Docker and is relatively easy to adopt if you already use Docker on single hosts. The command line is familiar and the learning curve is gentle compared to more complex orchestration systems. For small to medium clusters, internal teams, labs, demos, or environments where simplicity is more important than an extensive feature set, Swarm can be a suitable choice.
On the other hand, Docker Swarm has a smaller ecosystem and fewer advanced features than larger orchestration platforms. For organizations that expect rapid growth, multi cloud strategies, or need a rich ecosystem of plugins and controllers, other tools may be more appropriate. How Swarm compares to those tools is explored separately, but you should already understand that Swarm’s strengths lie in simplicity and direct Docker integration.
Summary
Docker Swarm turns multiple Docker hosts into a single clustered environment. It introduces nodes with manager and worker roles, services that describe desired state, and tasks that represent individual containers across the cluster. Through overlay networks and built in service discovery, it connects containers across hosts and balances traffic between replicas. The control plane keeps services running, scales them, and performs rolling updates while maintaining the declared configuration.
This chapter only introduced the main ideas of Swarm at a high level. Implementation details, alternatives, and decisions about when to move beyond Swarm are discussed in related chapters within this section.