5.2.2 PostgreSQL

Table of Contents

Overview

PostgreSQL is a powerful, open source, relational database management system that focuses strongly on correctness, standards compliance, and extensibility. It is often shortened to “Postgres”. For Linux server administrators it is one of the primary choices when a reliable transactional SQL database is required, especially for applications that need advanced SQL features, complex queries, and strong guarantees around data integrity.

While earlier chapters cover general database concepts and MySQL or MariaDB, this chapter focuses on what is particular about PostgreSQL in a Linux server context. You will see how PostgreSQL is organized on disk, how its server processes behave, and how to perform common administrative tasks such as connecting, creating databases, managing roles, and handling basic backups, all in a way that matches how PostgreSQL expects to be used.

Installing and Service Layout on Linux

On most Linux distributions PostgreSQL is available from the default repositories, but the details differ. The precise installation commands are covered elsewhere in the course under package management. From an administration perspective, what matters is where the service runs, how it starts, and where it stores its data.

Typically PostgreSQL is installed as a system service. On modern distributions using systemd you control it with systemctl. For example, you start, stop, and check status with commands like:

sudo systemctl start postgresql
sudo systemctl stop postgresql
sudo systemctl status postgresql

The actual service unit name might differ slightly depending on version and distribution, for example postgresql, postgresql-16, or postgresql@16-main. You can check the available units with systemctl list-units | grep postgres.

The data directory, which PostgreSQL often calls PGDATA, is where it stores all database files. Package defaults vary, but common locations are /var/lib/postgresql/<version>/main, /var/lib/pgsql/<version>/data, or similar paths. You typically never edit files inside the data directory by hand. Instead you configure PostgreSQL using its configuration files and SQL commands.

PostgreSQL processes run under a dedicated system account such as postgres. You do not normally run the database daemon as root. Administrative tasks that operate inside the database can be carried out either as the postgres system user or as a PostgreSQL superuser role, which is also commonly named postgres but is distinct from the system user account.

Server Processes and Connections

When PostgreSQL starts it creates a master server process often called the postmaster. This process listens for incoming connections on a TCP port. The default port is 5432, but you can change it in the configuration. For each client connection PostgreSQL forks a new backend process that handles all queries for that connection. You can observe these processes with standard Linux tools such as ps or top while the service is running.

By default PostgreSQL listens on the local machine only, through a local Unix domain socket and often through 127.0.0.1 as well. To accept remote connections from other hosts you must adjust configuration, which is covered later in this chapter. This separation means that after installation, many PostgreSQL servers are secured by default because no external network client can reach the database until you deliberately enable it.

Configuration Files and Key Parameters

PostgreSQL is configured mainly through text files in its data directory. The three most important files for a server administrator are postgresql.conf, pg_hba.conf, and pg_ident.conf.

The main configuration file is postgresql.conf. It controls core server parameters such as memory usage, logging, and listen addresses. You usually edit it with a text editor as root or the postgres system user, then reload the service.

One of the most important settings for connectivity is listen_addresses. This parameter determines which network interfaces the server will bind to. It can hold a comma separated list of addresses, for example localhost, a specific IP address, or * for all addresses. For instance, a typical line in postgresql.conf might look like:

listen_addresses = 'localhost'
port = 5432

If you want the database to accept connections from other machines on the network you would change listen_addresses to include the server’s IP address or use '*' and then secure access through pg_hba.conf and a firewall.

Another important parameter is max_connections, which sets how many concurrent client connections are allowed. Memory related settings like shared_buffers, work_mem, and maintenance_work_mem influence performance and resource usage, but deep performance tuning is addressed in another part of the course. For now it is enough to know that PostgreSQL uses these configuration values at startup and that many of them require a full service restart instead of a simple reload.

Logging behavior is also controlled from postgresql.conf. You can configure the logging collector, log file directory, log rotation, and which statements are logged. On many Linux systems PostgreSQL logs go to a subdirectory of the data directory, such as pg_log or log, but some package builds send logs to the system journal instead.

After making changes to postgresql.conf, you can apply them with either a reload or a restart. A reload, such as sudo systemctl reload postgresql, tells the server to reread configuration without interrupting connections. A restart stops and starts the service, which will disconnect clients but allows settings that require a fresh start to take effect.

Access Control with pg_hba.conf

While postgresql.conf controls general behavior, pg_hba.conf controls which clients are allowed to connect, from where, and using which authentication methods. This file is critical for security and is specific to PostgreSQL.

Each line in pg_hba.conf describes one rule that matches a kind of connection. Fields typically include the connection type, the database, the user, the client address, and the authentication method. Rules are evaluated in order from top to bottom. When a connection attempt matches the first applicable rule, that rule determines its fate, whether access is granted or denied, and which authentication method is required.

A simple example line might look like this:

host    all     all     192.168.1.0/24    md5

This line says that for any database, any PostgreSQL role, and any client host in the IPv4 subnet 192.168.1.0/24, the server permits connections using the md5 password authentication method. If you want to restrict which databases or which users can connect from that subnet you would replace all with specific names.

Remember that both pg_hba.conf and the listen_addresses setting must permit access before a connection is accepted. Even if listen_addresses allows the server to listen on a network interface, if pg_hba.conf has no matching rule for a client then authentication will fail.

After editing pg_hba.conf, you must reload the PostgreSQL service for changes to take effect. You can do this using sudo systemctl reload postgresql or through the pg_ctl utility if you manage the service manually.

In PostgreSQL, client authentication is controlled by pg_hba.conf. Rules are evaluated from top to bottom. The first matching rule decides whether a connection is allowed and which authentication method is used. A reload is required after editing this file.

Connecting with psql

The primary command line client for PostgreSQL is psql. It connects to the server as a PostgreSQL role, runs SQL commands, and displays their results. For an administrator, psql is the main interface to manage databases and roles.

On a typical Linux installation the postgres system user corresponds to a PostgreSQL superuser role named postgres. To connect to the local server as this superuser you might run:

sudo -u postgres psql

This will usually connect over a local Unix socket without using TCP. Once inside psql you see a prompt that shows the database name and the role you are using. For example, you might see a prompt like postgres=#, which means you are in the postgres database with superuser privileges.

You can also connect to a specific database or as a particular role. For example:

psql -h 192.168.1.10 -p 5432 -U appuser -d appdb

Here, -h selects the server host, -p the port, -U the PostgreSQL role, and -d the database name. If password authentication is configured, psql will prompt you for the password.

Inside psql you use normal SQL statements, but also backslash commands that control psql itself. For example, \l lists databases, \du lists roles, and \q quits. These meta commands are executed by psql and do not go to the server as SQL. Learning the essential ones speeds up administrative work significantly.

Managing Databases

In PostgreSQL each collection of tables, indexes, and other objects lives inside a database. A PostgreSQL server instance can host many databases at once. Certain SQL commands, such as CREATE DATABASE, can be run only from specific contexts, often as a superuser or as a role with sufficient privileges.

To create a new database you typically connect as a superuser or as a role with the CREATEDB privilege and run a command like:

CREATE DATABASE appdb;

You can also specify an owner, character encoding, collation, and template database. For example:

CREATE DATABASE appdb
    OWNER appuser
    ENCODING 'UTF8';

The owner of a database has broad control over it, including the ability to create and drop tables, indexes, and other objects, unless further restrictions are put in place through permissions. When you want to remove a database you use DROP DATABASE databasename; but must ensure no one is connected to it.

PostgreSQL creates some standard databases automatically, such as a maintenance database called postgres and a template database named template1. You usually do not drop these. Many administrative operations are run from the postgres database, but you can connect to any database to manage the objects inside it.

Database level operations are usually performed through SQL rather than external tools. This is a major difference from file based databases, where you might manipulate data as files. In PostgreSQL you treat the database as a logical entity. Physical storage is handled by the server in the data directory.

Roles, Privileges, and Authentication

PostgreSQL uses roles rather than distinct user and group types. A role can act like a user that can log in, or like a group that grants privileges to other roles, or both. For server administration it is critical to understand that the login capability and the superuser capability are both attributes of a role, not separate account types.

To create a role that can log in and own objects you use a statement like:

CREATE ROLE appuser LOGIN PASSWORD 'secret';

This makes appuser a role that can connect, provided pg_hba.conf and other settings allow it. To give this role permission to create databases, you can do:

ALTER ROLE appuser CREATEDB;

PostgreSQL superuser roles have unrestricted access. They can override all permission checks and can perform maintenance operations such as creating other roles, changing server level settings, and more. You grant superuser status with:

ALTER ROLE adminuser SUPERUSER;

This should be done rarely and only for trusted accounts because a PostgreSQL superuser can effectively access or destroy all data in the cluster.

Privileges in PostgreSQL are granted on objects such as databases, schemas, tables, sequences, and functions. A common case is granting access to all tables in a database schema for an application user. For example:

GRANT CONNECT ON DATABASE appdb TO appuser;
GRANT USAGE ON SCHEMA public TO appuser;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO appuser;

Such grants control what an authenticated role can do inside the database. They work together with pg_hba.conf rules, which control who can authenticate and from where. Both parts must permit an action before it succeeds.

PostgreSQL supports several authentication methods. Two common choices are md5, which uses a hashed password exchange, and peer, which maps the local system user name to a PostgreSQL role without a password. Network based connections almost always use a password method such as md5 or scram-sha-256, with the specific method chosen in pg_hba.conf and related configuration.

Basic Administration: Startup, Shutdown, and Reloads

On Linux with systemd you normally manage PostgreSQL through systemctl. However, PostgreSQL also ships a tool called pg_ctl that can start, stop, and control the server directly if you run it as the postgres system user. Which method is appropriate depends on how your distribution integrated PostgreSQL with the init system.

A normal shutdown waits for active transactions to complete. Commands such as sudo systemctl stop postgresql or pg_ctl stop -m fast perform an orderly shutdown, ensuring all data is flushed to disk and that crash recovery will not be required on the next start. A more abrupt shutdown method exists but should only be used when the server is stuck or unresponsive.

When you edit configuration files you usually prefer to reload rather than fully restart the instance. A reload applies configuration changes that can be changed on the fly, such as authentication rules or logging parameters, without disconnecting users. A restart is necessary for changes to settings that are read only at startup, such as max_connections or some memory parameters.

Knowing when a reload is enough and when a restart is required is part of routine PostgreSQL administration. The documentation notes for each setting whether it is applied at startup or can be reloaded.

Simple Backup and Restore Using SQL Dumps

PostgreSQL provides tools for logical backups that extract schema and data in a form that can be restored on another server or on a fresh instance. The most common utilities are pg_dump and pg_dumpall. These tools connect to a running server and generate an SQL script or a custom format archive.

A basic full database backup with pg_dump might look like:

pg_dump -U appuser -h localhost -d appdb > appdb.sql

This stores the definition of tables, indexes, and data from appdb in the file appdb.sql. Restoring it on another PostgreSQL server is as simple as creating an empty database and then feeding the dump into psql:

createdb -U appuser appdb
psql -U appuser -d appdb < appdb.sql

A complete cluster level logical backup, including all databases and global objects like roles, can be produced with pg_dumpall. For example:

pg_dumpall -U postgres > full_cluster.sql

Logical dumps are easy to move between servers and PostgreSQL versions, but they are usually slower than physical backups for large data volumes. They also represent the state of the database at a point in time and do not include the write ahead log segments that PostgreSQL uses for crash recovery and replication. For this beginner level view, it is enough to understand that logical dumps give you a portable and human readable backup method that integrates with PostgreSQL itself.

Extensions and Advanced Features Overview

One of PostgreSQL’s distinctive strengths is its extensibility. Many features are provided as extensions that you can enable in a database with a SQL command. Extensions might add data types, functions, indexes, or even whole subsystems. Examples include full text search helpers, geospatial support through PostGIS, and advanced indexing methods.

To use an installed extension in a particular database you run a command like:

CREATE EXTENSION hstore;

This activates the extension in that database. Not all extensions are installed by default at the operating system level. Some require additional packages, which you install through your Linux package manager.

PostgreSQL also supports advanced features such as transactional DDL, robust ACID compliance, complex joins, window functions, and advanced indexing strategies. It can participate in replication setups, high availability clusters, and sophisticated backup strategies. These topics belong to more specialized chapters, but it is important to know that PostgreSQL is designed for serious production use and is not limited to simple relational storage.

Summary

PostgreSQL on Linux is a multi process database server that stores its data in a dedicated data directory and is controlled largely through configuration files and SQL commands. As a server administrator you work with systemctl or pg_ctl to control the service, edit postgresql.conf and pg_hba.conf for behavior and access control, connect using psql and roles, and manage databases and privileges with SQL. For backups you use tools such as pg_dump and pg_dumpall to create logical dumps. PostgreSQL’s architecture and extensibility make it a strong choice for many server workloads, and understanding its basic organization prepares you for the more advanced topics that follow in later chapters.

Comments

Please login to add a comment.

Don't have an account? Register now!