13.1.2 LAPACK

Table of Contents

What LAPACK Is and What Problems It Solves

LAPACK (Linear Algebra PACKage) is a widely used numerical library for dense linear algebra on CPUs. It provides routines to solve:

Systems of linear equations: $Ax = b$
Least-squares problems
Eigenvalue and eigenvector problems
Singular value decompositions (SVD)
Matrix factorizations (LU, Cholesky, QR, Schur, etc.)

Unlike BLAS, which focuses on basic vector–matrix operations, LAPACK builds higher-level algorithms on top of BLAS, especially Level‑3 BLAS, to exploit cache and achieve high performance on modern architectures.

LAPACK is written in Fortran, but it is routinely used from C, C++, Python, and many other languages via bindings or wrappers.

Design Principles and Relationship to BLAS

LAPACK is designed to:

Reuse optimized BLAS for performance-critical kernels
Use blocked algorithms that operate on submatrices to maximize cache reuse
Provide numerically stable algorithms where possible
Offer a standardized, portable API

Performance-critical parts (e.g., matrix–matrix multiplications inside factorizations) are delegated to BLAS. That means the speed of LAPACK on a given system largely depends on the quality of the BLAS implementation (e.g., OpenBLAS, Intel oneMKL, vendor-tuned BLAS).

Major Classes of LAPACK Routines

LAPACK organizes functionality by matrix type (general, symmetric, Hermitian, banded, etc.) and by problem type. Common categories:

Linear Systems and Factorizations

These routines solve $Ax = b$ directly or via matrix factorizations.

Typical factorizations:

LU factorization for general matrices: $A = PLU$
Cholesky factorization for symmetric positive definite matrices: $A = R^T R$ or $A = LL^T$
QR factorization for rectangular matrices: $A = QR$

LAPACK separates:

“Driver” routines: complete solutions to common problems (e.g., solve a system in one call)
“Computational” routines: building blocks (e.g., factorization, refinement) that you can use in custom algorithms

Examples (double precision real):

dgesv: Solve general linear system using LU factorization with partial pivoting
dgetrf: Compute LU factorization
dgetrs: Solve using a previously computed LU factorization
dposv: Solve symmetric positive definite system via Cholesky
dpotrf: Compute Cholesky factorization
dpotrs: Solve using Cholesky factorization
dgels: Solve linear least-squares problems using QR

Eigenvalue and Eigenvector Problems

For dense matrices, LAPACK offers:

Symmetric/Hermitian eigenproblems (real eigenvalues, nice numerical properties)
Nonsymmetric / non-Hermitian eigenproblems
Generalized eigenvalue problems (e.g., $Ax = \lambda Bx$)

Examples:

dsyev: Eigenvalues (and optionally eigenvectors) of a real symmetric matrix
dsyevx, dsyevr: More advanced symmetric eigenproblem drivers (subset of spectrum, improved performance)
dgeev: Eigenvalues (and optionally left and right eigenvectors) of a general real matrix
dsygv: Generalized symmetric-definite eigenproblem

Internally, these use reductions to simpler canonical forms (e.g., tridiagonal, Hessenberg) and then solve the reduced problem.

Singular Value Decomposition (SVD) and Least Squares

LAPACK’s SVD routines compute:

$$A = U \Sigma V^T$$

where $\Sigma$ is diagonal (nonnegative), $U$ and $V$ are orthogonal (or unitary in complex case).

Examples:

dgesvd: Compute SVD of a general dense matrix
dgesdd: Compute SVD using a divide-and-conquer algorithm (usually faster for large problems)

SVD-based methods are often used for:

Robust least-squares solutions
Rank-revealing factorization
Condition number estimation

Specialized Matrix Types

LAPACK includes dedicated routines for:

Symmetric/Hermitian positive definite matrices (dposv, dpotrf, dpotrs)
Symmetric/Hermitian indefinite matrices (dsytrf, dsytrs)
Banded matrices (dgbsv, dptsv)
Tridiagonal matrices (‘ste’ family for symmetric tridiagonal eigenproblems)
Packed storage formats (dpptrf, etc.)

Exploiting the structure significantly reduces storage and computational cost.

Naming Conventions and Data Types

LAPACK routine names are highly systematic. A typical name:

First letter: precision and data type

s: single precision real
d: double precision real
c: single precision complex
z: double precision complex

Next two letters: matrix type or high-level operation (varies by family)

ge: general
sy: symmetric (real)
he: Hermitian (complex)
po: positive definite
gb: general banded
sv: solve system; ev: eigenvalues; s/v/dd often specify variant

Trailing letters: specific task

sv: driver to solve a system
trf: factorization
trs: solve using factorization
ev, evx, evr, evd: eigenvalue problem drivers with different algorithms/features
gels: least squares
gesvd, gesdd: SVD drivers

For instance:

dgesv = d (double) + ge (general matrix) + sv (solve system)
zheev = z (complex double) + he (Hermitian) + ev (eigenvalues/vectors)

Basic Calling Pattern and Data Layout

LAPACK assumes Fortran-style column-major storage. In C/C++ you must either:

Use column-major layout explicitly (e.g., via libraries or proper leading dimension), or
Transpose/convert data, or
Use a wrapper that hides the details.

Canonical arguments in many LAPACK routines include:

Matrix dimensions: M, N
Leading dimension: LDA (stride between columns in memory)
Input matrices/vectors: A, B, etc.
Output results overwriting inputs (e.g., factorization overwrites the original matrix)
Workspace arrays: WORK, LWORK
Integer return status: INFO (0 means success, positive values often indicate numerical issues or convergence problems, negative values indicate invalid argument index)

A typical pattern (Fortran-like pseudocode for solving $Ax=b$ using dgesv):

integer :: N, NRHS, LDA, LDB, INFO
integer, dimension(N) :: IPIV
double precision, dimension(LDA, N) :: A
double precision, dimension(LDB, NRHS) :: B
! Fill A and B with problem data
call dgesv(N, NRHS, A, LDA, IPIV, B, LDB, INFO)
if (INFO .eq. 0) then
    ! On exit, B contains the solution X
endif

In C, you usually call LAPACK via a C interface (e.g., LAPACKE) which adapts the calling conventions and can provide row-major APIs.

Using LAPACK in HPC Environments

On HPC systems, LAPACK is commonly obtained via:

Vendor libraries (Intel oneMKL, Cray LibSci, IBM ESSL, NVIDIA HPC SDK)
Open-source tuned libraries (OpenBLAS, AOCL, etc.)
Reference LAPACK build (functional but typically slower than tuned variants)

Typical access patterns:

Linking against a BLAS/LAPACK implementation in your compiler command line, e.g.:

Intel oneAPI:

ifort mycode.f90 -qmkl

GCC + OpenBLAS:

gfortran mycode.f90 -lopenblas

Using environment modules to select the desired math library, e.g.:

  module load intel-oneapi-mkl
  module load openblas

The same LAPACK interface is used regardless of the backend; only the implementation changes.

When to Use LAPACK vs. BLAS vs. ScaLAPACK

In dense linear algebra on HPC systems:

Use BLAS directly when you only need basic operations (e.g., dgemm, daxpy).
Use LAPACK when you need higher-level factorizations or solvers on a single shared-memory node (or single process).
Use ScaLAPACK (or similar distributed libraries) when you need to scale dense problems across multiple nodes/processes; ScaLAPACK provides “parallel LAPACK” on top of distributed BLAS.

LAPACK itself is not a distributed-memory library; it is typically used within a node (possibly with multithreaded BLAS underneath).

Numerical Considerations and Practical Tips

Choose routines that match your matrix structure:

If your matrix is symmetric positive definite, prefer Cholesky (potrf, posv) over general LU (getrf, gesv) for better performance and stability.
For banded or tridiagonal matrices, use the corresponding specialized routines, which can reduce complexity from $O(n^3)$ to $O(n^2)$ or even $O(n)$.

Be aware of conditioning:

LAPACK offers routines to estimate condition numbers (e.g., dgecon, dpocon) and equilibrium scaling (geequ, poequ).
Some drivers combine solve + condition number + error bounds (e.g., dgelsx, dgesvx).

Workspace management:

Many routines support a workspace query: call with LWORK = -1, and they return the optimal size in the first element of the WORK array.
Correct workspace sizing can affect performance and sometimes algorithm choices.

Threading:

LAPACK itself is mainly single-threaded at the algorithm level; parallelism commonly comes from the multithreaded BLAS it calls.
Controlling thread count is typically done via environment variables for the BLAS provider (e.g., OMP_NUM_THREADS, MKL_NUM_THREADS), not via LAPACK.

High-Level Language Interfaces

In practice, many HPC users access LAPACK through higher-level environments:

Python:

scipy.linalg wraps LAPACK and BLAS; functions like scipy.linalg.lu, eigh, svd, solve call LAPACK under the hood.

MATLAB/Octave:

Core linear algebra (e.g., A\b, eig, svd) is implemented on top of BLAS/LAPACK.

R, Julia:

Built-in linear algebra uses LAPACK, often giving direct control over algorithm variants if needed.

Understanding the LAPACK routine families and naming conventions helps in mapping these higher-level calls to specific LAPACK algorithms, which is important when diagnosing performance or numerical issues on HPC systems.

Comments

Please login to add a comment.

Don't have an account? Register now!