Table of Contents
What LAPACK Is and What Problems It Solves
LAPACK (Linear Algebra PACKage) is a widely used numerical library for dense linear algebra on CPUs. It provides routines to solve:
- Systems of linear equations: $Ax = b$
- Least-squares problems
- Eigenvalue and eigenvector problems
- Singular value decompositions (SVD)
- Matrix factorizations (LU, Cholesky, QR, Schur, etc.)
Unlike BLAS, which focuses on basic vector–matrix operations, LAPACK builds higher-level algorithms on top of BLAS, especially Level‑3 BLAS, to exploit cache and achieve high performance on modern architectures.
LAPACK is written in Fortran, but it is routinely used from C, C++, Python, and many other languages via bindings or wrappers.
Design Principles and Relationship to BLAS
LAPACK is designed to:
- Reuse optimized BLAS for performance-critical kernels
- Use blocked algorithms that operate on submatrices to maximize cache reuse
- Provide numerically stable algorithms where possible
- Offer a standardized, portable API
Performance-critical parts (e.g., matrix–matrix multiplications inside factorizations) are delegated to BLAS. That means the speed of LAPACK on a given system largely depends on the quality of the BLAS implementation (e.g., OpenBLAS, Intel oneMKL, vendor-tuned BLAS).
Major Classes of LAPACK Routines
LAPACK organizes functionality by matrix type (general, symmetric, Hermitian, banded, etc.) and by problem type. Common categories:
Linear Systems and Factorizations
These routines solve $Ax = b$ directly or via matrix factorizations.
Typical factorizations:
- LU factorization for general matrices: $A = PLU$
- Cholesky factorization for symmetric positive definite matrices: $A = R^T R$ or $A = LL^T$
- QR factorization for rectangular matrices: $A = QR$
LAPACK separates:
- “Driver” routines: complete solutions to common problems (e.g., solve a system in one call)
- “Computational” routines: building blocks (e.g., factorization, refinement) that you can use in custom algorithms
Examples (double precision real):
dgesv: Solve general linear system using LU factorization with partial pivotingdgetrf: Compute LU factorizationdgetrs: Solve using a previously computed LU factorizationdposv: Solve symmetric positive definite system via Choleskydpotrf: Compute Cholesky factorizationdpotrs: Solve using Cholesky factorizationdgels: Solve linear least-squares problems using QR
Eigenvalue and Eigenvector Problems
For dense matrices, LAPACK offers:
- Symmetric/Hermitian eigenproblems (real eigenvalues, nice numerical properties)
- Nonsymmetric / non-Hermitian eigenproblems
- Generalized eigenvalue problems (e.g., $Ax = \lambda Bx$)
Examples:
dsyev: Eigenvalues (and optionally eigenvectors) of a real symmetric matrixdsyevx,dsyevr: More advanced symmetric eigenproblem drivers (subset of spectrum, improved performance)dgeev: Eigenvalues (and optionally left and right eigenvectors) of a general real matrixdsygv: Generalized symmetric-definite eigenproblem
Internally, these use reductions to simpler canonical forms (e.g., tridiagonal, Hessenberg) and then solve the reduced problem.
Singular Value Decomposition (SVD) and Least Squares
LAPACK’s SVD routines compute:
$$A = U \Sigma V^T$$
where $\Sigma$ is diagonal (nonnegative), $U$ and $V$ are orthogonal (or unitary in complex case).
Examples:
dgesvd: Compute SVD of a general dense matrixdgesdd: Compute SVD using a divide-and-conquer algorithm (usually faster for large problems)
SVD-based methods are often used for:
- Robust least-squares solutions
- Rank-revealing factorization
- Condition number estimation
Specialized Matrix Types
LAPACK includes dedicated routines for:
- Symmetric/Hermitian positive definite matrices (
dposv,dpotrf,dpotrs) - Symmetric/Hermitian indefinite matrices (
dsytrf,dsytrs) - Banded matrices (
dgbsv,dptsv) - Tridiagonal matrices (‘ste’ family for symmetric tridiagonal eigenproblems)
- Packed storage formats (
dpptrf, etc.)
Exploiting the structure significantly reduces storage and computational cost.
Naming Conventions and Data Types
LAPACK routine names are highly systematic. A typical name:
- First letter: precision and data type
s: single precision reald: double precision realc: single precision complexz: double precision complex- Next two letters: matrix type or high-level operation (varies by family)
ge: generalsy: symmetric (real)he: Hermitian (complex)po: positive definitegb: general bandedsv: solve system;ev: eigenvalues;s/v/ddoften specify variant- Trailing letters: specific task
sv: driver to solve a systemtrf: factorizationtrs: solve using factorizationev,evx,evr,evd: eigenvalue problem drivers with different algorithms/featuresgels: least squaresgesvd,gesdd: SVD drivers
For instance:
dgesv=d(double) +ge(general matrix) +sv(solve system)zheev=z(complex double) +he(Hermitian) +ev(eigenvalues/vectors)
Basic Calling Pattern and Data Layout
LAPACK assumes Fortran-style column-major storage. In C/C++ you must either:
- Use column-major layout explicitly (e.g., via libraries or proper leading dimension), or
- Transpose/convert data, or
- Use a wrapper that hides the details.
Canonical arguments in many LAPACK routines include:
- Matrix dimensions:
M,N - Leading dimension:
LDA(stride between columns in memory) - Input matrices/vectors:
A,B, etc. - Output results overwriting inputs (e.g., factorization overwrites the original matrix)
- Workspace arrays:
WORK,LWORK - Integer return status:
INFO(0 means success, positive values often indicate numerical issues or convergence problems, negative values indicate invalid argument index)
A typical pattern (Fortran-like pseudocode for solving $Ax=b$ using dgesv):
integer :: N, NRHS, LDA, LDB, INFO
integer, dimension(N) :: IPIV
double precision, dimension(LDA, N) :: A
double precision, dimension(LDB, NRHS) :: B
! Fill A and B with problem data
call dgesv(N, NRHS, A, LDA, IPIV, B, LDB, INFO)
if (INFO .eq. 0) then
! On exit, B contains the solution X
endifIn C, you usually call LAPACK via a C interface (e.g., LAPACKE) which adapts the calling conventions and can provide row-major APIs.
Using LAPACK in HPC Environments
On HPC systems, LAPACK is commonly obtained via:
- Vendor libraries (Intel oneMKL, Cray LibSci, IBM ESSL, NVIDIA HPC SDK)
- Open-source tuned libraries (OpenBLAS, AOCL, etc.)
- Reference LAPACK build (functional but typically slower than tuned variants)
Typical access patterns:
- Linking against a BLAS/LAPACK implementation in your compiler command line, e.g.:
- Intel oneAPI:
ifort mycode.f90 -qmkl- GCC + OpenBLAS:
gfortran mycode.f90 -lopenblas- Using environment modules to select the desired math library, e.g.:
module load intel-oneapi-mkl
module load openblasThe same LAPACK interface is used regardless of the backend; only the implementation changes.
When to Use LAPACK vs. BLAS vs. ScaLAPACK
In dense linear algebra on HPC systems:
- Use BLAS directly when you only need basic operations (e.g.,
dgemm,daxpy). - Use LAPACK when you need higher-level factorizations or solvers on a single shared-memory node (or single process).
- Use ScaLAPACK (or similar distributed libraries) when you need to scale dense problems across multiple nodes/processes; ScaLAPACK provides “parallel LAPACK” on top of distributed BLAS.
LAPACK itself is not a distributed-memory library; it is typically used within a node (possibly with multithreaded BLAS underneath).
Numerical Considerations and Practical Tips
- Choose routines that match your matrix structure:
- If your matrix is symmetric positive definite, prefer Cholesky (
potrf,posv) over general LU (getrf,gesv) for better performance and stability. - For banded or tridiagonal matrices, use the corresponding specialized routines, which can reduce complexity from $O(n^3)$ to $O(n^2)$ or even $O(n)$.
- Be aware of conditioning:
- LAPACK offers routines to estimate condition numbers (e.g.,
dgecon,dpocon) and equilibrium scaling (geequ,poequ). - Some drivers combine solve + condition number + error bounds (e.g.,
dgelsx,dgesvx). - Workspace management:
- Many routines support a workspace query: call with
LWORK = -1, and they return the optimal size in the first element of theWORKarray. - Correct workspace sizing can affect performance and sometimes algorithm choices.
- Threading:
- LAPACK itself is mainly single-threaded at the algorithm level; parallelism commonly comes from the multithreaded BLAS it calls.
- Controlling thread count is typically done via environment variables for the BLAS provider (e.g.,
OMP_NUM_THREADS,MKL_NUM_THREADS), not via LAPACK.
High-Level Language Interfaces
In practice, many HPC users access LAPACK through higher-level environments:
- Python:
scipy.linalgwraps LAPACK and BLAS; functions likescipy.linalg.lu,eigh,svd,solvecall LAPACK under the hood.- MATLAB/Octave:
- Core linear algebra (e.g.,
A\b,eig,svd) is implemented on top of BLAS/LAPACK. - R, Julia:
- Built-in linear algebra uses LAPACK, often giving direct control over algorithm variants if needed.
Understanding the LAPACK routine families and naming conventions helps in mapping these higher-level calls to specific LAPACK algorithms, which is important when diagnosing performance or numerical issues on HPC systems.