Kahibaro
Discord Login Register

BLAS

What BLAS Is (In Practice)

BLAS (Basic Linear Algebra Subprograms) is a standardized collection of low-level routines for common linear algebra operations, especially on vectors and matrices.

At the HPC user level, BLAS is:

BLAS is all about well-optimized building blocks (e.g., dot products, matrix–vector multiplies, matrix–matrix multiplies) that can be highly tuned for each architecture.

BLAS Levels: Level-1, Level-2, Level-3

The BLAS interface is traditionally divided into three “levels,” mostly by the shape and computational intensity of operations.

Level-1 BLAS: Vector–Vector

Level-1 routines operate on vectors and do $O(n)$ work.

Typical operations:

Examples of routine names:

Key properties:

Level-2 BLAS: Matrix–Vector

Level-2 routines operate on matrix–vector combinations and do $O(n^2)$ work.

Typical operations:

Examples of routine names:

Key properties:

Level-3 BLAS: Matrix–Matrix

Level-3 routines work with matrices and do $O(n^3)$ work; these are the main performance workhorses.

Typical operations:

Examples of routine names:

Key properties:

Naming Conventions and Data Types

BLAS routine names encode both:

Common type prefixes:

Common suffixes:

So for example:

Storage Conventions: Column-Major and Leading Dimension

BLAS was historically defined for Fortran, which uses column-major order. Most BLAS implementations preserve this convention even for C interfaces.

Key concepts:

In C/C++ code, you can:

Most BLAS routines have arguments like:

BLAS Implementations in HPC

BLAS is a specification, not a single library. Many optimized implementations exist:

Common vendor and open-source libraries:

On many HPC clusters:

Key points:

Linking and Using BLAS in HPC Applications

In HPC, even if you don’t call BLAS directly, you often link to it via other packages. For basic usage, you should understand:

Linking from C/C++ or Fortran

Typical link flags (cluster-specific):

Example (generic, not cluster-specific):

gcc mycode.c -lblas -llapack -o myprog

On real HPC systems, check the documentation or module help; link lines can be more complex for MKL or other vendor libraries.

Threading Considerations

Most modern BLAS implementations are multi-threaded:

In HPC:

This choice is application-dependent and often determined by profiling.

BLAS in the Linear Algebra Stack

Within the numerical libraries ecosystem, BLAS is the foundation:

Why this matters:

When and How to Use BLAS Directly

As an HPC beginner you don’t need to memorize all routines, but you should recognize common use-cases:

Typical practical steps:

  1. Identify linear algebra kernels in your code.
  2. Replace hand-written loops with appropriate BLAS calls.
  3. Link against the optimized BLAS library available on your cluster.
  4. Profile to ensure that BLAS calls dominate runtime (a good sign in many dense linear algebra algorithms).

BLAS in High-Level Languages (User Perspective)

Many high-level environments already use BLAS internally:

Even if you never call dgemm yourself, understanding BLAS helps you:

Views: 9

Comments

Please login to add a comment.

Don't have an account? Register now!