Kahibaro
Discord Login Register

Spam filtering systems

Understanding Email Spam Filtering

Spam filtering systems analyze incoming (and sometimes outgoing) email and decide whether to accept, reject, quarantine, or tag it as spam. In a mail server stack, filters typically sit:

Common goals:

In this chapter, the focus is on how spam filtering systems work and how to integrate the main tools in Linux mail environments.

Types of Spam Filtering Approaches

Most systems combine several techniques:

1. Header and Content Rule-Based Filters

These rely on a large set of rules:

Rules give positive or negative scores. For example:

The final spam score is the sum:

$$
\text{score} = \sum_{i=1}^{n} w_i \cdot r_i
$$

where $r_i$ is rule $i$’s result (0 or 1, or sometimes a numeric measure) and $w_i$ its weight.

If score ≥ threshold (e.g. 5.0), mail is marked spam.

2. Bayesian and Statistical Filters

Bayesian filters learn from examples of spam and ham (legitimate mail):

Effect: adapts to your environment and languages.

3. Reputation and Blacklist-Based Filtering

Uses external data sources:

Lookups are done via DNS queries during spam scanning. Positive hits usually add significant score.

4. Sender Authentication Results (SPF, DKIM, DMARC)

Spam filters consume the results of sender-authentication checks (usually performed by a separate component):

Spam filters then apply rules like:

5. Heuristics and Structural Analysis

Heuristics detect suspicious structure:

6. Collaborative and Cloud-Based Filters

Some systems submit message fingerprints or metadata to central services:

Common in commercial gateways and some open-source projects with optional online services.

Core Open-Source Spam Filtering Tools

SpamAssassin Overview

Apache SpamAssassin is the most widely used open-source spam filtering engine.

Key characteristics:

Typical outcomes:

Basic SpamAssassin Configuration Concepts

Main config files (paths depend on distro):

Common settings to adjust:

  required_score 5.0
  rewrite_header Subject ***** SPAM *****
  add_header all  Score _SCORE_
  add_header all  Status _YESNO_, score=_SCORE_ required=_REQD_
  score  BAYES_99  4.0
  score  HTML_MESSAGE  0.0

You normally start with defaults and adjust thresholds after monitoring for false positives/negatives.

Training the Bayesian Filter

Bayes is usually disabled or untrained by default. To enable and train:

  1. Enable Bayes:
   use_bayes 1
   bayes_auto_learn 1
  1. Feed sample spam and ham:
   sa-learn --spam /path/to/spam/
   sa-learn --ham  /path/to/ham/
  1. Check stats:
   sa-learn --dump magic

For multi-user servers, be careful who controls training data; centralizing training or using per-domain training may be necessary.

Rspamd Overview

Rspamd is a newer, high-performance spam filtering system designed as a full policy and filtering engine.

Key properties:

Compared to SpamAssassin, Rspamd is more of an all-in-one policy engine than a pure content filter.

Rspamd Architecture Basics

Components:

Configuration is modular, usually in /etc/rspamd/:

Rspamd assigns symbols and scores; each message ends up with:

Example snippet from log-like output:

Action: reject
Score: 12.3 (required 7.0)
Symbols: BAYES_SPAM(4.20), RBL_SPAMHAUS(3.50), DKIM_INVALID(2.50), ...

Admins typically control thresholds for actions:

actions {
  reject = 15;
  add_header = 7;
  greylist = 4;
}

Amavis and Content Filter Stacks

Amavis (amavisd-new) is a “content filter controller” that often runs between the MTA (e.g., Postfix) and the actual engines:

Typical email flow with Amavis:

  1. Incoming SMTP → Postfix.
  2. Postfix passes message to Amavis based on a content_filter setting.
  3. Amavis runs SpamAssassin + antivirus.
  4. Amavis returns cleaned and annotated message, or rejects/quarantines.

Amavis is configuration-heavy but standard in many “all-in-one” mail server solutions.

Integration with MTAs

Filtering at SMTP Time vs After Delivery

Two main strategies:

High-volume and security-focused setups tend to prefer SMTP-time filtering.

Milter-Based Integration (Postfix, Sendmail, etc.)

Milters are filter daemons speaking the milter protocol:

Typical setup:

Example Postfix configuration (conceptual):

smtpd_milters = inet:localhost:11332
non_smtpd_milters = $smtpd_milters
milter_default_action = accept
milter_protocol = 6

Where localhost:11332 is an Rspamd or SpamAssassin milter socket.

Policy and Header-Based Delivery

Even when filtering is done pre-queue, delivery agents (Dovecot, local delivery programs) often use headers or flags to route spam:

Mailbox rules can move spam-tagged messages into a “Junk” folder based on these headers.

Example Dovecot Sieve rule:

if header :contains "X-Spam-Flag" "YES" {
  fileinto "Junk";
  stop;
}

This separates the act of scoring from where the message finally lands.

Supporting Technologies in Spam Filtering

DNSBLs and DNS Lookups

Filters frequently use DNSBLs:

If the query returns an address (usually in 127.0.0.x), it’s a listing.

Spam filters cache these results to avoid excessive DNS load. Using DNSBLs at scale requires:

Greylisting

Greylisting is often used alongside content filters:

Filters track a triplet:

Once seen retrying after a delay, mails from that triplet are accepted.

Effects:

Rate Limiting and Abuse Detection

Spam filtering stacks may also implement rate limits:

Helps control compromised accounts or abuse from a single client before content filters even run.

Tuning and Managing Spam Filters

Thresholds and Policies

Key spam policy questions:

Common patterns:

Whitelists and Blacklists

To reduce false positives:

  whitelist_from   *@trustedpartner.com

Use blacklists with care; better to rely on reputation systems than manual blacklists unless dealing with very persistent sources.

Per-User vs Global Settings

On multi-tenant or hosting servers:

IMAP clients often expose spam settings (like “Junk” folder) but the actual logic usually lives in server-side filters (Sieve, Rspamd, SpamAssassin).

Monitoring Effectiveness

Key metrics:

Practical monitoring methods:

Log and Header Analysis

Headers added by filters are your main diagnostic tool:

For example:

X-Spam-Status: Yes, score=8.1 required=5.0
    tests=BAYES_95,DKIM_INVALID,HTML_IMAGE_ONLY_32,URIBL_BLACK ...

From this, you can:

Security and Anti-Abuse Considerations

Spam filtering interacts closely with security:

Typical hardening measures:

Deployment Patterns and Examples

Simple On-Box SpamAssassin on a Small Server

Scenario: Single Postfix server for a small organization.

Advantages:

High-Performance Stack with Rspamd

Scenario: Larger installation or ISP.

Advantages:

Gateway Scanning with Amavis + Back-End Mail Store

Scenario: Separate MX gateway servers.

Advantages:

This chapter focused on the concepts, main tools, and integration patterns of spam filtering systems in Linux-based mail environments. Configuration details for the MTA itself and for other email components are covered in their respective chapters.

Views: 27

Comments

Please login to add a comment.

Don't have an account? Register now!