Scalable SMTP E-Mail Filtering
Methods and Techniques
UCCSC 2007, Santa Cruz

Jon Kuroda

Note: press space to go forward. pageup/down keys work too
This is an (x)html document in the S5 presentation system, so there are lots of links one can follow.

Some Alternate Titles

"Mail Filtering Deconstructed"

"Content Filtering on the Cheap"

"Content Filtering that Works (Better)"

"Virus Scanning and Spam Tagging That Sucks Less"

"Virus Scanning for a Non-Ideal World"

"Mail Filtering for the Masses"

"Anti-Virus Is Hard. Lets Go Shopping!"

Topics (and Non-Topics)

What I will be (or have been) talking about: Background/Historical Information; Personal Caveats; How different filtering systems work; Ways to deploy email filtering - including examples; Crazy Ideas and Odds and Ends
What I will not be talking about as much: My (MTA|OS) is better than your (MTA|OS); Every single implementation detail; Measures outside of a filtering context
What I hope you (and I) will get out of this: Some understanding of how e-mail filters work and how to use them; Some tools and ideas to take home and try; If I am lucky, a good laugh

What's this all about?

Scalable: Cheap and Easy to get more capacity (Pay what you want as you go); Does More. Costs Less. Doesn't Suck (as much)
Flexible: Useful in situations other than my own; Looking for a modular/toolkit approach
Server Based: Primarily interested in the MTA/SMTP side; Less so in delivery-time systems such as Sieve, .forward/procmail, or MUA filters
Content-Filtering: Anti-Virus/Spam; Data Scrubbing/Retention; Auto-Spell-Checking / Auto-Translation

A (Very) Brief History of E-mail Servers

In the beginning ...: One big server - Shell/Mail/FTP/everything; Servers became (more) affordable - Proliferation and decentralization; Whoa Nellie - Consolidation

Results: Viruses, Spam, Server Attacks, SMTP relay abuse, ...; Partially consolidated services; Legacy servers; Market for anti-(bad stuff) software and systems; Market for people who can manage all of this
Mmmm, Job Security ...

Once upon a time ...

It's 2003 in 399 Cory Hall ...

20+ supported separate, disjoint systems accepting email
- Numerous legacy research group mailservers - maillists and mailspools
- Research systems that accept input via SMTP
- One Exchange server.
Policy (then a draft policy) requiring anti-virus measures on mailservers

2. Anti-virus software:

Anti-virus software for any particular type of device currently listed on the Approved Software website must be running and up-to-date on every level of device, including clients, file servers, mail servers, and other types of campus networked devices.
Departmental (@EECS / @CS) MX hosts already had it
I was the new guy
"Hey Jon ..." Beware these words.

Caveats

I'm a *nix/sendmail guy who installed anti-virus software: Examples involving these will have the most detail; I don't do MS Exchange, but I will talk about it (a little)
I'm a realist, not an idealist: I don't work in an ideal IT world; I try not to assume one.
There is nothing new here: I didn't actually think this was that novel; No (in my opinion) out of the ordinary ideas; But, as always, the work is in the documentation

There is no spoon. But I have some lovely sporks.

Note for the online readers, I meant to have some plastic sporks to pass out as random prizes for questions, but 1) I forgot to bring them 2) I had too high of a slide/time ratio.

Filters: An (Over) Simplified Look

Pseudocode describing what a filter does. Note there there are I/O and side-effects.

while (<INPUT>)
    if (/PATTERN/) {
	mangle $input;
  	print OUTPUT;
        do some_side_effect;
    } else {
        print OUTPUT;
        do some_other_side_effect;
    }

Filters: The Engine

The "brains", does the work of making filtering and other decisions: Anti-virus/spam; "Scrubbing" messages for sensitive data; Anything from rejecting a message to passing it on unmodified
Side-Effects: Logging/Notifications; Updating cached information; Sometimes, all we care about are the side-effects
It may also depend on databases that require periodic* updating: Virus/Spam databases; SpamAssassin Bayesian Analysis (sa-learn); Spam Host lists

* How periodic? How paranoid are you? More on this later.

The Ins and Outs of Filter I/O

Great, we have a filter engine, but how do we get email in and out of the filter?

First, a detour to talk about Pre/Post-queue filtering: Do we have to let people in the door just so we can kick them out?
MTA Plugins: Embrace the MTA extensions
SMTP-aware Filter: It's an MTA, it's a Filter, no wait ... it's both!! Well, sorta
API/Protocol: Good fences make for good filtering.
Network Proxy (Extra Slide): Black Magic that noone has done, probably

Filter I/O: Pre/Post-queue Filtering, a Slide Without a Home

Filter before or after queueing email? Pre-queue filtering lets one reject mail during an SMTP connection, but it can cause timeouts. Postfix has some good notes on the pros and cons of pre-queue filtering.

Pre-queue filtering:

220-mail.example.com ESMTP JavaMail 6.2 Mon, 31 Jul 2006 20:01:23 -0700
ehlo poland.example.com
250-mail.example.com Hello poland.example.com [192.0.34.166]
Mail From: <exile@poland.example.com>
250 OK
Rcpt To: <president@example.com>
250 Accepted
data
354 Enter message, ending with "." on a line by itself
X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*
.
550 DENIED!! Message contains malware (ClamAV:Eicar-Test-Signature)

Versus post-queue, where filtering occurs but not till after email is accepted

220-post.example.com ESMTP JavaMail 6.2 Mon, 31 Jul 2006 20:02:23 -0700
ehlo siberia.example.com
250-post.example.com Hello siberia.example.com [192.0.34.166]
Mail From: <exile@siberia.example.com>
250 OK
Rcpt To: <president@example.com>
250 Accepted
data
354 Enter message, ending with "." on a line by itself
X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*
.
250 2.0.0 k713JO1n055810 Message accepted for delivery

Filter I/O: Pre-queue Filters: SMTP Timeouts and Consequences

Are SMTP timeouts a real cause for concern with pre-queue filtering?

RFCs 2821 and 1123 say one should wait 10 minutes after sending the terminating "."
Not "shall" or "must" but merely should.
Sendmail defaults to 1hr, Postfix to 10min, No clue about QMail or Exchange
It's an uncertain world out there. Best to be on your toes.

What does this mean?

It means don't run a filter pre-queue just because it's possible. Consider its impact on overall performance and its ability to stay within those suggested timeouts on SMTP connections.

But do I need to worry?

In all practicality, not before the fact.
Check filters before putting into production.
Keep tabs on performance to see if a filter is causing timeouts.

Filter I/O: Pre-queue Filters: What to run pre-queue?

Must be pre-queue

Anything that issues SMTP Return Codes
Example: Greylisting "Deferred Delivery" and "Message Accepted" don't mix.
Any filter that is a dependency (See below)

Very good ideas

Anything quick (a relative term) that can cut down load for bigger filters later on

DNSBL
SPF checks
SMTP Compliance

Not so Good

Message Body Inspection/filtering

Anti-spam/virus
Unless depended on by a pre-queue filter
Example: Greylist all mail with a high spam-score.

Anything slow or heavy on CPU/IO/memory use

Anything that has only side-effects

Filter I/O: MTA Plug-in

Designed for a particular MTA (such as Microsoft Exchange w/VSAPI?)

Becomes part of the MTA (MS Exchange VSAPI uses DLLs)
Integration can increase performance
Filter can make use of MTA specific features to provide non-SMTP filtering

Essentially yields 'full-featured' Standalone SMTP-aware Filter

Can accept and deliver incoming email
Retains all of the MTA's own features, adding in filtering

Operates in MTA's process space

Often shares access privileges
Problem with filter can affect MTA from within
Even if somehow separate processes, often share system resources

I don't find them that flexible, but these can work well for specific situations.

Filter I/O: SMTP-aware Filter

It speaks SMTP, but do we call it an 'MTA'?

May not have all the features found in your actual MTA:
- DNSBL, RCPT Verfication, LDAP, Full SMTP routing, running hot water ...
Can relay to a "Real MTA" or be part of a "Dual MTA" setup, described later

SMTP usually means post-queue filtering only

Postfix has a workaround which is useful for Dual-MTA setups.
Use Postfix as a pre-queue-capable frontend to a post-queue-only filter.

**Examples of COTS SMTP-aware filters**
Hardware Boxes	Software Only
Barracuda Networks Spam Firewall* IronPort Mail Appliances Borderware "Mxtreme"**	TrendMicro Viruswall** Kaspersky Security Software Products * denotes what we use in our group or our department ** what we have used.

No room for pretty pictures here :( Next slide

Filter I/O: SMTP-aware Filter Diagram

Filter I/O: API/Protocol

MTA and filter are separate and communicate via a defined protocol or API

Filter is a separate process, possibly on another system
Capable of supporting any MTA modulo API support
Combined with an MTA, one can build a Full Featured SMTP-aware Filter

Like an MTA Plugin, this augments an MTA instead of replacing it, but, unlike plugins, filters and MTA are explicitly separate. For high performance computing types, think shared memory versus message passing.

Two well known API/Protocols:

SMTP aka RFC 2821 (or LMTP/QMTP/...)
Sendmail's Content Management API aka Milter
Lots of for-pay and open/cheap-source options here

Filter I/O: Filtering Protocol Diagram

Filter I/O: SMTP and Milter compared

SMTP - Everyone speaks it, some better than others

Example: Any SMTP-aware Filter
Protocol was not meant for this task — it just happens to work
- Unidirectional, may require "dual MTA" setup
- Often implies post-queue filtering
- Inherently post-queue filtering only
- Postfix has some workarounds.

Milter - The new cool thing

Examples:
- Trend Viruswall (Sendmail Edition) or their new Antivirus for Sendmail
- SpamAssassin and ClamAV milters (many of these)
- Sophos Puremessage for Unix
- Kaspersky Anti-Virus for Sendmail with Milter API
Open API, though only Sendmail & Postfix(≥ v2.3) currently have support
Protocol designed for filtering
- bidirectional - no need for "dual MTA" setup
- can do pre-queue and post-queue filtering
  But Sendmail and Postfix use only for pre-queue

Filter I/O: Dual MTA and Milter Compared

Dual MTA needs at least one extra* MTA listening on different port/socket

MTA-A accepts mail, relays via SMTP/LMTP on arbitrary port/socket to filter
Filter sends filtered email via SMTP/LMTP to MTA-D on arbitrary port/socket
MTA-D continues processing e-mail
For additional filters, chain filters or add more MTA-D's
- limited filter configurations can limit chaining
- adding MTA-D's is messy and can interfere with pre-queue filtering

* Postfix lets you get around this (choice of pre- or post-queue), sorta ...

Milter setup only needs one MTA running

MTA accepts incoming mail, sends to milter over a socket
The milter sends filtered email back over the same socket
MTA continues processing e-mail, perhaps to other milters
Add milters till you run out of resources, still need only one MTA

Filter I/O: Dual MTA and Milter Compared Diagram

Filter I/O: Dual MTA and Milter Compared - Multi-Filter Diagram

Filter: Side Effects and Other Random Bits

Notification E-mails

"You sent a virus" - useless and annoying. Turn it off
"You got a virus" - almost as useless and annoying, maybe amusing
"Virus Deleted" (Cleaned email sent out) - useless/annoying, maybe amusing
- Not a Side-effect but actual bona-fide filter I/O

Saving Viruses/Spam

Potentially amusing, harmful, and profitable

Logging

How else will you know if this works
How else will you fix it when it doesn't?
How else will you know if you need more capacity?
How else will you get that raise?
Right. Raise. UC system

"Virus Deleted" Emails In EECS, we actually send the cleaned e-mails. A default sieve rule on our IMAP server auto-files all such cleaned e-mails to a special folder where users can ignore them or be impressed by how much virus-laden email we're catching for them.

Saving Viruses A user in our department who was doing work with windows viruses asked if we had any he could get his hands on. We save viruses mostly for our amusement and to run stats, but we were able to give him a CD of viruses and get paid some T&M for it.

Deployment: What to do with these tools

We now have some pieces that can be combined in many ways, how can we use them?

Install it Everywhere
- The Simple (But Stupid?) Life

MX Filter
- The "Big Guy at the Entrance" way.

Network Filter Service (The Other Other NFS)
- It seems slick, but is it really useful?

Deployment: Install It Everywhere

Pretty self-explanatory

Pros

Braindead simple
Fine if you have only one or two systems to manage
But that is not the focus here.

Cons

Multiple points of (mis)management
May conflict with licensing schemes
Filter may not be supported/available on some platforms

This does not count as scalable, mmm'kay?

Deployment: MX Filter in a Nutshell

Essentially an SMTP relay that filters along the way

Major Steps

build (or buy) one or more SMTP-aware Filters
Set up SMTP-aware Filters for handoff to 'client MTAs'
MX all of your 'client MTAs' to the SMTP-aware Filters
Sit back, relax, have a tasty beverage

Optional

IP firewalls to limit access to client MTAs
Collect statistics on performance (that whole raise thing)
RCPT verification

Deployment: MX Filter Diagram

Deployment: MX Filter - Pro and Cons

Pros

Only need a few machines to run the filter
Redundant backup MXs in case of downtime
- Control over mail queue during/after downtime
We can arbitrarily alter the destination MTA
Filter for many MTAs: Exchange, Sendmail, Postfix, ... but only the SMTP vector
Allows us to firewall off client MTAs from the world-at-large

Cons

One Big One: accepting mail for non-existent@client-mta
- Lots of postmaster mail. We hatesssss that. So, some solutions:
  - Make undergrad students read it
  - Make list of valid addresses available to Filter (cron jobs, LDAP, etc)
  - SnertSoft milter-ahead Thanks to UCI for this, I was going to use milter-cli.

Small SMTP delay introduced
Presence of filter system revealed in Received: headers

Deployment: Network Filter Service in a Nutshell

Not a Network File System, nor a Number Field Sieve

Major Steps

Build (or buy) one or more systems running your filter(s)
Make them available via milter or other API
Configure 'protected' MTAs to access filter via API
Sit back, relax, have a tasty beverage

Optional

SMTP-aware Filter(s) as lower priority MXs to queue mail during downtime
IP firewalls to control access to Network Filter Service
Again, collect statistics

Deployment: Network Filter Service Diagram

Deployment: Network Filter Service - Pro and Cons

Pros

E-mail goes directly to destination server
- Fewer worries about accepting email for bogus addresses
- One fewer SMTP hop
- Presence of filtering system not announced as openly
Like the MX Filter, it only needs a few systems for filtering, but only protects SMTP

Cons

Requires Filtering API support
May mean more network traffic per filter
MTAs get e-mail filtering but are still open to outside world
Don't get mail queue control during downtime without setting up other MXs

It seems cooler, but maybe not better when supporting disjoint heterogenous mail servers. It may work better in a more uniform managed environment, say an end-to-end mail-service as opposed to "just" protecting someone else's servers.

Deployment: A Detour for Exchange

Microsoft Exchange is, for better or for worse, not going to go away anytime soon. The question is "How best to keep the viruses away from it?"

First, and perhaps only, relevant thing to remember:
You cannot rely upon SMTP filtering as a sole method of anti-virus for Exchange

For example, users can upload files to Exchange via HTTP, from desktops, PDAs, anything. Where have your users' Crackberrys been?

While it is always a good idea to "pre-filter" mail inbound to an Exchange server, you should also make use of Exchange's VSAPI to provide virus scanning of an item whenever a client requests it, not just when the item (message) is accepted and enqueued. Additionally, items are continually rescanned when virus definitions/signatures are updated.

Implementation: Our version

We went with Filtering MXs for deploying our virus filter.

Our guiding principle was "Free Beer Good".

Our tools:

OS: Solaris 10 on Sparc (Originally 8 and 9)
MTA: Open Source Sendmail
Filter I/O: milter
Filters:
- TrendMicro Viruswall (Sendmail Edition)
- SnertSoft Milter-Ahead (for RCPT Verification)
Perl
Caffeine

Scalable E-Mail Filtering, © 2007

Scalable SMTP E-Mail Filtering Methods and Techniques UCCSC 2007, Santa Cruz

Jon Kuroda

UC Berkeley EECS, CUSG

jkuroda[at]EECS[dot]Berkeley[dot]EDU

http://www.EECS.Berkeley.EDU/~jkuroda/talks/mailfiltering/

Note: press space to go forward. pageup/down keys work too This is an (x)html document in the S5 presentation system, so there are lots of links one can follow.

Some Alternate Titles

Topics (and Non-Topics)

What's this all about?

A (Very) Brief History of E-mail Servers

Once upon a time ...

It's 2003 in 399 Cory Hall ...

Caveats

Filters: An (Over) Simplified Look

Filters: The Engine

The Ins and Outs of Filter I/O

Filter I/O: Pre/Post-queue Filtering, a Slide Without a Home

Filter I/O: Pre-queue Filters: SMTP Timeouts and Consequences

Are SMTP timeouts a real cause for concern with pre-queue filtering?

What does this mean?

But do I need to worry?

Filter I/O: Pre-queue Filters: What to run pre-queue?

Filter I/O: MTA Plug-in

Filter I/O: SMTP-aware Filter

Filter I/O: SMTP-aware Filter Diagram

Filter I/O: API/Protocol

Filter I/O: Filtering Protocol Diagram

Filter I/O: SMTP and Milter compared

Filter I/O: Dual MTA and Milter Compared

Filter I/O: Dual MTA and Milter Compared Diagram

Filter I/O: Dual MTA and Milter Compared - Multi-Filter Diagram

Filter: Side Effects and Other Random Bits

Deployment: What to do with these tools

Deployment: Install It Everywhere

Deployment: MX Filter in a Nutshell

Deployment: MX Filter Diagram

Deployment: MX Filter - Pro and Cons

Deployment: Network Filter Service in a Nutshell

Deployment: Network Filter Service Diagram

Deployment: Network Filter Service - Pro and Cons

Deployment: A Detour for Exchange

Implementation: Our version

Deployment: Our Version Diagram

Our Way: Hardware/OS (Free Beer)

Our Way: Sendmail (Cheap Beer)

Our Way: Dealing with the SMTP parts

Our Way: Dealing with the SMTP parts - meta-config file

Our Way: Milters - a detour

Our Way: Virus Filter Software (Almost Free Beer)

Our Way: RCPT Verification Software (Almost Free Beer)

Our Way: RCPT verification milter and Sendmail

Our Way: Connecting Sendmail to the Milters

Implementation: The Big Mail Server Farm

Deployment: The Big Mail Server Farm - Diagram

The Big Mail Server Farm

Crazy Ideas 1

Crazy Ideas 2: Transparent (or is it Opaque?) Proxy

Odds and Ends

References/Resources

Thanks and Acknowledgements

Scalable SMTP E-Mail Filtering
Methods and Techniques
UCCSC 2007, Santa Cruz

Note: press space to go forward. pageup/down keys work too
This is an (x)html document in the S5 presentation system, so there are lots of links one can follow.