proxy70

Title: Filtering spam using Rspamd and OpenSMTPD on OpenBSD
Author: Solène
Date: 13 July 2021
Tags: openbsd mail spam
Description: 

# Introduction

I recently used Spamassassin to get ride of the spam I started to
receive but it proved to be quite useless against some kind of spam so
I decided to give rspamd a try and write about it.

rspamd can filter spam but also sign outgoing messages with DKIM, I
will only care about the anti spam aspect.

rspamd project website

# Setup

The rspamd setup for spam was incredibly easy on OpenBSD (6.9 for me
when I wrote this).  We need to install the rspamd service but also the
connector for opensmtpd, and also redis which is mandatory to make
rspamd working.

```shell instructions
pkg_add opensmtpd-filter-rspamd rspamd redis
rcctl enable redis rspamd
rcctl start redis rspamd
```

Modify your /etc/mail/smtpd.conf file to add this new line:

```smtpd.conf file
filter rspamd proc-exec "filter-rspamd"
```

And modify your "listen on ..." lines to add "filter "rspamd"" to it,
like in this example:

```smtpd.conf file
listen on em0 pki perso.pw tls auth-optional   filter "rspamd"
listen on em0 pki perso.pw smtps auth-optional filter "rspamd"
```

Restart smtpd with "rcctl restart smtpd" and you should have rspamd
working!

# Using rspamd

Rspamd will automatically check multiple criteria for assigning a score
to an incoming email, beyond a high score the email will be rejected
but between a low score and too high, it may be tagged with a header
"X-spam" with the value true.

If you want to automatically put the tagged email as spam in your Junk
directory, either use a sieve filter on the server side or use a local
filter in your email client.  The sieve filter would look like this:

```sieve rule

if header :contains "X-Spam" "yes" {
        fileinto "Junk";
        stop;
}
```

# Feeding rspamd

If you want better results, the filter needs to learn what is spam and
what is not spam (named ham).  You need to regularly scan new emails to
increase the effectiveness of the filter, in my example I have a single
user with a Junk directory and an Archives directory within the maildir
storage, I use crontab to run learning on mails newer than 24h.

```crontab
0  1 * * * find /home/solene/maildir/.Archives/cur/ -mtime -1 -type f -exec rspamc learn_ham {} +
10 1 * * * find /home/solene/maildir/.Junk/cur/     -mtime -1 -type f -exec rspamc learn_spam {} +
```

# Getting statistics

rspamd comes with very nice reporting tools, you can get a WebUI on the
port 11334 which is listening on localhost by default so you would
require tuning rspamd to listen on other addresses or you can use a SSH
tunnel.

You can get the same statistics on the command line using the command
"rspamc stat" which should have an output similar to this:

```command line output
Results for command: stat (0.031 seconds)
Messages scanned: 615
Messages with action reject: 15, 2.43%
Messages with action soft reject: 0, 0.00%
Messages with action rewrite subject: 0, 0.00%
Messages with action add header: 9, 1.46%
Messages with action greylist: 6, 0.97%
Messages with action no action: 585, 95.12%
Messages treated as spam: 24, 3.90%
Messages treated as ham: 591, 96.09%
Messages learned: 4167
Connections count: 611
Control connections count: 5190
Pools allocated: 5824
Pools freed: 5801
Bytes allocated: 31.17MiB
Memory chunks allocated: 158
Shared chunks allocated: 16
Chunks freed: 0
Oversized chunks: 575
Fuzzy hashes in storage "rspamd.com": 2936336370
Fuzzy hashes stored: 2936336370
Statfile: BAYES_SPAM type: redis; length: 0; free blocks: 0; total blocks: 0; free: 0.00%; learned: 344; users: 1; languages: 0
Statfile: BAYES_HAM type: redis; length: 0; free blocks: 0; total blocks: 0; free: 0.00%; learned: 3822; users: 1; languages: 0
Total learns: 4166
```

# Conclusion

rspamd is for me a huge improvement in term of efficiency, when I tag
an email as spam the next one looking similar will immediately go into
Spam after the learning cron runs, it draws less memory then
Spamassassin and reports nice statistics.  My Spamassassin setup was
directly rejecting emails so I didn't have a good comprehension of its
effectiveness but I got too many identical messages over weeks that
were never filtered, for now rspamd proved to be better here.

I recommend looking at the configurations files, they are all disabled
by default but offer many comments with explanations which is a nice
introduction to learn about features of rspamd, I preferred to keep the
defaults and see how it goes before tweaking more.