Switching from SpamAssassin+Bogofilter+Exim to Rspamd+Exim+Dovecot

For more than ten years, I used SpamAssassin and Bogofilter along with Exim to filter spams, along with SFP and Greylisting directly within Exim.

Why changing?

I must say that almost no spam reached me unflagged for years. Why changing anything then?

First, I have more users and the system was not really multiuser-aware. For instance, the bayesian filter training cronjob had configured SPAMDIR, etc.

Second, my whole setup was based on using specific transports and routers in exim to send mails first to bogofilter, then to spamassassin. It means that filtering is done after SMTP-time, when the mail has been already accepted. You filter but do not discourage or block spam sources.

Rspamd?

General
Written inC/LuaPerlC
Process modelevent drivenpre-forked poolLDA and pre-forked
MTA integrationmilter, LDA, custommilter, custom (Amavis)LDA
Web interfaceembedded3rd party
Languages supportfull, UTF-8 conversion/normalisation, lemmatizationnaïve (ASCII lowercase)naïve
Scripting supportLua APIPerl plugins
LicenceApache 2Apache 2GPL
Development statusvery activeactiveabandoned

Rspamd seems activitely developed and easy to integrate not only with Exim, the SMTP, but also with Dovecot, which is use as IMAPS server.

Instead of having:

Exim SMTP accept with greylist -> bogofilter -> spamassassin -> procmail -> dovecot 

The idea is to have:

Exim SMTP accept with greylist and rspamd -> dovecot with sieve filtering 

It blocks rejects/discard spam earlier and makes filtering easier in a multiuser environment (sieve is not dangerous, unlike procmail, and can be managed by clients, if desirable)

My new setup is contained in my rien-mx package: the initial greylist system is still there.

Exim

What matters most is acl_check_rcpt definition (already used in previous version) and new acl_check_data definition.:

### acl/41_rien-check_data_spam
#################################
# based on https://rspamd.com/doc/integration.html
# -  using CHECK_DATA_LOCAL_ACL_FILE included in the acl_check_data instead a creating a new acl
# - and scan all the messages no matter the source:
#    because some might be forwarded by smarthost client, requiring scanning with no defer/reject

## process earlier scan

# find out if a (positive) spam level is already set
warn
  condition = ${if match{$h_X-Spam-Level:}{\N\*|\+\N}}
  set acl_m_spamlevel = $h_X-Spam-Level:
warn
  condition = ${if match{$h_X-Spam-Bar:}{\N\*|\+\N}}
  set acl_m_spamlevel = $h_X-Spam-Bar:
warn
  condition = ${if match{$h_X-Spam_Bar:}{\N\*|\+\N}}
  set acl_m_spamlevel = $h_X-Spam_Bar:

# discard high probability spam identified by earlier scanner
# (probably forwarded by a friendly server, since it is unlikely that a spam source would shoot
# itself in the foot, no point to generate bounces)
discard
  condition = ${if >={${strlen:$acl_m_spamlevel}}{15}}
  log_message = discard as high-probability spam announced

# at least make sure X-Spam-Status is set if relevant
warn
  condition = ${if and{{ !def:h_X-Spam-Status:}{ >={${strlen:$acl_m_spamlevel}}{6} }}}
  add_header = X-Spam-Status: Yes, earlier scan ($acl_m_spamlevel)

# accept content from relayed hosts with no spam check
# unless registered in final_from_hosts (they are outside the local network)
accept
  hosts = +relay_from_hosts
  !hosts = ${if exists{CONFDIR/final_from_hosts}\
		      {CONFDIR/final_from_hosts}\
		      {}}

# rename earlier reports and score
warn
  condition = ${if def:h_X-Spam-Report:}
  add_header = X-Spam-Report-Earlier: $h_X-Spam-Report:
warn
  condition = ${if def:h_X-Spam_Report:}
  add_header = X-Spam-Report-Earlier: $h_X-Spam_Report:
warn
  condition = ${if def:h_X-Spam-Score:}
  add_header = X-Spam-Score-Earlier: $h_X-Spam-Score:
warn
  condition = ${if def:h_X-Spam_Score:}
  add_header = X-Spam-Score-Earlier: $h_X-Spam_Score:


# scan the message with rspamd
warn spam = nobody:true
# This will set variables as follows:
# $spam_action is the action recommended by rspamd
# $spam_score is the message score (we unlikely need it)
# $spam_score_int is spam score multiplied by 10
# $spam_report lists symbols matched & protocol messages
# $spam_bar is a visual indicator of spam/ham level

# remove foreign headers except spam-status, because it better to have twice than none 
warn
  remove_header = x-spam-bar : x-spam_bar : x-spam-score : x-spam_score : x-spam-report : x-spam_report : x-spam_score_int : x-spam_action : x-spam-level
  
# add spam-score and spam-report header
# (possible to add condition to add header rspamd recommend:
#   condition  = ${if eq{$spam_action}{add header})
warn
  add_header = X-Spam-Score: $spam_score
  add_header = X-Spam-Report: $spam_report

# add x-spam-status header if message is not ham
# do not match when $spam_action is empty (e.g. when rspamd is not running)
warn
  ! condition  = ${if match{$spam_action}{^no action\$|^greylist\$|^\$}}
  add_header = X-Spam-Status: Yes

# add x-spam-bar header if score is positive
warn
  condition = ${if >{$spam_score_int}{0}}
  add_header = X-Spam-Bar: $spam_bar

## delay/discard/deny depending on the scan
  
# use greylisting with rspamd
# (unless coming from authenticated or relayed host)
defer message    = Please try again later
   condition  = ${if eq{$spam_action}{soft reject}}
   !hosts = ${if exists{CONFDIR/final_from_hosts}\
		       {CONFDIR/final_from_hosts}\
		       {}}
   !authenticated = *
   log_message  = greylist $sender_host_address according to soft reject spam filtering

# high probability spam get silently discarded if 
# coming from authenticated or relayed host
discard
   condition  = ${if eq{$spam_action}{reject}}
   hosts = ${if exists{CONFDIR/final_from_hosts}\
		       {CONFDIR/final_from_hosts}\
		       {}}
   log_message  = discard as high-probability spam from final from host

discard
   condition  = ${if eq{$spam_action}{reject}}
   authenticated = *
   log_message  = discard as high-probability spam from authentificated
   
# refuse high probability spam from other sources
deny  message    = Message discarded as high-probability spam
   condition  = ${if eq{$spam_action}{reject}}
   log_message	= reject mail from $sender_host_address as high-probability spam

These two will take to send through rspamd and accept/reject/discard mails.

A dovecot_lmtp transport is also necessary:

dovecot_lmtp:   
  debug_print = "T: dovecot_lmtp for $local_part@$domain"   
  driver = lmtp   
  socket = /var/run/dovecot/lmtp   
  #maximum number of deliveries per batch, default 1   
  batch_max = 200   
  # remove suffixes/prefixes   
  rcpt_include_affixes = false 

There are also other internal files, especially in conf.d/main. For instance. If you want to follow my setup, you are encouraged to download the whole mx/etc/exim folder at least. Most files have comments, easy to find out if they are relevant or not. Or you can just copy/paste relevant settings into etc/conf.d/main/10_localsettings, like for instance:

# path of rspamd 
spamd_address = 127.0.0.1 11333 variant=rspamd 

# data acl definition 
CHECK_DATA_LOCAL_ACL_FILE =  /etc/exim4/conf.d/acl/41_rien-check_data_spam

# memcache traditional greylioting
GREY_MINUTES  = 0.4
GREY_TTL_DAYS = 25
# we greylist servers, so we keep it to the minimum required to cross-check with SPF
#   sender IP, sender domain
GREYLIST_ARGS = {${quote:$sender_host_address}}{${quote:$sender_address_domain}}{GREY_MINUTES}{GREY_TTL_DAYS}

Other files exim4/conf.d/ are useful for other local features a bit outside the scope of this article (business per target email aliases, specific handling of friendly relays, SMTP forward to specific authenticated SMPT for specific domains when sending mails).

Dovecot

This assumes that dovecot already works (with all components installed). Nonetheless, you need to edit LTMP delivery by editing /etc/dovecot/conf.d/20-lmtp.conf as follow:

# to be added
lmtp_proxy = no
lmtp_save_to_detail_mailbox = no
lmtp_rcpt_check_quota = no
lmtp_add_received_header = no 

protocol lmtp {
  # Space separated list of plugins to load (default is global mail_plugins).
  mail_plugins = $mail_plugins
  # remove domain from user name
  auth_username_format = %n
}

You also need to edit /etc/dovecot/conf.d/90-sieve.conf:

# to be added

 # editheader is restricted to admin global sieve
 sieve_global_extensions = +editheader

 # run global sieve (sievec must ran manually every time they are updated)
 sieve_before = /etc/dovecot/sieve.d/

You also need to edit /etc/dovecot/conf.d/20-imap.conf:

protocol imap {   
  mail_plugins = $mail_plugins imap_sieve
}

You also need to edit /etc/dovecot/conf.d/90-plugin.conf:

plugin {
  sieve_plugins = sieve_imapsieve sieve_extprograms
  sieve_extensions = +vnd.dovecot.pipe +vnd.dovecot.environment

  imapsieve_mailbox4_name = Spam
  imapsieve_mailbox4_causes = COPY APPEND
  imapsieve_mailbox4_before = file:/usr/local/lib/dovecot/report-spam.sieve

  imapsieve_mailbox5_name = *
  imapsieve_mailbox5_from = Spam
  imapsieve_mailbox5_causes = COPY
  imapsieve_mailbox5_before = file:/usr/local/lib/dovecot/report-ham.sieve

  imapsieve_mailbox3_name = Inbox
  imapsieve_mailbox3_causes = APPEND
  imapsieve_mailbox3_before = file:/usr/local/lib/dovecot/report-ham.sieve

  sieve_pipe_bin_dir = /usr/local/lib/dovecot/
}

You need custom scripts to train dovecot: both shell and sieve filters. /usr/local/lib/dovecot/report-ham.sieve:

require ["vnd.dovecot.pipe", "copy", "imapsieve", "environment", "variables"];

if environment :matches "imap.mailbox" "*" {
  set "mailbox" "${1}";
}

if string "${mailbox}" "Trash" {
  stop;
}

if environment :matches "imap.user" "*" {
  set "username" "${1}";
}

pipe :copy "sa-learn-ham.sh" [ "${username}" ];

/usr/local/lib/dovecot/report-spam.sieve:

require ["vnd.dovecot.pipe", "copy", "imapsieve", "environment", "variables"];

if environment :matches "imap.user" "*" {
  set "username" "${1}";
}

pipe :copy "sa-learn-spam.sh" [ "${username}" ];

/usr/local/lib/dovecot/sa-learn-ham.sh

#!/bin/sh
exec /usr/bin/rspamc learn_ham

/usr/local/lib/dovecot/sa-learn-spam.sh

#!/bin/sh 
exec /usr/bin/rspamc learn_spam

Then you need a /etc/dovecot/sieve.d similar as mine to put all site-wide sieve scripts. Mine are shown as example of what can be done easily with sieve. Regarding spam, they will only flag spam. End user sieve filter will matter:

#; -*-sieve-*-
require ["editheader", "regex", "imap4flags", "vnd.dovecot.pipe", "copy"];

# simple flagging for easy per-user sorting
# chained, so only a single X-Sieve-Mark is possible

## flag Spam
if anyof (
	  header :regex "X-Spam-Status" "^Yes",
	  header :regex "X-Spam-Flag" "^YES",
	  header :regex "X-Bogosity" "^Spam",
	  header :regex "X-Spam_action" "^reject")
{
  # flag for the mail client
  addflag "Junk";
  # header for further parsing
  addheader "X-Sieve-Mark" "Spam";
  # autolearn
  pipe :copy "sa-learn-spam.sh";
}
## sysadmin
elsif address :localpart ["from", "sender"] ["root", "netdata", "mailer-daemon"]
{
  addheader "X-Sieve-Mark" "Sysadmin";
} 
## social network
elsif address :domain :regex ["to", "from", "cc"] ["^twitter\.",
						   "^facebook\.",
						   "^youtube\.",
						   "^mastodon\.",
						   "instagram\."]
{
  addheader "X-Sieve-Mark" "SocialNetwork";
}
## computer related
elsif address :domain :regex ["to", "from", "cc"] ["debian\.",
						   "devuan\.",
						   "gnu\.",
						   "gitlab\.",
						   "github\."]
{
  addheader "X-Sieve-Mark" "Cpu";
}

Each time scripts are modified in this folder, sievec must be run by root (because otherwise sieve script are compiled by current user, which cannot write in /etc for obvious reasons):

sievec -D /etc/dovecot/sieve.d
sievec -D /usr/local/lib/dovecot

Finally, as example of final user sieve script (to put in ~/.dovecot.sieve:

#; -*-sieve-*-
require ["fileinto", "regex", "vnd.dovecot.pipe", "copy"];

if header :is "X-Sieve-Mark" "Spam"
{
   # no care for undisclosed recipients potential false positive
  if address :contains ["to", "cc", "bcc"] ["undisclosed recipients", "undisclosed-recipients"]
		{
		  discard;
		  stop;
		}

  # otherwise just put in dedicated folder		
  fileinto "Spam";
  stop;
}

Rspamd

Rspamd was installed by devuan/debian package (not clear to me why Rspamd people discourage using these packages on their website, lacking context). It work out of the box.

I also installed clamav, razor and redis. Rspamd require lot of small tuning, check the folder /etc/rspamd

To get razor running, it pass request through /etc/xinetd.d/razor:

service razor
{
#	disable		= yes
	type		= UNLISTED
	socket_type     = stream
	protocol	= tcp
	wait		= no
	user		= _rspamd
	bind            = 127.0.0.1
	only_from	= 127.0.0.1
	port		= 11342
	server		= /usr/local/bin/razord
}

going along with wrapper script /usr/local/bin/razord:

#!/bin/sh
/usr/bin/razor-check && echo -n "spam" || echo -n "ham"

It is configured to work in the same way with pyzor but so far it does not work (not clear to me why – seems also an IPv6 issue, see below).

I noticed issues with IPv6: so far my mail servers are still IPv4 only and Rspamd nonetheless tries sometimes to connect on IPv6. I solved issue by commenting ::1 localhost in /etc/hosts.

Results

So far it works as expected (except the issues IPv4 vs IPv6 and pyzor). Rspamd required a bit more work than expected, but once it is going, it seems good.

Obviously, in the process, I lost the benefit of the well trained Bogofilter, but I hope soon enough Rspamd own bayesian filters will kick in.

In my setup there are extra files related to replicating over multiple servers that I might cover in another article (replication of email, sieve users filter through nextcloud and redis shared database via stunnel). The switch to Rspamd+Exim+Dovecot made this replication of multiples servers much better.

UPDATE: pipe :copy vs execute :pipe

Using pipe :copy in sieve script is actually causing issues. Sieve pipe is a disposition-type action, it is intended to deliver the message, similarly to a fileinto or redirect command. As such, if the command return failure, sieve filter stop. That is not desirable, if we use rspamd with learn_condition (defined in statistic.conf) to avoid multiple learning of the same file, etc. It would lead to such error in logs and sieve scripts prematurely ended:

[dovecot log]
Apr  9 21:08:10 mx dovecot: lmtp(userx)<31807><11nYLZrZUWI/fAAA4k3FvQ>: program exec:/usr/local/lib/dovecot/sa-learn-spam.sh (31810): Terminated with non-zero exit code 1
Apr  9 21:08:10 mx dovecot: lmtp(userx)<31807><11nYLZrZUWI/fAAA4k3FvQ>: Error: sieve: failed to execute to program `sa-learn-spam.sh': refer to server log for more information.
Apr  9 21:08:10 mx dovecot: lmtp(userx)<31807><11nYLZrZUWI/fAAA4k3FvQ>: sieve: msgid=<20220409190707.0A81E808EB@xxxxxxxxxxxxxxxxxxx>: stored mail into mailbox 'INBOX'
Apr  9 21:08:10 mx dovecot: lmtp(userx)<31807><11nYLZrZUWI/fAAA4k3FvQ>: Error: sieve: Execution of script /etc/dovecot/sieve.d/20_rien-mark.sieve failed, but implicit keep was successful

[rspamd log]
2022-04-09 21:08:10 #4264(controller) <eb2984>; csession; rspamd_stat_classifier_is_skipped: learn condition for classifier bayes returned: already in class spam; probability 93.38%; skip classifier
2022-04-09 21:08:10 #4264(controller) <eb2984>; csession; rspamd_task_process: learn error: all learn conditions denied learning spam in default classifier

We got an implicit keep, with an already known and identified spam forcefully sent to INBOX due to learning failure since it was already known.

Using execute :pipe instead solves the issue and match what we really want: the spam/ham learning process is extra step, it is neither involved in the filtering or delivery of the message. Its failure or success is irrelevant to the delivery process.

Using execute, the non-zero error return code from the executed script will be logged too, but without any other effect, especially not stopping sieve further processing:

[dovecot log]
Apr 10 15:12:09 mx dovecot: lmtp(userx)<3450><iqoWCKnXUmJ6DQAA4k3FvQ>: program exec:/usr/local/lib/dovecot/sa-learn-spam.sh (3451): Terminated with non-zero exit code 1
Apr 10 15:12:09 mx dovecot: lmtp(userx)<3450><iqoWCKnXUmJ6DQAA4k3FvQ>: sieve: msgid=<6252b1ee.1c69fb81.8a4e2.f847@xxxxxxxxxxxx>: fileinto action: stored mail into mailbox 'Spam'

[rspamd log]
2022-04-10 15:12:09 #9025(controller) <820d3e>; csession; rspamd_task_process: learn error: all learn conditions denied learning spam in default classifier

Check dovecot related files for up-to-date example/version:/etc/dovecot /usr/local/lib/dovecot

Leave a comment