Replace PowerDNS by Knot DNS and Knot Resolver+supervisor with DNSSEC, DNS over TLS and domain name spoofing

I was considering using Knot DNS since a while. Switching to DNS over TLS for the resolver queries was the push needed. Turns out that transposing my setup with Knot DNS is very easy and fast.

Knot Resolver does not provide init scripts and suggests to use supervisor as an alternative to systemd omnipresent features. Wary with this idea, turns out that supervisor is very easy to put in place and I might use it more, replace some xinetd, in the future.

For the record, my setup is as follow: there is a local DNS server to serve the local area network HERE.ici domain and is a resolver that cache requests. All requests are sent to the resolver and this one, if he cannot answer, then ask the relevant DNS server. Nothing too fancy, even if sometimes LAN are set up the other way around, where people query the local DNS server by default and this one query the local resolver if he can’t answer.

Install require the following:

apt install knot/testing
apt install knot-resolver supervisor

DNS for the local area network

The Knot DNS server will not be queried directly but by the Knot Resolver and DHCPd. Edit /etc/knot/knot.conf by adding:

server:
    # meant to be called only on loopback
    # by knot-resolver and dhcpd on update
    listen: 127.0.1.1@53

acl:
  - id: update_acl
    # restrict by IP is enough, no need for a ddns key stored on the same host
    address: 127.0.0.1
    action: update

zone:
  - domain: HERE.ici
    dnssec-signing: on
    acl: update_acl

  - domain: 10.in-addr.arpa
    dnssec-signing: on
    acl: update_acl

Create the zones (edit serverhostname and HERE.ici according to your setup):

invoke-rc.d knot restart

knotc zone-begin HERE.ici
knotc zone-set HERE.ici @ 7200 SOA serverhostname hostmaster 1 86400 900 691200 3600
knotc zone-set HERE.ici serverhostname 3600 A 10.10.10.1
knotc zone-set HERE.ici @ 3600 NS serverhostname
knotc zone-set HERE.ici @ 3600 MX 10 mx.HERE.ici
knotc zone-set HERE.ici jeden 3600 CNAME serverhostname
knotc zone-commit HERE.ici

knotc zone-begin 10.in-addr.arpa
knotc zone-set 10.in-addr.arpa @ 7200 SOA serverhostname.HERE.ici. hostmaster.HERE.ici. 1 86400 900 691
200 3600
knotc zone-set 10.in-addr.arpa 10.10.10.1 3600 PTR serverhostname
knotc zone-set 10.in-addr.arpa @ 3600 NS serverhostname.HERE.ici.
knotc zone-commit 10.in-addr.arpa

Zone will to be updated by the DHCP server, in this case ISC dhcpd. Edit /etc/dhcp/dhcpd.conf accordingly:

# dynamic update
ddns-updates on;
ddns-update-style standard;
ignore client-updates; # restrict to domain name

# option definitions common to all supported networks...
option domain-name "HERE.ici";
option domain-search "HERE.ici";
# you can add other extra name servers if you consider acceptable 
# direct external queries in case the resolver is dead
option domain-name-servers 10.0.0.1;
option routers 10.0.0.1;
default-lease-time 600;
max-lease-time 6000;
update-static-leases on;
authoritative;

 [...]

zone HERE.ici. {
  primary 127.0.1.1;
}
zone 10.in-addr.arpa. {
  primary 127.0.1.1;
}

No dynamic update keys, everything goes through the loopback. You might want erase DHCPd leases (usually in /var/lib/dhcp/) so it does not get confused.

DNS Resolver

The Knot Resolver will handle all clients queries, contacting Internet DNS over TLS if need be and caching results. Edit /etc/knot-resolver/kresd.conf to contain:

-- Network interface configuration
-- (knot dns should be using 127.0.1.1)
net.listen('127.0.0.1', 53, { kind = 'dns' })
net.listen('127.0.0.1', 853, { kind = 'tls' })
net.listen('10.0.0.1', 53, { kind = 'dns' })
net.listen('10.0.0.1', 853, { kind = 'tls' })

-- drop privileges (check /var/lib/knot-resolves modes/owner)
user('knot-resolver', 'knot-resolver')

-- Load useful modules
modules = {
   'hints > iterate',  -- Load /etc/hosts and allow custom root hints
   'stats',            -- Track internal statistics
   'predict',          -- Prefetch expiring/frequent records
   'view', 	       -- require to limit access
}

-- Cache size
cache.size = 500 * MB

-- whitelist queries identified by subnet
view:addr('127.0.0.0/24', policy.all(policy.PASS))
view:addr('10.0.0.0/24', policy.all(policy.PASS))
-- drop everything that hasn't matched
view:addr('0.0.0.0/0', policy.all(policy.DROP))

-- Custom hints: local spoofed address and antispam/ads
hints.add_hosts("/etc/knot-resolver/redirect-spoof")
hints.add_hosts("/etc/knot-resolver/redirect-ads")

-- internal domain: use knot dns listening on loopback
internalDomains = policy.todnames({'HERE.ici', '10.in-addr.arpa'})
policy.add(policy.suffix(policy.FLAGS({'NO_CACHE'}), internalDomains))
policy.add(policy.suffix(policy.STUB({'127.0.1.1@53'}), internalDomains))

-- forward in TLS					
policy.add(policy.all(policy.TLS_FORWARD(
			 {'208.67.222.222', hostname='dns.opendns.com'},
			 {'208.67.220.220', hostname='dns.opendns.com'},
 			 {'1.1.1.1', hostname='cloudflare-dns.com'},
			 {'1.0.0.1', hostname='cloudflare-dns.com'},
})))

redirect-spoof and redirect-ads at /etc/hosts format: it allows domain spoofing or ads domains filtering. It replaces conveniently the extra lua script that my setup was using with PowerDNS.

Update Feb 19 2023: Check recent files on gitlab, I know use RPZ instead of hints/hosts file to block hostile domains. No real change in principle but knot-resolver seems to handle better very long lists in this form.

Finally, the resolver need to be started by the supervisord, with a /etc/supervisor/conf.d/knot-resolver.conf as such:

[program:knot-resolver]
command=/usr/sbin/kresd -c /etc/knot-resolver/kresd.conf --noninteractive
priority=0
autostart=true
autorestart=true
stdout_syslog=true
stderr_syslog=true
directory=/var/lib/knot-resolver

[program:knot-resolver-gc]
command=/usr/sbin/kres-cache-gc -c /var/lib/knot-resolver -d 120000
user=knot-resolver
autostart=true
autorestart=true
stdout_syslog=true
stderr_syslog=true
directory=/var/lib/knot-resolver

Restart the supervisor, check logs. Everything should be fine. You can cleanup.

rc-update add supervisor
rc-update add knot
apt --purge remove pdns-*

# check if there is still traffic on DNS port 53 on the public network interface (should be none)
tcpdump -ni eth0 -p port  53
# check if there is trafic on DNS over TLS port 853 (should be whenever there is a query outside of the cache and LAN)
tcpdump -ni eth0 -p port  853

(My default files are my rien-host package; if you have on your network a mail server using DNS blacklist which will inevitably blocked, you might want to install knot-resolver also on this server, in recursive mode)

Banning IP on two iptables chains with fail2ban

If you use LXC containers in IPv4, it is very likely you use NAT with iptables. I found no immediate way to get fail2ban with iptables/ipset to apply ban on both INPUT (for the LXC master) and FORWARD (for the LXC slaves).

In /etc/fail2ban/jail.local

banaction = iptables-ipset-proto6-allports

(proto6 refers to ipset itself)

In /etc/fail2ban/action.d/iptables.local

[Init]
blocktype=DROP
chain=INPUT
chain2=FORWARD

# brute force add a forward rule 
# letting the INPUT as default for the relevant tests
[Definition]
_ipt_add_rules = <_ipt_for_proto-iter>
              { %(_ipt_check_rule)s >/dev/null 2>&1; } || { <iptables> -I <chain> %(_ipt_chain_rule)s; <iptables> -I <chain2> %(_ipt_chain_rule)s; }
              <_ipt_for_proto-done>

_ipt_del_rules = <_ipt_for_proto-iter>
              <iptables> -D <chain> %(_ipt_chain_rule)s
              <iptables> -D <chain2> %(_ipt_chain_rule)s
              <_ipt_for_proto-done>

After restarting fail2ban, you should find the relevant rules running:

fail2ban-client stop ; fail2ban-client start
iptables-save  | grep match
-A INPUT -p tcp -m set --match-set f2b-ssh src -j DROP
-A INPUT -p tcp -m set --match-set f2b-saslauthd src -j DROP
-A INPUT -p tcp -m set --match-set f2b-banstring src -j DROP
-A INPUT -p tcp -m set --match-set f2b-xinetd-fail src -j DROP
-A FORWARD -p tcp -m set --match-set f2b-ssh src -j DROP
-A FORWARD -p tcp -m set --match-set f2b-saslauthd src -j DROP
-A FORWARD -p tcp -m set --match-set f2b-banstring src -j DROP
-A FORWARD -p tcp -m set --match-set f2b-xinetd-fail src -j DROP

I’d like to act on PREROUTING level but found no faster way to do it. I’d welcome any suggestion to get to this result with less changes to default fail2ban setup.

Switching from SpamAssassin+Bogofilter+Exim to Rspamd+Exim+Dovecot

For more than ten years, I used SpamAssassin and Bogofilter along with Exim to filter spams, along with SFP and Greylisting directly within Exim.

Why changing?

I must say that almost no spam reached me unflagged for years. Why changing anything then?

First, I have more users and the system was not really multiuser-aware. For instance, the bayesian filter training cronjob had configured SPAMDIR, etc.

Second, my whole setup was based on using specific transports and routers in exim to send mails first to bogofilter, then to spamassassin. It means that filtering is done after SMTP-time, when the mail has been already accepted. You filter but do not discourage or block spam sources.

Rspamd?

General
Written inC/LuaPerlC
Process modelevent drivenpre-forked poolLDA and pre-forked
MTA integrationmilter, LDA, custommilter, custom (Amavis)LDA
Web interfaceembedded3rd party
Languages supportfull, UTF-8 conversion/normalisation, lemmatizationnaïve (ASCII lowercase)naïve
Scripting supportLua APIPerl plugins
LicenceApache 2Apache 2GPL
Development statusvery activeactiveabandoned

Rspamd seems activitely developed and easy to integrate not only with Exim, the SMTP, but also with Dovecot, which is use as IMAPS server.

Instead of having:

Exim SMTP accept with greylist -> bogofilter -> spamassassin -> procmail -> dovecot 

The idea is to have:

Exim SMTP accept with greylist and rspamd -> dovecot with sieve filtering 

It blocks rejects/discard spam earlier and makes filtering easier in a multiuser environment (sieve is not dangerous, unlike procmail, and can be managed by clients, if desirable)

My new setup is contained in my rien-mx package: the initial greylist system is still there.

Exim

What matters most is acl_check_rcpt definition (already used in previous version) and new acl_check_data definition.:

### acl/41_rien-check_data_spam
#################################
# based on https://rspamd.com/doc/integration.html
# -  using CHECK_DATA_LOCAL_ACL_FILE included in the acl_check_data instead a creating a new acl
# - and scan all the messages no matter the source:
#    because some might be forwarded by smarthost client, requiring scanning with no defer/reject

## process earlier scan

# find out if a (positive) spam level is already set
warn
  condition = ${if match{$h_X-Spam-Level:}{\N\*|\+\N}}
  set acl_m_spamlevel = $h_X-Spam-Level:
warn
  condition = ${if match{$h_X-Spam-Bar:}{\N\*|\+\N}}
  set acl_m_spamlevel = $h_X-Spam-Bar:
warn
  condition = ${if match{$h_X-Spam_Bar:}{\N\*|\+\N}}
  set acl_m_spamlevel = $h_X-Spam_Bar:

# discard high probability spam identified by earlier scanner
# (probably forwarded by a friendly server, since it is unlikely that a spam source would shoot
# itself in the foot, no point to generate bounces)
discard
  condition = ${if >={${strlen:$acl_m_spamlevel}}{15}}
  log_message = discard as high-probability spam announced

# at least make sure X-Spam-Status is set if relevant
warn
  condition = ${if and{{ !def:h_X-Spam-Status:}{ >={${strlen:$acl_m_spamlevel}}{6} }}}
  add_header = X-Spam-Status: Yes, earlier scan ($acl_m_spamlevel)

# accept content from relayed hosts with no spam check
# unless registered in final_from_hosts (they are outside the local network)
accept
  hosts = +relay_from_hosts
  !hosts = ${if exists{CONFDIR/final_from_hosts}\
		      {CONFDIR/final_from_hosts}\
		      {}}

# rename earlier reports and score
warn
  condition = ${if def:h_X-Spam-Report:}
  add_header = X-Spam-Report-Earlier: $h_X-Spam-Report:
warn
  condition = ${if def:h_X-Spam_Report:}
  add_header = X-Spam-Report-Earlier: $h_X-Spam_Report:
warn
  condition = ${if def:h_X-Spam-Score:}
  add_header = X-Spam-Score-Earlier: $h_X-Spam-Score:
warn
  condition = ${if def:h_X-Spam_Score:}
  add_header = X-Spam-Score-Earlier: $h_X-Spam_Score:


# scan the message with rspamd
warn spam = nobody:true
# This will set variables as follows:
# $spam_action is the action recommended by rspamd
# $spam_score is the message score (we unlikely need it)
# $spam_score_int is spam score multiplied by 10
# $spam_report lists symbols matched & protocol messages
# $spam_bar is a visual indicator of spam/ham level

# remove foreign headers except spam-status, because it better to have twice than none 
warn
  remove_header = x-spam-bar : x-spam_bar : x-spam-score : x-spam_score : x-spam-report : x-spam_report : x-spam_score_int : x-spam_action : x-spam-level
  
# add spam-score and spam-report header
# (possible to add condition to add header rspamd recommend:
#   condition  = ${if eq{$spam_action}{add header})
warn
  add_header = X-Spam-Score: $spam_score
  add_header = X-Spam-Report: $spam_report

# add x-spam-status header if message is not ham
# do not match when $spam_action is empty (e.g. when rspamd is not running)
warn
  ! condition  = ${if match{$spam_action}{^no action\$|^greylist\$|^\$}}
  add_header = X-Spam-Status: Yes

# add x-spam-bar header if score is positive
warn
  condition = ${if >{$spam_score_int}{0}}
  add_header = X-Spam-Bar: $spam_bar

## delay/discard/deny depending on the scan
  
# use greylisting with rspamd
# (unless coming from authenticated or relayed host)
defer message    = Please try again later
   condition  = ${if eq{$spam_action}{soft reject}}
   !hosts = ${if exists{CONFDIR/final_from_hosts}\
		       {CONFDIR/final_from_hosts}\
		       {}}
   !authenticated = *
   log_message  = greylist $sender_host_address according to soft reject spam filtering

# high probability spam get silently discarded if 
# coming from authenticated or relayed host
discard
   condition  = ${if eq{$spam_action}{reject}}
   hosts = ${if exists{CONFDIR/final_from_hosts}\
		       {CONFDIR/final_from_hosts}\
		       {}}
   log_message  = discard as high-probability spam from final from host

discard
   condition  = ${if eq{$spam_action}{reject}}
   authenticated = *
   log_message  = discard as high-probability spam from authentificated
   
# refuse high probability spam from other sources
deny  message    = Message discarded as high-probability spam
   condition  = ${if eq{$spam_action}{reject}}
   log_message	= reject mail from $sender_host_address as high-probability spam

These two will take to send through rspamd and accept/reject/discard mails.

A dovecot_lmtp transport is also necessary:

dovecot_lmtp:   
  debug_print = "T: dovecot_lmtp for $local_part@$domain"   
  driver = lmtp   
  socket = /var/run/dovecot/lmtp   
  #maximum number of deliveries per batch, default 1   
  batch_max = 200   
  # remove suffixes/prefixes   
  rcpt_include_affixes = false 

There are also other internal files, especially in conf.d/main. For instance. If you want to follow my setup, you are encouraged to download the whole mx/etc/exim folder at least. Most files have comments, easy to find out if they are relevant or not. Or you can just copy/paste relevant settings into etc/conf.d/main/10_localsettings, like for instance:

# path of rspamd 
spamd_address = 127.0.0.1 11333 variant=rspamd 

# data acl definition 
CHECK_DATA_LOCAL_ACL_FILE =  /etc/exim4/conf.d/acl/41_rien-check_data_spam

# memcache traditional greylioting
GREY_MINUTES  = 0.4
GREY_TTL_DAYS = 25
# we greylist servers, so we keep it to the minimum required to cross-check with SPF
#   sender IP, sender domain
GREYLIST_ARGS = {${quote:$sender_host_address}}{${quote:$sender_address_domain}}{GREY_MINUTES}{GREY_TTL_DAYS}

Other files exim4/conf.d/ are useful for other local features a bit outside the scope of this article (business per target email aliases, specific handling of friendly relays, SMTP forward to specific authenticated SMPT for specific domains when sending mails).

Dovecot

This assumes that dovecot already works (with all components installed). Nonetheless, you need to edit LTMP delivery by editing /etc/dovecot/conf.d/20-lmtp.conf as follow:

# to be added
lmtp_proxy = no
lmtp_save_to_detail_mailbox = no
lmtp_rcpt_check_quota = no
lmtp_add_received_header = no 

protocol lmtp {
  # Space separated list of plugins to load (default is global mail_plugins).
  mail_plugins = $mail_plugins
  # remove domain from user name
  auth_username_format = %n
}

You also need to edit /etc/dovecot/conf.d/90-sieve.conf:

# to be added

 # editheader is restricted to admin global sieve
 sieve_global_extensions = +editheader

 # run global sieve (sievec must ran manually every time they are updated)
 sieve_before = /etc/dovecot/sieve.d/

You also need to edit /etc/dovecot/conf.d/20-imap.conf:

protocol imap {   
  mail_plugins = $mail_plugins imap_sieve
}

You also need to edit /etc/dovecot/conf.d/90-plugin.conf:

plugin {
  sieve_plugins = sieve_imapsieve sieve_extprograms
  sieve_extensions = +vnd.dovecot.pipe +vnd.dovecot.environment

  imapsieve_mailbox4_name = Spam
  imapsieve_mailbox4_causes = COPY APPEND
  imapsieve_mailbox4_before = file:/usr/local/lib/dovecot/report-spam.sieve

  imapsieve_mailbox5_name = *
  imapsieve_mailbox5_from = Spam
  imapsieve_mailbox5_causes = COPY
  imapsieve_mailbox5_before = file:/usr/local/lib/dovecot/report-ham.sieve

  imapsieve_mailbox3_name = Inbox
  imapsieve_mailbox3_causes = APPEND
  imapsieve_mailbox3_before = file:/usr/local/lib/dovecot/report-ham.sieve

  sieve_pipe_bin_dir = /usr/local/lib/dovecot/
}

You need custom scripts to train dovecot: both shell and sieve filters. /usr/local/lib/dovecot/report-ham.sieve:

require ["vnd.dovecot.pipe", "copy", "imapsieve", "environment", "variables"];

if environment :matches "imap.mailbox" "*" {
  set "mailbox" "${1}";
}

if string "${mailbox}" "Trash" {
  stop;
}

if environment :matches "imap.user" "*" {
  set "username" "${1}";
}

pipe :copy "sa-learn-ham.sh" [ "${username}" ];

/usr/local/lib/dovecot/report-spam.sieve:

require ["vnd.dovecot.pipe", "copy", "imapsieve", "environment", "variables"];

if environment :matches "imap.user" "*" {
  set "username" "${1}";
}

pipe :copy "sa-learn-spam.sh" [ "${username}" ];

/usr/local/lib/dovecot/sa-learn-ham.sh

#!/bin/sh
exec /usr/bin/rspamc learn_ham

/usr/local/lib/dovecot/sa-learn-spam.sh

#!/bin/sh 
exec /usr/bin/rspamc learn_spam

Then you need a /etc/dovecot/sieve.d similar as mine to put all site-wide sieve scripts. Mine are shown as example of what can be done easily with sieve. Regarding spam, they will only flag spam. End user sieve filter will matter:

#; -*-sieve-*-
require ["editheader", "regex", "imap4flags", "vnd.dovecot.pipe", "copy"];

# simple flagging for easy per-user sorting
# chained, so only a single X-Sieve-Mark is possible

## flag Spam
if anyof (
	  header :regex "X-Spam-Status" "^Yes",
	  header :regex "X-Spam-Flag" "^YES",
	  header :regex "X-Bogosity" "^Spam",
	  header :regex "X-Spam_action" "^reject")
{
  # flag for the mail client
  addflag "Junk";
  # header for further parsing
  addheader "X-Sieve-Mark" "Spam";
  # autolearn
  pipe :copy "sa-learn-spam.sh";
}
## sysadmin
elsif address :localpart ["from", "sender"] ["root", "netdata", "mailer-daemon"]
{
  addheader "X-Sieve-Mark" "Sysadmin";
} 
## social network
elsif address :domain :regex ["to", "from", "cc"] ["^twitter\.",
						   "^facebook\.",
						   "^youtube\.",
						   "^mastodon\.",
						   "instagram\."]
{
  addheader "X-Sieve-Mark" "SocialNetwork";
}
## computer related
elsif address :domain :regex ["to", "from", "cc"] ["debian\.",
						   "devuan\.",
						   "gnu\.",
						   "gitlab\.",
						   "github\."]
{
  addheader "X-Sieve-Mark" "Cpu";
}

Each time scripts are modified in this folder, sievec must be run by root (because otherwise sieve script are compiled by current user, which cannot write in /etc for obvious reasons):

sievec -D /etc/dovecot/sieve.d
sievec -D /usr/local/lib/dovecot

Finally, as example of final user sieve script (to put in ~/.dovecot.sieve:

#; -*-sieve-*-
require ["fileinto", "regex", "vnd.dovecot.pipe", "copy"];

if header :is "X-Sieve-Mark" "Spam"
{
   # no care for undisclosed recipients potential false positive
  if address :contains ["to", "cc", "bcc"] ["undisclosed recipients", "undisclosed-recipients"]
		{
		  discard;
		  stop;
		}

  # otherwise just put in dedicated folder		
  fileinto "Spam";
  stop;
}

Rspamd

Rspamd was installed by devuan/debian package (not clear to me why Rspamd people discourage using these packages on their website, lacking context). It work out of the box.

I also installed clamav, razor and redis. Rspamd require lot of small tuning, check the folder /etc/rspamd

To get razor running, it pass request through /etc/xinetd.d/razor:

service razor
{
#	disable		= yes
	type		= UNLISTED
	socket_type     = stream
	protocol	= tcp
	wait		= no
	user		= _rspamd
	bind            = 127.0.0.1
	only_from	= 127.0.0.1
	port		= 11342
	server		= /usr/local/bin/razord
}

going along with wrapper script /usr/local/bin/razord:

#!/bin/sh
/usr/bin/razor-check && echo -n "spam" || echo -n "ham"

It is configured to work in the same way with pyzor but so far it does not work (not clear to me why – seems also an IPv6 issue, see below).

I noticed issues with IPv6: so far my mail servers are still IPv4 only and Rspamd nonetheless tries sometimes to connect on IPv6. I solved issue by commenting ::1 localhost in /etc/hosts.

Results

So far it works as expected (except the issues IPv4 vs IPv6 and pyzor). Rspamd required a bit more work than expected, but once it is going, it seems good.

Obviously, in the process, I lost the benefit of the well trained Bogofilter, but I hope soon enough Rspamd own bayesian filters will kick in.

In my setup there are extra files related to replicating over multiple servers that I might cover in another article (replication of email, sieve users filter through nextcloud and redis shared database via stunnel). The switch to Rspamd+Exim+Dovecot made this replication of multiples servers much better.

UPDATE: pipe :copy vs execute :pipe

Using pipe :copy in sieve script is actually causing issues. Sieve pipe is a disposition-type action, it is intended to deliver the message, similarly to a fileinto or redirect command. As such, if the command return failure, sieve filter stop. That is not desirable, if we use rspamd with learn_condition (defined in statistic.conf) to avoid multiple learning of the same file, etc. It would lead to such error in logs and sieve scripts prematurely ended:

[dovecot log]
Apr  9 21:08:10 mx dovecot: lmtp(userx)<31807><11nYLZrZUWI/fAAA4k3FvQ>: program exec:/usr/local/lib/dovecot/sa-learn-spam.sh (31810): Terminated with non-zero exit code 1
Apr  9 21:08:10 mx dovecot: lmtp(userx)<31807><11nYLZrZUWI/fAAA4k3FvQ>: Error: sieve: failed to execute to program `sa-learn-spam.sh': refer to server log for more information.
Apr  9 21:08:10 mx dovecot: lmtp(userx)<31807><11nYLZrZUWI/fAAA4k3FvQ>: sieve: msgid=<20220409190707.0A81E808EB@xxxxxxxxxxxxxxxxxxx>: stored mail into mailbox 'INBOX'
Apr  9 21:08:10 mx dovecot: lmtp(userx)<31807><11nYLZrZUWI/fAAA4k3FvQ>: Error: sieve: Execution of script /etc/dovecot/sieve.d/20_rien-mark.sieve failed, but implicit keep was successful

[rspamd log]
2022-04-09 21:08:10 #4264(controller) <eb2984>; csession; rspamd_stat_classifier_is_skipped: learn condition for classifier bayes returned: already in class spam; probability 93.38%; skip classifier
2022-04-09 21:08:10 #4264(controller) <eb2984>; csession; rspamd_task_process: learn error: all learn conditions denied learning spam in default classifier

We got an implicit keep, with an already known and identified spam forcefully sent to INBOX due to learning failure since it was already known.

Using execute :pipe instead solves the issue and match what we really want: the spam/ham learning process is extra step, it is neither involved in the filtering or delivery of the message. Its failure or success is irrelevant to the delivery process.

Using execute, the non-zero error return code from the executed script will be logged too, but without any other effect, especially not stopping sieve further processing:

[dovecot log]
Apr 10 15:12:09 mx dovecot: lmtp(userx)<3450><iqoWCKnXUmJ6DQAA4k3FvQ>: program exec:/usr/local/lib/dovecot/sa-learn-spam.sh (3451): Terminated with non-zero exit code 1
Apr 10 15:12:09 mx dovecot: lmtp(userx)<3450><iqoWCKnXUmJ6DQAA4k3FvQ>: sieve: msgid=<6252b1ee.1c69fb81.8a4e2.f847@xxxxxxxxxxxx>: fileinto action: stored mail into mailbox 'Spam'

[rspamd log]
2022-04-10 15:12:09 #9025(controller) <820d3e>; csession; rspamd_task_process: learn error: all learn conditions denied learning spam in default classifier

Check dovecot related files for up-to-date example/version:/etc/dovecot /usr/local/lib/dovecot

Preventing stored files deletion without assigning to root or using chattr

To prevent accidental or malicious files deletion (for instance a local collection of images, or a collection of movies on a Samba server), one option is to grant the directory to root or use chattr to make these files immutables (which also require root privileges).

That works. But any further modification would then require root privileges.

The proposed approach is, instead, to change ownership of files that reached a certain age (one week, one month or one year) to a dedicated “read-only” user, in a way that usual users can still add new files in the collection directories but no longer remove the old ones to safekeep.

This is not opposed to backups or filesystem snapshots, it is a step to prevent data loss instead of curing it.

Say on your video storage library on a samba server, files added by guest users are forcibly assigned to nobody:smbusers. You would then create a dedicated nobody-ro user and would configure in /etc/read-only-this.conf the library path to be handled by the read-only-this.pl script. Run by a daily cronjob, the read-only-this.pl script would reassign all files older than say one week to nobody-ro:smbusers, with no write group privilege. Directories would get the special T sticky bit so samba guest users would still be able to add new files but not remove old ones.

It would be possible to actually allow nobody-ro to log in through samba, shell or whatever scripts, to enable file removal or directory reorganisation. But the video storage library is protected from mistakes from regular users or malicious deletion by scripts using regular users accounts.

(Note that the read-only-this.pl script cares only for video/image/audio/documents mime types – the mime-type selection might later be added as configuration option)

It is included in the rien-common package.

Using PowerDNS (server and recursor) with DNSSEC and domain name spoofing/caching

update, October 2021: I consider stopping using PowerDNS. I do not want to rely on unreliable people (debian bug #997054 and debian bug #997056). update, February 2023: done with Knot DNS and Knot Resolver!

I updated my earlier PowerDNS (server and recursor) setup along with domain name spoofing/caching. This is a short update to allow DNSSEC usage and unlimited list of destinations from spoof/cache list. Files described here can be found on gitlab.

DNSSEC

Adding DNSSEC support can done easily by creating /etc/powerdns/recursor.d/10-dnssec.conf :

# dnssec	DNSSEC mode: off/process-no-validate 
#                    (default)/process/log-fail/validate
dnssec=validate

#################################
# dnssec-log-bogus	Log DNSSEC bogus validations
dnssec-log-bogus=no

Every local zone must be excluded by adding to /etc/powerdns/recursor.lua :

addNTA("10.10.10.in-addr.arpa", "internal zone")

New redirect.lua renewal script

Earlier version provided a static /etc/powerdns/redirect.lua which was depending on with redirect-cached.lua, redirect-ads.lua and redirect-blacklisted.lua, which contained lists of domains to either blacklist (meaning: redirected to loopback) or spoof.

Now, the script redirect-rebuild.pl use the configuration redirect-spooflist.conf to generate redirect.lua. The ads blacklist part is unchanged.

The configuration syntax is as follow:

# IP:	domain domain 

# redirect thisdomain.lan and thisother.lan to 192.168.0.1,
# except if 192.168.0.1 is asking 
192.168.0.1: thisdomain.lan thisother.lan 

# redirect anotherthisdomain.lan and anotherthisother.lan to 10.0.0.1,
# even if 10.0.0.1 is asking 
10.0.0.1+:    anotherthisdomain.lan anotherthisother.lan 

# you can use 127.0.0.1: to blacklist domains

It is enough to run the redirect-rebuild.pl script and restart the recursor:

use strict;
use Fcntl ':flock';

my $spooflist = "redirect-spooflist.conf";
my $ads_lua = "redirect-ads.lua";
my $ads_pl = "redirect-ads-rebuild.pl";
my $main_lua = "redirect.lua";

# disallow concurrent run
open(LOCK, "< $0") or die "Failed to ask lock. Exiting";
flock(LOCK, LOCK_EX | LOCK_NB) or die "Unable to lock. This daemon is already alive. Exiting";

# first check if we have a ads list to block
# if not, run the local script to build izt
unless (-e $ads_lua) {
    print "$ads_lua missing\n";
    print "run $ads_pl\n" and do "./$ads_pl" if -x $ads_pl;
}

my %cache;
# read conf
open(LIST, "< $spooflist");
while (<LIST>) {
    next if /^#/;
    next unless s/^(.*?):\s*//;
    $cache{$1} = [ split ];
}
close(LIST);

# build lua
open(NEWCONF, "> $main_lua");
printf NEWCONF ("-- Generated on %s by $0\n", scalar localtime);
print NEWCONF '-- IPv4 only script

-- ads kill list
ads = newDS()
adsdest = "127.0.0.1"
ads:add(dofile("/etc/powerdns/redirect-ads.lua"))

-- spoof lists
';

foreach my $ip (keys %cache) {
    # special handling of IP+, + meaning we spoof even to the destination host
    my $name = $ip;
    $name =~ s/(\.|\+)//g;  
    print NEWCONF "spoof$name = newDS()\n";
    print NEWCONF "spoof$name:add{", join(", ", map "\"$_\"", sort@{$cache{$ip}}), "}\n";
    $ip =~ s/(\+)//g;
    print NEWCONF "spoofdest$name = \"$ip\"\n";
}

print NEWCONF '
function preresolve(dq)
   -- DEBUG
   --pdnslog("Got question for "..dq.qname:toString().." from "..dq.remoteaddr:toString().." to "..dq.localaddr:toString(), pdns.loglevels.Error)
   
   -- spam/ads domains
   if(ads:check(dq.qname)) then
     if(dq.qtype == pdns.A) then
       dq:addAnswer(dq.qtype, adsdest)
       return true
     end
   end
    ';

foreach my $ip (keys %cache) {
    my $always = 0;
    $always = 1 if ($ip =~ s/(\+)//g);     # + along with IP means always spoof no matter who is asking
    my $name = $ip;
    $name =~ s/\.//g;

    print NEWCONF '
   -- domains spoofed to '.$ip.'
   if(spoof'.$name.':check(dq.qname)) then';
    print NEWCONF '
     dq.variable = true
     if(dq.remoteaddr:equal(newCA(spoofdest'.$name.'))) then
       -- request coming from the spoof/cache IP itself, no spoofing
       return false
     end' unless $always;
    print NEWCONF '   
     if(dq.qtype == pdns.A) then
       -- redirect to the spoof/cache IP
       dq:addAnswer(dq.qtype, spoofdest'.$name.')
       return true
     end
   end
	';
}

print NEWCONF '
   return false
end
';
close(NEWCONF);


# EOF

Fetching mails from gmail.com with lieer/notmuch instead of fetchmail

Google is gradually making traditional IMAPS access to gmail.com impossible, in it’s usual opaque way. It is claimed to be a security plus, though, if your data is already on gmail.com, it means that you are already accepting that it can and is spied on, so whether the extra fuss is justified is not so obvious.

Nonetheless, I have a few secondary old boxes on gmail, that I’d like to still extract mails from, not to miss any, without bothering connecting to with a web browser. But the usual fetchmail setup is no longer reliable.

The easiest alternative is to use lieer, notmuch and procmail together.

apt install links notmuch lieer procmail

# set up as regular user
su enduser

boxname=mygoogleuser
mkdir -p ~/mail/$boxname
notmuch
notmuch new
cd ~/mail/$boxname
gmi init $boxname@gmail.com

# at this moment, you'll get an URL to connect to gmail.com 
# with a web browser
# 
# if it started links instead, exit cleanly (no control-C)
# to get the URL
#
# that will then return an URL to a localhost:8080
# to no effect if you are on a distant server
# in this case, just run, in an another terminal
links "localhost:8080/..."
# or  run `gmi --noauth_local_webserver auth`  and use the URL
# on another browser on another computer

The setup should be ok. You can check and toy a bit with the setup:

gmi sync
notmuch new
notmuch search tag:unread
notmuch show --format=mbox thread:000000000000009c

Then you should mark all previous messages as read:

for thread in `notmuch search --output=threads tag:unread`; do echo "$thread" && notmuch tag -unread "$thread" ; done
gmi push

Finally, we set up the mail forward (in my case, the fetch action is done in a dedicated container, so it is forwarded to ‘user’ by SMTP to the destination host, but procmail allows any setup) and fetchmail script: each new unread mail is forwarded and then marked as read:

echo ':0
! user' > ~/.procmail

echo '#!/bin/bash

BOXES=""

# no concurrent run
if pidof -o %PPID -x "fetchmail.sh">/dev/null; then
        exit 1
fi

for box in $BOXES; do
    cd ~/mail/$box/
    # get data
    gmi pull --quiet
    notmuch new --quiet >/dev/null
    for thread in `notmuch search --output=threads tag:unread`; do
	# send unread through procmail
	notmuch show --format=mbox "$thread" | procmail
	# mark as read
	notmuch tag -unread "$thread"
    done
    # send data
    gmi push --quiet
    cd
done

# EOF' > ~/fetchmail.sh
chmod +x ~/fetchmail.sh

Then it is enough to call the script (with BOXES=”boxname” properly set) and to include it in cronjob, like using crontab -e`

notmuch does not remove mails.

Downgrading Nextcloud 18.0.3.0 to 17.0.5

I upgraded Nextcloud to the latest save without realizing the gallery app has been replaced by an half-baked “photos” app, completely useless to share pictures in any relevant way to me, in addition to a bug with ublock origin making the whole “sharing” interface disappearing.

Rolling back is not as easy at it seems: the “occ” php app check with your config version if it matches the software version, and if not matching just plainly refuses to work with:

An unhandled exception has been thrown:
OC\HintException: [0]: Downgrading is not supported and is likely to cause unpredictable issues (from 18.0.3.0 to 17.0.5.0) ()

So you need to update this config/config.php. Then, playing with occ app:list, occ app:remove and occ app:install only you can get back to a working install.

update: since then, nothing changed, nextcloud 20 has been released and it looks like the developers have no plan to deal with this issue. Since users were pissed at their refusal to consider there is an issue to fix, now they lost their interest in fixing the issue that they did not want to consider in first place.

Resize completely a wordpress.com blog’s media gallery

I found no convenient way to resize a whole media gallery on wordpress.com, with the free plan which does not allow to install plugins. Aside from that, I find strange wordpress itself still does not prevent duplicates media using checksum or else.

I had a blog with a media gallery reaching the limit of upload on a free plan. And it contained tons of very high-res pictures that actually could be downsized without posing any problem.

I found no convenient way to replace images (with the free plan and no plugins). If you reupload the same file, after deleting it, it will get an extra suffix -1/-2, etc: wordpress clearly keep the deleted media names in the database and prevent it to be reused so it is a no go.

The only solution I found was to:

  • export both post and media files;
  • delete all files and post;
  • run scripts to update images files and xml export/import files;
  • reimport everything with new filenames.

It is not perfect, some data will be lost, namely old galleries and some post front image choice, etc. Follows the first approach (1, 2, 3) and the second one (1, 2+3, 4) that tries to do smarter things (but is then more likely to break soon).

1. Renaming and downsizing the images

# to run in the exported media directory (extracted in a single directory)    
date=`date +%H%M%S`
backup=`mktemp --directory --tmpdir=$PWD -t $date-XXX.bak`
    
for file in  *.png *.jpg *.jpeg; do
	# skip if not a file
	if [ ! -f "$file" ]; then continue; fi

	# rename:
	newfile="BLOGNAME.wordpress.com-"$file
	echo "$file => $newfile"
	
	# limit to 1600 in max size  - to check up to decent size amount for the full
	convert "$file" -resize 1600x1600\> "$newfile"
	mv "$file" $backup/
	
done

2. Updating post export/import with new images filenames

#!/usr/bin/perl
# to be saved as as perl script file, 
# edit (especial THISBLOG and 2020/03, YYYY/MM of the new upload that will be added automatically in the URL of the media file during upload)
# and then 
# run against the exports files like
# chmod +x ./thiscript.pl
# ./thiscript.pl 

# note: wordpress.com rename -- in - during upload

use strict;


open(IN, "< $ARGV[0]");
open(OUT, "> edited.$ARGV[0]");
while (<IN>) {
    s@\.files\.wordpress.com\/\d{4}\/\d{2}\/@.files.wordpress.com/2020/03/THISBLOG.wordpress.com-re.up-@ig;
    print OUT $_;
}
close(IN);
close(OUT);

3. Checking if images appears in posts

#!/usr/bin/perl
#
# make sure every image downloaded actually exist in posts
# also to download and run against the xml import/export files and the media directory
# ./thiscript.pl THISBLOG.wordpress.2020-03-29.001.xml #../images_updated

use strict;
use warnings;
use Regexp::Common qw/URI/;
use File::Basename;
use File::Copy;

my %images;

open(IN, "< $ARGV[0]");
while (<IN>) {
    my $url;
    next unless /($RE{URI}{HTTP}{-keep})/;
    $url = $1;
    next unless $url =~ /THISBLOG\.files\./;
    my ($basename, $parentdir, $extension) = fileparse($url, qr/\.[^.]*$/);

    if ($extension  =~ /^\.(png|jpg|jpeg|gif)$/i) {
	#print "$basename$extension ($url)\n";
	$images{"$basename$extension"} = "$basename$extension"
	    unless $images{"$basename$extension"};
    } else {
	#print "IGNORE $basename$extension ($url)\n";
    }

}

while (my($image, ) = sort(each(%images))) {
    print "$image\n";
    if ($ARGV[1]) {
	if ((-d "$ARGV[1]") and (-e "$ARGV[1]/$image")) {
	    mkdir("$ARGV[1]/valide") unless -d "$ARGV[1]/valide";
	    copy("$ARGV[1]/$image", "$ARGV[1]/valide/") or print "failed to copy $image to $ARGV[1]/valide/\n";
	}
    }
}

This scripts are primitive (sure, even the blog name and upload YYYY/MM was hardcoded). Since platforms like wordpress.com often changes, this might no longer works at another time. Many pages on the Internet claims you can simply erase and reupload image with the same name: this clearly no longer works. This page could save you some trial-and-error process to give a solution that works as of today.

Note also that some wordpress.com theme have front page image and other selections that wont be carried over.

2+3. Updated script to update xml and check images

I wrote the following script for a smoother process. Being more complex, it is more likely to be fragile also. It still miss handling/removing some specific outdated wp:metadata. This one list duplicates, unused and missing files.

#!/usr/bin/perl
#
# ./renew-url+checkimages.pl FOLDER_OF_XML FOLDER_OF_IMAGES (no subdirs)

use strict;
use warnings;
use Regexp::Common qw/URI/;
use File::Basename;
use File::Copy;
use File::Slurp;
use Cwd 'abs_path';
use MIME::Base64;
use URI;

my $blog = "zukowka";
my $newprefix = "zukowka.wordpress.com-re.up-";
my $upload_yyyymm = "2020/03"; # time of the new upload
my $images_types = "png|jpg|jpeg|gif";

my %oldurl_newurl;       # {original} = new  
my %imagebase64_newurl;    # {base64} = new_url


die "$0 FOLDER_OF_XML FOLDER_OF_IMAGES (no subdirs)" unless $ARGV[0] and$ARGV[1];



# get dir full path 
my $xmldir;
$xmldir = abs_path($ARGV[0])
    if $ARGV[0];
# get dir full path 
my $imgdir;
$imgdir = abs_path($ARGV[1])
    if $ARGV[1];


my ($unused,$dupes,$missing,$mapped);
open(DUPES, "> $0.duplicatedimages");  # same image found with different names/URL
open(MISSING, "> $0.missingimages");   # file listed in xml not found in the image directory
open(UNUSED, "> $0.unusedimages");   #  file found in the image directory but not listed
open(MAPPING, "> $0.mapping");


# test access to dir
print "XML dir (ARG1): $xmldir
images dir (ARG2): $imgdir\n";
chdir($xmldir) if -d $xmldir or die "unable to enter $xmldir, exiting"; 
chdir($imgdir) if -d $imgdir or die "unable to enter $imgdir, exiting";

# - slurp all urls in import files
# - check if the image listed exist, store with base64 so we keep only one
chdir($xmldir);
while (defined(my $file = glob("*.xml"))) {
    print "### slurp $file ###\n";
    open(IN, "< $file");
    while (<IN>) {
	while (/($RE{URI}{HTTP}{-scheme => qr@https?@}{-keep})/g) {
	    # remove the http/https and possible args to keep the most portable url 
	    my $uri = URI->new($1);
	    my $url = $uri->authority.$uri->path;

	    
	    # images URL always start with $blog.files.wordpress.com
	    next unless $url =~ /^$blog\.files\./;
	    
	    my ($basename, $parentdir, $extension) = fileparse($url, qr/\.[^.]*$/);
	    my $newurl = "$blog.files.wordpress.com/$upload_yyyymm/$newprefix$basename$extension";
	    
	    # skip if the url was already mapped
	    if ($oldurl_newurl{$url}) {
		#print "SEEN ALREADY (skipping) $url\n";
		next;
	    }
	    
	    # work only on images
	    next unless lc($extension) =~ /^\.($images_types)$/i;
	    
	    # check if the relevant image exists in the image folder
	    my $newimage = "$imgdir/$newprefix$basename$extension";
	    unless (-e $newimage) { 
		print "MISSING (skipping) $newimage\n";
		print MISSING "$newimage\n";
		$missing++;
		next;
	    }
	    
	    # get image base64
	    my $base64 = encode_base64(read_file($newimage));
	    
	    # find out if this exact image is already known
	    if ($imagebase64_newurl{$base64}) {
		# already known, will point to the first one found
		print "DUPES $newprefix$basename$extension\n";
		print DUPES "$newprefix$basename$extension:\n\t$url => $imagebase64_newurl{$base64}\n";
		$dupes++;
		
		# URLs will point to the first image found
		#   full form like http://blog.files.wordpress.com/YYYY/MM/file.jpg
		$oldurl_newurl{$url} = $imagebase64_newurl{$base64};
		#   short form like http://blog.wordpress.com/file/
		my ($base64_basename, $base64_parentdir, $base64_extension) = fileparse($imagebase64_newurl{$base64}, qr/\.[^.]*$/);
		$oldurl_newurl{"$blog.wordpress.com/$basename/"} = "$blog.wordpress.com/$base64_basename/" if
		    "$blog.wordpress.com/$basename/" ne "$blog.wordpress.com/$base64_basename/";
		
	    } else {
		# store base64 with full form url to
		$imagebase64_newurl{$base64} = $newurl;
		
		# URLs will point to the first image found
		#   full form like http://blog.files.wordpress.com/YYYY/MM/file.jpg
		$oldurl_newurl{$url} = $newurl;
		#   short form like http://blog.wordpress.com/file/
		$oldurl_newurl{"$blog.wordpress.com/$basename/"} = "$blog.wordpress.com/$newprefix$basename/" if
		    "$blog.wordpress.com/$basename/" ne "$blog.wordpress.com/$newprefix$basename/";
			
	    }
	}
    }
    close(IN);
}

# store mappings
my %used;
while (my($oldurl,$newurl) = sort(each(%oldurl_newurl))) {
    print MAPPING "$oldurl => $newurl\n";
    my ($basename, $parentdir, $extension) = fileparse($newurl, qr/\.[^.]*$/);
    $used{"$basename$extension"} = 1;
    $mapped++;
}


# build import xml 
chdir($xmldir);
mkdir("$xmldir/renew") unless -d "$xmldir/renew";
while (defined(my $file = glob("*.xml"))) {
    print "### rewrite $file ###\n";
    open(OUT, "> renew/renewed-$file");
    open(IN, "< $file");
    while (<IN>) {
	my $line = $_;
	# check every url
	while (/($RE{URI}{HTTP}{-scheme => qr@https?@}{-keep})/g) {
	    # remove the http/https and possible args to keep the most portable url 
	    my $uri = URI->new($1);
	    my $url = $uri->authority.$uri->path;
	    
	    # update if mapping registered
	    if ($oldurl_newurl{$url}) {
		$line =~ s/$url/$oldurl_newurl{$url}/g;
		#print "$url -> $oldurl_newurl{$url}\n"
	    }
	}
	print OUT $line;
    }
    close(OUT);
    close(IN);	
}

# finally list useless images
chdir($imgdir);
mkdir("$xmldir/renew") unless -d "$xmldir/renew";
while (defined(my $file = glob("*"))) {    
    # work only on images
    my ($basename, $parentdir, $extension) = fileparse($file, qr/\.[^.]*$/);
    next unless lc($extension) =~ /^\.($images_types)$/i;

    # check if registered yet
    next if $used{$file};

    # if we reach this point, this media is unknown
    $unused++;
    print UNUSED $file, "\n";
}

close(MAPPING);
close(DUPES);
close(MISSING);
close(UNUSED);


print "=============================
$mapped mapped URL
$missing missing files (!)
$dupes duplicated files/links
$unused unused files\n";

# EOF

Grabbing new images post_id

There are used in gallery in the form . To use this script, you must compare your new XML produced by the previous script and new XML export made by wordpress.com AFTER uploading the new images.

The point is to get post_id from newly uploaded images and to map them to the old removed images post_id.


#!/usr/bin/perl
#
# ./update-post_id.pl FOLDER_OF_XML_UPDATED FOLDER_OF_XML_AFTER_IMAGE_REUPLOAD
#
#
# galleries are [gallery ids="27,28,29,30,31,32"...
#   and later <!-- wp:gallery {"ids":[4741]} -->
# refering to <wp:post_id>27</wppost_id> of image attachements.
#
# reuploaded files have new post_id along with new metadata hardcoded in the database
# 	<guid isPermaLink="false">http://XX.files.wordpress.com/2020/03/XXX-img_20160814_125558.jpg</guid>
#       <wp:post_type>attachment</wp:post_type>
# should match
#       <wp:attachment_url>https://XX.files.wordpress.com/2020/03/XXX-img_20160814_125558.jpg</wp:attachment_url>


use strict;
use warnings;
use Regexp::Common qw/URI/;
use File::Basename;
use File::Copy;
use File::Slurp;
use Cwd 'abs_path';
use MIME::Base64;
use URI;
use XML::LibXML;

die "$0 FOLDER_OF_XML_UPDATED FOLDER_OF_XML_AFTER_IMAGE_REUPLOAD" unless $ARGV[0] and$ARGV[1];



# get dir full path 
my $xmldir;
$xmldir = abs_path($ARGV[0])
    if $ARGV[0];
# get dir full path 
my $xmlafterreuploaddir;
$xmlafterreuploaddir = abs_path($ARGV[1])
    if $ARGV[1];

# test access to dir
print "XML dir (ARG1): $xmldir
XML after reupload dir (ARG2): $xmlafterreuploaddir\n";
chdir($xmldir) if -d $xmldir or die "unable to enter $xmldir, exiting"; 
chdir($xmlafterreuploaddir) if -d $xmlafterreuploaddir or die "unable to enter $xmlafterreuploaddir, exiting";


my %guid_postid;  # {guid} = postid



chdir($xmlafterreuploaddir);
# get postid after reupload
while (defined(my $file = glob("*.xml"))) {
    print "### read $file ###\n";
    my $dom = XML::LibXML->load_xml(location=>$file);

    foreach my $e ($dom->findnodes('//item')) {	
	#	print $e->to_literal();

	# only care for attachement type post
	next unless $e->findvalue('./wp:post_type') eq 'attachment';
	
	# store new post_id with the guid as key 	
	$guid_postid{$e->findvalue('./guid')} = $e->findvalue('./wp:post_id');
    }   
}


my %old2new_postid; # {old} = new


chdir($xmldir);
# get postid in first export
while (defined(my $file = glob("*.xml"))) {
    print "### read $file ###\n";
    my $dom = XML::LibXML->load_xml(location=>$file);

    foreach my $e ($dom->findnodes('//item')) {
	# only care for attachement type post
	next unless $e->findvalue('./wp:post_type') eq 'attachment';

	# ignore if this guid was not found/replaced
	next unless $guid_postid{$e->findvalue('./guid')};

	# map postids
	$old2new_postid{$e->findvalue('./wp:post_id')} = $guid_postid{$e->findvalue('./guid')};	
	print $e->findvalue('./wp:post_id')." -> ".$guid_postid{$e->findvalue('./guid')}."\n";
    }   
}


# finally, with this mapping, edit xml gallery entries:
# build import xml 
chdir($xmldir);
mkdir("$xmldir/newpostid") unless -d "$xmldir/newpostid";
while (defined(my $file = glob("*.xml"))) {
    print "### rewrite $file ###\n";
    open(OUT, "> newpostid/$file");
    open(IN, "< $file");
    while (<IN>) {
	my $line = $_;
	# older galleries
	# [gallery ids="1495,1496" type="rectangular" link="file"]	
	while (/\[gallery ids\=\"([\d|,]*)".*]/g) {
	    print "$1\n";
	    my $original = $1;
	    my @new_ids;
	    foreach my $id (split(",", $original)) {
		if ($old2new_postid{$id}) {
		    push(@new_ids, $old2new_postid{$id});
		} else {
		    # if not found, push back the original one
		    push(@new_ids, $id);
		}
	    }
	    my $new = join(",", @new_ids);
	    print " => $new\n";

	    $line =~ s/\[gallery ids\=\"$original\" /[gallery ids="$new" /g;
	}
	# newer galleries
	# <!-- wp:gallery {"ids":[4744,4745],"columns":2} -->	
	while (/\<\!\-\- wp\:gallery \{\"ids\"\:\[([\d|,]*)\].*\}/g) {
	    print "$1\n";
	    my $original = $1;
	    my @new_ids;
	    foreach my $id (split(",", $original)) {
		if ($old2new_postid{$id}) {
		    push(@new_ids, $old2new_postid{$id});
		} else {
		    # if not found, push back the original one
		    push(@new_ids, $id);
		}
	    }
	    my $new = join(",", @new_ids);
	    print " => $new\n";

	    $line =~ s/\<\!\-\- wp\:gallery \{\"ids\"\:\[$original\]/<!-- wp:gallery {"ids":[$new]/g;
	    print $line;
	}

	print OUT $line;
    }
    close(OUT);
    close(IN);	
}


# EOF

SPF-aware greylisting with Exim and memcache

This is a followup of my 2011’s article avoiding Spams with SPF and greylisting within Exim. What changed since then? I actually am not more harrassed by spam that I was earlier on. It works. I am spam free since a decade now. No, but, however, several importants mail providers have a tendancy to send mail through multiples SMTPs, so many it took a while for any of them to do at least two attempt. So some mails takes ages to pass the greylist.

Contemplating the idea to use opensmtpd, I incidentally found an interesting proposal to mix greylisting of IP with SPF-validated domains.

The idea is that you greylist either an SMTP IP or a domain including any SMTP IP approved by SPF.

I updated the memcached-exim.pl script previously used and described. It was simplified because I dont think useful to actually make greylist per sender and recipient, only per IP or domain. Now it either only greylist IP, if not validated by SPF, or the domain and IP on success (to save a few SPF further test).

I dont think it should have any noticeable impact on the server behavior. SPF is anyway checked, so it is meaningless since there is local caching DNS on my mail servers.

The earlier /etc/exim4/memcached.conf is actually no longer required (defaults are enough). You still need exim configuration counterparts:  /etc/exim4/conf.d/main/00_stalag13-config_0greylist and /etc/exim4/conf.d/acl/26_stalag13-config_check_rcpt.

Delisting an Exim4 server from Office365 ban list

Ever tried to get delisted from Office365 ban list, for whatever reason you might try to get (new IP for a server that was abused in the past or else, you won’t know since they wont tell – and it even looks like they probably dont even really know)?

It is a funny process, because it involves receive a mail from their servers, a mail that will probably be flagged as spam, with clues so big that it might be blocked at SMTP time.

With Exim4, you’ll probably get in the log something like:

2019-08-20 22:18:09 1i0Aa1-0004Hu-8h H=mail-eopbgr740042.outbound.protection.outlook.com (NAM01-BN3-obe.outbound.protection.outlook.com) [40.107.74.42] X=TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256 CV=no F=<no-reply@microsoft.com> rejected after DATA: maximum allowed line length is 998 octets, got 3172

Long story short (this length test is not welcomed by all users), add /etc/exim4/conf.d/main/00_localoptions add

IGNORE_SMTP_LINE_LENGTH_LIMIT=1

and then restart the server.

Try delisting and check your spam folder. You should get now the relevant mail. Whatever we think about the lenght limit test of Exim4 (based on RCF, isn’t it?), you still end up with a mail sent by Office365 like this:

X-Spam-Flag: YES
X-Spam-Level: *****
X-Spam-Status: Yes, score=5.2 required=3.4 tests=BASE64_LENGTH_79_INF,
	HTML_IMAGE_ONLY_08,HTML_MESSAGE,MIME_HTML_ONLY,MIME_HTML_ONLY_MULTI,
	MPART_ALT_DIFF,SPF_HELO_PASS,SPF_PASS autolearn=no autolearn_force=no
	version=3.4.2
X-Spam-Report: 
	* -0.0 SPF_PASS SPF: sender matches SPF record
	* -0.0 SPF_HELO_PASS SPF: HELO matches SPF record
	*  0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
	*  0.7 MPART_ALT_DIFF BODY: HTML and text parts are different
	*  0.0 HTML_MESSAGE BODY: HTML included in message
	*  1.8 HTML_IMAGE_ONLY_08 BODY: HTML: images with 400-800 bytes of
	*      words
	*  2.0 BASE64_LENGTH_79_INF BODY: base64 encoded email part uses line
	*      length greater than 79 characters
	*  0.0 MIME_HTML_ONLY_MULTI Multipart message only has text/html MIME
	*      parts

Considering the context, it screams incompetence.