Running Debian GNU with kFreeBSD

As you could have guess considering my latest update to my iPXE setup, I’m currently giving a try to Debian GNU along with FreeBSD kernel – Debian GNU/kFreeBSD.

The hardware I’m giving this try with is neither simple nor complicated: it’s old but it’s also laptop; a Dell Latitude C640 with a P4 mobile CPU and 1GB RAM.

The install was made over network. There’s nothing overly complicated but to avoid wasting time, it’s always good to properly RTFM. For instance, I learned too late that kFreeBSD does not handle / partition set on a logical one. I did not understood exactly how come, but I had to get my / partition on ufs (ext2 for /home was ok though). I did not even got into ZFS, as it looks like it’s not recommended with a simple i686 CPU. It took me a while and find no way to get my NFS4 partitions mounted as usual from /etc/fstab, or even with mount, I had to add a dirty call to /sbin/mount_nfs -o nfsv4 gate:/all /path in /etc/rc.local. And when it came to Xorg, I found the mouse to be sometimes working, sometimes not, plenty of overly complicated and confusing info on the web, to finally come up with a working /etc/X11/xorg.conf containing only  Section “ServerFlags”  Option “AutoAddDevices” “False” EndSection (on three lines).

These are some little inconveniencies that you would not expect with a recent GNU/Linux system install, that the debian-installer does not prevent you in any way to hit/create. I’m not even sure that I found the best fixes for them. It feels a bit like installing RedHat 5.2 🙂 with is more than what I actually expected.

So far I did not encountered any issue to get anything working but the suspend/sleep and general energy management looks much less reliable (with xfce4). On a side note, the fact that only OSS is available with kFreeBSD pushed me to update my wakey.pl script, I expect it to run on any BSD now.

Booting over the network to install the system (improved, with iPXE instead of PXE)

A few month ago, I provided my setup using pxelinux, isc-dhcpd and tftdp-hpa in order to make possible boot on lan. I improved this setup to chainload iPXE instead. I’m not interested in overwrite ROMs of ethernet devices I have so I still use PXE but only in order to get to iPXE that is way more powerful, as it allows direct access over http and much more.

The README in the my PXE directory explains the whole (quite short actually) install from scratch process. If you had the previous version running, note that the DHCPD configuration and update script changed (and the case of subdirectories changed too).

Modifying preinst and postinst scripts before installing a package with dpkg

Ever found yourself in the situation where you’d like to ignore or edit a postinst or preinst script of a Debian package?

As Debian froze Wheezy I decided it would be a good time for me to upgrade my home server, to help catching bugs and because it’s Sandy Bridge based not well supported regarding its sensors by Squeeze’s kernel. Unfortunately, I had weird stuff regarding EGLIBC, namely I had the 2.13 version installed from scratch, unknown to the dpkg database, while dpkg only knew about the cleanly installed 2.11. So the upgrade failed with:

A copy of the C library was found in an unexpected directory:
  '/lib/x86_64-linux-gnu/libc-2.13.so'
It is not safe to upgrade the C library in this situation;
please remove that copy of the C library or get it out of
'/lib/x86_64-linux-gnu' and try again.

dpkg : erreur de traitement de libc6_2.13-33_amd64.deb (--install) :
 le sous-processus nouveau script pre-installation a retourné une erreur de sortie d'état 1
Des erreurs ont été rencontrées pendant l'exécution :
 libc6_2.13-33_amd64.deb

Nasty. EGLIBC/GLIBC is a major piece of the system, you cannot simple “remove” it or “get it out” and expect the system to continue to work. Moreover, in this specific case, these files we’re not truly an issue: they were about to be replaced during the upgrade process. But dpkg does not provide any mean to ignore configure scripts (and will probably never do). So one easy workaround is to uncompressed, edit, rebuild and install the package as follows:

aptitude download libc6
dpkg-deb --extract libc6_2.13-33_amd64.deb libc
dpkg-deb --control libc6_2.13-33_amd64.deb libc/DEBIAN

Then we can edit libc/DEBIAN/preinst (I commented out the exit 1 after the safety warning)

dpkg-deb --build libc
dpkg -i libc.deb

Yes, it’s fast 🙂

Booting over the network to install the system

Do you still have CD/DVD players installed on your boxes? Well, I mostly don’t; why would I anyway?

Actually, apart from system installation or access to the rescue mode of the system installation, there’s nothing you cannot do without and nothing is not best to do without (nothing is slower and noisier on  nowadays computers). But that’s not even really true anymore, now most mainboards include an ethernet card capable of network booting even if hidden behind confusing names like NVDIA Boot Agent for instance.

Usually, it supports the Preboot Execution Environment (PXE) which combines DHCP and TFTP. That’s nice because it’s then easy with GNU/Linux to ran DHCP and TFTP servers. So here comes my PXE setup, using ISC DHCPD and TFTPD-HPA, both shipped by Debian.

As described in the README, on the server (you have a home server, right? *plonk*), put this PXE directory somewhere clever, like /srv/pxe for instance (yes, that’s what I did; but you can put it in /opt/my/too/long/path/i/cannot/remember if you really really want).

Run the gnulinux/update.sh script to get kernels and initrds. By default, it fetches debian and ubuntu stuff. If it went well, you should have several *-linux and *-initrd.gz files in gnulinux/ plus a generated config file named default inside pxelinux.cfg/
You may add a symlink to this script inside /etc/cron.monthly so you keep stuff up-to-date.

Then, you must install a “Trivial FTP Daemon” on you local server which will, in the context of PXE (Preboot Execution Environment), serve these files you just got:

apt-get install tftpd-hpa
update-rc.d tftpd-hpa defaults

Edit /etc/default/tftpd-hpa, especially TFTP_DIRECTORY setting (you know, /opt/my/what/the/…).

Finally, you must update your DHCP Daemon so it advertises we’re running PXE (filename and next-server options). With ISC dhcpd, in /etc/dhcp/dhcpd.conf, for my subnet, I have now:

subnet 192.168.1.0 netmask 255.255.255.0 {
  range 192.168.1.100 192.168.1.200;

  # PXE / boot on lan
  filename "pxelinux.0";
  next-server 192.168.1.1;
}

Obviously, you wont forget to do:

invoke-rc.d isc-dhcp-server restart
invoke-rc.d tftpd-hpa start

That’s all. Now on your client, go in the BIOS, look for “boot on lan” and whatever crap it may be called (it varies greatly), activate it. Then boot. It’ll do some DHCP magic to find the path to the PXE and the menu should be printed on your screen at some point.

We can actually do plenty of things with this simple stuff. We could, for instance, use it to boot diskless terminals on a specifically designed distro.

Package: amarok 2.4beta1 for Debian testing

Amarok is a nice music player for KDE. Inspired by iTunes interface, it features a clever random mode, provides Wikipedia/lyrics/photos pages for the currently played song, handles mtp devices and works with lastfm so you can have online stats of what you listen to and find people that listen to the same crap too. That’s definitely a nice software. But, unfortunately, it’s quite buggy.

This morning, Amarok would not start, no matter what. Well, it’s started once I erased .kde/share/config/amarok and .kde/share/apps/amarok. Then, shortly afterwards, it failed to start once more. I’m not quite frankly prepared to remove my Amarok config twice per day. So I decided I would just give a shot to a more recent version and the first beta for 2.4 around seemed a good pick.

Here’s amarok 2.4beta1 (2.3.90) packages for Debian testing amd64.

It was built with amarok Debian experimental directory (a new entry added in debian/changelog, usr/share/doc/kde/HTML/* removed from debian/amarok-common.install, usr/lib/strigi* removed from debian/amarok.install, usr/lib/kde4/*.so and usr/lib/*.so* added to debian/amarok.install, target override_dh_shlibdeps: added to debian/rules and all patches removed from debian/patches/series) and amarok official latest source package (renamed amarok_2.3.90.orig.tar.bz2) using the command dpkg-buildpackage -rfakeroot inside the amarok-2.3.90 directory (source tarball decompressed) containing also the debian directory.

Slaying Spams with both Bogofilter and SpamAssassin embedded in exim

Ads are spam. Good thing with the internet’s ads is that you can set up countermeasures.

(Disclaimer: yes, there is nothing new here, just an example of setup)

I have plenty of email addresses from different providers, some are definitely history. I could go through the websites of all of these and set up forwarding for the one I no longer use but still want to be able to get mail from, just in case. Well, I would do that if I was using my mail client to fetch mails – because otherwise fetching mails would actually take ages.

But, as I have a local home underclocked 🙂 server, I find way easier and potent to, instead, use ESR’s fetchmail to download them all to a single account that is accessed by my mail client through IMAPS. I have a /etc/fetchmailrc like:

poll pop.free.fr with proto POP3
user 'XXX' there with password 'XXX' is 'localuser' here
poll imap.gmail.com with proto IMAP
user 'XXX@gmail.com' there with password 'XXX' is 'localuser' here with ssl
user 'XXZ@gmail.com' there with password 'XXZ' is 'localuser' here with ssl

Fetchmail download mails than then relies on the installed SMTP, which is Exim, to deliver it to end user account mailbox accessible through IMAPS.

What’s so nifty nifty about? Well, mails will also be filtered for spam. As it happens on the local home server, it will be unnoticeable for the end user that is me. We’ll use several anti-spam tools, not caring about redundancy and time-consumption: DNSBLs, Bogofilter, SpamAssassin, razor2.

So, here we go. Note that Exim (exim4) in Debian use the user Debian-exim. localuser is the recipient end-user, it belongs to the group localuser name after himself.
We will add Debian-user to the group localuser and create a system group dedicated to spamchecking to easily share bayesian databases.:

# addgroup --system spamslayer
# adduser Debian-exim spamslayer
# adduser Debian-exim localuser
# adduser localuser spamslayer

* Bogofilter is a bayesian spam filter . It is said to be faster and lesser time consuming than the SpamAssassin’s own bayesian filter so will run mails through it first. It is installed with the debian package.

Edit /etc/bogofilter.cf as follows:

bogofilter_dir=/var/lib/bogofilter
db_transaction=yes

The bayes directory must be created by hand:

# mkdir /var/lib/bogofilter
# chgrp spamslayer /var/lib/bogofilter
# chmod 2777 /var/lib/bogofilter

* SpamAssassin is a powerful, at the cost of time-consumption, spam-killer. It is installed with the debian package.

In the following site-wide config /etc/spamassassin/local.cf, I use bayesian filters, razor2, several DNSBLs and I adjust some tests according to my needs:

# Save spam messages as a message/rfc822 MIME attachment instead of
# modifying the original message (0: off, 2: use text/plain instead)
#
# Keep as it is because bogofilter would not learn properly otherwise,
# as it cannot distinguish report from the spam.
report_safe 0
# Set which networks or hosts are considered 'trusted' by your mail
# server (i.e. not spammers)
#
trusted_networks 192.168.1.
# Locales
#
# (I only receive mails in English or French)
ok_locales en fr
# Set the threshold at which a message is considered spam (default: 5.0)
#
required_score 3.3
# Use Bayesian classifier (default: 1)
#
# (I created the relevant directory)
use_bayes 1
bayes_file_mode 0777
bayes_path /var/lib/spamassassin-bayes/bayes
score BAYES_20 0.3
score BAYES_40 0.5
score BAYES_50 0.8
score BAYES_60 1
score BAYES_80 2
score BAYES_95 2.5
score BAYES_99 6
# Bayesian classifier auto-learning (default: 1)
#
# (I may change that, not sure about it)
bayes_auto_learn 1
# Set headers which may provide inappropriate cues to the Bayesian
# classifier
#
bayes_ignore_header X-Bogosity
bayes_ignore_header X-Spam-Flag
bayes_ignore_header X-Spam-Status
# use razor
# (/etc/razor is the standard debian path)
use_razor2 1
razor_config /etc/razor/razor-agent.conf
score RAZOR2_CF_RANGE_51_100 3.2
# some rbl checks are already made by exim, at RCPT time, not all.
skip_rbl_checks 0
rbl_timeout 30
score RCVD_IN_SBL 15
score RCVD_IN_XBL 15
score RCVD_IN_SORBS_HTTP 15
score RCVD_IN_SORBS_SOCKS 15
score RCVD_IN_SORBS_MISC 15
score RCVD_IN_SORBS_SMTP 15
score RCVD_IN_SORBS_ZOMBIE 15
# adjust some tests scores: lower DUL test
score FROM_ENDS_IN_NUMS 0.2
score FROM_HAS_MIXED_NUMS 0.2
score FROM_HAS_MIXED_NUMS3 0.2
score RCVD_IN_NJABL_DUL 0.1
score RCVD_IN_SORBS_DUL 0.1
# lower stupid test
score DNS_FROM_SECURITYSAGE 0.0
# adjust some tests scores
score FAKE_HELO_HOTMAIL 3
score FORGED_HOTMAIL_RCVD 3
score HTML_FONT_BIG 2.4
score NO_REAL_NAME 2
score RCVD_IN_BL_SPAMCOP_NET 3
score SUBJ_ILLEGAL_CHARS 4.8
score EXTRA_MPART_TYPE 2.8
score SUBJ_ALL_CAPS 2.6
# increase all scores related to drugs: what do I care, duh
score DRUGS_ANXIETY 5
score DRUGS_ANXIETY_EREC 5
score DRUGS_ANXIETY_OBFU 5
score DRUGS_DIET 5
score DRUGS_DIET_OBFU 5
score DRUGS_ERECTILE 5
score DRUGS_ERECTILE_OBFU 5
score DRUGS_MANYKINDS 10
score DRUGS_MUSCLE 5
score DRUGS_PAIN 5
score DRUGS_PAIN_OBFU 5
score DRUGS_SLEEP 5
score DRUGS_SLEEP_EREC 5
score DRUGS_SMEAR1 5
# same goes for porn
score AMATEUR_PORN 5
score BEST_PORN 5
score DISGUISE_PORN 5
score DISGUISE_PORN_MUNDANE 5
score FREE_PORN 5
score HARDCORE_PORN 5
score LIVE_PORN 5
score PORN_15 5
score PORN_16 5
score PORN_URL_MISC 5
score PORN_URL_SEX 5
score PORN_URL_SLUT 5

The bayes directory must be created:

# mkdir /var/lib/spamassassin-bayes
# chown Debian-exim /var/lib/spamassassin-bayes
# chmod 0777 /var/lib/spamassassin-bayes

Obviously, it implies that razor2 must be properly installed. We install the debian package then set it up. Remember it must run with user Debian-exim, so we do:

# chown -R Debian-exim:spamslayer /etc/razor
# su Debian-exim
$ razor-admin -home=/etc/razor -register
$ razor-admin -home=/etc/razor -create
$ razor-admin -home=/etc/razor -discover

To save ressources, we start SpamAssassin as a daemon (spamd), that will be called using its specific client (spamc). Before using the initd script, edit as follows /etc/defaut/spamassassin:

# Change to one to enable spamd
ENABLED=1
# SpamAssassin uses a preforking model, so be careful! You need to
# make sure --max-children is not set to anything higher than 5,
# unless you know what you're doing.
OPTIONS="--create-prefs --max-children 5 --helper-home-dir -u Debian-exim -g spamslayer"
# Cronjob
# Set to anything but 0 to enable the cron job to automatically update
# spamassassin's rules on a nightly basis
CRON=1

All that being do, you’ll want to (re)start the daemon with the relevant initd script (/etc/init.d/spamassassin restart here).

* Now we’ll tune Exim to call all by himself first Bogofilter and then SpamAssassin, if necessary only. We use splitted configuration in /etc/exim4/conf.d/. That is debian-specific I think but it does make any difference anyway.

First we define useful transports in /etc/exim4/conf.d/transport/35_spamblock (the name 35_spamblock is arbitrary and the number does not matter here):

spamslay_bogofilter:
driver = pipe
command = /usr/sbin/exim4 -oMr spamslayed-bogofilter -bS
use_bsmtp = true
transport_filter = /usr/bin/bogofilter -l -p -e
home_directory = "/tmp"
current_directory = "/tmp"
# must use a privileged user to set $received_protocol
# on the way back in!
user = Debian-exim
group = spamslayer
log_output = true
return_fail_output = true
return_path_add = false
message_prefix =
message_suffix =
#
spamslay_spamd:
driver = pipe
command = /usr/sbin/exim4 -oMr spamslayed-spamd -bS
use_bsmtp = true
transport_filter = /usr/bin/spamc
home_directory = "/tmp"
current_directory = "/tmp"
# must use a privileged user to set $received_protocol
# on the way back in!
user = Debian-exim
group = spamslayer
log_output = true
return_fail_output = true
return_path_add = false
message_prefix =
message_suffix =

Second we define routers, here in /etc/exim4/conf.d/router/450_spamblock – the order matters, here it is just after 400_exim4-config_system_aliases and before 500_exim4-config_hubuser:

# spam checking
# first bogofilter
spamslay_router_bogofilter:
debug_print = "R: bogofilter for $local_part@$domain received with protocol $received_protocol with X-Spam-Flag=$h_X-Spam-Flag and X-Bogosity=$h_X-Bogosity"
# When to scan a message :
# - it isn't already flagged as spam
# - it has not yet been spamslayed at all
# - it isn't local ($received_protocol eq "" or local)
condition = "${if and{ {!eqi{$h_X-Spam-Flag:}{yes}} {!eq{$received_protocol}{spamslayed-bogofilter}} {!eq{$received_protocol}{spamslayed-spamd}} {!eq{$received_protocol}{local}} {!eq{$received_protocol}{}} }}"
driver = accept
transport = spamslay_bogofilter
#
# second spamd
spamslay_router_spamd:
debug_print = "R: spamd for $local_part@$domain received with protocol $received_protocol with X-Spam-Flag=$h_X-Spam-Flag and X-Bogosity=$h_X-Bogosity"
# When to scan a message :
# - it isn't already flagged as spam
# - it has not yet been spamslayed with SA
# - it isn't local ($received_protocol eq "" or local)
condition = "${if and { {!eqi{$h_X-Spam-Flag:}{yes}} {!match{$h_X-Bogosity:}{^Spam}} {!eq {$received_protocol}{spamslayed-spamd}} {!eq{$received_protocol}{local}} {!eq{$received_protocol}{}} }}"
driver = accept
transport = spamslay_spamd
#
# This route will send any mail that got here to the devnull alias, that
# should be configured in /etc/aliases to be a real link to /dev/null.
# This route should get only mails that have spam score higher than 14.
# This will affect users mails!
spamslay_killit:
condition = "${if ge{$h_X-Spam-Level:}{\*\*\*\*\*\*\*\*\*\*\*\*\*\*} {1}{0} }"
driver = redirect
data = spam
file_transport = address_file
pipe_transport = address_pipe

* Next step, now that spams are flagged, it makes sense to put them apart in the Maildir that will be accessed through IMAPS. I do this with procmail. We set umask for procmail (the IMAP server is configured as such too) to make sure Debian-exim can access stored mails (we want mode 0640, group read access, so the umask is 666-640=026). Here’s the relevant bit of /home/localuser/.procmailrc:

UMASK=026
IMAPDIR=$HOME/.Maildir/
SPAM=$IMAPDIR".Poubelle.Spam/"
#
:0
* ^X-Spam-Status: Yes
$SPAM
:0
* ^X-Spam-Flag: YES
$SPAM
#
:0
* ^X-Bogosity: Spam
$SPAM

At the same time, we make sure Debian-exim can access mails already there (so not affected by umask):

# cd /home/localuser
# chmod 750 -Rv .Maildir
# chmod 0640 -v `find .Maildir -type f`

(PS: you may want to enforce a more restrictive policy, depending on how your server is accessed – but, anyway, Debian-exim is by essence able to tamper with mails you receive, so it won’t make a big difference)

* Training bayesian filters.

Now that spam ended up in a specific Maildir, both SpamAssassin and Bogofilter bayesians filters must be trained to be effective.

We add the following in /etc/cron.d/bayes:

# trains bayesian filters
SPAMDIR="/home/localuser/.Maildir/.Poubelle.Spam/cur/ /home/localuser/.Maildir/.Poubelle.Spam/new/"
#
# spamd: can handle by itself bogofiltered headers
25 * * * * Debian-exim /usr/bin/sa-learn --spam $SPAMDIR
#
# bogofilter: not able to clean inappropriate cues from spamd, will do it
# by removing:
# - informational SpamAssassin headers
# - SpamAssassin score and decision (irrelevant)
# (-u was not set as it is discouraged perf-wise in bogofilter's manual)
28 * * * * Debian-exim for file in `find $SPAMDIR -type f`; do cat $file; done | grep -v -E "^X-Spam-(Checker|Flag|Level|Report)" | sed s/"^X-Spam-Status.*score.*required.*tests="//g | /usr/bin/bogofilter --register-spam

Obviously, if you want it to learn from plenty of different users, you’ll have to think of something more elaborated 🙂
Anyway, regarding plenty of users, it would actually probably wise to think twice about the whole concept of sharing bayesian filters that may not at all be accurate for very differents users.

One alternative would have been to avoid meddling with Exim and to run both bogofilter and spamd via procmail. Sure, it would not have been site-wide setup but for a few users, ~/.procmailrc can be replicated easily. But actually I enjoy messing with Exim, that’s kind of a hobby. I skipped here the part where we call DNSBLs in Exim (working out-of-the-box anyway). And on a production server, with the SMTP wide opened to the web, it is possible to follow this approach just to shut off spammers at SMTP-time -which induces a huge resources gain- and even ban them.