Sharing graphs of multiple Munin (master) instances

Munin is a convenient monitoring tool. Even if it gets old, it is easy to set up and agrement with custom scripts.

It works with the notion of having a master munin process that will grab data from nodes (a device within the network), store it in Round-robin databases (RRD) and process the data  to generate static images and HTML pages. These sequences are split in several scripts: munin-update, munin-limits, munin-graph, munin-html.

It’s fine -overkill?- for a small local network, despite the fact RRD is a bit I/O consuming to the point it may be require to use a caching daemon like rrdcached.

It’s a different story if you want to monitor several small networks that are connected through the internet at once. Why would you? First because it might be convenient to get graphs from different networks side by side. Also because if one network disappear from the internet, data from munin might actually be meaningful, provided you can still access it.

muninex

Problem is munin updates are synchronous: any disconnect between the two would cause the data to be inconsistent. It leads  to many issues that munin-async can help with. But even though you might be able to use munin-async, one of your servers will lack a munin master: the setup will works only when both are up.

So I’m actually much more interested in having a master munin process, for each network.

How to achieve that? It is not an option to share RRD via NFS over the web. I’m also not fan of the notion of having both master munin process read through all RRD and generate graphs in parallel, re-generating exactly the same data with no value added.

I went for an alternative approach with a modified version of the munin-mergedb.pl script. We do not merge RRD trees. We simply synchronize the db files to merge and the generated graphs. So if there are graphs from another munin master process to include in the HTML output, they’ll be there. But munin master process will go undisturbed by any other process unavailability and wont have more RRD to process, more graphs to produce.

Graphs and db files replication:

On both (master munin process) hosts, you need an user dedicated to replication: here.

adduser SYNCUSER munin

This user need ssh access from one host to the other (private/public key sharing, whatever).

Directories setup:

mkdir -p /var/lib/munin-mergedb/
chown munin:munin -R /var/lib/munin-mergedb/
# the +s is very important so directory group ownership is preserved
chmod g+rws -R /var/lib/munin-mergedb/
chmod g+rws /var/lib/munin/
chmod g+rws -R /var/www/html/munin/

On one host (the one allowed to connect through ssh), synchronized two way with unison HTML files:

su - SYNCUSER --shell=/bin/bash

DISTANT_HOST=DISTANTHOST
DISTANT_PORT=22
LOCAL_HTML=/var/www/html/munin/DOMAIN
DISTANT_HTML=/var/www/html/munin/DOMAIN

LOCAL_DB=/var/lib/munin
DISTANT_LOCAL_DB=/var/lib/munin-mergedb/THISHOST
LOCAL_DISTANT_DB=/var/lib/munin-mergedb/DISTANTHOST


# step one, get directories
unison -batch -auto -ignore="Name *.html" -ignore="Name *.png" "$LOCAL_HTML" "ssh://$DISTANT_HOST:$DISTANT_PORT/$DISTANT_HTML"
# step two, get directories img content 
cd "$LOCAL_HTML" && for DIR in *; do [ -d "$DIR" ] && unison -batch -auto -ignore="Name *.html" "$LOCAL_HTML/$DIR" "ssh://$DISTANT_HOST:$DISTANT_PORT/$DISTANT_HTML/$DIR"; done

On one host (the same), synchronized one way with rsync database files:

LOCAL_DB=/var/lib/munin
DISTANT_LOCAL_DB=/var/lib/munin-mergedb/THISHOST
LOCAL_DISTANT_DB=/var/lib/munin-mergedb/DISTANTHOST

# push our db (one way action, easier with rsync)
rsync -a --include='datafile*' --include='limits*' --exclude='*' -e "ssh -p $DISTANT_PORT" "$LOCAL_DB/" "$DISTANT_HOST:$DISTANT_LOCAL_DB/"
# get theirs (one way action, easier with rsync)
rsync -a --include='datafile*' --include='limits*' --exclude='*' -e "ssh -p $DISTANT_PORT" "$DISTANT_HOST:$LOCAL_DB/" "$LOCAL_DISTANT_DB/"

If it works fine, set up /etc/cron.d/munin-sync:

# supposed to assist munin-mergedb.pl

DISTANT_HOST=DISTANTHOST
DISTANT_PORT=22

LOCAL_HTML=/var/www/html/munin/DOMAIN
DISTANT_HTML=/var/www/html/munin/DOMAIN

LOCAL_DB=/var/lib/munin
DISTANT_LOCAL_DB=/var/lib/munin-mergedb/THISHOST
LOCAL_DISTANT_DB=/var/lib/munin-mergedb/DISTANTHOST

# m h dom mon dow user command
# every 5 hour update dir list
01 */5 * * *  SYNCUSER unison -batch -auto -silent -log=false -ignore="Name *.html" -ignore="Name *.png" "$LOCAL_HTML/$DIR" "ssh://$DISTANT_HOST:$DISTANT_PORT/$DISTANT_HTML/$DIR" 2>/dev/null

#  update content twice per hour
*/28 * * * *  SYNCUSER cd "$LOCAL_HTML" && for DIR in *; do [ -d "$DIR" ] && unison -batch -auto -silent -log=false -ignore="Name *.html" "$LOCAL_HTML/$DIR" "ssh://$DISTANT_HOST:$DISTANT_PORT/$DISTANT_HTML/$DIR" 2>/dev/null; done && rsync -a --include='datafile*' --include='limits*' --exclude='*' -e "ssh -p $DISTANT_PORT" "$LOCAL_DB/" "$DISTANT_HOST:$DISTANT_LOCAL_DB/" 2>/dev/null && rsync -a --include='datafile*' --include='limits*' --exclude='*' -e "ssh -p $DISTANT_PORT" "$DISTANT_HOST:$LOCAL_DB/" "$LOCAL_DISTANT_DB/"2>/dev/null

Updated scripts:

Once data there, you will need munin-mergedb script to handle them, use a munin-cron script like my munin-cron-plus.pl instead of munin-cron so it actually calls munin-mergedb.pl. Plus you’ll need a fixed version of munin-graph so –host arguments are not blattlanly ignored (lacking RRD, it would fail to actually write graph for distant munin master process, but it would nonetheless delete existing graphs).

(Where these files go depends on your munin installation packaging. I have the munin processes in /usr/local/share/munin  and munin-cron-plus.pl in /usr/local/bin – it reflects the fact that original similar files are either in /usr/share/munin or /usr/bin. Beware, if you change the name of any munin process, update log rotation files otherwise you may easily fill up a disk drive, since it is kind of noisy especially when issues arise)

As conveniency, you can download these with my -utils-munin debian/devuan packages:

wget apt.rien.pl/stalag13-keyring.deb
dpkg -i apt.rien.pl/stalag13-keyring.deb
apt-get update
apt-get install stalag13-utils-munin

Once everything set up, you can test/debug it by typing:

su - munin --shell=/bin/bash

/usr/local/bin/munin-cron-plus.pl

What next?

Actually I’d welcome improvements munin-cron-plus.pl since it extract –host information in the most barbaric way. I am sure it can be done cleanly using Munin::Master::Config/else.

Then I’d welcome any insight about why munin-graph’s –host option does not works the way I’d like it. Maybe I misunderstand it’s exact purpose. The help reads:

 --host  Limit graphed hosts to . Multiple --host options
               may be supplied.

To me, it really means that it should not do anything at all to any files of hosts excluded this way. If it meant something else, maybe this should be explained.

Removing car’s error messages with an ELM327 device and AndrOBD

Removing car’s error message: am I insane? Well, indeed, in a perfect world where no faulty design exist, I would be. Fixing an error message, that would really mean fixing not even a symptom but a warning and that can only be wrong.

But in the world of french automobile, it is not so (I cannot tell for expensive german or asian cars, I don’t own any). Namely, with Peugeot-Citroën HDI (and strangely not so much with similar Fiat’s JTD and Ford’s TDCi), you easily end up with the infamous Anti Pollution Fault error code after firing the engine. Sometimes it really means something is very wrong, often it only means that a probe is faulty. Sometimes some car shop do not replace/fix the probe but just reset it, so the problem stops only for time. And later it would pop-up and cause the engine to work in degraded mode, stuck to less than 2500 RPM or so – not great. On my HDI-based car, the mechanic decided to completely deactivate the probe, faulty when the car was only a few years old and with less than 50000km, considering it is not worth being changed to a new one that may die early as the original part anyway. Since then, the engine works nicely but on startup there is this Anti Pollution Fault error code that stays on. Not really dramatic but it causes you to pay actually less attention to any error message.

So all modern cars are electronics or even computer-based. But it is unlikely that you’ll manage to access to any code running. For your security they might say. Convenient to fake gaz emission tests, nonetheless.

Still, these days, you can get for cheap some OBD-II devices, OBD standing for on-board diagnostics. It is quite limited in scope and a capabilities, still, it can be used to set off error codes.

I tested a few (libre) software and cheap hardware. What worked for me (Peugeot car with HDI engine) is a bluetooth ELM327  (10 €) device along with AndrOBD (available through F-Droid). It provides data seemingly accurate and reset error code actually works (when the contact is on but engine is off).

I also tried an WiFi ELM327 device, the dedicated software failed to connect or was not providing any usable info. I’d be interested in any other option (for instance with a GNU/Linux laptop instead of F-Droid phone).

 

Fixing black screen during boot caused by LVDS-panel presence assumption by GMA 3650 drivers

On a Intel DN2800MT-based system, so having Graphics Media Accelerator 3650 integrated processor graphic card, your screen turn to black/off during the boot process, exactly starting when the system switch to framebuffer if you connect a VGA screen (no problem so far with HDMI).

Passing nomodeset or any similar option is of no help.

You cannot invent it, apparently GMA 3600 kernel DRM driver always assumes there is a LVDS panel, as it would on laptop but probably not on home servers, and defaults to a 1920×1080 panel.

So you need to add to the grub kernel line:

video=LVDS-1:d

Or, in /etc/default/grub :

GRUB_CMDLINE_LINUX_DEFAULT="quiet video=LVDS-1:d"

And run update-grub afterwards.

Using same soundcard among users with PulseAudio not in system mode

Sound on GNU/Linux never have been convenient. Right now, de facto standard is PulseAudio: yeah, made by the same people that does this nightmare of systemd. When it works it is better than just ALSA  (Advanced Linux Sound Architecture). When it doesn’t, you’re in for a headache.

Anyway, I had this situation where I wanted user whatever to be able to use the soundcard. But the soundcard was blocked and reserved by PulseAudio started by my regular user account.

First option is to make PulseAudio work as a system daemon. UNIX-style option. Quite obviously, that would be too easy to implement for these systemd people. So they implemented the option altogether advising not to use it. I did not care about the advice, though, so I tried. And then I understood why, while advising not to use it, they said they would not be accountable for problems using it. Because it is utter trash, unreliable, giving out error endless messages and, in the end, not working at all.

 So the system mode is a no-go, in the short run and definitely not in the long run either.

Alternate option is to open PulseAudio through the loopback network device. To do so, in /etc/pulse/default.pa add the TCP module with 127.0.0.1:

load-module module-native-protocol-tcp auth-ip-acl=127.0.0.1

Obviously, by default tcpwrapper will refuse access, so you have also to add the relevant counterpart in /etc/hosts.allow :

pulseaudio-native: 127.0.0.1

From now on, after restarting PulseAudio, you should be able to access it through any user (in audio group).

Update: some comments on reddit made me think there has been a misunderstanding on the scope of this post. It is not to describe inner workings of audio on common GNU/Linux systems with PulseAudio. The following does and almost perfectly explain why I did not bother get specific on the topic :

1000px-Pulseaudio-diagram.svg.png

Improving Qualys SSL server test results regarding weak Diffie-Hellman and Logjam attack

Followup on earlier Improving Qualys SSL server test results regarding Poodle attack and SHA1, the following should secure servers I use (openssh/nginx/exim/dovecot on Debian/Devuan) against Logjam attack on TLS protocol tied to weak Diffie-Hellman.

OpenSSH shell server

Run the following and look for the line KEX algorithms. It is fine unless diffie-hellman-group1-sha1 shows up.

ssh -vvv serverhostname

Debian default is ok.

Nginx HTTPs server

cd /etc/ssl
openssl dhparam -out dhparams.pem 2048

Edit /etc/nginx/nginx.conf:

##
# SSL Settings
##

ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # Dropping SSLv3, ref: POODLE
ssl_ciphers 'EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA';
ssl_prefer_server_ciphers on;
ssl_dhparam /etc/ssl/dhparams.pem;

Restart:

invoke-rc.d nginx restart

Dovecot IMAPs server

Edit /etc/dovecot/conf.d/10-ssl.conf:

# How often to regenerate the SSL parameters file. Generation is quite CPU      
# intensive operation. The value is in hours, 0 disables regeneration           
# entirely.                                                                     
ssl_parameters_regenerate = 168h
ssl_dh_parameters_length = 2048

# SSL protocols to use                                                          
ssl_protocols = !SSLv2 !SSLv3

# SSL ciphers to use                                                            
ssl_cipher_list = ALL:!LOW:!SSLv2:!EXP:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA

Restart:

invoke-rc.d dovecot restart

Exim SMTPs server

In my case, the relevant file /etc/exim4/conf.d/main/00_stalag13-config_0ssl is managed by my -exim package.

# deactivate outdated SSLv3 (compiled with TLS)
# deactivate weak diffie-hellman
tls_require_ciphers = NORMAL:!DHE-RSA:!DHE-DSS:!DHE-PSK:!ANON-DH:!MD5:!SRP:!PSK:!VERS-SSL3.0

Restart:

invoke-rc.d exim restart

This should gives a A rating as of today, except regarding possibly self-signed certificate. There is a convenient tool to check Logjam vulnerability at keycdn.com.

Using a fast and reliable, still not obsolete, desktop environment with Fluxbox

Something like 6 years ago, I already described desktop environment I used over years. I was, since then, decently satisfied by KDE/Plasma/however you name it. Mostly because kmail works properly with IMAPS and handle CardDav-cloud server out of the box, because Dolphin is the best file browser (immediate filter and group files and directories by day are the feature I enjoy most and none other file browser I know got right) and the rest (akregator, korganizer with CalDav) ok.

But systemd arrived. I was curious at first and encouraged, on this very blog, people to give it a try. But soon  enough I found out I was faring better without, reaching the conclusion that “Point is with systemd, I’m able to do less and it takes me more time”. I moved away from systemd and, then, as consequence, of Debian. And I found out that KDE was increasingly getting dependant on systemd, with bug reports about regression in their software being closed advising to use systemd.  It led me, in 2015, to ask on /. Will You Be Able To Run a Modern Desktop Environment In 2016 Without Systemd? as follows:

Early this year, David Edmundson from KDE, concluded that “In many cases [systemd] allows us to throw away large amounts of code whilst at the same time providing a better user experience. Adding it [systemd] as an optional extra defeats the main benefit“. A perfectly sensible explanation. But, then, one might wonder to which point KDE would remain usable without systemd?

Recently, on one Devuan box, I noticed that KDE power management (Powerdevil) no longer supported suspend and hibernate. Since pm-utils was still there, for a while, I resorted to call pm-suspend directly, hoping it would get fixed at some point. But it did not. So I wrote a report myself. I was not expecting much. But neither was I expecting it to be immediately marked as RESOLVED and DOWNSTREAM, with a comment accusing the “Debian fork” I’m using to “ripe out” systemd without “coming with any of the supported solutions Plasma provides“. I searched beforehand about the issue so I knew that the problem also occurred on some other Debian-based systems and that the bug seemed entirely tied to upower, an upstream software used by Powerdevil. So if anything, at least this bug should have been marked as UPSTREAM.

While no one dares (yet) to claim to write software only for systemd based operating system, it is obvious that it is now getting quite hard to get support otherwise. At the same time, bricks that worked for years without now just get ruined, since, as pointed out by Edmunson, adding systemd as “optional extra defeats its main benefit”. So, is it likely that we’ll still have in 2016 a modern desktop environment, without recent regressions, running without systemd?

I replied once to comments in this article, for instance (not that l like to quote myself, but I’d rather avoid repeating myself):

Yeah and no. As pointed out in the article, the culprit is upower. But upower is mandatory for KDE power management. So it does not really matter whether it is Powerdevil that requires systemd or upower. ConsoleKit2 recently gained support? Was ConsoleKit2 actually been packaged? Does upower supporting ConsoleKit2 been packaged? If not, user experience wise, that is not palatable. And moreover, what to expect from upower? Did they not purposefully removed pm-utils support, that worked until then, in favor of systemd? Why removing support for a working solution (pm-utils) and, later, much later, adding support for some ConsoleKit2? What is the exact plan of ConsoleKit2? Providing some systemd-like interface without being systemd? Is that what ConsoleKit2 offers that pm-utils could not? If so, wow long will it work, to attempt to write a parallel to systemd, in order to make sure that all the software that in the past worked without systemd can now work with the systemd alternative? Just as a reminder, ConsoleKit2 exists “because there isnâ(TM)t currently a standard for system actions like suspend/hibernate anymore. We use these features in Xfce and it would be nice to keep the session manager and power manager in sync (i.e. you inhibit something and the session manager doesnâ(TM)t see it). Obviously thereâ(TM)s systembsd in the works, so this is a stop gap until that matures (however long that may be). But Iâ(TM)ll happily continue to maintain and support ConsoleKit2 as long as someone finds it useful”. https://erickoegel.wordpress.c… [wordpress.com] The acknowledged benefit of systemd, as pointed out by Edmunson (link in the article) was to drop code. If ConsoleKit2 and al needs to write code to compensate from all the dropped code, following systemd, that unlikely sustainable. The stop gap project won’t do. And it is really the funny thing now with systemd: if you dont want it, you need to write everything that it does because all the anterior/historical parts, good or bad, are getting deprecated and removed. So in order not to use systemd, you need to clone it. Bonkers. Hence the question: will KDE be still usable in 2016 without systemd.

Since then, I noticed a few other small issues which I did not bother to report: the answer would have been the same. So it is near one year later after my question asked on ./ and the answer is grim. More KDE parts got broken for me (sound, etc).

So I resorted to old answers, tested previously using desktop. I found Fluxbox to be the easiest to set up in a way that suits my needs.

Along with fluxbox, you need tint2 and xcompmgr, all properly packaged in Devuan. Then it is just a matter of editing files in ~/.fluxbox (after starting it once):

~/.fluxbox/startup :

#!/bin/sh
#
# fluxbox startup-script:
#
# Lines starting with a '#' are ignored.

# background image
fbsetbg ~/.fluxbox/backgrounds/selje.png &
# modern panel
tint2 &
# desktop transparency
xcompmgr -c &
# sysinfo panel
conky &
# required for dolphin to show up cleanly
export XDG_CURRENT_DESKTOP=kde
# desktop pager
fbpager -w &
# XMPP client
pidgin &
# cloud sync client
owncloud &
# gpg/ssh agents
eval "$(gpg-agent --daemon)" &
eval "$(ssh-agent)" &
# screen temperature
redshift &

[...]

~/.fluxbox/keys :

Control Mod1 A :Exec urxvtc
Control Mod1 I :Exec firefox
Control Mod1 M :Exec kmail
Control Mod1 E :Exec emacs
Control Mod1 D :Exec XDG_CURRENT_DESKTOP=kde dolphin

[...]

# if these don't work, use xev to find out your real keycodes
XF86AudioRaiseVolume :Exec amixer sset Master,0 2%+
XF86AudioLowerVolume :Exec amixer sset Master,0 2%-
XF86AudioMute :Exec amixer sset Master,0 toggle
#XF86AudioPlay
XF86AudioPrev :Exec /usr/local/bin/switch-sound
XF86AudioNext :Exec /usr/local/bin/switch-redshift

[...]

# sleep fluxbox CTRL-ALT pause
Control Mod1 127 :Exec sudo hibernate-ram

[...]

~/.fluxbox/init (just to set of fluxbox toolbar since we use tint2 instead) :

session.screen0.toolbar.visible: false

~/.conkyrc (need to be edited, for instance eth device names, etc):

conky.config = {
 alignment = 'bottom_left',
 background = yes,
 border_width = 1,
 cpu_avg_samples = 2,
 default_color = 'white',
 default_outline_color = 'white',
 default_shade_color = 'white',
 draw_borders = false,
 draw_graph_borders = true,
 draw_outline = false,
 draw_shades = false,
 double_buffer = yes,
 use_xft = true,
 font = 'Oxygen Mono:size=10',
 gap_x = 25,
 gap_y = 25,
 minimum_height = 5,
 minimum_width = 5,
 net_avg_samples = 2,
 no_buffers = true,
 out_to_console = false,
 out_to_stderr = false,
 extra_newline = false,
 own_window = true,
 own_window_class = 'Conky',
 own_window_type = 'override',
 own_window_colour = '#3d3d3d',
 stippled_borders = 0,
 update_interval = 2.5,
 uppercase = false,
 use_spacer = 'none',
 show_graph_scale = false,
 show_graph_range = false
}

conky.text = [[
# in red if sound off
${if_match "[on]" == "${exec amixer get Master | egrep -o '\[on\]' | tail -1}"}$
{color #4d4d4d}${else}${color Dark Salmon}${endif}
# assume left/right channels have same volume level
${execbar amixer get Master | egrep -o '[0-9]+%'| sed s/\%// | tail -1}
############
#${color #4d4d4d}$hr
${color grey}↑${color #4d4d4d}${upspeedgraph eth2 25,140} ${color grey}↓${color 
#4d4d4d}${downspeedgraph eth2 25,140}
###########
#${color #4d4d4d}$hr
#${color grey}CPU: ${color white}${i2c isa-0228 temp 2}°C$color - MB: ${color wh
ite}${i2c 9191-0290 temp 1}°C
###########
#${color #4d4d4d}$hr
${color grey}Name PID CPU% MEM%
${color lightgrey} ${top name 1} ${top pid 1} ${top cpu 1} ${top mem 1}
${color lightgrey} ${top name 2} ${top pid 2} ${top cpu 2} ${top mem 2}
${color lightgrey} ${top name 3} ${top pid 3} ${top cpu 3} ${top mem 3}
${color lightgrey} ${top name 4} ${top pid 4} ${top cpu 4} ${top mem 4}
${color lightgrey} ${top name 5} ${top pid 5} ${top cpu 5} ${top mem 5}
${color lightgrey} ${top name 6} ${top pid 6} ${top cpu 6} ${top mem 6}
${color lightgrey} ${top name 7} ${top pid 7} ${top cpu 7} ${top mem 7}
]]

shot.png

With this setup, the only thing I actually miss is a icon-tasklist.

Avoiding GPG issues while submitting to popularity-contest on Devuan

For some reason, on Devuan, popularity-contest submits fails with:

gpg: 4383FF7B81EEE66F: skipped: public key not found
gpg: /var/log/popularity-contest.new: encryption failed: public key not found

The Debian Popularity Contest being described as an attempt to map the usage of Debian packages, I think useful that it also get stats from disgrunted Debian users forced to use a fork of the same general scope.

I do not think it data transmitted in this context is really sensitive. So the simplest hack is just to set off encryption by adding to /etc/popularity-contest.conf:

ENCRYPT="no"