Sharing graphs of multiple Munin (master) instances

Munin is a convenient monitoring tool. Even if it gets old, it is easy to set up and agrement with custom scripts.

It works with the notion of having a master munin process that will grab data from nodes (a device within the network), store it in Round-robin databases (RRD) and process the data  to generate static images and HTML pages. These sequences are split in several scripts: munin-update, munin-limits, munin-graph, munin-html.

It’s fine -overkill?- for a small local network, despite the fact RRD is a bit I/O consuming to the point it may be require to use a caching daemon like rrdcached.

It’s a different story if you want to monitor several small networks that are connected through the internet at once. Why would you? First because it might be convenient to get graphs from different networks side by side. Also because if one network disappear from the internet, data from munin might actually be meaningful, provided you can still access it.

muninex

Problem is munin updates are synchronous: any disconnect between the two would cause the data to be inconsistent. It leads  to many issues that munin-async can help with. But even though you might be able to use munin-async, one of your servers will lack a munin master: the setup will works only when both are up.

So I’m actually much more interested in having a master munin process, for each network.

How to achieve that? It is not an option to share RRD via NFS over the web. I’m also not fan of the notion of having both master munin process read through all RRD and generate graphs in parallel, re-generating exactly the same data with no value added.

I went for an alternative approach with a modified version of the munin-mergedb.pl script. We do not merge RRD trees. We simply synchronize the db files to merge and the generated graphs. So if there are graphs from another munin master process to include in the HTML output, they’ll be there. But munin master process will go undisturbed by any other process unavailability and wont have more RRD to process, more graphs to produce.

Graphs and db files replication:

On both (master munin process) hosts, you need an user dedicated to replication: here.

adduser SYNCUSER munin

This user need ssh access from one host to the other (private/public key sharing, whatever).

Directories setup:

mkdir -p /var/lib/munin-mergedb/
chown munin:munin -R /var/lib/munin-mergedb/
# the +s is very important so directory group ownership is preserved
chmod g+rws -R /var/lib/munin-mergedb/
chmod g+rws /var/lib/munin/
chmod g+rws -R /var/www/html/munin/

On one host (the one allowed to connect through ssh), synchronized two way with unison HTML files:

su - SYNCUSER --shell=/bin/bash

DISTANT_HOST=DISTANTHOST
DISTANT_PORT=22
LOCAL_HTML=/var/www/html/munin/DOMAIN
DISTANT_HTML=/var/www/html/munin/DOMAIN

LOCAL_DB=/var/lib/munin
DISTANT_LOCAL_DB=/var/lib/munin-mergedb/THISHOST
LOCAL_DISTANT_DB=/var/lib/munin-mergedb/DISTANTHOST


# step one, get directories
unison -batch -auto -ignore="Name *.html" -ignore="Name *.png" "$LOCAL_HTML" "ssh://$DISTANT_HOST:$DISTANT_PORT/$DISTANT_HTML"
# step two, get directories img content 
cd "$LOCAL_HTML" && for DIR in *; do [ -d "$DIR" ] && unison -batch -auto -ignore="Name *.html" "$LOCAL_HTML/$DIR" "ssh://$DISTANT_HOST:$DISTANT_PORT/$DISTANT_HTML/$DIR"; done

On one host (the same), synchronized one way with rsync database files:

LOCAL_DB=/var/lib/munin
DISTANT_LOCAL_DB=/var/lib/munin-mergedb/THISHOST
LOCAL_DISTANT_DB=/var/lib/munin-mergedb/DISTANTHOST

# push our db (one way action, easier with rsync)
rsync -a --include='datafile*' --include='limits*' --exclude='*' -e "ssh -p $DISTANT_PORT" "$LOCAL_DB/" "$DISTANT_HOST:$DISTANT_LOCAL_DB/"
# get theirs (one way action, easier with rsync)
rsync -a --include='datafile*' --include='limits*' --exclude='*' -e "ssh -p $DISTANT_PORT" "$DISTANT_HOST:$LOCAL_DB/" "$LOCAL_DISTANT_DB/"

If it works fine, set up /etc/cron.d/munin-sync:

# supposed to assist munin-mergedb.pl

DISTANT_HOST=DISTANTHOST
DISTANT_PORT=22

LOCAL_HTML=/var/www/html/munin/DOMAIN
DISTANT_HTML=/var/www/html/munin/DOMAIN

LOCAL_DB=/var/lib/munin
DISTANT_LOCAL_DB=/var/lib/munin-mergedb/THISHOST
LOCAL_DISTANT_DB=/var/lib/munin-mergedb/DISTANTHOST

# m h dom mon dow user command
# every 5 hour update dir list
01 */5 * * *  SYNCUSER unison -batch -auto -silent -log=false -ignore="Name *.html" -ignore="Name *.png" "$LOCAL_HTML/$DIR" "ssh://$DISTANT_HOST:$DISTANT_PORT/$DISTANT_HTML/$DIR" 2>/dev/null

#  update content twice per hour
*/28 * * * *  SYNCUSER cd "$LOCAL_HTML" && for DIR in *; do [ -d "$DIR" ] && unison -batch -auto -silent -log=false -ignore="Name *.html" "$LOCAL_HTML/$DIR" "ssh://$DISTANT_HOST:$DISTANT_PORT/$DISTANT_HTML/$DIR" 2>/dev/null; done && rsync -a --include='datafile*' --include='limits*' --exclude='*' -e "ssh -p $DISTANT_PORT" "$LOCAL_DB/" "$DISTANT_HOST:$DISTANT_LOCAL_DB/" 2>/dev/null && rsync -a --include='datafile*' --include='limits*' --exclude='*' -e "ssh -p $DISTANT_PORT" "$DISTANT_HOST:$LOCAL_DB/" "$LOCAL_DISTANT_DB/"2>/dev/null

Updated scripts:

Once data there, you will need munin-mergedb script to handle them, use a munin-cron script like my munin-cron-plus.pl instead of munin-cron so it actually calls munin-mergedb.pl. Plus you’ll need a fixed version of munin-graph so –host arguments are not blattlanly ignored (lacking RRD, it would fail to actually write graph for distant munin master process, but it would nonetheless delete existing graphs).

(Where these files go depends on your munin installation packaging. I have the munin processes in /usr/local/share/munin  and munin-cron-plus.pl in /usr/local/bin – it reflects the fact that original similar files are either in /usr/share/munin or /usr/bin. Beware, if you change the name of any munin process, update log rotation files otherwise you may easily fill up a disk drive, since it is kind of noisy especially when issues arise)

As conveniency, you can download these with my -utils-munin debian/devuan packages:

wget apt.rien.pl/stalag13-keyring.deb
dpkg -i apt.rien.pl/stalag13-keyring.deb
apt-get update
apt-get install stalag13-utils-munin

Once everything set up, you can test/debug it by typing:

su - munin --shell=/bin/bash

/usr/local/bin/munin-cron-plus.pl

What next?

Actually I’d welcome improvements munin-cron-plus.pl since it extract –host information in the most barbaric way. I am sure it can be done cleanly using Munin::Master::Config/else.

Then I’d welcome any insight about why munin-graph’s –host option does not works the way I’d like it. Maybe I misunderstand it’s exact purpose. The help reads:

 --host  Limit graphed hosts to . Multiple --host options
               may be supplied.

To me, it really means that it should not do anything at all to any files of hosts excluded this way. If it meant something else, maybe this should be explained.