Setting up LXC containers with mapped GID/UID

Result of ps aux on a LXC host is quite messy! But that can be improved, with the benefit of having each LXC container using a specific namespace: for instance « having a process is unprivileged for operations outside the user namespace but with root privileges inside the namespace ». Easier to check on and likely to be more secure.

A reply to the question « what is an unpriviledged LXC container » provides a working howto.  The following is a proposal to implement it even more easily.

For each LXC container, you need to pick a UID/GID range. For instance, for container test1, let’s pick 100000 65536. It means that root in test1, will actually be 100000 on the main host. User 1001 in test1 will be 101001 on the main host and so on.

So you must add the map on the main host:

 usermod --add-subuids 100000-165535 root
 usermod --add-subgids 100000-165535 root

Then you must configure the relevant LXC container configuration file whose location varies according to you lxc.lxcpath.

# require userns.conf associated to the distribution used
lxc.include = /usr/share/lxc/config/debian.userns.conf

# specific user map
lxc.id_map = u 0 100000 65536
lxc.id_map = g 0 100000 65536

Then you need to update files ownership according to the new mapping. Original poster proposed a few shell commands but that would only be enough to start the container. Files within the container would not get inappropriate ownership: most likely, files that belongs to root/0 on the host would show up as owned by nobody/65534. For proper ownership to root/0 within the LXC container, they need to belong to 100000 on the the host.

Here comes my script: it’ll take as argument your LXC container name (or alternatively a path, useful for mounts that are residing outside of it) and value to increment. In the first case, it’ll be 100000:

# shutting down the container before touching it
lxc-stop --name test1 

# obtain the script
chmod +x ~/

# chown files +100000
~/ --lxc=test1 --increment=100000

# start the container
lxc-start --name test1

That’s all. Obviously, you should check that every daemon is still functionning properly. If not, either it means a file owership changed was missed (happened once to a container with transmission-daemon) or maybe its mode was not properly set beforehand (happened once to a container with exim4 that was not setuid – it led to failure with procmail_pipe).

Next container test2? Edit `lxc-config lxc.lxcpath`/test2/config:

# require userns.conf associated to the distribution used
lxc.include = /usr/share/lxc/config/debian.userns.conf

# specific user map
lxc.id_map = u 0 200000 65536
lxc.id_map = g 0 200000 65536

Then run:

lxc-stop --name test2
usermod --add-subuids 200000-165535 root
usermod --add-subgids 200000-165535 root
~/ --lxc=test2 --increment=200000
lxc-start  --name test2

I tested the script on 16 LXC containers with no problem so far.

If you need to deal with extra mounted directories (lxc.mount.entry=…), use –path option.

If you need to deal with a container that was already mapped (for instance already 100000 65536 but you would like it to be 300000 65536), you’ll need to raise the –limit that is by default equal to increment value: that would be –increment=200000 –limit=300000. This limit exists so you can re-run the script on the same container with no risk of having files getting out of range.

For the record:

For the record, follows the script as it is today (but it always best to get latest version from gitlab – because I wont update any bugfixes/improvements on this page) :


use strict;
use File::Find;
use Getopt::Long;

### options
my ($getopt, $help, $path, $lxc, $increase, $limit);
eval {
    $getopt = GetOptions("help" => \$help,
			 "lxc=s" => \$lxc,
	                 "path=s" => \$path,
	                 "increase=i" => \$increase,
	                 "limit=i" => \$limit);

if ($help or
    !$increase or
    (!$path and !$lxc)) {
    # increase is mandatory
    # either path or lxc also
    # print help if missing
        print STDERR "
  Usage: $0 [OPTIONS] --lxc=name --increase=100000
         $0 [OPTIONS] --path=/directory/ --increase=100000

Will increase all files UID/GID by the value set.

      --lxc=name    LXC container name, will be used to determine path
      --path=/dir   No LXC assumption, just work on a given path
      --increase=n  How much to increment
      --limit=n     Increase limit, by default equal to increase

Useful for instance when you add to a LXC container such config:
  lxc.id_map = u 0 100000 65536
  lxc.id_map = g 0 100000 65536

And the host system having the relevant range set: 
  usermod --add-subuids 100000-165535 root
  usermod --add-subgids 100000-165535 root

It would update UID/GID within rootfs to match the proper range. Note that
additional configured mount must also be updated accordingly, using --path 
for instance.

By default, limit is set to increase value so you can run it several time on 
the same container, the increase will be effective only once. You can set the
limit to something else, for instance if you want to increase by 100000 a 
container already within the 100000-165536 range, you would have to 
use --increase=100000 --limit=200000.

This script is primitive: it should work in most case, but if some service fail
to work after the LXC container restart, it is probably because one or several 
files were missed.

Author: yeupou\

# limit set to increase by default
$limit = $increase unless $limit;

# if lxc set, use it to define path
if ($lxc) {
    my $lxcpath = `lxc-config lxc.lxcpath`;
    $path = "$lxcpath/$lxc/rootfs";

# in any case, path must be given and found
die "path $path: not found, exit" unless -e $path;
print "path: $path\n";

### run
find(\&wanted, $path);

# if lxc, check main container config
if ($lxc) {
    my $lxcpath = `lxc-config lxc.lxcpath`;
    # directory for the container
    chown(0,0, "$lxcpath/$lxc");
    chmod(0775, "$lxcpath/$lxc");
    # container config
    chown(0,0, "$lxcpath/$lxc/config");
    chmod(0644, "$lxcpath/$lxc/config");
    # container rootfs - chown will be done during the wanted()
    chmod(0775, "$lxcpath/$lxc/rootfs");


sub wanted {
    print $File::Find::name;
    # find out current UID/GID
    my $originaluid = (lstat $File::Find::name)[4];
    my $newuid = $originaluid;
    my $originalgid = (lstat $File::Find::name)[5];
    my $newgid = $originalgid;
    # increment but only if we are below the new range
    $newuid += $increase if ($originaluid < $increase);
    $newgid += $increase if ($originalgid < $increase);

    # update if there is at least one change
    if ($originaluid ne $newuid or
	$originalgid ne $newgid) {
	chown($newuid, $newgid, $File::Find::name);
	print " set to UID:$newuid GID:$newgid\n";
    } else {
	print " kept to UID:$originaluid GID:$originalgid\n";


Using PowerDNS (server and recursor) instead of Bind9, along with domain name spoofing/caching

Recently, I made an article about how to use Bind9 with LXC containers, setup including domain name spoofing/caching. This setup was using Bind9 views so only the caching LXC container would get real IP for cached domains (like or so nginx, on this system, could mirror accordingly.

Then Debian 9.0 was released and I found out two views were no longer allowed to share writing rights to a single same zone definition file.

You would then get error like  “writeable file ‘/etc/bind/…’: already in use: /etc/bind/…”.

As sacrifial-spam-address wrote:

At first, I thought this was a bug (how can a config file line conflict
with itself?), then I realized that the conflict was between the
two views.

There does not appear to be any simple workaround. The best solution
appears to be to use the new BIND 9.10 “in-view” feature, which allows a
zone in one view to be a reference to the same zone in a different view.
When this is done, both views may share the same cache file.

The down side is that this violates one of the important principles of
programming: only specify something in one place. Instead, I have to have
a “master” definition and several “in-view” declarations referencing
the master.

I wish BIND would either deal with the problem after noticing it (by
automatically doing the equivalent of the in-view), or provide a way to
import every zone in a view, avoiding the need for a long list of in-view

Then I fixed my setup to work with in-view, updating the article already linked. But the experience was clearly unsatisfying, adding one more layer of complexity to something already quite dense.

Plus I got some error in this new setup: it seemed that in-view, at least the way I configured, cause the different views to behave as if they share a same cache. Say, after Bind9 startup, I pinged from any LXC container but the cache one, I would get the IP of the LXC cache container as it should be. But, then, if I pinged the same domain from the LXC cache container, I would still get as answer its own IP, as if there was not two different views setup. Same with the opposited test, if the first ping was from within the LXC cache container, then from any other, I would get the result wanted only for the LXC cache container.

So it lead me to the point that I had to understand better the in-view feature that in first place I did not want to use, in order to get it to behave like view did.

You got it: I found much easier to user PowerDNS instead.

PowerDNS (pdns) is composed of an authoritative DNS server plus a DNS recursor. The first one I only need on this setup for the LAN domain name (quite used: LXC containers + connected devices). The recursor is doing some caching. And can be easily scripted.

apt-get install pdns-server pdns-recursor pdns-backend-sqlite3

Often, when both the name server and the recursor are installed on the same machine, people set up the name server to listen on port 53 on the network and to pass to the recursor, listening on another port, requests it cannot handle (that it is not authoritative for and need a recursor to resolve then).

Ok, why not. Except that I want specific answer to be given depending on the querier’s IP for domains outside of the LAN, so handled by the recursor. That would not work if the recursor get queries sent over loopback device by the authoritative server.

Aside from that, just as general principle, I like better the notion of, by default, soliciting a recursor that, only when necessary, ask the local DNS server  instead of other DNS than the notion of asking a DNS server to handle queries that he is most of the time unlikely to be have authoritative answer for and that he’ll have to pass to a recursor.

So instead of the usual proposed :

  • client ->  local DNS server  -> DNS recursor if non authoritative  -> distant authoritative DNS server

It’ll be:

  • client -> DNS recursor -> authoritative DNS server (local or distant).


authoritative PowerDNS server

First we deal with the DNS server to server YOURDOMAIN.LAN. The sqlite3 backend should be installed and set up (or else of your liking).

By default, the sqlite3  database is in /var/lib/powerdns/pdns.sqlite3

Easiest way is to convert Bind9 zone config to set it up:

zone2sql --named-conf=/etc/bind/named.conf.local --gsqlite | sqlite3 /var/lib/powerdns/pdns.sqlite3

That’s all!

As alternative,  you can also create zone from scratch with pdnsutil:

cd /var/lib/powerdns
sqlite3 pdns.sqlite3 < /usr/share/doc/pdns-backend-sqlite3/schema.sqlite3.sql
chown pdns:pdns pdns.sqlite3
# main zone
pdnsutil create-zone YOURDOMAIN.LAN ns1.YOURDOMAIN.LAN
pdnsutil add-record YOURDOMAIN.LAN main A
pdnsutil add-record YOURDOMAIN.LAN @ MX "10 mx.YOURDOMAIN.LAN"
# first reverse zone 192.168.1
pdnsutil create-zone ns1.YOURDOMAIN.LAN
pdnsutil add-record 1 PTR main.YOURDOMAIN.LAN
# to be continued


In our previous setup, we had DNS update automated by ISC DCPDd: we want any new host on the local network to be given an IP. Nothing changed regarding ISC DHCPd, read the relevant ISC DHCPd setup part. For the record, to generate the relevant update key:

cd /etc/dhcp
dnssec-keygen -a hmac-md5 -b 256 -n USER ddns

The secret will be a string like XXXXX== within the ddns.key generated file.

Obviously, powerdns needs this data. You need to register the key+secret and give right on each zone (YOURDOMAIN.LAN plus the reverse for IP ranges, below for 192.168.1 and 10.0.0)

sqlite3 /var/lib/powerdns/pdns.sqlite3

# XXXXX== = the secret string
insert into tsigkeys (name, algorithm, secret) values ('ddns', 'hmac-md5','XXXXX==');

# find out ids of zones
select id from domains where name='YOURDOMAIN.LAN';
select id from domains where name='';
select id from domains where name='';

 # authorized the key for each
 insert into domainmetadata (domain_id, kind, content) values (1, 'TSIG-ALLOW-DNSUPDATE', 'ddns');
 insert into domainmetadata (domain_id, kind, content) values (2, 'TSIG-ALLOW-DNSUPDATE', 'ddns');
 insert into domainmetadata (domain_id, kind, content) values (3, 'TSIG-ALLOW-DNSUPDATE', 'ddns');

Finally, you need to configure powerdns itself. You can directly edit /etc/powerdns/pdns.conf but I think easier to create a specific /etc/powerdns/pdns.d/00-pdns.conf so you do not edit the default example:

# base

# dynamic updates

Note it is an IPv4 only setup. It’ll listen only on loopback interface, since no one is supposed to contact him directly beside the recursor sitting on the same loopback.

You can restart the daemon (rc-service pdns restart  with OpenRC, else depending on your init).

PowerDNS recursor :

It is quite straighforward to configure in /etc/powerdns/recursor.conf, this one will listen on LAN addresses (not the loopback):

# restrict netmask allowed to query 


# for local domain, via loopback device, forward queries to the PowerDNS local authoritative server

# list of IP to listen to

# that is how we will spoof/cache

So all the magic will be done in the /etc/powerdns/redirect.lua script, where the cache LXC container IP is hardcoded (that could be change in future version if necessary):

-- (requires pdns-recursor 4 at least)
-- cached servers
cached = newDS()
cachedest = ""

-- ads kill list
ads = newDS()
adsdest = ""

-- hand maintained black list
blacklisted = newDS()
blacklistdest = ""

function preresolve(dq)
   -- DEBUG
   --pdnslog("Got question for "..dq.qname:toString().." from "..dq.remoteaddr:toString().." to "..dq.localaddr:toString(), pdns.loglevels.Error)
   -- handmade domains blacklist
   if(blacklisted:check(dq.qname)) then
      if(dq.qtype == pdns.A) then
	 dq:addAnswer(dq.qtype, blacklistdest)
	 return true
   -- spam/ads domains
   if(ads:check(dq.qname)) then
      if(dq.qtype == pdns.A) then
	 dq:addAnswer(dq.qtype, adsdest)
	 return true
   -- cached domains
   if(not cached:check(dq.qname)) then
      -- not cached
      return false
      -- cached: variable answer
      dq.variable = true
      -- request coming from the cache itself
      if(dq.remoteaddr:equal(newCA(cachedest))) then
	 return false
      --  redirect to the cache
      if(dq.qtype == pdns.A) then
	 dq:addAnswer(dq.qtype, cachedest)
   return true


This script relies on three files to do its magic.

redirect-blacklisted.lua that is hand made blacklist, the default content is:


redirect-cached.lua is to be generated by that you should edit before running, to list which domains you want to cache:



# comment this if you dont cache steam
# (note: nginx cache must also cover this)
# comment this if you dont cache debian
# (useful read: )
# comment this if you dont cache devuan
# comment this if you dont cache ubuntu

echo "-- build by ${0}" > $out
echo "-- re-run it commenting relevant domains if you dont cache them all" >> $out
echo "return{" >> $out
for domain in $DOMAINS; do
    echo \"$domain\", >> $out
echo "}" >> $out


Finally, redirect-ads.lua is to be generated by that you put in a weekly cronjob (following by a pdns-recursor restart):

use strict;
use Fcntl ':flock';

# disallow concurrent run
open(LOCK, "< $0") or die "Failed to ask lock. Exiting"; flock(LOCK, LOCK_EX | LOCK_NB) or die "Unable to lock. This daemon is already alive. Exiting"; open(OUT, "> redirect-ads.lua");

# You can choose between wget or curl. Both rock!
# my $snagger = "curl -q";
my $snagger = "wget -q -O - ";

# List of URLs to find ad servers.
my @urls = (";hostformat=one-line;mimetype=plaintext");

print OUT "return{\n";
# Grab the list of domains and add them to the realm file
foreach my $url (@urls) {
    # Open the curl command
    open(CURL, "$snagger \"$url\" |") || die "Cannot execute $snagger: $@\n";

    printf OUT ("--- Added domains on %s --\n", scalar localtime);

    while () {
	next if /^#/;
	next if /^$/;
	foreach my $domain (split(",")) {
	    print OUT "\"$domain\",\n";

print OUT "}\n";

So before starting the recursor, run and

Then, after restart, everything should be up and running, with no concerns of inconsistent issues. And, as you can see for yourself, the LUA scripting possibility is as easy as extensible.

Replacing sysv-rc by OpenRC on Debian

I won’t comment once again systemd, I think my mind set on the topic and future will tell if I was right or wrong. Still, dislike for the humongous scope self-assigned by systemd does not equate to consider that sysv-rc cannot be improved, it does not mean being stuck with sysv-rc for life.

Good thing is OpenRC is still being developed (latest release: 7 days ago) and, being portable over BSDs, is very unlikely to follow systemd path. Switching Debian 8.8 from sysv-rc over openrc (assuming you already got rid of systemd) is just a matter of installing it:

apt-get install openrc

for file in /etc/rc0.d/K*; do s=`basename $(readlink "$file")` ; /etc/init.d/$s stop; done

Then after reboot, removing sysv-rc:

apt-get --purge remove sysv-rc

rm -rf /etc/rc*.d /etc/rc.local

You might want then to edit /etc/rc.conf (for instance for LXC containers rc_sys=”lxc” option), check status with rc-status, check services according to runlevels in /etc/runlevels.

Update: also works with Debian 9.0 just released. But do not bother removing /etc/rc*.d, they now belong to package init-system-helpers, a package that “contains helper tools that are necessary for switching between the various init systems that Debian contains (e. g. sysvinit or systemd)”, essential on which other packages depends on (like cron). I actually asked for init-system-helpers not to force people to have in /etc directories that makes sense only for systemd or sysv-rc but, so far, got no real reply, but a WONTFIX closure – apparently the fact that update-rc.d would require /etc/rc*.d is the main issue, while update-rc.d handles perfectly the fact that /etc/runlevels may or may not exists.

For more details about OpenRC, I suggest to read Debian’s Debate initsystem openrc listing pros and cons.