Pick a font able to properly render a string composed of Unicode characters with Perl

In the case of automated watermarking with randomly picked fonts within a Perl script, it is quite annoying to stumble on fonts missing many non-basic unicode characters (accents, etc). In French, you’ll likely miss the ê or ü or even é or à. In Polish, while the ł is often provided, you’ll like miss ź.

The Perl module Font::FreeType is quite convenient in this regard. The sample code here will try to find a font, within the @fonts list, able to render the $string.  It will pick the fonts randomly, one by one, and check every character of the string against the characters provided by the font. It will stop to pick the first one that actually can fully render the string:

use Font::FreeType;
use utf8; # must occur before any string definition!
use strict;

my @image_tags = "~ł ajàüd&é)=ù\$;«~źmn";
my @fonts = ("/usr/share/fonts/truetype/ttf-bitstream-vera/Vera.ttf", "/usr/share/fonts/truetype/zeppelin.ttf", "/usr/share/fonts/truetype/Barrio-Regular.ttf");
my %fonts_characters;
my $watermark_font;

# we want a random font: but we also want a font that can print every character
# (not obvious with utf8)
# loop until we find a suitable one (all chars are valid, so the chars counter reached 0) or,
# worse case scenario, until we checked them all (means more suitable fonts should be added)
my $chars_to_check = length("#".@image_tags[0]);
my $fonts_to_check = scalar(@fonts);
my %fonts_checked;
while ($chars_to_check > 0 and $fonts_to_check > 0) {

 # pick a random font
 srand();
 $watermark_font = $fonts[rand @fonts];
 
 # if this font was already probed, pick another one
 next if $fonts_checked{$watermark_font};
 $fonts_checked{$watermark_font} = 1; 

 # always reset the chars counter each time we try a font
 $chars_to_check = length("#".@image_tags[0]);
 
 print "Selected font $watermark_font (to check: $fonts_to_check)\n";
 
 # if not yet already, build list of available chars with this font
 unless ($fonts_characters{$watermark_font}) {
 Font::FreeType->new->face($watermark_font)->foreach_char(
 sub {
 my $char_chr = chr($_->char_code);
 my $char_code = $_->char_code;
 $fonts_characters{$watermark_font}{$char_chr} = $char_code;
 });
 print "Slurped $watermark_font chars\n";
 }
 
 # then check if every available character of the watermark exists in this font
 for (split //, "#".@image_tags[0]) {
 print "Check $_\n";
 # breaks out if missing char
 last unless $fonts_characters{$watermark_font}{$_};
 # otherwise decrement counter of chars to check: if we reach 0, they are all valid
 # and we should get out of the font picking loop 
 $chars_to_check--;
 print "Chars still to check $chars_to_check\n";
 }
 
 # we also record there is one less font to check
 $fonts_to_check--;
 
}


print "FONT PICKED $watermark_font\n";

This code is actually included in my post-image-to-tumblr.pl script (hence the variables name).

Obviously, if no font is suitable, it’ll take the last one tested. It won’t go as far as comparing which one is the most suitable, since in the context of this script, if no fonts can fully render a tag, the only sensible course is to add more (unicode capable) fonts to the fonts/ directory.

Advertisements