Sorting and moving automatically videos from download to storage directory

In the spirit the earlier scripts to clean up ogg/mp3 collection (tags, filenames) with lltag, the following script is a proposal to automatically move, from download directory to storage directory, the video files that deserved to be kept. This is especially useful when both directories are on different physical drives and, as such, take a while, and typically with implies heavy IO usage at the moment you are actually using the computer that host the drivers.

The idea is to put a simple mark on directories or files, in the download area, that should be move to the storage area, assuming storage area contains top directories to sort files. The mark is arbitrary: ##Mark##, Mark matching a top directory within the download area.

For example, say we have in the download area the file Vabank II, czyli riposta (1985) DVDRip XviD AC3-BR.avi, associated with a subtitle file Vabank II, czyli riposta (1985) DVDRip XviD AC3-BR.en.srt, within a directory named Vabank II. It must go in storage area top directory Action Espionnage.

The way the script sort-download-area.pl works requires only the Vabank II or Vabank II, czyli riposta (1985) DVDRip XviD AC3-BR.avi to be renamed to include ##Action Espionnage##. And, obviously, to make it more practical, if can be also ##Action##, ##action##, ##Espionnage##, ##AE##, ##ActionEspionnage##, and others aliases as long as they are not confusing regarding other top directories of the download area.

Then running sort-download-area.pl --download /mnt/download --storage /mnt/storage (assuming these are the relevant directories) will take care of moving the found video and text/subtitles files (based on mime-type and filenames). The old directory will remain, the script won’t take any risk to erase any data by itself.

It can be run with --debug option to make a dry-run, to check if everything is in order, list possible marks, etc. If run as root, it will take care of changing mode and ownership to match the relevant download area top directory.

Once a satisfying setup is in place (assuming the script is in /usr/local/bin), it is enough to add a /etc/cron.daily/sort-download-area like:

#!/bin/sh
/usr/local/bin/sort-download-area.pl  --download /mnt/download --storage /mnt/storage

Here the current version of the sort-download-area.pl (but you are advised to always take the latest gitlab version) :

#!/usr/bin/perl

use strict "vars";
use Fcntl ':flock';
use POSIX qw(strftime);
use File::Find;
use File::Basename;
use File::Path qw(make_path);
use File::Copy qw(move);
use File::MimeInfo;
use File::Slurp qw(read_dir);
use Getopt::Long;
use Term::ANSIColor qw(:constants);

# config:
my $user = "nobody";
my $group = "nobody";
my ($download, $storage);
my $debug = 0;
my ($getopt, $help);

# get standard opts with getopt
eval {
    $getopt = GetOptions("debug" => \$debug,
			 "help" => \$help,
			 "download|d:s" => \$download,
			 "storagedir:s" => \$storage);
};

if ($help) {
    print STDERR <<EOF;
Usage: $0 [OPTIONS]

    -d DIR, --download DIR   (mandatory) path to the download/input area
    -s DIR, --storage DIR    (mandatory) path to the storage/output area

    --debug                  Dry-run debug test


Author: yeupou\@gnu.org
       https://yeupou.wordpress.com/

EOF
    exit(1);    
}

unless ($download and $storage) {
    die "Both --download INDIR and --storage OUTDIR must be provided.\nExiting";
}
unless (-d $download and -d $storage) {
    die "Both $download (--download) and $storage (--storage) must exists.\nExiting";
}

sub debug {
    return unless $debug;
    print $_[1] if $_[1]; 
    print $_[0];
    print RESET if $_[1];
    print "\n";
}

########################################################################
## run

# silently forbid concurrent runs
# (http://perl.plover.com/yak/flock/samples/slide006.html)
open(LOCK, "< $0") or die "Failed to ask lock. Exit";
flock(LOCK, LOCK_EX | LOCK_NB) or exit;


####
#### Find out current possible storage top-dirs
#### (with their respective uid/gid)
# value equal to the real-top dir
my %storage_topdir;
# uid/gid of the real-top dir
my %storage_topdir_uid;
my %storage_topdir_gid;
# keep a list of confusing marks
my %storage_topdir_confusingmark;


debug("\n\nStorage ($storage) top-dirs:\n", ON_CYAN);

for my $dir (read_dir($storage)) {
    next unless -d "$storage/$dir";
    next if ($dir =~ /^\./);   # skip hidden dirs

    # store top dir details
    $storage_topdir{$dir} = $dir;
    $storage_topdir_uid{$dir} = (lstat "$storage/$dir")[4];
    $storage_topdir_gid{$dir} = (lstat "$storage/$dir")[5];    
    debug("\t$storage_topdir{$dir}", GREEN);
    debug("\t($storage_topdir_uid{$dir}:$storage_topdir_gid{$dir})");

    # store also top dir useful aliases (end user might want to use shortcuts)
    # but no checks will be made in case of confusing aliases (ie two top dirs shortened in the name way)
    # for instance, Action Espionnage would also accept:
    #           action espionnage (lowercased)
    #		ActionEspionnage (removal of non-word chars)
    #		actionespionnage (lowercased removal of non-word chars)
    #		AE (only capital letters)
    #           ea (lowercase only capital letters)
    #           Action (single word apart)
    #           action (lowercased single word apart)
    #           Espionnage (single word apart)
    #           espionnage (lowercased single word apart)
    

    # alias as lowercased : WesteRn eq western
    my $alias = lc($dir);
    if ($alias ne $dir) {
	unless ($storage_topdir{$alias} or $storage_topdir_confusingmark{$alias}) {
	    debug("\t\t$alias (lowercased)");
	    $storage_topdir{$alias} = $dir;
	} else {
	    debug("\t\tlowercased alias ($alias) is confusing regarding earlier items, skipping", ON_RED);
	    $storage_topdir_confusingmark{$alias} = 1;
	    delete($storage_topdir{$alias});
	}
    }
    # alias with space in place of any non word characters
    $alias = $dir;
    $alias =~ s/[^[:alnum:]]//g;
    
    if ($alias ne $dir) {
	unless ($storage_topdir{$alias} or $storage_topdir_confusingmark{$alias}) {
	    debug("\t\t$alias (removal of non-word chars)");
	    $storage_topdir{$alias} = $dir;
	} else {
	    debug("\t\tremoval of non-word chars alias ($alias) is confusing regarding earlier items, skipping", ON_RED);
	    $storage_topdir_confusingmark{$alias} = 1;
	    delete($storage_topdir{$alias});
	}
	
	# same lowercased
	$alias = lc($alias);
	if ($alias ne $dir) {
	    unless ($storage_topdir{$alias} or $storage_topdir_confusingmark{$alias}) {
		debug("\t\t$alias (lowercased removal of non-word chars)");
		$storage_topdir{$alias} = $dir;
	    } else {
		debug("\t\tlowercased removal of non-word chars alias ($alias) is confusing regarding earlier items, skipping", ON_RED);
		$storage_topdir_confusingmark{$alias} = 1;
		delete($storage_topdir{$alias});
	    }
	}
    }
    # alternatively, only keep the capitalized letters
    $alias = $dir;
    $alias =~ s/[^[:alnum:]]//g;
    $alias =~ s/[^[:upper:]]//g;
    if ($alias ne $dir) {
	unless ($storage_topdir{$alias} or $storage_topdir_confusingmark{$alias}) {
	    debug("\t\t$alias (only capital letters)");
	    $storage_topdir{$alias} = $dir;
	} else {
	    debug("\t\tonly capital letter alias ($alias) is confusing regarding earlier items, skipping", ON_RED);
	    $storage_topdir_confusingmark{$alias} = 1;
	    delete($storage_topdir{$alias});
	}
	# same lowercased     
	$alias = lc($alias);
	unless ($storage_topdir{$alias} or $storage_topdir_confusingmark{$alias}) {
	    debug("\t\t$alias (lowercased only capital letter alias)");
	    $storage_topdir{$alias} = $dir;
	} else {
	    debug("\t\tlowercased only capital letter alias ($alias) is confusing regarding earlier items, skipping", ON_RED);
	    $storage_topdir_confusingmark{$alias} = 1;
	    delete($storage_topdir{$alias});
	}
    }
    # finally, if several worlds compose a string, try to register each
    # (this is where it is most likely to find confusing aliases)
    if (split(" ", $dir) > 1) {
	foreach my $word (split(" ", $dir)) {
	    $alias = $word;
	    unless ($storage_topdir{$alias} or $storage_topdir_confusingmark{$alias}) {
		debug("\t\t$alias (single word apart)");
		$storage_topdir{$alias} = $dir;
	    } else {
		debug("\t\tsingle word apart alias ($alias) is confusing regarding earlier items, skipping", ON_RED);
		$storage_topdir_confusingmark{$alias} = 1;
		delete($storage_topdir{$alias});
	    }
	    $alias = lc($alias);
	    unless ($storage_topdir{$alias} or $storage_topdir_confusingmark{$alias}) {
		debug("\t\t$alias (lowercased single word apart)");
		$storage_topdir{$alias} = $dir;
	    } else {
		debug("\t\tlowercased single word apart alias ($alias) is confusing regarding earlier items, skipping", ON_RED);
		$storage_topdir_confusingmark{$alias} = 1;
		delete($storage_topdir{$alias});
	    }
	}
    }
        
}


####
#### Find out any file or directory that we should be moving
#### (do not start moving files unless we checked everything)

# build an hash of files to move
# (with a secondary hash to keep track of the storage topdir) 
my %tomove;
my %tomove_topdir;


debug("\n\nDownload ($download) files:\n", ON_CYAN);

sub wanted {
    # $File::Find::dir is the current directory name,
    # $_ is the current filename within that directory
    # $File::Find::name is the complete pathname to the file.

    # check if we have a ##STRING## inside
    my $mark;
    $mark = $1 if $File::Find::name =~ m/##(.*)##/;

    # none found, skipping
    next unless $mark;

    # string refers to non-existant directory, skipping
    unless ($storage_topdir{$mark}) {
	debug("Mark $mark found for $File::Find::name while no such storage directory exists in $storage", ON_RED);
	# this is an issue that requires manual handling, print ont STDERR
	print STDERR ("Mark $mark found for $File::Find::name while no such storage directory exists in $storage\n");
	next;
    }

    # take into account only videos and text files
    my $suffix;
    $suffix = $1 if $_ =~ /([^\.]*)$/;
    my ($mime_type,$mime_type2) = split("/", mimetype($File::Find::name));
    if ($mime_type ne "video" and
	$mime_type ne "text") {
	# second pass to allow even more text files based on extension
	# (subtitles : srt sub ssa ass idx txt smi)
		
	unless ($suffix eq "srt" or
		$suffix eq "sub" or
		$suffix eq "txt" or
		$suffix eq "ssa" or
		$suffix eq "ass" or
		$suffix eq "idx" or
		$suffix eq "smi") {
	    debug("\tskip $_ ($mime_type/$mime_type2 type)");
	    next;		
	}
    }

    my $destination_dir = "$storage/$storage_topdir{$mark}";
    my $destination_file = $_;
    $destination_file =~ s/##(.*)##//g;
    $destination_file =~ s/^\s*//;
    $destination_file =~ s/\s*$//;

    # now handle the special S00E00 case of series, like 30 Rock (2006) - S05E16 or 30 Rock S05E16
    my ($season, $before_season, $show);
    $before_season = $1 and $season = $2 if $_ =~ m/^(.*)S(\d\d)\ ?E\d\d[^\d]/i;
    if ($season) {
	# there is a season, we must determine the show name
	#    30 Rock (2006) - S05E16 => 30 Rock
	# end user must pay attention to have consistent names
	$show = $1 if $before_season  =~ m/^([\w|\s|\.|\'|\,]*)/g;
	# dots often are used in place of white spaces
	$show =~ s/\./ /g;    
	# keep only spaces in shows name, nothing else
	$show =~ s/[^[:alnum:]|\ ]//g;    
	$show =~ s/^\s*//;
	$show =~ s/\s*$//;
	# capitalize first letter
	$show =~ s/\b(\w)/\U$1/g;
	
	# if we managed to find the show name, then set up the specific series tree
	last unless $show;
	debug("found show: $show", MAGENTA);
	$destination_dir = "$storage/$storage_topdir{$mark}/$show/S$season";	    	
    }

    
    # if we reach this point, everything seems in order, plan the move
    debug("plan -> $destination_dir/$destination_file");
    $tomove{$File::Find::name} = "$destination_dir/$destination_file";
    $tomove_topdir{$File::Find::name} = $storage_topdir{$mark};


    # additionally, if we deal with a video, look for any possibly related file to add also that would not have been picked
    # otherwise
    if ($mime_type eq "video") {
	my $other_files_path = $File::Find::name;
	$other_files_path =~ s/\.$suffix$//g;

	debug("glob $other_files_path*");
	my @other_files =
	    glob('$other_files_path*.srt'),
	    glob('$other_files_path*.sub'),
	    glob('$other_files_path*.txt'),
	    glob('$other_files_path*.ssa'),
	    glob('$other_files_path*.ass'),
	    glob('$other_files_path*.idx'),
	    glob('$other_files_path*.smi');
	foreach my $file (@other_files) {
	    debug("plan -> $destination_dir/$file");
	    $tomove{"$File::Find::dir/$file"} = "$destination_dir/$file";
	    $tomove_topdir{"$File::Find::name/$file"} = $storage_topdir{$mark};
	}		    
    }

    debug();
      
}
find(\&wanted, $download);

####
#### Actually move files now
####

debug("\n\nMove from download ($download) to storage ($storage):\n", ON_CYAN);

foreach my $file (sort keys %tomove) {

    debug(basename($file), YELLOW);

    my $uid = $storage_topdir_uid{$tomove_topdir{$file}};
    my $gid = $storage_topdir_gid{$tomove_topdir{$file}};    

    # create directory if needed
    my $dir = dirname($tomove{$file});
    unless (-e $dir) {
	make_path($dir, { chmod => 0770, user => $uid, group => $gid }) unless $debug;
	debug("make_path $dir (chmod => 0770, user => $uid, group => $gid)");
    }

    # then move the file (chown if root)
    # avoid overwriting, add number in the end, no extension saving
    my $copies;
    if (-e $tomove{$file}) {
	while (-e "$tomove{$file}.$copies") {
	    $copies++;
	    # stop at 10, makes no sense to keep more than that amount of copies
	    last if $copies > 9;
	}
    }    
    $tomove{$file} .= ".$copies" if $copies;
    move($file, $tomove{$file}) unless $debug;
    chown($uid, $gid, $tomove{$file}) unless $debug or $< ne 0;
    debug("$file -> $tomove{$file}");

    debug();	  
}

# EOF

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s