Whenever I download a filename with dangerous characters, or receive such a file in an email attachment, I get mildly frustrated. To address this issue, I wrote a Perl script called fix-file-names, which is used to rename such files. The script is given below:
#!/usr/bin/perl
# fix-file-names - change file names to safe names, e.g. space to _ etc.
# 2009-2020 Vlado Keselj vlado@dnlp.ca http://vlado.ca last update:2020-12-08
# Usage: fix-file-names f1 f2 ...
for my $fnold (@ARGV) {
my $fnnew = &fix_filename($fnold);
if ($fnnew eq $fnold) { print "$fnnew \t\tthe same file name kept!\n" }
else {
if (-e $fnnew) { die "$fnnew already exists!" }
print "$fnold \t-> $fnnew\n";
rename($fnold,$fnnew) or die;
}
}
sub fix_filename {
local $_ = shift; s/^-/F-/; s/ +- +/-/g;
s/''+/--/g; s/'/-/g; s/[[(<{]/_-/g; s/[])>}]/-_/g;
s/[,:;]\s*/--/g; s/&/and/g; s/ /_/g;
s/__+/_/g; s/---+/--/g;
s/\xE2\x80\x99/-/g; # Single right quote
s/[^\w.-]/"0x".uc unpack("H2",$&)/ge;
return $_;
}
# 2020-12-06
# - =HH encoding is replaced with 0xHH since '=' is a special character in
# shell (bash)
The script first tries to fix various common constructs in filenames to
their roughly similar but safe equivalents, and finally it replaces any
potentially non-safe character to a hexadecimal 0xHH code.
The package fix-filenames by Martin Zagora is an interesting open-source program in TypeScript (JavaScript) to fix filenames by recoding some non-ASCII characters. It contains an interesting mapping of non-ASCII characters to ASCII string equivalents. Its GitHub location is https://github.com/zaggino/fix-filenames.