I somehow missed this issue. A rune can not be smaller than the left
range-delimiter and bigger than the right range-delimiter at the
same time.
The real check has to check if either condition applies.
This is a particularly interesting program.
I managed to implement everything according to POSIX except how
octal escapes are specified in the standard, which is yet another
format compared to the one demanded for tr(1).
This not only confuses people, it also adds unnecessary cruft
for no real gain.
So in order to be able to use unescape() easily and for consistency,
I used our initial format \o[oo] instead of \0[ooo].
Marked as optional is UTF-8 support for %c in the POSIX specification.
Given how well-developed libutf has become, doing this here was more
or less trivial, putting us yet again ahead of the competition.
Store the result in an int and do the comparison. This is always
safe without using strange constructs like "signed char".
wc(1) would go into an infinite loop when executed on an ARM
system.
This is a special third kind of structure found in Unicode, besides
singletons and ranges.
This dramatically reduces the number of explicit singletons in the
lookup tables.
Also, I changed the awk-script so that it can sort trivial
translations as well, breaking down the LOC even more.
The binary size of tr dropped from 67K to 51K.
Previously, the to*rune function would have to jiggle with two
arrays, and it somehow evaded me that it is actually way simpler
to just add another entry to the arrays if needed.
Binary size goes slightly down, e.g. tr statically linked against
musl: 68072 -> 67688
Behind the scenes though the conversion should be a bit faster and,
more importantly, the scary case-conversion function is simplified
and easier to understand.
It also drops nearly half the LOC in upperrune.c and lowerrune.c.