Well, isspacerune() has been fixed and some other FIXME's were also easy
to do.
There are some places where maybe some util-functions could be helpful.
In some cases, like for instance in regard to escape-sequences, I'm all
for consistency rather than adhering to the POSIX-standard too much.
Relying on centralized util-functions also makes it possible to keep
this consistency across the board.
Previously, the string-length was limited to BUFSIZ, which is an
obvious deficiency.
Now the buffer only needs to be as long as the user specifies the
minimal string length.
I added UTF-8-support, because that's how POSIX wants it and there
are cases where you need this. It doesn't add ELF-barf compared to
the previous implementation.
The t-flag is also pretty important for POSIX-compliance, so I added
it.
The only trouble previously was the a-flag, but given that POSIX
leaves undefined what the a-flag actually does, we set it as default
and don't care about parsing ELF-headers, which has already
turned out to be a security issue in GNU coreutils[0].
[0]: http://lcamtuf.blogspot.ro/2014/10/psa-dont-run-strings-on-untrusted-files.html
I somehow missed this issue. A rune can not be smaller than the left
range-delimiter and bigger than the right range-delimiter at the
same time.
The real check has to check if either condition applies.
This is a particularly interesting program.
I managed to implement everything according to POSIX except how
octal escapes are specified in the standard, which is yet another
format compared to the one demanded for tr(1).
This not only confuses people, it also adds unnecessary cruft
for no real gain.
So in order to be able to use unescape() easily and for consistency,
I used our initial format \o[oo] instead of \0[ooo].
Marked as optional is UTF-8 support for %c in the POSIX specification.
Given how well-developed libutf has become, doing this here was more
or less trivial, putting us yet again ahead of the competition.
Store the result in an int and do the comparison. This is always
safe without using strange constructs like "signed char".
wc(1) would go into an infinite loop when executed on an ARM
system.