If we are just copying data from one file to another, we don't need to
fill a complete buffer, just read a chunk at a time, and write it to the
output.
fread reads the entire requested size (BUFSIZ), which causes tools to
block if only small amounts of data are available at a time. At best,
this causes unnecessary copies and inefficiency, at worst, tools like
tee and cat are almost unusable in some cases since they only display
large chunks of data at a time.
writeall makes successive write calls to write an entire buffer to the
output file descriptor. It returns the number of bytes written, or -1 on
the first error.
Previously, with -p, the specified directory and all of its parents
would be 0777&~filemask (regardless of the -m flag). POSIX says parent
directories must created as (0300|~filemask)&0777, and of course if -m
is set, the specified directory should be created with those
permissions.
Additionally, POSIX says that for symbolic_mode strings, + and - should
be interpretted relative to a default mode of 0777 (not 0).
Without -p, previously the directory would be created first with
0777&~filemask (before a chmod), but POSIX says that the directory shall
at no point in time have permissions less restrictive than the -m mode
argument.
Rather than dealing with mkdir removing the filemask bits by calling
chmod afterward, just clear the umask and remove the bits manually.
In the description of 3111908b03, it says
that the functions must be able to handle st being NULL, but recurse
always passes a valid pointer. The only function that was ever passed
NULL was rm(), but this was changed to go through recurse in
2f4ab52739, so now the checks are
pointless.
In commit 30fd43d7f3, unescape was
simplified significantly, but the new version failed to NULL terminate
the resulting string.
This causes bad behavior in various utilities, for example tr:
Broken:
$ echo b2 | tr '\142' '\143'
c3
Fixed:
$ echo b2 | tr '\142' '\143'
c2
This bug breaks libtool's usage of tr, causing gcc to fail to build with
sbase.
Previously, if a file failed to read in a checksum list, it would be
reported as not matched rather than a read failure.
Also, if reading from stdin failed, previously a bogus checksum would be
printed anyway.
Also, since parsemode exits on failure, don't bother checking return
value in xinstall (this would never trigger anyway because mode_t can be
unsigned).
For sort(1) we need memmem(), which I imported from OpenBSD.
Inside sort(1), the changes involved working with the explicit lengths
given by getlines() earlier and rewriting some of the functions.
Now we can handle NUL-characters in the input just fine.
strmem() was not very well thought out. The thing is the following:
If the string contains a zero character, we want to match it, and not
stop right there in place.
The "real" solution is to use memmem() where needed and replace all
functions that assume zero-terminated-strings from standard input, which
could lead to early string-breakoffs.
This requires a strict tracking of string lengths.
We want our delimiters to also contain 0 characters and have them
handled gracefully.
To accomplish this, I wrote a function strmem(), which looks for a
certain, arbitrarily long memory subset in a given string.
memmem() is a GNU extension and forces you to call strlen every time.
Yeah well, the old topic. POSIX allows \0123 and \123 octals in
different tools, in printf, depending on %b or other things.
We'll just keep it simple and just allow 4 digits. the 0 does not make
a difference anyway.
When we move the exit() out of venprintf(), we can reuse it for
weprintf(), which basically had duplicate code.
I also renamed venprintf() to xvprintf (extended vprintf) so it's
more obvious what it actually does.
This reverts commit a564a67c4ea70e90a4dc543814458e4903869d3e.
Not as trivial as I thought. This breaks cp when used as:
cp -r /foo/bar /baz
The old code expands this to:
cp -r /foo/bar /baz/bar
This is a utility function to allow easy parsing of file or other
offsets, automatically taking in regard suffixes, proper bases and
so on, for instance used in split(1) -b or od -j, -N(1).
Of course, POSIX is very arbitrary when it comes to defining the
parsing rules for different tools.
The main focus here lies on being as flexible and consistent as
possible. One central utility-function handling the parsing makes
this stuff a lot more trivial.
Otherwise, we run into problems in a typical autoconf-based build
system:
- config.status is created at some point between two seconds.
- config.status is run, generating Makefile by first writing to a file
in /tmp, and then mv-ing it to Makefile.
- If this mv happens before the beginning of the next second, Makefile
will be created with the same tv_sec as config.status, but with
tv_nsec = 0.
- When make runs, it sees that Makefile is older than config.status,
and re-runs config.status to generate Makefile.
In general, POSIX does not define /dev/std{in, out, err} because it
does not want to depend on the dev-filesystem.
For utilities, it thus introduced the '-'-keyword to denote standard
input (and output in some cases) and the programs have to deal with
it accordingly.
Sadly, the design of many tools doesn't allow strict shell-redirections
and many scripts don't even use this feature when possible.
Thus, we made the decision to implement it consistently across all
tools where it makes sense (namely those which read files).
Along the way, I spotted some behavioural bugs in libutil/crypt.c and
others where it was forgotten to fshut the files after use.
General convention is to use size_t to store sizes of all kinds.
Internally, the function uses double anyway, but at least this
doesn't clobber up the API any more and there's a chance in the
future to make this function a bit cleaner and not use this dirty
static buffer hack any more.