Okay, why yet another recurse()-refactor?
The last one added the recursor-struct, which simplified things
on the user-end, but there was still one thing that bugged me a lot:
Previously, all fn()'s were forced to (l)stat the paths themselves.
This does not work well when you try to keep up with H-, L- and P-
flags at the same time, as each utility-function would have to set
the right function-pointer for (l)stat every single time.
This is not desirable. Furthermore, recurse should be easy to use
and not involve trouble finding the right (l)stat-function to do it
right.
So, what we needed was a stat-argument for each fn(), so it is
directly accessible. This was impossible to do though when the
fn()'s are still directly called by the programs to "start" the
recurse.
Thus, the fundamental change is to make recurse() the function to
go, while designing the fn()'s in a way they can "live" with st
being NULL (we don't want a null-pointer-deref).
What you can see in this commit is the result of this work. Why
all this trouble instead of using nftw?
The special thing about recurse() is that you tell the function
when to recurse() in your fn(). You don't need special flags to
tell nftw() to skip the subtree, just to give an example.
The only single downside to this is that now, you are not allowed
to unconditionally call recurse() from your fn(). It has to be
a directory.
However, that is a cost I think is easily weighed up by the
advantages.
Another thing is the history: I added a procedure at the end of
the outmost recurse to free the history. This way we don't leak
memory.
A simple optimization on the side:
- if (h->dev == st.st_dev && h->ino == st.st_ino)
+ if (h->ino == st.st_ino && h->dev == st.st_dev)
First compare the likely difference in inode-numbers instead of
checking the unlikely condition that the device-numbers are
different.
Be more pedantic about the error-checking, fread can also return
values > 0 even though there has been a read-error.
We want to write the last incoming data and then bail.
pathconf() is just an insane interface to use. All sane operating-
systems set sane values for PATH_MAX. Due to the by-runtime-nature of
pathconf(), it actually weakens the programs depending on its values.
Given over 3 years it has still not been possible to implement a sane
and easy to use apathmax()-utility-function, and after discussing this
on IRC, we'll dump this garbage.
We are careful enough not to overflow PATH_MAX and even if, any user
is able to set another limit in config.mk if he so desires.
I marked out -m, -s and -x, because they are either visual flags
for interactive mode, which are better solved with tools made for this
job, or superfluous in another sense.
For example, -s basically "steals" the job from du.
In general, some of these options might still be easy to implement.
The options -S and -f are important though, as they are sorting-options
with real use.
Only add empty lines before returns, everything else is ok.
Also add the STANDARDS-section to the manpage, which was only
present as a heading until now.
1) Specify default in manpage under flag.
2) Boolean and return value style fixes.
3) argv-argc-centric loop.
4) No need to check for argc == 1 before the fflag-subroutine.
5) Remove indentation.
6) Empty line before return.
1) Get rid of strtop(), which was a NiH-version of estrtonum().
2) Boolean-style-fixes.
3) Update usage, reflecting num-idiom, also update manpage accordingly.
4) Don't break after usage().
5) Rewrite main loop with *argv instead of argv[i].
6) Don't play around with who < 0 and stuff.
7) Rename status to ret for consistency.
It has become a common idiom in sbase to check strlcat() and strlcpy()
using
if (strl{cat, cpy}(dst, src, siz) >= siz)
eprintf("path too long\n");
However, this was not carried out consistently and to this very day,
some tools employed unchecked calls to these functions, effectively
allowing silent truncations to happen, which in turn may lead to
security issues.
To finally put an end to this, the e*-functions detect truncation
automatically and the caller can lean back and enjoy coding without
trouble. :)
1) Add usage().
2) Idiomatic argv0-setter. We don't use arg.h, as we do not process
flags or arguments.
3) Remove program-name from eprintf-call. This is done in the eprintf-
function itself when the DEBUG-define is set.
We'll activate it by default later.
4) Add empty line before return.
After the audit, I had this noted down as a TODO-item, but
considered the function to be tested enough to hold the line
until I came to rewrite it.
Admittedly, I didn't take a closer look at the previous loop
and there probably were some edge-cases which caused trouble, but
so far so good, the new version of this commit should be safe
and considered audited.
1) Refactor the manpage with num-options, optimize wording to be more
concise and to the point, pid also specifies process groups.
2) Make int sig const.
3) Remove prototypes.
4) /* not reached */ consistency.
5) Refactor usage() with eprintf.
6) Refactor arg-parser with a switch, use estrtonum
7) Use return instead of exit() in main()
8) argc-argv-correctness.
1) Use num-wording in the manpage, remove offensive remark against
the beloved -num-syntax <3.
2) Style changes.
3) Report errors of getline.
4) argv-argc-centric argument loop.
5) Rename r to ret for consistency.
We just take the raw argument list as is. Using arg.h, arguments
beginning with - would have been "eaten up".
Writing a special "bailout" for arg.h was not a good option,
not because it's not impossible (done in 6 LOC), but because it
is a shoehorning around a corner case present for a few programs
which are broken by design by POSIX.
1) Any path passed to mkdir -p beginning with '/' failed, because
it would cut out the first '/' immediately, passing "" to mkdir.
2) Running mkdir -p with a path/to/dir without trailing '/' would
not create the directory.
This is due to a wrong flag-check I added in the main-loop.
It should now work as expected.
3) With the p-flag given, don't report an error in case the last
dir also exists.
For loop detection, a history is mandatory. In the process of also
adding a flexible struct to recurse, the recurse-definition was moved
to fs.h.
The motivation behind the struct is to allow easy extensions to the
recurse-function without having to change the prototypes of all
functions in the process.
Adding flags is really simple as well now.
Using the recursor-struct, it's also easier to see which defaults
apply to a program (for instance, which type of follow, ...).
Another change was to add proper stat-lstat-usage in recurse. It
was wrong before.
While auditing du(1) I realized that there's no way the over 100 lines
of procedures in du() would pass the audit.
Instead, I decided to rewrite this section using recurse() from libutil.
However, the issue was that you'd need some kind of payload to count
the number of bytes in the subdirectories and use them in the higher
hierarchies.
The solution is to add a "void *data" data pointer to each recurse-
function-prototype, which we might also be able to use in other
recurse-applications.
recurse() itself had to be augmented with a recurse_samedev-flag, which
basically prevents recurse from leaving the current device.
Now, let's take a closer look at the audit:
1) Removing the now unnecessary util-functions push, pop, xrealpath,
rename print() to printpath(), localize some global variables.
2) Only pass the block count to nblks instead of the entire stat-
pointer.
3) Fix estrtonum to use the minimum of LLONG_MAX and SIZE_MAX.
4) Use idiomatic argv+argc-loop
5) Report proper exit-status.
1) Add check to parselist() to warn about an empty list.
2) Remove all "cut: "-prefixes from error-messages and other style
changes.
3) != -1 --> >= 0 and check for ferror on fp after getline.
4) Update usage with argv0.
5) argv-centric loop refactor
6) Properly report exit-status.
7) Add empty line before return.
1) Use the LIMIT()-macro in util.h instead of defining our own.
2) Drop nextline() and finish(), not needed anymore. Use
fputs in printline instead of printf.
--> BUGFIX: Finish exited with status 1, but actually should
exit with status 0 if ferror(f) == 0.
3) Don't use /dev/fd/0 and use idiomatic <stdin> and fp = stdin
instead.
4) Refactor loop to use getline() instead of some handrolled
nextline-function.
--> BUGFIX: Line-length was limited to LINE_MAX before, now
it's factually unlimited.
5) Combine diff >= 0 and diff <= 0 into one loop with a beginning
continue-condition (diff && i == (diff < 0)).
6) BUGFIX: If diff == 0, don't print one buffer after EOFing on the
other.
1) Remove the return-value-enum, which is not necessary for a simple
program like this.
2) Don't disallow both l and s to be specified. This is undefined
behaviour defined by POSIX, so we don't start demanding things
from the user.
3) Replace exit() with return (we are in main).
4) Refactor main loop to never return in the loop, but actually
set the same-value and break, which increases readability.
5) Remove the final fclose()'s. The OS will take care of them, no
need to become cleansy here.
6) Use idiomatic return-value using same. This concludes the
increase of readability in the main-loop.
After a short correspondence with Otto Moerbeek it turned out
mallocarray() is only in the OpenBSD-Kernel, because the kernel-
malloc doesn't have realloc.
Userspace applications should rather use reallocarray with an
explicit NULL-pointer.
Assuming reallocarray() will become available in c-stdlibs in the
next few years, we nip mallocarray() in the bud to allow an easy
transition to a system-provided version when the day comes.