Commit Graph

81 Commits

Author SHA1 Message Date
Eivind Uggedal
3c2f8b0cfa tar: don't change modes for hardlinks on extraction
Changing timestamps, modes and ownership of hardlinks
makes no sense.
2016-02-15 09:41:58 +00:00
Quentin Rameau
6e7743eb56 Cleanup usage() across sbase
Some tools didn't use argv0 for tool name, or usage() at all.
2015-12-21 18:07:25 +00:00
Brad Barden
211c565b3d tar: improve file status handling on extract
by re-ordering when chmod/chown is done, only a list of directories (not
all files) need be kept for fixing mtime.

this also fixes an issue where set-user-id files in a tar may not work. chmod
is done before chown and before the file is written. if ownership changes, or
the file is being written as a normal user, the setuid bit would be cleared.

also fixes ownership of symbolic links. previously a chown() was called,
which would change the ownership of the link target. lchown() is now
used for symbolic links.

renamed all ent, ent* functions to dir* as it better describes what they
do.

use timespec/utimensat instead of timeval/utimes to get AT_SYMLINK_NOFOLLOW
2015-11-20 09:58:38 +00:00
Brad Barden
85a9254d3a tar: extract creation mode
mode for newly-created files should be restrictive. chmod is always
called soon after to set correct mode from the archive.
2015-11-20 09:58:38 +00:00
sin
2366164de7 No need for semicolon after ARGEND
This is also the style used in Plan 9.
2015-11-01 10:18:55 +00:00
sin
f14a896891 Stop defining major()/minor() and makedev()
Rely on what the system provides.  These are not standardized macros
but any relevant UNIX system will provide them.

We can revisit this in the future if something breaks.
2015-10-04 16:39:28 +01:00
Hiltjo Posthuma
590f34c4a9 tar: compatibility, treat reserved type as regular file
References:
- http://www.gnu.org/software/tar/manual/html_node/Standard.html
- http://pubs.opengroup.org/onlinepubs/009695399/basedefs/tar.h.html
2015-05-10 12:58:38 +01:00
Hiltjo Posthuma
9d95321f0b tar: compatibility, allow empty magic aswell 2015-05-10 12:58:38 +01:00
Hiltjo Posthuma
d41095299a tar: ignore more crazy GNU PAX header crap
don't fail, but maybe we should give a warning for this?
2015-05-08 21:36:40 +01:00
Hiltjo Posthuma
deb8a16527 tar: fix checksum calculation (signed/unsigned issue)
some archives gave the error: "malformed tar archive"

test file where this occurred:
   http://nl.alpinelinux.org/alpine/v3.1/main/x86_64/apk-tools-static-2.5.0_rc1-r0.apk
2015-05-08 21:36:40 +01:00
Hiltjo Posthuma
1d9d17eba2 tar: add support for compressing with an external tool
... and add xz, compress and lzma as options
2015-05-08 15:56:20 +01:00
Hiltjo Posthuma
7dff7d4c83 tar: add verbose flag (-v) 2015-05-08 15:56:20 +01:00
FRIGN
5a52154a47 tar: Fix remove(3)-error-message 2015-04-24 10:09:26 +01:00
sin
fbd39af1fa tar: Minor style fixes 2015-04-24 10:09:25 +01:00
sin
ac694e6c4a tar: Add routine to test if the tar archive is "legit" 2015-04-23 16:34:12 +01:00
sin
74f680948e tar: Rename field to better match spec 2015-04-23 16:34:12 +01:00
sin
624bf64ac5 tar: Use more conventional name for iterator 2015-04-23 16:34:12 +01:00
sin
76eb6bdf42 tar: Match type like everywhere else 2015-04-23 15:32:00 +01:00
sin
d1e19e972a tar: No need to zero-fill variables with global storage 2015-04-23 15:32:00 +01:00
sin
ab267a87eb tar: Use raw I/O instead of standard file streams
As part of refactoring tar to add support for compression through
gzip/bzip2, it makes sense to avoid intermixing file stream I/O with
raw I/O.
2015-04-23 15:32:00 +01:00
sin
fb1595a69c tar: Use remove() instead of unlink() so we can work on directories too 2015-04-23 15:32:00 +01:00
sin
3419f94d83 tar: Staticise symbols 2015-04-23 12:38:53 +01:00
sin
ce4a10abe7 tar: Apply mtime at the end otherwise it gets reverted
Consider the following scenario:

1) create a/
2) apply mtime to a/
3) create a/b # reverts mtime on a

TODO: utimes() does not work on symlinks.
2015-04-23 12:37:39 +01:00
sin
aab2e273bd tar: Add skipblk() and simplify code 2015-04-23 12:37:38 +01:00
sin
201e71be2b tar: Skip over git's global pax header crap 2015-04-23 00:05:23 +01:00
sin
7a0d9fb3ea tar: Skip over data before processing the next entry
When we selectively process entries from the archive, ensure that
we jump over the data section of each uninteresting entry before going
on to process the next entry.  Not doing so, leaves the file stream
pointer in the wrong place.
2015-04-22 23:24:39 +01:00
sin
0925bf95ac tar: Cast to proper type, no functional change 2015-04-21 16:20:31 +01:00
sin
22c0ae67a4 tar: Don't error out if we can't pull pw/gr entries 2015-04-21 16:18:46 +01:00
sin
e6c532a47a tar: Briefly update manpage and usage for the latest changes 2015-04-21 15:43:52 +01:00
sin
258d0793ac tar: Allow extracting only a given list of files
tar -xf foo.tar a b c
2015-04-21 15:43:52 +01:00
sin
fde9e29d05 tar: Don't assume that name, linkname and prefix are null-terminated 2015-04-21 15:43:52 +01:00
sin
f1261b57d9 Add support to tar multiple files in a single run 2015-04-21 15:43:52 +01:00
sin
542f645bc2 Convert chown() failure to a warning in tar(1)
This particular change does not have any immediate shortcomings.
We still print a warning to alert the user.

Exiting on a chown() failure caused problems when untarring
inside a restricted user namespace on Linux where the uid/gid
mappings were limited.
2015-04-21 09:17:26 +01:00
sin
b9d60bee87 Move mkdirp() to libutil 2015-04-20 18:04:08 +01:00
Hiltjo Posthuma
9c03736696 tar: error if strdup fails (estrdup) 2015-04-20 17:32:54 +01:00
sin
71eeb21feb Use strtol() instead of strtoul() in tar(1) 2015-04-20 16:36:03 +01:00
sin
97905f6991 Fix tar(1) handling of archives with improper internal order
Not all archives are packed in such way to be generated without
having to recursively generate the output path.

For now, reuse the function from mkdir.c and later move it to
libutil.
2015-04-20 16:36:03 +01:00
sin
3ef6d4e4c9 Fix tar(1) handling of <space> terminated fields
Numeric fields can be <space> terminated.  Ensure those are
patched with NULs so we can perform string operations.

There is more work to be done in this area, namely some fields like
name, linkname and prefix are not always null-terminated.
2015-04-20 16:36:03 +01:00
FRIGN
7b2465c101 Add maxdepth to recurse()
This also makes more sense.
2015-04-20 11:12:40 +01:00
FRIGN
11e2d472bf Add *fshut() functions to properly flush file streams
This has been a known issue for a long time. Example:

printf "word" > /dev/full

wouldn't report there's not enough space on the device.
This is due to the fact that every libc has internal buffers
for stdout which store fragments of written data until they reach
a certain size or on some callback to flush them all at once to the
kernel.
You can force the libc to flush them with fflush(). In case flushing
fails, you can check the return value of fflush() and report an error.

However, previously, sbase didn't have such checks and without fflush(),
the libc silently flushes the buffers on exit without checking the errors.
No offense, but there's no way for the libc to report errors in the exit-
condition.

GNU coreutils solve this by having onexit-callbacks to handle the flushing
and report issues, but they have obvious deficiencies.
After long discussions on IRC, we came to the conclusion that checking the
return value of every io-function would be a bit too much, and having a
general-purpose fclose-wrapper would be the best way to go.

It turned out that fclose() alone is not enough to detect errors. The right
way to do it is to fflush() + check ferror on the fp and then to a fclose().
This is what fshut does and that's how it's done before each return.
The return value is obviously affected, reporting an error in case a flush
or close failed, but also when reading failed for some reason, the error-
state is caught.

the !!( ... + ...) construction is used to call all functions inside the
brackets and not "terminating" on the first.
We want errors to be reported, but there's no reason to stop flushing buffers
when one other file buffer has issues.
Obviously, functionales come before the flush and ret-logic comes after to
prevent early exits as well without reporting warnings if there are any.

One more advantage of fshut() is that it is even able to report errors
on obscure NFS-setups which the other coreutils are unable to detect,
because they only check the return-value of fflush() and fclose(),
not ferror() as well.
2015-04-05 09:13:56 +01:00
FRIGN
1f0f1dd320 Show usage() when filtermode is given for tar-creation
We only allow decompression for extraction. Thus, it may be confusing
for the user and break scripts silently when the j- or z-flag are given
even though this is not supported.
2015-03-21 14:04:49 +01:00
FRIGN
b6b977f63d Audit tar(1), add DIRFIRST-flag to recurse()
I've been wanting to do this for a while now, as tar(1) used to
be one of messiest and cruftiest tools.
First off, before walking through the audit, I'll talk about
what the DIRFIRST-flag for recurse() does.
It basically calls fn() on the first-level-dir before calling
it's subentries. It's necessary here, because else the order
of the tar-files would've been wrong (it would try to create
dir/file before creating dir/).

Now, to the audit:
1)  Update manpage, fix mistake that compression is also available
    for compressing. It's only available for extracting.
2)  Define the major, minor and makedev macros from glibc by ourselves.
    No need to rely on them, as they are common sense.

decomp()
3)  Simple refactorization.

putoctal()
4)  Add a truncation check for snprintf().

archive()
5)  BUGFIX: Add checks to any checkable function, don't blindly call
    them, this is harmful and there are 100 ways to exploit that.
6)  Use estrlcpy() instead of snprintf() wherever possible, fix
    alignment.
7)  BUGFIX: Terminate the result-buffer of readlink(), check if
    it even succeeded.
8)  Fix sizeof()-formatting.

unarchive()
9)  BUGFIX: Add checks to any checkable function, don't blindly call
    them, this is harmful and there are 100 ways to exploit that.
10) BUGFIX: strtoul can happily return negative numbers. Add checks
    for that and also if the full string has been processed.
11) Remove calls to perror(). We have eprintf, use it.
12) BUGFIX: "minor = strtoul(h->mode, 0, 8);". We need h->minor of
    course.
13) Fix typo "usupported", remove fprintf-call.

print()
14) Check fread().

xt()
15) Get rid of snprintf-magic. Use estrlcat().
16) BUGFIX: check for ferror() on the tarfile.

usage()
17) Update it. The old usage() was like 1000 years old.

main()
18) Add DIRFIRST-flag to the recursor.
19) Don't print usage() when a mode is re-set. We allow this in
    general.
20) Add function checks and fix error messages.
21) Add tarfilename-global for proper error-messages.
2015-03-21 01:30:47 +01:00
FRIGN
3111908b03 Refactor recurse() again
Okay, why yet another recurse()-refactor?
The last one added the recursor-struct, which simplified things
on the user-end, but there was still one thing that bugged me a lot:
Previously, all fn()'s were forced to (l)stat the paths themselves.
This does not work well when you try to keep up with H-, L- and P-
flags at the same time, as each utility-function would have to set
the right function-pointer for (l)stat every single time.

This is not desirable. Furthermore, recurse should be easy to use
and not involve trouble finding the right (l)stat-function to do it
right.
So, what we needed was a stat-argument for each fn(), so it is
directly accessible. This was impossible to do though when the
fn()'s are still directly called by the programs to "start" the
recurse.
Thus, the fundamental change is to make recurse() the function to
go, while designing the fn()'s in a way they can "live" with st
being NULL (we don't want a null-pointer-deref).

What you can see in this commit is the result of this work. Why
all this trouble instead of using nftw?
The special thing about recurse() is that you tell the function
when to recurse() in your fn(). You don't need special flags to
tell nftw() to skip the subtree, just to give an example.

The only single downside to this is that now, you are not allowed
to unconditionally call recurse() from your fn(). It has to be
a directory.
However, that is a cost I think is easily weighed up by the
advantages.

Another thing is the history: I added a procedure at the end of
the outmost recurse to free the history. This way we don't leak
memory.

A simple optimization on the side:

-		if (h->dev == st.st_dev && h->ino == st.st_ino)
+		if (h->ino == st.st_ino && h->dev == st.st_dev)

First compare the likely difference in inode-numbers instead of
checking the unlikely condition that the device-numbers are
different.
2015-03-19 01:08:19 +01:00
FRIGN
9fd4a745f8 Add history and config-struct to recurse
For loop detection, a history is mandatory. In the process of also
adding a flexible struct to recurse, the recurse-definition was moved
to fs.h.
The motivation behind the struct is to allow easy extensions to the
recurse-function without having to change the prototypes of all
functions in the process.
Adding flags is really simple as well now.

Using the recursor-struct, it's also easier to see which defaults
apply to a program (for instance, which type of follow, ...).

Another change was to add proper stat-lstat-usage in recurse. It
was wrong before.
2015-03-13 00:29:48 +01:00
FRIGN
01de5df8e6 Audit du(1) and refactor recurse()
While auditing du(1) I realized that there's no way the over 100 lines
of procedures in du() would pass the audit.
Instead, I decided to rewrite this section using recurse() from libutil.
However, the issue was that you'd need some kind of payload to count
the number of bytes in the subdirectories and use them in the higher
hierarchies.
The solution is to add a "void *data" data pointer to each recurse-
function-prototype, which we might also be able to use in other
recurse-applications.
recurse() itself had to be augmented with a recurse_samedev-flag, which
basically prevents recurse from leaving the current device.

Now, let's take a closer look at the audit:
1) Removing the now unnecessary util-functions push, pop, xrealpath,
   rename print() to printpath(), localize some global variables.
2) Only pass the block count to nblks instead of the entire stat-
   pointer.
3) Fix estrtonum to use the minimum of LLONG_MAX and SIZE_MAX.
4) Use idiomatic argv+argc-loop
5) Report proper exit-status.
2015-03-11 23:21:52 +01:00
Hiltjo Posthuma
066a0306a1 fork: no need to _exit() on the error case 2015-03-10 20:05:18 +01:00
FRIGN
a8bd21c0ab Use switch with fork()
Allows dropping a local variable if the explicit PID is not needed
and it makes it clearer what happens.
Also, one should always strive for consistency for cases like these.
2015-03-09 15:01:29 +01:00
FRIGN
6f207dac5f Don't return but _exit after failed exec*() and fork()
Quoting POSIX[0]:
"Care should be taken, also, to call _exit() rather than exit() if exec cannot be used, since
exit() flushes and closes standard I/O channels, thereby damaging the parent process' standard
I/O data structures. (Even with fork(), it is wrong to call exit(), since buffered data would
then be flushed twice.)"

[0]: http://pubs.opengroup.org/onlinepubs/009695399/functions/vfork.html
2015-03-09 01:12:59 +01:00
FRIGN
c8f2b068f6 Fix segmentation fault in tar(1) 2015-03-03 11:26:59 +01:00
FRIGN
8dc92fbd6c Refactor enmasse() and recurse() to reflect depth
The HLP-changes to sbase have been a great addition of functionality,
but they kind of "polluted" the enmasse() and recurse() prototypes.
As this will come in handy in the future, knowing at which "depth"
you are inside a recursing function is an important functionality.

Instead of having a special HLP-flag passed to enmasse, each sub-
function needs to provide it on its own and can calculate results
based on the current depth (for instance, 'H' implies 'P' at
depth > 0).
A special case is recurse(), because it actually depends on the
follow-type. A new flag "recurse_follow" brings consistency into
what used to be spread across different naming conventions (fflag,
HLP_flag, ...).

This also fixes numerous bugs with the behaviour of HLP in the
tools using it.
2015-03-02 22:50:38 +01:00