Commit Graph

65 Commits

Author SHA1 Message Date
FRIGN
eb9bda8787 Support NUL-containing lines in sort(1)
For sort(1) we need memmem(), which I imported from OpenBSD.
Inside sort(1), the changes involved working with the explicit lengths
given by getlines() earlier and rewriting some of the functions.

Now we can handle NUL-characters in the input just fine.
2016-03-10 08:48:09 +00:00
pekka.jylha.ollila@gmail.com
fad1d35357 Add -d, -f and -i flags to sort(1)
Here's the patch with updated manpage and usage().
2016-02-16 09:56:48 +00:00
sin
2366164de7 No need for semicolon after ARGEND
This is also the style used in Plan 9.
2015-11-01 10:18:55 +00:00
FRIGN
51390a3c51 Audit sort(1) and mark it as finished
1) Remove the function prototypes. No need for them, as the
   functions are ordered.
2) Add fieldseplen, so the length of the field-separator is not
   calculated nearly each time skipcolumn() is called.
3) rename next_col to skip_to_next_col so the purpose is clear,
   also reorder the conditional accordingly.
4) Put parentheses around certain ternary expressions.
5) BUGFIX: Don't just exit() in check(), but make it return something,
   so we can cleanly fshut() everything.
6) OFF-POSIX: Posix for no apparent reason does not allow more than
   one file when the -c or -C flags are given.
   This can be problematic when you want to check multiple files.
   With the change 5), rewriting check() to return a value, I went
   off-posix after discussing this with Dimitris to just allow
   arbitrary numbers of files. Obviously, this does not break scripts
   and is convenient for everybody who wants to quickly check a big
   amount of files.
   As soon as 1 file is "unsorted", the return value is 1, as expected.
   For convenience reasons, check()'s warning now includes the filename.
7) BUGFIX: Set ret to 2 instead of 1 when the fshut(fp, *argv) fails.
8) BUGFIX: Don't forget to fshut stderr at the end. This would improperly
   return 1 in the following case:
   $ sort -c unsorted_file 2> /dev/full
9) Other style changes, line length, empty line before return.
2015-08-04 12:08:13 +01:00
FRIGN
e153447657 Make sort(1) utf-compliant and update README
Make it clear that <blank> characters just are spaces or tabs and
not a special group which needs special treatment for wide characters.

Also, and that was the only problem here, correctly calculate the
offset given by the key definitions for the start- and end-characters
using libutf-utility-functions.

Mark the progress in the README and put parentheses around the missing
flags which are insane to implement for no real gain.
2015-08-03 19:14:52 +01:00
FRIGN
1622089a21 Reorder functions in sort(1)
I kind of missed that the sorting was still not properly done.
parse_flags() and addkeydef() are independent of everything else,
so they can be put at the bottom.
Sorting the other functions reveals the true hierarchy much better.
2015-08-03 10:00:00 +01:00
FRIGN
61ee561728 Factor out parse_keydef() into addkeydef() and reorder functions
Add a small comment explaining the data-structure and sort the
functions according to usage, not alphabetically.
2015-08-03 10:00:00 +01:00
FRIGN
e00cdf226a Use queue.h-macros in sort(1)
This is much easier to read than having yet another handrolled
list implementation.
Tested and more or less clearly equivalent.

Now that I have uni-vac, I'll have enough time to refactor more.
2015-08-02 23:32:17 +01:00
FRIGN
d23cc72490 Simplify return & fshut() logic
Get rid of the !!()-constructs and use ret where available (or introduce it).

In some cases, there would be an "abort" on the first fshut-error, but we want
to close all files and report all warnings and then quit, not just the warning
for the first file.
2015-05-26 16:41:43 +01:00
FRIGN
9a074144c9 Remove handrolled strcmp()'s
Favor readability over bare-metal.
2015-05-21 15:43:38 +01:00
FRIGN
0545d32ce9 Handle '-' consistently
In general, POSIX does not define /dev/std{in, out, err} because it
does not want to depend on the dev-filesystem.
For utilities, it thus introduced the '-'-keyword to denote standard
input (and output in some cases) and the programs have to deal with
it accordingly.

Sadly, the design of many tools doesn't allow strict shell-redirections
and many scripts don't even use this feature when possible.

Thus, we made the decision to implement it consistently across all
tools where it makes sense (namely those which read files).

Along the way, I spotted some behavioural bugs in libutil/crypt.c and
others where it was forgotten to fshut the files after use.
2015-05-16 13:34:00 +01:00
Hiltjo Posthuma
72250324b1 sort: reuse buffer in columns()
speeds up sorting for huge input aswell.
2015-05-07 18:18:35 +01:00
Jakob Kramer
403b047a30 sort: allow keys where start_col > end_col
Useful in (rare) cases like:

	$ printf 'aaaa c\nx a\n0 b\n' | sort -k 2,1.3

And this is how POSIX wants it.
2015-04-06 17:15:54 +01:00
Jakob Kramer
061932a31b sort: allow 0 as key's end_char 2015-04-06 17:15:54 +01:00
Jakob Kramer
bddb7200b8 sort: apply -b only to "custom" keys 2015-04-06 17:15:54 +01:00
Jakob Kramer
2d9d224a1b sort: add support for delimiter strings
Instead of just single characters.  This also fixes
some bugs in columns().  Example bug:

	$ printf "a b\nc b x\n" | sort -k 2,2 -k 1,1
2015-04-06 17:15:54 +01:00
FRIGN
11e2d472bf Add *fshut() functions to properly flush file streams
This has been a known issue for a long time. Example:

printf "word" > /dev/full

wouldn't report there's not enough space on the device.
This is due to the fact that every libc has internal buffers
for stdout which store fragments of written data until they reach
a certain size or on some callback to flush them all at once to the
kernel.
You can force the libc to flush them with fflush(). In case flushing
fails, you can check the return value of fflush() and report an error.

However, previously, sbase didn't have such checks and without fflush(),
the libc silently flushes the buffers on exit without checking the errors.
No offense, but there's no way for the libc to report errors in the exit-
condition.

GNU coreutils solve this by having onexit-callbacks to handle the flushing
and report issues, but they have obvious deficiencies.
After long discussions on IRC, we came to the conclusion that checking the
return value of every io-function would be a bit too much, and having a
general-purpose fclose-wrapper would be the best way to go.

It turned out that fclose() alone is not enough to detect errors. The right
way to do it is to fflush() + check ferror on the fp and then to a fclose().
This is what fshut does and that's how it's done before each return.
The return value is obviously affected, reporting an error in case a flush
or close failed, but also when reading failed for some reason, the error-
state is caught.

the !!( ... + ...) construction is used to call all functions inside the
brackets and not "terminating" on the first.
We want errors to be reported, but there's no reason to stop flushing buffers
when one other file buffer has issues.
Obviously, functionales come before the flush and ret-logic comes after to
prevent early exits as well without reporting warnings if there are any.

One more advantage of fshut() is that it is even able to report errors
on obscure NFS-setups which the other coreutils are unable to detect,
because they only check the return-value of fflush() and fclose(),
not ferror() as well.
2015-04-05 09:13:56 +01:00
FRIGN
9144d51594 Check getline()-return-values properly
It's not useful when 0 is returned anyway, so be sure that we have a
string with length > 0, this also solves some indexing-gotchas like
"len - 1" and so on.
Also, add checked getline()'s whenever it has been forgotten and
clean up the error-messages.
2015-03-27 14:49:48 +01:00
FRIGN
df8529f0a1 Fix syntax error in sort(1)
Somehow went unnoticed...
2015-03-23 20:30:07 +01:00
FRIGN
49e27c1b0c Add -m and -o flags to sort(1)
Sort comes pretty much automatically, as no script relies on the
undefined behaviour of the input _not_ being sorted, we might as well
sort the sorted input already.
The only downside is memory usage, which can be an issue for large
files.
The o-flag was trivial to implement.
2015-03-22 23:39:48 +01:00
Hiltjo Posthuma
ad6776e9a1 grep, kill, renice, sort: style: put main at bottom 2015-03-08 12:51:33 +01:00
Hiltjo Posthuma
31f0624f3d code-style: minor cleanup and nitpicking 2015-02-20 13:29:38 +01:00
FRIGN
31572c8b0e Clean up #includes 2015-02-14 21:12:23 +01:00
Jakob Kramer
0fcad66c75 make use of en*alloc functions 2015-02-11 01:17:21 +00:00
Jakob Kramer
4769b47dd7 Use size_t for number of lines in linebuf
.nlines and .capacity are used as array indices and
should therefore be of type size_t.
2015-01-31 22:49:43 +00:00
Jakob Kramer
572ad27110 sort: support sorting decimal numbers correctly
sorry not to have used strtold from the beginning
2015-01-31 19:19:55 +00:00
sin
153b8428b1 Nuke another freelist() 2014-12-16 21:02:03 +00:00
Michael Forney
cb427d553a sort: Implement -c and -C flags 2014-11-23 19:42:14 +00:00
FRIGN
1436518f9d Use < 0 instead of == -1 2014-11-19 20:09:29 +00:00
FRIGN
7fc5856e64 Tweak NULL-pointer checks
Use !p and p when comparing pointers as opposed to explicit
checks against NULL.  This is generally easier to read.
2014-11-14 10:54:30 +00:00
FRIGN
ec8246bbc6 Un-boolify sbase
It actually makes the binaries smaller, the code easier to read
(gems like "val == true", "val == false" are gone) and actually
predictable in the sense of that we actually know what we're
working with (one bitwise operator was quite adventurous and
should now be fixed).

This is also more consistent with the other suckless projects
around which don't use boolean types.
2014-11-14 10:54:20 +00:00
FRIGN
eee98ed3a4 Fix coding style
It was about damn time. Consistency is very important in such a
big codebase.
2014-11-13 18:08:43 +00:00
sin
0c5b7b9155 Stop using EXIT_{SUCCESS,FAILURE} 2014-10-02 23:46:59 +01:00
sin
b712ef44ad Fix warning 'array subscript of type char' 2014-09-02 13:32:32 +01:00
Jakob Kramer
7d1fd2621e add -t flag to sort 2014-06-02 13:35:59 +01:00
Jakob Kramer
9366f48b1f sort: simplify linecmp, rename curr => tail 2014-05-06 18:01:44 +01:00
Jakob Kramer
6f7e9a5078 sort: add support for "per-keydef" flags 2014-05-06 16:21:50 +01:00
Jakob Kramer
109e8963f5 sort: ignore trailing newline while sorting 2014-05-06 16:21:45 +01:00
Jakob Kramer
0723c8d32e sort: work with signed integers as well 2014-05-06 16:21:39 +01:00
Jakob Kramer
814b04e710 sort: document -b 2014-05-04 00:16:24 +01:00
Jakob Kramer
a2da9edb99 sort: simplify skip_columns 2014-05-04 00:15:57 +01:00
Jakob Kramer
d965985a52 sort: add -b flag; don't use it as default 2014-05-04 00:15:46 +01:00
Jakob Kramer
a62a2197a8 sort: don't evaluate if clause
this fixes that you could specify a key
definition like "-k 1.2.3", which is incorrect.
2014-05-04 00:15:33 +01:00
Jakob Kramer
e535e8d88a sort: replace loop with MIN() 2014-05-04 00:15:23 +01:00
Jakob Kramer
56e1616486 sort: remove 'rest' variable 2014-05-04 00:15:10 +01:00
Jakob Kramer
56b9a26de9 sort: don't repeat skipping columns logic 2014-05-04 00:14:58 +01:00
Jakob Kramer
c4e5354a32 sort: linebuf is no global 2014-05-04 00:14:19 +01:00
Jakob Kramer
0bc6b1377b sort: readability; check strndup return value 2014-04-30 15:17:01 +01:00
Hiltjo Posthuma
a1b62b2282 sort: style, explicitly state type
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
2014-04-23 15:25:21 +01:00
sin
1d5663672e Minor style changes to sort 2014-04-18 17:24:59 +01:00