Subject: sorting with -u and -k
From: Joost Helberg <joost at s n o w . n l>
To: mike at g n u . o r g, eggert at t w i n s u n . c o m
Date: Thu, 24 Jan 2008 13:50:56 +0100 (CET)
X-Mailer: Mew version 5.2 on Emacs 23.0.0 / Mule 6.0 (HANACHIRUSATO)
Hi,
I'm not sure whether I tripped over a bug in sort or my mis-expectation of
it's behaviour.
In coreutils-6.9, sort.c contains (from line 2078 onwards):
/* If uniquified output is turned on, output only the first of
an identical series of lines. */
if (unique)
{
if (savedline && compare (savedline, smallest))
{
savedline = NULL;
write_bytes (saved.text, saved.length, ofp, output_file);
}
if (!savedline)
{
The function `compare' is called to tell whether to output a line in
case -u is used.
This function `compare' compares keys first in case -k was used.
This means that, given options '-t: -k2 -k3' the following lines
compare equal:
foo:1:2
oof:1:2
hence, the oof-line doesn't appear in case -u was used.
This strikes me as a surprise, I expected that these two lines would
compare as non-equal. I cannot find any explicit reference in the
manual indicating this as correct behaviour (only one example is
given: -n).
Please consider my arguments and please let me know why I'm wrong (or
not of course). I'm happy to write code which implements my
expectations.
Many regards,
Joost, CTO Snow B.V.
--
Snow B.V. http://snow.nl Tel 0418-653333 Fax 0418-653666