Section 4.3. Sorting Files: sort


[Page 113 (continued)]

4.3. Sorting Files: sort

The sort utility sorts a file in ascending or descending order based on one or more sort fields, and works as described in Figure 4-7.

Figure 4-7. Description of the sort command.

Utility: sort -tc -r { sortField -bfMn }* { fileName }*

sort is a utility that sorts lines in one or more files based on a sorting criterion. By default, lines are sorted into ascending order. The -r option specifies descending order instead. Input lines are split into fields separated by spaces and/or tabs. To specify a different character for the field separator, use the -t option. By default, all of a line's fields are considered when the sort is being performed. This may be overridden by specifying one or more sort fields, whose format is described later in this section. Individual sort fields may be customized by following them by one or more options. The -f option causes sort to ignore the case of the field. The -M option sorts the field in month order. The -n option sorts the field in numeric order. The -b option ignores leading spaces.



[Page 114]

Individual fields are ordered lexicographically, which means that corresponding characters are compared based on their ASCII value (see man ascii for a list of all characters and their corresponding values). Two consequences of this are that an uppercase letter is "less" than its lowercase equivalent, and a space is "less" than a letter. In the following example, I sorted a text file in ascending order and descending order using the default ordering rule:

$ cat sortfile          ...list the file to be sorted. jan  Start chapter 3  10th Jan  Start chapter 1  30th  Jan  Start chapter 5  23rd  Jan  End chapter 3  23rd Mar  Start chapter 7  27  may  End chapter 7  17th Apr  End Chapter 5  1  Feb  End chapter 1  14 $ sort sortfile           ...sort it.  Feb  End chapter 1  14  Jan  End chapter 3  23rd  Jan  Start chapter 5  23rd  may  End chapter 7  17th Apr  End Chapter 5  1 Jan  Start chapter 1  30th Mar  Start chapter 7  27 jan  Start chapter 3  10th $ sort -r sortfile       ...sort it in reverse order. jan  Start chapter 3  10th Mar  Start chapter 7  27 Jan  Start chapter 1  30th Apr  End Chapter 5  1  may  End chapter 7  17th  Jan  Start chapter 5  23rd  Jan  End chapter 3  23rd  Feb  End chapter 1  14 $ _ 


To sort on a particular field, you must specify the starting field number using a + prefix, followed by the noninclusive stop field number using a - prefix. Field numbers start at index 0. If you leave off the stop field number, all fields following the start field are included. In the next example, I sorted the same text file on the first field only, which is number zero:

$ sort +0 -1 sortfile        ...sort on first field only.  Feb  End chapter 1  14  Jan  End chapter 3  23rd  Jan  Start chapter 5  23rd  may  End chapter 7  17th 
[Page 115]
Apr End Chapter 5 1 Jan Start chapter 1 30th Mar Start chapter 7 27 jan Start chapter 3 10th $ _


Note that the leading spaces were counted as being part of the first field, which resulted in a strange sorting sequence. Additionally, I would have preferred the months to be sorted in correct order, with "Jan" before "Feb", etc. The -b option ignores leading blanks and the -M option sorts a field based on a month order. Here's an example that worked better:

 $ sort +0 -1 -bM sortfile       ...sort on first month.  Jan  End chapter 3  23rd  Jan  Start chapter 5  23rd Jan  Start chapter 1  30th jan  Start chapter 3  10th  Feb  End chapter 1  14 Mar  Start chapter 7  27 Apr  End Chapter 5  1   may End chapter 7  17th $ _ 


The example text file was correctly sorted by month, but the dates were still out of order. You may specify multiple sort fields on the command line to deal with this problem. The sort utility first sorts all of the lines based on the first sort specifier, and then uses the second sort specifier to order lines that compared equally by the first specifier. Therefore, to sort the example text file by month and date, it had to be sorted based on the first field and then the fifth. In addition, the fifth field had to be sorted numerically by using the -n option.

$ sort +0 -1 -bM +4 -n sortfile jan  Start chapter 3  10th   Jan End chapter 3  23rd   Jan Start chapter 5  23rd Jan  Start chapter 1  30th   Feb End chapter 1  14 Mar  Start chapter 7  27 Apr  End Chapter 5  1   may End chapter 7  17th $ _ 


Characters other than spaces often delimit fields. For example, the "/etc/passwd" file contains user information stored in fields separated by colons. You may use the -t option to specify an alternative field separator. In the following example, I sorted a file based on fields separated by : characters.


[Page 116]

$ cat sortfile2           ...look at the test file. jan:Start chapter 3:10th Jan:Start chapter 1:30th Jan:Start chapter 5:23rd Jan:End chapter 3:23rd Mar:Start chapter 7:27 may:End chapter 7:17th Apr:End Chapter 5:1 Feb:End chapter 1:14 $ sort -t: +0 -1 -bM +2 -n sortfile2      ...colon delimiters. jan:Start chapter 3:10th Jan:End chapter 3:23rd Jan:Start chapter 5:23rd Jan:Start chapter 1:30th Feb:End chapter 1:14 Mar:Start chapter 7:27 Apr:End Chapter 5:1 may:End chapter 7:17th $ _ 


sort contains several other options that are too detailed to describe here; I suggest that you use the man utility to find out more about them.




Linux for Programmers and Users
Linux for Programmers and Users
ISBN: 0131857487
EAN: 2147483647
Year: 2007
Pages: 339

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net