- Awk uses the $ sign extensively in its internal language, so it is common practice to use single-quotes ('....') to escape awk "statements to protext them from the shell.
- Awk's default "field seperator" is a collection of blanks, i.e. spaces and tabs.
We see that the file size is always listed in the 5th column. So lets print just the sizes:root@linwarg:/etc# ls -l total 1996 drwxr-xr-x 8 root root 4096 2007-04-16 10:03 acpi -rw-r--r-- 1 root root 2077 2006-09-09 21:08 adduser.conf -rw-r--r-- 1 root root 46 2007-05-26 15:24 adjtime -rw-r--r-- 1 root root 50 2006-09-09 21:24 aliases drwxr-xr-x 2 root root 8192 2007-05-26 16:01 alternatives -rw-r--r-- 1 root root 395 2007-03-05 08:38 anacrontab drwxr-xr-x 7 root root 4096 2007-04-16 10:05 apm ...
ls -l | awk '{print ($5)}'This highlights some more aspects of the awk scripting language.
- The awk language surrounds statements in curly-braces.
- The "$5" reminds of the shell's default variable referring to the 5th command-line argument, but here, being between a pair of single-quotes, it is interpreted by awk, not by the shell. As is immediately evident, the above command prints only the 5th column of every row of input.
ls -l | awk '{TOTAL=TOTAL+$5} END {print (TOTAL)}'The awk script command consists of two separate statements. The second statement is prefixed by the END keyword. What this script does is, for each line of input encountered, add the value of the 5th field (implicitly converted to a numeric value) to the variable TOTAL. After all lines of input has been read, print the value of the TOTAL variable (implicitly converted to a text string). Note that awk initiates variables to zero/null values on first reference. The above example prints only a single line, showing the value of the TOTAL obtained. In fact, all awk commands takes addresses - The above examples both uses a default (blank) address that matches all input lines. Addresses can be specified as simple regular expressions, or as complex conditions. A typical use would be to add up only the 5th column for files with some commonality in the name, eg to add up all the files with names like "*txt", you can use:
ls -l | awk '/txt$/ {TOTAL=TOTAL+$5} END {print ("Total for txt files:", TOTAL)}'The above line only executes the TOTAL=TOTAL+$5 statement when the expression /txt$/ is matched by the current input line. The expression is the "address", indicating when to execute the statement. Statements can be complex, for example to print the list of files being added up, with the total a the end, we can add an extra step into the command like this:
ls -l | awk '/txt$/ {print;TOTAL+=$5} END {print ("Total for txt files:", TOTAL)}'Note - I used the short-form of the add instruction to keep the command from becoming too long. The print command use with no arguments as in the above example, simply emits the complete input line. awk has also got a printf statement which can be used with great results, for example to simply re-format the input from the ls -l command:
ls -l | awk '{printf ("%-30s %20d %s $%3d %8s %8s %s\n", $8, $5, $1, $2, $3, $4, $6, $7)}'There are (probably many) better ways to do this, but this example shows some of the mathematical capability of awk:
echo 5 3 | awk '{printf ("%5.3f\n", $1/$2)}'awk has got a few built-in variables which are constantly updated automatically. The NF variable , for example, reveals the number of fields on the current line. The NR variable contains the current input line (or record) number. Interestingly, the NF variable can be used in conjunction with a $, eg "$NF" to refer to the last field (or word) on a line, even when every line has got a different number of fields, eg:
ls -l | awk '{print ($NF)}'This can be further advanced, for example to get the second-to-last field, one can use something like:
ls -l | awk '{print ($(NF-1))}'Note the extra set of braces in the above example. The awk language justifies that awk scripts sometimes be placed in their own separate files, allowing the awk statements to be formatted with indentation, etc. This is especially useful when having many complex statements, and eliminates the problems sometimes experienced with the shell expanding/interpreting special characters in the awk script. I will in the future do a follow-up article on awk as this is only just barely scratching at the surface of its capabilities.
No comments:
Post a Comment