Sunday, 26 February 2017

awk command examples for practice for Linux beginners


The basic function of awk is to search files for lines or other text units containing one or more patterns. When a line matches one of the patterns, special actions are performed on that line.

Programs in awk are different from programs in most other languages, because awk programs are
"data-driven": you describe the data you want to work with and then what to do when you find it. Most other languages are "procedural." You have to describe, in great detail, every step the program is to take. When working with procedural languages, it is usually much harder to clearly describe the data your program will process. For this reason, awk programs are often refreshingly easy to read and write.

There are several ways to run awk. If the program is short, it is easiest to run it on the command line:
awk PROGRAM inputfile(s)
If multiple changes have to be made, possibly regularly and on multiple files, it is easier to put the awk
commands in a script. This is read like this:
awk -f PROGRAM-FILE inputfile(s)

Printing selected fields
When awk reads a line of a file, it divides the line in fields based on the specified input field separator, FS
The variables $1, $2, $3, ..., $N hold the values of the first, second, third until the last field of an input line. The variable $0 (zero) holds the value of the entire line.

In the output of ls -l, there are 9 columns. The print statement uses these fields as follows:

ls -l | awk '{ print $5, $9 }'
4096 jenkins_upgrade
57 shellPractice
120 venky.sh
4096 wcpjars

Without formatting, using only the output separator, the output looks rather poor. Inserting a couple of tabs and a string to indicate what output this is will make it look a lot better:
ls -ldh * | grep -v total | awk '{ print "Size is " $5 " bytes for " $9 }'

Size is 4.0K bytes for jenkins_upgrade
Size is 57 bytes for shellPractice
Size is 120 bytes for venky.sh
Size is 4.0K bytes for wcpjars

df -h | sort -rnk 5 | head -3 | awk '{ print "Partition " $6 "\t: " $5 " full!" }'
Partition /boot : 39% full!
Partition /     : 10% full!
Partition /home : 3% full!

\n Newline character
\t Tab

The print command and regular expressions

awk 'EXPRESSION { PROGRAM }' file(s)

df -h | awk '/dev/ { print $6 "\t: " $5 }'
/       : 10%
/dev    : 0%
/dev/shm        : 0%
/home   : 3%
/boot   : 39%

ls -l | awk '/\<(s|u|O).*\.jar$/ { print $9 }'
OnlineDataModel.jar
sharedserviceslib.jar
utilityframework.jar

In order to precede output with comments, use the BEGIN statement:
ls -l | awk 'BEGIN { print "Files found:\n" } /\<[a|x].*\.conf$/ { print $9 }'

The END statement can be added for inserting text after the entire input is processed
ls -l | \
awk '/\<[a|x].*\.conf$/ { print $9 } END { print \
"Can I do anything else for you, mistress?" }'

The field separator is represented by the built-in variable FS.
awk 'BEGIN { FS=":" } { print $1 "\t" $5 }' /etc/passwd
root    root
bin     bin
daemon  daemon
adm     adm
lp      lp

The output from an entire print statement is called an output record. Each print command results in one
output record, and then outputs a string called the output record separator, ORS

awk 'BEGIN { OFS=";" ; ORS="\n-->\n" } { print $1,$2}' test

cat revenues
20021009 20021013 consultancy BigComp 2500
20021015 20021020 training EduComp 2000
20021112 20021123 appdev SmartComp 10000
20021204 20021215 training EduComp 5000

cat total.awk
{ total=total + $5 }
{ print "Send bill for " $5 " dollar to " $4 }
END { print "---------------------------------\nTotal revenue: " total }

awk -f total.awk revenues
Send bill for 2500 dollar to BigComp
Send bill for 2000 dollar to EduComp
Send bill for 10000 dollar to SmartComp
Send bill for 5000 dollar to EduComp
---------------------------------
Total revenue: 19500



No comments:

Post a Comment