Filters

awk

Awk programs are equivalent to sed "instructions" and can be defined inline or in a program file (also "source files"). If no input files are specified awk can accept input from standard input.

# Inline
awk $OPTIONS $PROGRAM $INPUTFILES

# Program file
awk $OPTIONS -f $PROGRAMFILE $INPUTFILES

awk programs combine patterns and actions

Patterns can be:

regular expressions or fixed strings
line numbers using builtin variable NR
predefined patterns BEGIN or END, whose actions are executed before and after processing any lines of the data file, respectively

Convert ":" to newlines in $PATH environment variable

echo $PATH | awk 'BEGIN {RS=":"} {print}'

Print the first field of all files in the current directory, taking semicolon ; as the field separator, outputting filename, line number, and first field of matches, with colon : between the filename and line number

awk 'BEGIN {FS=";"} /enable/ {print FILENAME ":" FNR,$1}' *

search for string MA in all files, outputting filename, line, and line number for matches

awk '/MA/ {OFS=" " print FILENAME OFS FNR OFS $0} *

change field separator (FS) to a colon (:) and run awkscr

awk -F: -f awkscr /etc/passwd

flag also works for awk

awk -f script files` `-f

print the first field of each line in the input file

awk '{ print $1 }' list

equivalent to grep MA * ({print} is implied)

awk '/MA/' * | awk '/MA/ {print}' *

-F flag is followed by field separator

awk -F, '/MA/ { print $1 }' list

pipe output of free to awk to get free memory and total memory

free -h | awk '/^Mem|/ {print $3 "/" $2}

pipe output of sensors to awk to get CPU temperature

sensors | awk '/^temp1/ {print $2}

replace initial "fake." with "real;" in file fake_isbn

awk 'sub(^fake.,"real;")' fake_isbn

print all lines

awk '1 { print }' file

remove file header

awk 'NR>1' file

remove file header

awk 'NR>1 { print } file

print lines in a range

awk 'NR>1 && NR < 4' file

remove whitespace-only lines

awk 'NF' file

remove all blank lines

awk '1' RS='' file

extract fields

awk '{ print $1, $3}' FS=, OFS=, file

perform column-wise calculations

awk '{ SUM=SUM+$1 } END { print SUM }' FS=, OFS=, file

count the number of nonempty lines

awk '/./ { COUNT+=1 } END { print COUNT }' file

count the number of nonempty lines

awk 'NF { COUNT+=1 } END { print COUNT }' file

count the number of nonempty lines

awk '+$1 { COUNT+=1 } END { print COUNT }' file

Arrays

awk '+$1 { CREDITS[$3]+=$1 } END { for (NAME in CREDITS) print NAME, CREDITS[NAME] }' FS=, file

Identify duplicate lines

awk 'a[$0]++' file

Remove duplicate lines

awk '!a[$0]++' file

Remove multiple spaces

awk '$1=$1' file

Join lines

awk '{ print $3 }' FS=, ORS=' ' file; echo

awk '+$1 { SUM+=$1; NUM+=1 } END { printf("AVG=%f",SUM/NUM); }' FS=, file` | format

awk '+$1 { SUM+=$1; NUM+=1 } END { printf("AVG=%6.1f",SUM/NUM); }' FS=, file

Convert to uppercase

awk '$3 { print toupper($0); }' file

Change part of a string

awk '{ $3 = toupper(substr($3,1,1)) substr($3,2) } $3' FS=, OFS=, file

Split the second field ("EXPDATE") by spaces, storing the result into the array DATE; then print credits ($1) and username ($3) as well as the month (DATE[2]) and year (DATE[3])

awk '+$1 { split($2, DATE, " "); print $1,$3, DATE[2], DATE[3] }' FS=, OFS=, file

awk '+$1 { split($4, GRP, ":"); print $3, GRP[1], GRP[2] }' FS=, file

awk '+$1 { split($4, GRP, /:+/); print $3, GRP[1], GRP[2] }' FS=, file

Search and replace with comma

awk '+$1 { gsub(/ +/, "-", $2); print }' FS=, file

Adding date

awk 'BEGIN { printf("UPDATED: "); system("date") } /^UPDATED:/ { next } 1' file

Modify a field externally

awk '+$1 { CMD | getline $5; close(CMD); print }' CMD="uuid -v4" FS=, OFS=, file

Invoke dynamically generated command

awk '+$1 { cmd = sprintf(FMT, $2); cmd | getline $2; close(cmd); print }' FMT='date -I -d "%s"'  FS=, file

Join data

awk '+$1 { CMD | getline $5; print }' CMD='od -vAn -w4 -t x /dev/urandom' FS=, file

Add up all first records to {sum}, then print that number out at the end

awk '{sum += $1} END {print sum}' file

cat

cut

grep

grep -R $TEXT $DIRECTORY

head

Print first 8 characters of $FILE

head -c8 $FILE

paste

Merge lines of files

Make a .csv file from two lists

paste -d ',' file1 file2

Transpose rows

paste -s file1 file2

sed

sed ("Stream-oriented editor") is typically used for applying repetitive edits across all lines of multiple files. In particular it is, alongside awk one of the two primary commands which accept regular expressions in Unix systems.

sed instructions can be defined inline or in a command file (i.e. script).

Inline

sed $OPTIONS $INSTRUCTION $FILE

Command file

sed $OPTIONS -f $SCRIPT $FILE

sed instructions are made of two components: addresses (i.e. patterns) and procedures (i.e. actions).

Run sed commands in $SCRIPT on $FILE

sed -f $SCRIPT $FILE

Suppress automatic printing of pattern space

sed -n # --quiet , --silent

Zero, one, or two addresses can precede a procedure. In the absence of an address, the procedure is executed over every line of input. With one address, the procedure will be executed over every line of input that matches.

With two addresses, the procedure will be executed over groups of lines whereby:

The first address selects the first line in the first group
The second address selects the next subsequent line that it matches, which becomes the last line in the first group
If no match for the second address is found, it point to the end of the file
After the match, the selection process for the next group begins by searching for a match to the first address

Addressing can be done in one of two ways:

Line addressing, specifying line numbers separated by a comma (e.g. 3,7p); $ represents the last line of input
Context addressing, using a regular expression enclosed by forward slashes (e.g. /From:/p)

Edit the file in-place, but save a backup copy of the original with {suffix} appended to - the filename

-i=suffix

In some circles, sed is recommended as a replacement for other filters like head. Here, the first 10 lines of a file are displayed.

sed 10q $FILE

Display the top 10 processes by memory or cpu usage.

ps axch -o cmd,%mem --sort=-%mem | sed 11q
ps axch -o cmd:15,%cpu --sort=-%cpu | sed 11q

Replace angle brackets with their HTML codes, piped in from a heredoc:

sed -e 's/</\&lt;/g' -e 's/>/\&gt;/g' << EOF

<!-- Display first two lines of file Without -n, each line will be printed twice

sed -n '1,2p' emp.lst

Prepending ! to the procedure reverses the sense of the command (YUG: 450)

sed -n '3,$!p' emp.lst

Display a range of lines

sed -n '9,11p' emp.lst

Use the -e flag to precede multiple instructions

sed -n -e '1,2p' -e '7,9p' -e '$p' emp.lst

Delete lines Delete second line alone

sed '2d' myfile

Delete a range of lines: from the 2nd through the 3rd

sed '2,3d' myfile

Delete a range of lines, from the first occurrence of 'second' to the line with the first occurrence of 'fourth'

sed '/second/,/fourth/d' myfile

Print all of a file except for specific lines Suppress any line with 'test' in it

sed '/test/d' myfile

Suppress from the 3rd line to EOF

sed '3,$d' myfile

Replace the first instance of the | character with : and display the first two lines [YUG:455]

sed 's/|/:/ emp.lst | head -2

Replace all instances of the | character with :, displaying the first two lines [YUG:455]

sed 's/|/:/g' emp.lst | head -2

Substitute HTML tags:

sed 's/<I>/<EM>/g'

These commands will replace "director" with "executive director"

sed 's/director/executive director/' emp.lst

sed 's/director/executive &/' emp.lst

sed '/director/s//executive &/' emp.lst

Searching for text

Equivalent to grep MA *

sed -n '/MA/p' *

Stringing sed statements together with pipe Take lines beginning with "fake" and remove all instances of "fake.", piping them... remove all parentheses with content and count lines of output (results)

sed -n '/^fake/s/fake\.//p' * | sed -nr 's/\(.*\)//p' | wc -l

Take lines of all files in CWD beginning with "fake" and remove all instances of string "fake." Then remove all parentheses with any content within them and print only the top 10 lines

sed -ne '/^fake/p' * | sed -n 's/fake\.//p' | sed -nr 's/\(.*\)//p' | sed 11q

Count the number of pipes replaced by piping output to cmp, which will use the -l option to output byte numbers of differing values, then counting the lines of output (YUG:456)

sed 's/|/:/g' emp.lst | cmp -l - emp.lst | wc -l

-->

tail

Output last lines beginning at 30th line from the start

Short optionPOSIX

tail -n=+30

tail --lines=+30

tr

c d s

Change the case of a string ]

tr [:upper:] [:lower:]

Remove a character or set of characters from a string or line of output

tr -d "text"

watch

Execute $CMD at periods of $N seconds, watching its output CLKF

watch $CMD -n $N

Check memory usage in megabytes (-m) every 5 seconds Enki

watch -n 5 free -m