Skip to content

Filters

awk

Awk programs are equivalent to sed "instructions" and can be defined inline or in a program file (also "source files"). If no input files are specified awk can accept input from standard input.

# Inline
awk $OPTIONS $PROGRAM $INPUTFILES

# Program file
awk $OPTIONS -f $PROGRAMFILE $INPUTFILES

awk programs combine patterns and actions

Patterns can be:

  • regular expressions or fixed strings
  • line numbers using builtin variable NR
  • predefined patterns BEGIN or END, whose actions are executed before and after processing any lines of the data file, respectively

Convert ":" to newlines in $PATH environment variable

echo $PATH | awk 'BEGIN {RS=":"} {print}'

Print the first field of all files in the current directory, taking semicolon ; as the field separator, outputting filename, line number, and first field of matches, with colon : between the filename and line number

awk 'BEGIN {FS=";"} /enable/ {print FILENAME ":" FNR,$1}' *
search for string MA in all files, outputting filename, line, and line number for matches
awk '/MA/ {OFS=" " print FILENAME OFS FNR OFS $0} *
change field separator (FS) to a colon (:) and run awkscr
awk -F: -f awkscr /etc/passwd
flag also works for awk
awk -f script files` `-f
print the first field of each line in the input file
awk '{ print $1 }' list
equivalent to grep MA * ({print} is implied)
awk '/MA/' * | awk '/MA/ {print}' *
-F flag is followed by field separator
awk -F, '/MA/ { print $1 }' list
pipe output of free to awk to get free memory and total memory
free -h | awk '/^Mem|/ {print $3 "/" $2}
pipe output of sensors to awk to get CPU temperature
sensors | awk '/^temp1/ {print $2}
replace initial "fake." with "real;" in file fake_isbn
awk 'sub(^fake.,"real;")' fake_isbn
print all lines
awk '1 { print }' file
remove file header
awk 'NR>1' file
remove file header
awk 'NR>1 { print } file
print lines in a range
awk 'NR>1 && NR < 4' file
remove whitespace-only lines
awk 'NF' file
remove all blank lines
awk '1' RS='' file
extract fields
awk '{ print $1, $3}' FS=, OFS=, file
perform column-wise calculations
awk '{ SUM=SUM+$1 } END { print SUM }' FS=, OFS=, file
count the number of nonempty lines
awk '/./ { COUNT+=1 } END { print COUNT }' file
count the number of nonempty lines
awk 'NF { COUNT+=1 } END { print COUNT }' file
count the number of nonempty lines
awk '+$1 { COUNT+=1 } END { print COUNT }' file
Arrays
awk '+$1 { CREDITS[$3]+=$1 } END { for (NAME in CREDITS) print NAME, CREDITS[NAME] }' FS=, file
Identify duplicate lines
awk 'a[$0]++' file
Remove duplicate lines
awk '!a[$0]++' file
Remove multiple spaces
awk '$1=$1' file
Join lines
awk '{ print $3 }' FS=, ORS=' ' file; echo
awk '+$1 { SUM+=$1; NUM+=1 } END { printf("AVG=%f",SUM/NUM); }' FS=, file` | format 
awk '+$1 { SUM+=$1; NUM+=1 } END { printf("AVG=%6.1f",SUM/NUM); }' FS=, file
Convert to uppercase
awk '$3 { print toupper($0); }' file
Change part of a string
awk '{ $3 = toupper(substr($3,1,1)) substr($3,2) } $3' FS=, OFS=, file
Split the second field ("EXPDATE") by spaces, storing the result into the array DATE; then print credits ($1) and username ($3) as well as the month (DATE[2]) and year (DATE[3])
awk '+$1 { split($2, DATE, " "); print $1,$3, DATE[2], DATE[3] }' FS=, OFS=, file
awk '+$1 { split($4, GRP, ":"); print $3, GRP[1], GRP[2] }' FS=, file
awk '+$1 { split($4, GRP, /:+/); print $3, GRP[1], GRP[2] }' FS=, file
Search and replace with comma
awk '+$1 { gsub(/ +/, "-", $2); print }' FS=, file
Adding date
awk 'BEGIN { printf("UPDATED: "); system("date") } /^UPDATED:/ { next } 1' file
Modify a field externally
awk '+$1 { CMD | getline $5; close(CMD); print }' CMD="uuid -v4" FS=, OFS=, file
Invoke dynamically generated command
awk '+$1 { cmd = sprintf(FMT, $2); cmd | getline $2; close(cmd); print }' FMT='date -I -d "%s"'  FS=, file
Join data
awk '+$1 { CMD | getline $5; print }' CMD='od -vAn -w4 -t x /dev/urandom' FS=, file
Add up all first records to {sum}, then print that number out at the end
awk '{sum += $1} END {print sum}' file

cat

cut

grep

grep -R $TEXT $DIRECTORY
Print first 8 characters of $FILE
head -c8 $FILE

paste

Merge lines of files

Make a .csv file from two lists

paste -d ',' file1 file2
Transpose rows
paste -s file1 file2

sed

sed ("Stream-oriented editor") is typically used for applying repetitive edits across all lines of multiple files. In particular it is, alongside awk one of the two primary commands which accept regular expressions in Unix systems.

sed instructions can be defined inline or in a command file (i.e. script).

Inline
sed $OPTIONS $INSTRUCTION $FILE
Command file
sed $OPTIONS -f $SCRIPT $FILE

sed instructions are made of two components: addresses (i.e. patterns) and procedures (i.e. actions).

Run sed commands in $SCRIPT on $FILE

sed -f $SCRIPT $FILE
Suppress automatic printing of pattern space
sed -n # --quiet , --silent

Zero, one, or two addresses can precede a procedure. In the absence of an address, the procedure is executed over every line of input. With one address, the procedure will be executed over every line of input that matches.

With two addresses, the procedure will be executed over groups of lines whereby:

  • The first address selects the first line in the first group
  • The second address selects the next subsequent line that it matches, which becomes the last line in the first group
  • If no match for the second address is found, it point to the end of the file
  • After the match, the selection process for the next group begins by searching for a match to the first address

Addressing can be done in one of two ways:

  • Line addressing, specifying line numbers separated by a comma (e.g. 3,7p); $ represents the last line of input
  • Context addressing, using a regular expression enclosed by forward slashes (e.g. /From:/p)

Edit the file in-place, but save a backup copy of the original with {suffix} appended to - the filename

-i=suffix

In some circles, sed is recommended as a replacement for other filters like head. Here, the first 10 lines of a file are displayed.

sed 10q $FILE

Display the top 10 processes by memory or cpu usage.

ps axch -o cmd,%mem --sort=-%mem | sed 11q
ps axch -o cmd:15,%cpu --sort=-%cpu | sed 11q

Replace angle brackets with their HTML codes, piped in from a heredoc:

sed -e 's/</\&lt;/g' -e 's/>/\&gt;/g' << EOF

<!-- Display first two lines of file Without -n, each line will be printed twice

sed -n '1,2p' emp.lst

Prepending ! to the procedure reverses the sense of the command (YUG: 450)

sed -n '3,$!p' emp.lst

Display a range of lines

sed -n '9,11p' emp.lst
Use the -e flag to precede multiple instructions
sed -n -e '1,2p' -e '7,9p' -e '$p' emp.lst
Delete lines Delete second line alone
sed '2d' myfile
Delete a range of lines: from the 2nd through the 3rd
sed '2,3d' myfile
Delete a range of lines, from the first occurrence of 'second' to the line with the first occurrence of 'fourth'
sed '/second/,/fourth/d' myfile
Print all of a file except for specific lines Suppress any line with 'test' in it
sed '/test/d' myfile

Suppress from the 3rd line to EOF

sed '3,$d' myfile

Replace the first instance of the | character with : and display the first two lines [YUG:455]

sed 's/|/:/ emp.lst | head -2
Replace all instances of the | character with :, displaying the first two lines [YUG:455]
sed 's/|/:/g' emp.lst | head -2
Substitute HTML tags:
sed 's/<I>/<EM>/g'
These commands will replace "director" with "executive director"
sed 's/director/executive director/' emp.lst
sed 's/director/executive &/' emp.lst
sed '/director/s//executive &/' emp.lst

Searching for text

Equivalent to grep MA *

sed -n '/MA/p' *
Stringing sed statements together with pipe Take lines beginning with "fake" and remove all instances of "fake.", piping them... remove all parentheses with content and count lines of output (results)
sed -n '/^fake/s/fake\.//p' * | sed -nr 's/\(.*\)//p' | wc -l
Take lines of all files in CWD beginning with "fake" and remove all instances of string "fake." Then remove all parentheses with any content within them and print only the top 10 lines
sed -ne '/^fake/p' * | sed -n 's/fake\.//p' | sed -nr 's/\(.*\)//p' | sed 11q
Count the number of pipes replaced by piping output to cmp, which will use the -l option to output byte numbers of differing values, then counting the lines of output (YUG:456)
sed 's/|/:/g' emp.lst | cmp -l - emp.lst | wc -l
-->

tail

Output last lines beginning at 30th line from the start

tail -n=+30
tail --lines=+30

tr

      c d                             s              

Change the case of a string ]

tr [:upper:] [:lower:]
Remove a character or set of characters from a string or line of output
tr -d "text"

watch

Execute $CMD at periods of $N seconds, watching its output CLKF
watch $CMD -n $N
Check memory usage in megabytes (-m) every 5 seconds Enki
watch -n 5 free -m