Filters
awk
-
Awk programs are equivalent to sed "instructions" and can be defined inline or in a program file (also "source files"). If no input files are specified awk can accept input from standard input.
# Inline awk $OPTIONS $PROGRAM $INPUTFILES # Program file awk $OPTIONS -f $PROGRAMFILE $INPUTFILES
awk programs combine patterns and actions
Patterns can be:
- regular expressions or fixed strings
- line numbers using builtin variable
NR
- predefined patterns
BEGIN
orEND
, whose actions are executed before and after processing any lines of the data file, respectively
Convert ":" to newlines in $PATH environment variable
echo $PATH | awk 'BEGIN {RS=":"} {print}'
Print the first field of all files in the current directory, taking semicolon
;
as the field separator, outputting filename, line number, and first field of matches, with colon:
between the filename and line numbersearch for stringawk 'BEGIN {FS=";"} /enable/ {print FILENAME ":" FNR,$1}' *
MA
in all files, outputting filename, line, and line number for matcheschange field separator (awk '/MA/ {OFS=" " print FILENAME OFS FNR OFS $0} *
FS
) to a colon (:
) and runawkscr
flag also works for awkawk -F: -f awkscr /etc/passwd
print the first field of each line in the input fileawk -f script files` `-f
equivalent toawk '{ print $1 }' list
grep MA *
({print}
is implied)awk '/MA/' * | awk '/MA/ {print}' *
-F
flag is followed by field separatorpipe output ofawk -F, '/MA/ { print $1 }' list
free
toawk
to get free memory and total memorypipe output offree -h | awk '/^Mem|/ {print $3 "/" $2}
sensors
toawk
to get CPU temperaturereplace initial "fake." with "real;" in filesensors | awk '/^temp1/ {print $2}
fake_isbn
print all linesawk 'sub(^fake.,"real;")' fake_isbn
remove file headerawk '1 { print }' file
remove file headerawk 'NR>1' file
print lines in a rangeawk 'NR>1 { print } file
remove whitespace-only linesawk 'NR>1 && NR < 4' file
remove all blank linesawk 'NF' file
extract fieldsawk '1' RS='' file
perform column-wise calculationsawk '{ print $1, $3}' FS=, OFS=, file
count the number of nonempty linesawk '{ SUM=SUM+$1 } END { print SUM }' FS=, OFS=, file
count the number of nonempty linesawk '/./ { COUNT+=1 } END { print COUNT }' file
count the number of nonempty linesawk 'NF { COUNT+=1 } END { print COUNT }' file
Arraysawk '+$1 { COUNT+=1 } END { print COUNT }' file
Identify duplicate linesawk '+$1 { CREDITS[$3]+=$1 } END { for (NAME in CREDITS) print NAME, CREDITS[NAME] }' FS=, file
Remove duplicate linesawk 'a[$0]++' file
Remove multiple spacesawk '!a[$0]++' file
Join linesawk '$1=$1' file
awk '{ print $3 }' FS=, ORS=' ' file; echo
awk '+$1 { SUM+=$1; NUM+=1 } END { printf("AVG=%f",SUM/NUM); }' FS=, file` | format
Convert to uppercaseawk '+$1 { SUM+=$1; NUM+=1 } END { printf("AVG=%6.1f",SUM/NUM); }' FS=, file
Change part of a stringawk '$3 { print toupper($0); }' file
Split the second field ("EXPDATE") by spaces, storing the result into the array DATE; then print credits ($1) and username ($3) as well as the month (DATE[2]) and year (DATE[3])awk '{ $3 = toupper(substr($3,1,1)) substr($3,2) } $3' FS=, OFS=, file
awk '+$1 { split($2, DATE, " "); print $1,$3, DATE[2], DATE[3] }' FS=, OFS=, file
awk '+$1 { split($4, GRP, ":"); print $3, GRP[1], GRP[2] }' FS=, file
Search and replace with commaawk '+$1 { split($4, GRP, /:+/); print $3, GRP[1], GRP[2] }' FS=, file
Adding dateawk '+$1 { gsub(/ +/, "-", $2); print }' FS=, file
Modify a field externallyawk 'BEGIN { printf("UPDATED: "); system("date") } /^UPDATED:/ { next } 1' file
Invoke dynamically generated commandawk '+$1 { CMD | getline $5; close(CMD); print }' CMD="uuid -v4" FS=, OFS=, file
Join dataawk '+$1 { cmd = sprintf(FMT, $2); cmd | getline $2; close(cmd); print }' FMT='date -I -d "%s"' FS=, file
Add up all first records to {sum}, then print that number out at the endawk '+$1 { CMD | getline $5; print }' CMD='od -vAn -w4 -t x /dev/urandom' FS=, file
awk '{sum += $1} END {print sum}' file
cat
cut
grep
grep -R $TEXT $DIRECTORY
head
- Print first 8 characters of
$FILE
head -c8 $FILE
paste
-
Merge lines of files
Make a .csv file from two lists
Transpose rowspaste -d ',' file1 file2
paste -s file1 file2
sed
-
sed ("Stream-oriented editor") is typically used for applying repetitive edits across all lines of multiple files. In particular it is, alongside
awk
one of the two primary commands which accept regular expressions in Unix systems.sed instructions can be defined inline or in a command file (i.e. script).
Inlinesed $OPTIONS $INSTRUCTION $FILE
Command filesed $OPTIONS -f $SCRIPT $FILE
sed instructions are made of two components: addresses (i.e. patterns) and procedures (i.e. actions).
Run sed commands in
$SCRIPT
on$FILE
Suppress automatic printing of pattern spacesed -f $SCRIPT $FILE
sed -n # --quiet , --silent
Zero, one, or two addresses can precede a procedure. In the absence of an address, the procedure is executed over every line of input. With one address, the procedure will be executed over every line of input that matches.
With two addresses, the procedure will be executed over groups of lines whereby:
- The first address selects the first line in the first group
- The second address selects the next subsequent line that it matches, which becomes the last line in the first group
- If no match for the second address is found, it point to the end of the file
- After the match, the selection process for the next group begins by searching for a match to the first address
Addressing can be done in one of two ways:
- Line addressing, specifying line numbers separated by a comma (e.g.
3,7p
);$
represents the last line of input - Context addressing, using a regular expression enclosed by forward slashes (e.g.
/From:/p
)
Edit the file in-place, but save a backup copy of the original with {suffix} appended to - the filename
-i=suffix
In some circles, sed is recommended as a replacement for other filters like head. Here, the first 10 lines of a file are displayed.
sed 10q $FILE
Display the top 10 processes by memory or cpu usage.
ps axch -o cmd,%mem --sort=-%mem | sed 11q ps axch -o cmd:15,%cpu --sort=-%cpu | sed 11q
Replace angle brackets with their HTML codes, piped in from a heredoc:
sed -e 's/</\</g' -e 's/>/\>/g' << EOF
<!-- Display first two lines of file Without
-n
, each line will be printed twicesed -n '1,2p' emp.lst
Prepending
!
to the procedure reverses the sense of the command (YUG: 450)sed -n '3,$!p' emp.lst
Display a range of lines
Use thesed -n '9,11p' emp.lst
-e
flag to precede multiple instructionsDelete lines Delete second line alonesed -n -e '1,2p' -e '7,9p' -e '$p' emp.lst
Delete a range of lines: from the 2nd through the 3rdsed '2d' myfile
Delete a range of lines, from the first occurrence of 'second' to the line with the first occurrence of 'fourth'sed '2,3d' myfile
Print all of a file except for specific lines Suppress any line with 'test' in itsed '/second/,/fourth/d' myfile
sed '/test/d' myfile
Suppress from the 3rd line to EOF
sed '3,$d' myfile
Replace the first instance of the
|
character with:
and display the first two lines [YUG:455]Replace all instances of thesed 's/|/:/ emp.lst | head -2
|
character with:
, displaying the first two lines [YUG:455]Substitute HTML tags:sed 's/|/:/g' emp.lst | head -2
These commands will replace "director" with "executive director"sed 's/<I>/<EM>/g'
sed 's/director/executive director/' emp.lst
sed 's/director/executive &/' emp.lst
sed '/director/s//executive &/' emp.lst
Searching for text
Equivalent to
grep MA *
Stringing sed statements together with pipe Take lines beginning with "fake" and remove all instances of "fake.", piping them... remove all parentheses with content and count lines of output (results)sed -n '/MA/p' *
Take lines of all files in CWD beginning with "fake" and remove all instances of string "fake." Then remove all parentheses with any content within them and print only the top 10 linessed -n '/^fake/s/fake\.//p' * | sed -nr 's/\(.*\)//p' | wc -l
Count the number of pipes replaced by piping output tosed -ne '/^fake/p' * | sed -n 's/fake\.//p' | sed -nr 's/\(.*\)//p' | sed 11q
cmp
, which will use the-l
option to output byte numbers of differing values, then counting the lines of output (YUG:456)-->sed 's/|/:/g' emp.lst | cmp -l - emp.lst | wc -l
tail
-
Output last lines beginning at 30th line from the start
tail -n=+30
tail --lines=+30
tr
-
Change the case of a string ]
Remove a character or set of characters from a string or line of outputtr [:upper:] [:lower:]
tr -d "text"
watch
- Execute
$CMD
at periods of$N
seconds, watching its output CLKFCheck memory usage in megabytes (watch $CMD -n $N
-m
) every5
seconds Enkiwatch -n 5 free -m