grep: Globally search a Regular Expression and Print
- 基本用法
- 参数说明
sed: Stream Editor
awk: Alfred Aho, Peter Weinberger, Brian Kernighan

grep: Globally search a Regular Expression and Print

强大的文本搜索工具，它能使用特定模式匹配（包括正则表达式）查找文本，并默认输出匹配行到STDOUT。

基本用法

1	$ grep [-abcEFGhHilLnqrsvVwxy][-A<显示列数>][-B<显示列数>][-C<显示列数>][-d<进行动作>][-e<范本样式>][-f<范本文件>][--help][范本样式][文件或目录...]

参数说明

$ grep --help
Usage: grep [OPTION]... PATTERN [FILE]...
Search for PATTERN in each FILE.
Example: grep -i 'hello world' menu.h main.c

Pattern selection and interpretation:
  -E, --extended-regexp     PATTERN is an extended regular expression
  -F, --fixed-strings       PATTERN is a set of newline-separated strings
  -G, --basic-regexp        PATTERN is a basic regular expression (default)
  -P, --perl-regexp         PATTERN is a Perl regular expression
  -e, --regexp=PATTERN      use PATTERN for matching                            # -e 将PATTERN作为正则表达式
  -f, --file=FILE           obtain PATTERN from FILE
  -i, --ignore-case         ignore case distinctions                            # -i 忽略大小写
  -w, --word-regexp         force PATTERN to match only whole words
  -x, --line-regexp         force PATTERN to match only whole lines
  -z, --null-data           a data line ends in 0 byte, not newline

Miscellaneous:
  -s, --no-messages         suppress error messages
  -v, --invert-match        select non-matching lines                           # -v 反向匹配，输出不包含PATTERN的文本行
  -V, --version             display version information and exit
      --help                display this help text and exit

Output control:
  -m, --max-count=NUM       stop after NUM selected lines
  -b, --byte-offset         print the byte offset with output lines
  -n, --line-number         print line number with output lines                 # -n 输出匹配的文本行的行标
      --line-buffered       flush output on every line
  -H, --with-filename       print file name with output lines
  -h, --no-filename         suppress the file name prefix on output
      --label=LABEL         use LABEL as the standard input file name prefix
  -o, --only-matching       show only the part of a line matching PATTERN
  -q, --quiet, --silent     suppress all normal output
      --binary-files=TYPE   assume that binary files are TYPE;
                            TYPE is 'binary', 'text', or 'without-match'
  -a, --text                equivalent to --binary-files=text                   # -a 将二进制文件内容作为text进行搜索
  -I                        equivalent to --binary-files=without-match
  -d, --directories=ACTION  how to handle directories;
                            ACTION is 'read', 'recurse', or 'skip'
  -D, --devices=ACTION      how to handle devices, FIFOs and sockets;
                            ACTION is 'read' or 'skip'
  -r, --recursive           like --directories=recurse                          # -r 在目录下递归搜索
  -R, --dereference-recursive  likewise, but follow all symlinks
      --include=FILE_PATTERN  search only files that match FILE_PATTERN
      --exclude=FILE_PATTERN  skip files and directories matching FILE_PATTERN
      --exclude-from=FILE   skip files matching any file pattern from FILE
      --exclude-dir=PATTERN  directories that match PATTERN will be skipped.
  -L, --files-without-match  print only names of FILEs with no selected lines   # -L 输出不包含能匹配PATTERN内容的文件名
  -l, --files-with-matches  print only names of FILEs with selected lines       # -l 输出包含能匹配PATTERN内容的文件名
  -c, --count               print only a count of selected lines per FILE       # -c 输出匹配到的文本行的数目
  -T, --initial-tab         make tabs line up (if needed)
  -Z, --null                print 0 byte after FILE name

Context control:
  -B, --before-context=NUM  print NUM lines of leading context                  # -B 显示查找到的某行字符串外，还显示之前<NUM>行
  -A, --after-context=NUM   print NUM lines of trailing context                 # -A 显示查找到的某行字符串外，还显示随后<NUM>行
  -C, --context=NUM         print NUM lines of output context                   # -C 显示查找到的某行字符串外，还显示之前和随后<NUM>行
  -NUM                      same as --context=NUM
      --color[=WHEN],
      --colour[=WHEN]       use markers to highlight the matching strings;
                            WHEN is 'always', 'never', or 'auto'
  -U, --binary              do not strip CR characters at EOL (MSDOS/Windows)

When FILE is '-', read standard input.  With no FILE, read '.' if
recursive, '-' otherwise.  With fewer than two FILEs, assume -h.
Exit status is 0 if any line is selected, 1 otherwise;
if any error occurs and -q is not given, the exit status is 2.

Report bugs to: bug-grep@gnu.org
GNU grep home page: <http://www.gnu.org/software/grep/>
General help using GNU software: <http://www.gnu.org/gethelp/>

sed: Stream Editor

利用脚本来编辑文本文件，主要用来自动编辑一个或多个文件，简化对文件的反复操作、编写转换程序等。它执行的操作为

一次从输入中读取一行数据；
根据提供的编辑器命令匹配数据；
按照命令修改流中的数据；
将新的数据输出到STDOUT，不改变原来的文本文件。

基本用法

1	$ sed [-e <script>][-f <script文件>][文本文件]

<script>为字符串格式的编辑命令，多条命令间以;分隔，或者用bash中的次提示符分隔命令；
<script文件>表示记录编辑命令的文件名，为与shell脚本区分，一般用.sed作为文件后缀名

参数说明

$ sed --help
Usage: sed [OPTION]... {script-only-if-no-other-script} [input-file]...

  -n, --quiet, --silent
                 suppress automatic printing of pattern space
  -e script, --expression=script                                            # -e 从命令行读取执行命令，单条编辑命令时可省略
                 add the script to the commands to be executed
  -f script-file, --file=script-file                                        # -f 从文件中读取执行命令
                 add the contents of script-file to the commands to be executed
  --follow-symlinks
                 follow symlinks when processing in place
  -i[SUFFIX], --in-place[=SUFFIX]                                           # -i 直接修改文本内容
                 edit files in place (makes backup if SUFFIX supplied)
  -l N, --line-length=N
                 specify the desired line-wrap length for the `l' command
  --posix
                 disable all GNU extensions.
  -E, -r, --regexp-extended
                 use extended regular expressions in the script
                 (for portability use POSIX -E).
  -s, --separate
                 consider files as separate rather than as a single,
                 continuous long stream.
      --sandbox
                 operate in sandbox mode.
  -u, --unbuffered
                 load minimal amounts of data from the input files and flush
                 the output buffers more often
  -z, --null-data
                 separate lines by NUL characters
      --help     display this help and exit
      --version  output version information and exit

If no -e, --expression, -f, or --file option is given, then the first
non-option argument is taken as the sed script to interpret.  All
remaining arguments are names of input files; if no input files are
specified, then the standard input is read.

GNU sed home page: <http://www.gnu.org/software/sed/>.
General help using GNU software: <http://www.gnu.org/gethelp/>.
E-mail bug reports to: <bug-sed@gnu.org>.

编辑命令

# `a`: 在指定行后添加行，注意若希望添加多行，行间用`\n`进行分隔，而开头和结尾无需添加`\n`；
$ sed -e "FROM[,TO] a [CONTENT]" FILENAME

# `i`: 在指定行前添加行
$ sed -e "FROM[,TO] i [CONTENT]" FILENAME

# `d`: 将指定行删除
$ sed -e "FROM[,TO] d" FILENAME

# `c`: 取代指定行内容
$ sed -e "FROM[,TO] c [CONTENT]" FILENAME

# `s`: 部分数据的搜索和取代
$ sed -e "FROM[,TO] s/[PATTERN]/[CONTENT]/g" FILENAME

# `p`: 打印输出指定行
$ sed -n -e "FROM[,TO] p" FILENAME

# `q`: 退出，终止命令
$ sed -e "[COMMANDS;]q" FILENAME

实例

# 新建文本`test_sed.txt`
$ for (( i=1; i<=5; i++ )) {
> echo "line $i" >> test_sed.txt
> }
$ cat test_sed.txt
line 1
line 2
line 3
line 4
line 5

# ================= 基本操作 ==================
# ------------------ 打印行 -------------------
# 输出第3~5行，若不添加`-n`会输出全部内容
$ sed -n -e "3,5 p" test_sed.txt
# ------------------ 添加行 -------------------
# 在第3行后添加一行
$ sed -e "3 a newline" test_sed.txt
# 在3~5每行后添加一行
$ sed -e "3,5 a newline" test_sed.txt
# ------------------ 插入行 -------------------
# 在第3行前添加一行
$ sed -e "3 i newline" test_sed.txt
# 在第3行后添加两行
$ sed -e "3 a newline1\nnewline2" test_sed.txt
# ------------------ 删除行 -------------------
# 删除第3行
$ sed -e "3 d" test_sed.txt
# 删除第3~5行
$ sed -e "3,5 d" test_sed.txt
# 删除第3行到最后行
$ sed -e "3,$ d" test_sed.txt
# ------------------ 替换行 -------------------
# 替换第3行
$ sed -e "3 c replace" test_sed.txt
# 替换第3~5行
$ sed -e "3,5 c replace" test_sed.txt
# ------------- 查找替换部分文本 ---------------
# 替换第3行中的`li`为`LI`
$ sed -e "3 s/li/LI/g" test_sed.txt
# ----------------- 多点编辑 ------------------
# 删除第3行到末尾行内容，并把`line`替换为`LINE`
$ sed -e "3,$ d; s/line/LINE/g" test_sed.txt
# 或者
$ $ sed -e "3,$ d" -e "s/line/LINE/g" test_sed.txt

# ============== 搜索并执行命令 ===============
# ---------------- 打印匹配行 -----------------
# 输出包含`3`的关键行，若不添加`-n`同时会输出所有行
$ sed -n -e "/3/p" test_sed.txt
# ---------------- 删除匹配行 -----------------
# 删除包含`3`的关键行
$ sed -e "/3/d" test_sed
# ---------------- 替换匹配行 -----------------
# 将包含`3`的关键行中，`line`替换为`this line`
$ sed -e "/3/{s/line/this line/}" test_sed.txt
# 将包含`3`的关键行中，`line`替换为`this line`，并且只输出该行
$ sed -n -e "/3/{s/line/this line/; p; }" test_sed.txt

# =============== in-place操作 ===============
# 直接修改文本内容，`line`替换为`this line`
$ sed -i -e "s/line/LINE/g" test_sed.txt
# 注意重定向操作可能出现错误
$ sed -e "s/line/LINE/g" test_sed.txt >  test_sed.txt   # 导致文本为空
$ sed -e "s/line/LINE/g" test_sed.txt >> test_sed.txt   # 正常追加

awk: Alfred Aho, Peter Weinberger, Brian Kernighan

逐行扫描指定文件，寻找匹配特定模式的行，并在这些行上进行想要的操作。若未指定匹配模式，将会对所有行进行操作(即默认全部行)；若未指定处理方法，将会被输出到STDOUT(即默认为print)。

基本用法

1
2
3

awk [选项参数] 'script' var=value file(s)
或
awk [选项参数] -f scriptfile var=value file(s)

参数说明

$ awk --help
Usage: awk [POSIX or GNU style options] -f progfile [--] file ...
Usage: awk [POSIX or GNU style options] [--] 'program' file ...
POSIX options:          GNU long options: (standard)
        -f progfile             --file=progfile         # 从文本读取awk命令
        -F fs                   --field-separator=fs    # 字符分隔符，即改行文本以该符号作为分隔，例如$PATH中的`:`
        -v var=val              --assign=var=val
Short options:          GNU long options: (extensions)
        -b                      --characters-as-bytes
        -c                      --traditional
        -C                      --copyright
        -d[file]                --dump-variables[=file]
        -D[file]                --debug[=file]
        -e 'program-text'       --source='program-text'
        -E file                 --exec=file
        -g                      --gen-pot
        -h                      --help
        -i includefile          --include=includefile
        -l library              --load=library
        -L[fatal|invalid]       --lint[=fatal|invalid]
        -M                      --bignum
        -N                      --use-lc-numeric
        -n                      --non-decimal-data
        -o[file]                --pretty-print[=file]
        -O                      --optimize
        -p[file]                --profile[=file]
        -P                      --posix
        -r                      --re-interval
        -S                      --sandbox
        -t                      --lint-old
        -V                      --version

To report bugs, see node `Bugs' in `gawk.info', which is
section `Reporting Problems and Bugs' in the printed version.

gawk is a pattern scanning and processing language.
By default it reads standard input and writes standard output.

Examples:
        gawk '{ sum += $1 }; END { print sum }' file
        gawk -F: '{ print $1 }' /etc/passwd

常用内置变量

变量名	说明
$0	当前记录
$1 ~ $n	当前记录被FS分隔后，第n个字段
NF	当前记录中字段个数
NR	已经读出的记录数
FS	字段分隔符，默认为空格
RS	记录分隔符，默认为换行符
OFS	输出字段分隔符，默认为空格
ORS	输出记录分隔符，默认为换行符

默认情况下，按换行符分隔记录、按空格分隔字段，即记录为单行文本、字段为文本单词。

语法

运算符

运算符	说明
=	赋值
+=, -=, =, %=, ^=, *=	赋值运算
\|\|, &&, !	逻辑或，逻辑与，逻辑非
~, !~	匹配和不匹配正则表达式
<, <=, >=, !=, ==	关系运算符；可以作为字符串比较，也可以用作数值比较；两个都为数字才为数值比较；字符串按字典序比较
+, -, *, /	加减乘除，所有用作算术运算符进行操作，操作数自动转为数值，所有非数值都变为0
&	求余
^, ***	求幂
++, –	前缀或后缀自增、自减
$n	字段引用
空格	字符串连接符
?:	三目运算符
ln	数组中是否存在某键值

BEGIN/END

在BEGIN/END代码块内的命令，只会在开始/结束处理输入文件的文本时执行一次。BEGIN块一般用作初始化FS、打印页眉、初始化全局变量等；END一般用于打印计算结果或输出摘要。

# 统计`/etc/passwd`记录数
$ awk 'BEGIN{count = 0} {count++} END{print count}' /etc/passwd

# 统计`/etc/passwd`字段数
$ awk 'BEGIN{count = 0; FS=":"} {count += NF} END{print count}' /etc/passwd

分支、循环、数组

分支: if

类似C的if语句

$ cat test.awk
BEGIN {
        FS = ":"
}
{
        if ($1 == "louishsu"){
                if ($2 == "x"){
                        print "louishsu x"
                } else {
                        print "louishsu _"
                }
        } else if ( $1 == "mysql"){
                print "mysql"
        }
}

$ awk -f test.awk /etc/passwd

循环: do while, for

可通过break/continue控制循环

$ cat test.awk
BEGIN {
        FS = ":"
}
{
        print "----------------"
        count = 0
        do {
                print $count
                count++
        } while (count < 3)
}

$ awk -f test.awk /etc/passwd

$ cat test.awk
BEGIN {
        FS = ":"
}
{
        print "----------------"
        for (count = 0; count < 3; count++) {
                print $count
        }
}

数组

awk中的数组都是关联数组，数字索引也会转变为字符串索引

$ cat test.awk
{
    cities[1] = "beijing"
    cities[2] = "shanghai"
    cities["three"] = "guangzhou"
    for( c in cities) {
        print cities[c]
    }
    print cities[1]
    print cities["1"]
    print cities["three"]
}

常用字符串函数

函数	说明
`sub(r, s, [t])`	在整个`t`中，用`s`代替`r`；`t`缺省为`$0`；返回替换数量
`gsub(r, s, [t])`	`r`被作为正则表达式，其余同`sub`函数
`index(s1, s2)`	查找并返回`s2`在`s1`中的位置(从1开始编号)；若不存在则返回0
`match(s, r)`	在`s`中匹配正则表达式`r`(从1开始编号)；若未找到匹配返回-1
`length [(s)]`	返回`s`字符串长度，缺省为`$0`
`substr(s, m, [n])`	返回从`m`开始，长度为`n`的子字符串；不指定`n`截取到字符串末尾
`split(s, a, [r])`	根据`r`指定的拓展正则表达式或`FS`，将字符串`s`分割为数组元素`a[1], a[2], ..., a[n]`；返回`n`
`tolower(s), toupper(s)`	全部转换为小写/大写字母，大小写映射由当前语言环境的`LC_CTYPE`范畴定义
`sprintf(fmt, ...)`	根据`fmt`格式化字符串并返回