Linux 文本文件处理

发表于 2019-03-21 更新于 2022-04-04 分类于 linux 本文字数： 1.5k 阅读时长 ≈ 1 分钟

Linux 文本文件处理

读取文件内容

more/less 逐屏显示文件

more test.txt  # one file
more *.txt     # many txt file 
ls -l | more   # no file
less test.txt

输入	动作
空格	下一屏
回车	上滚一行
`q`	退出
`/pattren`	搜索指定模式的字符串，模式描述用正则表达式
`/`	继续查找指定模式的字符串
`h`	帮助信息
`Ctrl-L`	屏幕刷新

less
- 回退浏览的功能更强
- 可直接使键盘的上下箭头键，或者j,k，类似vi的光标定位键，以及PgUp， PgDn，Home，End键

cat/od 列出文件内容

cat concatenate:串结，文本格式打印（选项-n：行号）
od octal dump逐字节打印（-c, -t c, -t x1，-t d1, -t u1选项）

cat -n 20 test.txt
cat >test1.txt     # 从stdin获取数据，直到ctrl-d

od -t x1 a.dat     # 十六进制打印
od -t xi a.dat | more
od -c b.file       # 逐字打印，遇到不可打印字符，打印编码

head/tail 显示文件的头部或者尾部

默认只选择10行，-n选项可以选择行数

head -n 15 test.txt
head -n 15 test1.txt test2.txt |more  # 显示2个文件各自前15行，共30行
tail -n 15 test.txt

head -n -20 test.txt  # 去除文件尾部20行，其余算头
tail -n +20 test.txt  # 去除文件头部20行，其余算尾

tail -f debug.txt     # 实时打印文件尾部被追加的内容

tee 三通

将从标准输入stdin得到的数据抄送到标准输出stdout显示，同时存入磁盘文件中

cat a.sh
echo a
echo b

sh a.sh | tee a.log
a
b

cat a.log
a
b

wc 字数统计

列出文件中一共有多少行，有多少个单词，多少字符
当指定的文件数大于1时，最后还列出一个合计
常用选项-l：只列出行计数

wc test.txt             # 1 个文件

wc test1.txt test2.txt  # 多个文件
# 输出
# 6  7 13 test1.txt
# 6  7 13 test2.txt
# 12 14 26 total

wc -l test1.txt test2.txt
# 输出
# 6  test1.txt
# 6  test2.txt
# 12 total

sort 对文件内容排序

-n选项(Numberic):对于数字按照算术值大小排序，而不是按照字符串比较规则，例如123与67
可以选择行中某一部分作为排序关键字
选择升序或降序
字符串比较时对字母是否区分大小写
内排序外排序等算法参数选择（当数据量较大时，性能调优）

1	sort text1.txt > text3.txt

tr 翻译字符

tr string1 string2
把标准输入拷贝到标准输出，string1中出现的字符替换为string2中的对应字符

cat test1.txt
# 输出
# a
# b
# c
# d

cat test1.txt |tr '[abc]' '[XYZ]'
# 输出
# X
# Y
# Z
# d

nuiq 筛选文件中重复行
uniq options
uniq options input-file
uniq options input-file output-file

重复的行：紧邻的两行内容相同
选项
-u （uniqe）只保留没有重复的行
-d （duplicated）只保留有重复的行（但只打印一次）没有以上两个选项，打印没有重复的行和有重复的行（但只打印一次)
-c （count）计数同样的行出现几次