sort 在对大文件排序时很慢,8.11版本(遗憾我的xubuntu11.10还是8.5) 已经支持了 –parallel 并行运算。
这里也有一些介绍如何用sort做并行排序的方法:
http://linuxwebdev.blogspot.com/2009/02/howto-simple-parallel-sort-in-linux.html
http://bashcurescancer.com/sorting-large-files-faster-with-a-shell-script.html
http://stackoverflow.com/questions/930044/why-unix-sort-command-could-sort-a-very-large-file
在commandlinefu上还看到可以设置buffersize,可以设置缓存大一些,这在大内存机器上应该有利,不知道默认情况这个buffer大小是多少。
补充:看了一下刚发布的 ubuntu12.04 beta版,里面的coreutils版本是8.13,下载了源码,sort.c 里面是有parallel选项的。
看里面的注释:
/*
Heuristic(启发式的) value for the number of lines for which it is worth creating
a subthread, during an internal merge sort(内部是归并排序). I.e., it is a small number of "average" lines for which sorting via two threads is faster than
sorting via one on an "average" system. On a dual-core 2.0 GHz i686
system with 3GB of RAM and 2MB of L2 cache, a file containing 128K
lines of gensort -a output is sorted slightly faster with --parallel=2 than with --parallel=1. By contrast, using --parallel=1 is about 10%
faster than using --parallel=2 with a 64K-line input.
*/
注意里面说的,128k行的用 parallel=2
比parallel=1
略微快一些,但64k行的用 parallel=1
比parallel=2
快10% 我还没有测试过。需要对比一下才好说。