高版本的sort支持–parallel选项

sort 在对大文件排序时很慢,8.11版本(遗憾我的xubuntu11.10还是8.5) 已经支持了 –parallel 并行运算。

这里也有一些介绍如何用sort做并行排序的方法:
http://linuxwebdev.blogspot.com/2009/02/howto-simple-parallel-sort-in-linux.html
http://bashcurescancer.com/sorting-large-files-faster-with-a-shell-script.html
http://stackoverflow.com/questions/930044/why-unix-sort-command-could-sort-a-very-large-file

commandlinefu上还看到可以设置buffersize,可以设置缓存大一些,这在大内存机器上应该有利,不知道默认情况这个buffer大小是多少。

补充:看了一下刚发布的 ubuntu12.04 beta版,里面的coreutils版本是8.13,下载了源码,sort.c 里面是有parallel选项的。
看里面的注释:

/* 
Heuristic(启发式的) value for the number of lines for which it is worth creating
a subthread, during an internal merge sort(内部是归并排序).  I.e., it is a small number of "average" lines for which sorting via two threads is faster than 
sorting via one on an "average" system.  On a dual-core 2.0 GHz i686
system with 3GB of RAM and 2MB of L2 cache, a file containing 128K
lines of gensort -a output is sorted slightly faster with --parallel=2  than with --parallel=1.  By contrast, using --parallel=1 is about 10%
faster than using --parallel=2 with a 64K-line input.  
*/

注意里面说的,128k行的用 parallel=2parallel=1 略微快一些,但64k行的用 parallel=1parallel=2快10% 我还没有测试过。需要对比一下才好说。

发表评论

电子邮件地址不会被公开。 必填项已用*标注