标签归档:unix

进程被debug时的状态

在mac/bsd和linux上ps展示进程状态的字符与linux可能含义不同,比如我的java进程在mac上ps看到是"TX+"状态

➜  ps aux | grep "[j]ava"
hongjiang  60387  0.0  0.4  6072548  36516 s001  TX+  7:28PM  0:00.19 java Daemon

Linux上man ps看到 X 表示 "dead (should never be seen)", 而Mac/BSD下则是 “The process is being traced or debugged” 的含义。 Linux上当一个进程被debug中,状态跟stopped一样都使用T表示(在某些高版本linux内核里会区分开,使用小写t表示进程是被debugger跟踪)

SIGTTIN?

我有一段脚本自己的mac机器默认用zsh,运行时一直很正常,今天给别的同事用,在bash下有些异常,追查了一下这个问题,把问题简化后如下:

#!/bin/bash
zsh -ic "which mvn"
zsh -ic "which mvn"

上面的脚本执行没有问题,但将里面的zsh换位bash,就会出现问题:

#!/bin/bash
bash -ic "which mvn"
bash -ic "which mvn"

在mac上执行,第一次bash -ic "which mvn"是成功的,但第二次执行时就会挂住:

 ➜  ./b.sh
/usr/local/bin/mvn
[1]  + 24649 suspended (tty input)  ./b.sh

/tmp/dd   [23:43:21]
[jobs:1] ➜    

后来想到which在zsh里是一个内置命令,而在bash下则是一个外部命令,可能有所差异,将zsh执行的命令也声明为外部命令:

#!/bin/bash
zsh -ic "/usr/bin/which mvn"
zsh -ic "/usr/bin/which mvn"

执行时会在第二次阻塞住,即使Ctrl-C也无法停止脚本。

这个问题很奇怪,两次以交互式调用shell执行一段命令(必须是外部命令)的话,第一次会成功,第二次则会suspend住。猜测可能是shell在第一次交互式执行结束后改变了上下文的什么状态,导致第二次再执行的时候挂住。在linux上用strace跟踪了一下脚本,看样子是因为SIGTTIN信号量所致,但其中缘由并不清楚,我把代码和strace的信息贴在这里,希望明白的人解释一下

$ cat b.sh
#!/bin/bash
bash -ic "ls"
bash -ic "ls"

$ strace ./b.sh

[hongjiang@localhost dd]$ strace ./b.sh
execve("./b.sh", ["./b.sh"], [/* 31 vars */]) = 0
brk(0)                                  = 0x1175000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa545ddf000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=32094, ...}) = 0
mmap(NULL, 32094, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fa545dd7000
close(3)                                = 0
open("/lib64/libtinfo.so.5", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0@\316\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=174520, ...}) = 0
mmap(NULL, 2268928, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fa545995000
mprotect(0x7fa5459ba000, 2097152, PROT_NONE) = 0
mmap(0x7fa545bba000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x25000) = 0x7fa545bba000
close(3)                                = 0
open("/lib64/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\16\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=19512, ...}) = 0
mmap(NULL, 2109744, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fa545791000
mprotect(0x7fa545794000, 2093056, PROT_NONE) = 0
mmap(0x7fa545993000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7fa545993000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\34\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2107760, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa545dd6000
mmap(NULL, 3932736, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fa5453d0000
mprotect(0x7fa545586000, 2097152, PROT_NONE) = 0
mmap(0x7fa545786000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b6000) = 0x7fa545786000
mmap(0x7fa54578c000, 16960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fa54578c000
close(3)                                = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa545dd4000
arch_prctl(ARCH_SET_FS, 0x7fa545dd4740) = 0
mprotect(0x7fa545786000, 16384, PROT_READ) = 0
mprotect(0x7fa545993000, 4096, PROT_READ) = 0
mprotect(0x7fa545bba000, 16384, PROT_READ) = 0
mprotect(0x6dc000, 4096, PROT_READ)     = 0
mprotect(0x7fa545de0000, 4096, PROT_READ) = 0
munmap(0x7fa545dd7000, 32094)           = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
open("/dev/tty", O_RDWR|O_NONBLOCK)     = 3
close(3)                                = 0
brk(0)                                  = 0x1175000
brk(0x1196000)                          = 0x1196000
brk(0)                                  = 0x1196000
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=106065056, ...}) = 0
mmap(NULL, 106065056, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fa53eea9000
close(3)                                = 0
brk(0)                                  = 0x1196000
getuid()                                = 1000
getgid()                                = 1000
geteuid()                               = 1000
getegid()                               = 1000
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
open("/proc/meminfo", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa545dde000
read(3, "MemTotal:        1017160 kB\nMemF"..., 1024) = 1024
close(3)                                = 0
munmap(0x7fa545dde000, 4096)            = 0
rt_sigaction(SIGCHLD, {SIG_DFL, [], SA_RESTORER|SA_RESTART, 0x7fa545405650}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGCHLD, {SIG_DFL, [], SA_RESTORER|SA_RESTART, 0x7fa545405650}, {SIG_DFL, [], SA_RESTORER|SA_RESTART, 0x7fa545405650}, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7fa545405650}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7fa545405650}, {SIG_DFL, [], SA_RESTORER, 0x7fa545405650}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x7fa545405650}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x7fa545405650}, {SIG_DFL, [], SA_RESTORER, 0x7fa545405650}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigaction(SIGQUIT, {SIG_IGN, [], SA_RESTORER, 0x7fa545405650}, {SIG_DFL, [], SA_RESTORER, 0x7fa545405650}, 8) = 0
uname({sys="Linux", node="localhost.localdomain", ...}) = 0
stat("/tmp/dd", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
stat(".", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
getpid()                                = 2945
open("/usr/lib64/gconv/gconv-modules.cache", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=26254, ...}) = 0
mmap(NULL, 26254, PROT_READ, MAP_SHARED, 3, 0) = 0x7fa545dd8000
close(3)                                = 0
getppid()                               = 2942
getpgrp()                               = 2942
rt_sigaction(SIGCHLD, {0x441090, [], SA_RESTORER|SA_RESTART, 0x7fa545405650}, {SIG_DFL, [], SA_RESTORER|SA_RESTART, 0x7fa545405650}, 8) = 0
getrlimit(RLIMIT_NPROC, {rlim_cur=3909, rlim_max=3909}) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
open("./b.sh", O_RDONLY)                = 3
ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fff54edec40) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "#!/bin/sh\n\nbash -ic \"ls\"\nbash -i"..., 80) = 39
lseek(3, 0, SEEK_SET)                   = 0
getrlimit(RLIMIT_NOFILE, {rlim_cur=1024, rlim_max=4*1024}) = 0
fcntl(255, F_GETFD)                     = -1 EBADF (Bad file descriptor)
dup2(3, 255)                            = 255
close(3)                                = 0
fcntl(255, F_SETFD, FD_CLOEXEC)         = 0
fcntl(255, F_GETFL)                     = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fstat(255, {st_mode=S_IFREG|0775, st_size=39, ...}) = 0
lseek(255, 0, SEEK_CUR)                 = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(255, "#!/bin/sh\n\nbash -ic \"ls\"\nbash -i"..., 39) = 39
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
stat(".", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
stat("/home/hongjiang/.local/bin/bash", 0x7fff54ede900) = -1 ENOENT (No such file or directory)
stat("/home/hongjiang/bin/bash", 0x7fff54ede900) = -1 ENOENT (No such file or directory)
stat("/data/program/scala/bin/bash", 0x7fff54ede900) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/qt-3.3/bin/bash", 0x7fff54ede900) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/bash", 0x7fff54ede900) = -1 ENOENT (No such file or directory)
stat("/usr/bin/bash", {st_mode=S_IFREG|0755, st_size=960384, ...}) = 0
stat("/usr/bin/bash", {st_mode=S_IFREG|0755, st_size=960384, ...}) = 0
geteuid()                               = 1000
getegid()                               = 1000
getuid()                                = 1000
getgid()                                = 1000
access("/usr/bin/bash", X_OK)           = 0
stat("/usr/bin/bash", {st_mode=S_IFREG|0755, st_size=960384, ...}) = 0
geteuid()                               = 1000
getegid()                               = 1000
getuid()                                = 1000
getgid()                                = 1000
access("/usr/bin/bash", R_OK)           = 0
stat("/usr/bin/bash", {st_mode=S_IFREG|0755, st_size=960384, ...}) = 0
stat("/usr/bin/bash", {st_mode=S_IFREG|0755, st_size=960384, ...}) = 0
geteuid()                               = 1000
getegid()                               = 1000
getuid()                                = 1000
getgid()                                = 1000
access("/usr/bin/bash", X_OK)           = 0
stat("/usr/bin/bash", {st_mode=S_IFREG|0755, st_size=960384, ...}) = 0
geteuid()                               = 1000
getegid()                               = 1000
getuid()                                = 1000
getgid()                                = 1000
access("/usr/bin/bash", R_OK)           = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
lseek(255, -14, SEEK_CUR)               = 25
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fa545dd4a10) = 2946
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x43e500, [], SA_RESTORER, 0x7fa545405650}, {SIG_DFL, [], SA_RESTORER, 0x7fa545405650}, 8) = 0
wait4(-1, b.sh
[{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 2946
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2946, si_status=0, si_utime=0, si_stime=0} ---
wait4(-1, 0x7fff54ede450, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn()                          = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7fa545405650}, {0x43e500, [], SA_RESTORER, 0x7fa545405650}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(255, "bash -ic \"ls\"\n", 39)      = 14
stat("/usr/bin/bash", {st_mode=S_IFREG|0755, st_size=960384, ...}) = 0
stat("/usr/bin/bash", {st_mode=S_IFREG|0755, st_size=960384, ...}) = 0
geteuid()                               = 1000
getegid()                               = 1000
getuid()                                = 1000
getgid()                                = 1000
access("/usr/bin/bash", X_OK)           = 0
stat("/usr/bin/bash", {st_mode=S_IFREG|0755, st_size=960384, ...}) = 0
geteuid()                               = 1000
getegid()                               = 1000
getuid()                                = 1000
getgid()                                = 1000
access("/usr/bin/bash", R_OK)           = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fa545dd4a10) = 2967
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x43e500, [], SA_RESTORER, 0x7fa545405650}, {SIG_DFL, [], SA_RESTORER, 0x7fa545405650}, 8) = 0
wait4(-1, 0x7fff54edea00, 0, NULL)      = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGTTIN {si_signo=SIGTTIN, si_code=SI_USER, si_pid=2967, si_uid=1000} ---
--- stopped by SIGTTIN ---

shell前边的连字符含义

在一个脚本里,要获取其父shell时,使用了下面的方式:

#!/bin/bash 
ps -ocomm= -p $(ps -oppid= $$)

它在某些环境下,父shell会显示为 “/usr/local/bin/zsh” 或者 “bash” ,而某些环境下却会显示为”-bash”或”-zsh”;这个开头多出来的连字符是怎么回事?查了一下,原来是表示的是”login shell”。

linux下可以在”/etc/passwd”里看到用户的login shell,而在mac下要确认当前用户的login shell,要通过下面的命令:

$ dscl . read /users/$USER UserShell
UserShell: /bin/bash

在mac下,当打开终端程序(Terminal.app)时,终端shell是login进程的子进程(不管你配置那种command):

$ pstree
 ...
 |-+= 01272 hongjiang /Applications/Utilities/Terminal.app/Contents/MacOS/Terminal
 | \-+= 01275 root login -pf hongjiang
 |   \-+= 01276 hongjiang -bash
 ...

$ ps -ef | grep login
0  1275  1272   0  2:05PM ttys000    0:00.05 login -pf hongjiang 

# 把Terminal的启动命令修改为zsh也一样

 |-+= 01988 hongjiang /Applications/Utilities/Terminal.app/Contents/MacOS/Terminal
     | \-+= 01991 root login -pf hongjiang /bin/zsh  |   \-+= 01992 hongjiang -zsh

而在mac的 iTerm.app 下,不管如果你配置的command是”Login shell”还是修改为其他shell,启动后shell没有再挂在login进程下:

$ pstree
 |-+= 00574 hongjiang /Volumes/Data/program/iTerm.app/Contents/MacOS/iTerm
 | \-+= 02177 hongjiang /usr/local/bin/zsh  

但在iTerm下如果使用的是“Login shell”显示的名称前边却是有连字符的,而其他shell则没有

$ ps $$
  PID   TT  STAT      TIME COMMAND
 2220 s001  Ss     0:00.01 -/bin/bash

# 修改启动shell为zsh

➜  ps $$
  PID   TT  STAT      TIME COMMAND
 2524 s000  Ss     0:00.16 /usr/local/bin/zsh

要想在iTerm下保持跟Terminal一致也用login来启动,应该在配置里修改启动命令为:

login -pf $username /usr/local/bin/zsh   

在linux下,从tty登录shell也是一样由login进程启动的

$ ps -ef | grep bash
 hongjia+  2241   467  0 14:09 tty1     00:00:00 -bash

$ pstree 
systemd─┬─NetworkManager─┬─2*[dhclient]
    │                └─3*[{NetworkManager}]
    ├─...
    ├─login───bash
    ...   

$ ps -ef | grep login
root       467  0.0  0.2  84584  2328 ?        Ss   14:08   0:00 login -- hongjiang    

使用su命令切换到一个用户shell下,默认情况这个shell并不是”login shell”不会去执行/etc/profile和home目录下相关配置:

$ sudo su hongjiang

$ ps -ef | grep $$
hongjia+  2796  2795  0 14:57 pts/0    00:00:00 bash

要以”login shell”方式启动,需要对su指定一个参数“-“

$ sudo su - hongjiang

$ ps -ef | grep $$
hongjia+  3188  3187  0 15:35 pts/0    00:00:00 -bash

从bash文档里可以看到,要以”login shell”方式启动一个shell,要么第一个参数给一个特定的连字符“-”,要么显式的对bash设定”–login”参数。