https://www.jianshu.com/p/2cdf71f99209复制
Java进程Crash后通常会生成一个hs_error%pid.log
该文件的位置可通过JVM参数'-XX:ErrorFile'指定如
-XX:ErrorFile=/landon/business/battle/hs_error%p.log
一个典型的hs_error文件内容如下
第一部分文件头
Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f838c76f1ae, pid=8257, tid=0x00007f838b5b6700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_221-b11) (build 1.8.0_221-b11)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.221-b11 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C [libluajit-5.1.so.2+0xc1ae]
#
# Core dump written. Default location: /landon/battle_server/online/core or core.8257
复制
第二部分是Crash的线程信息
--------------- T H R E A D ---------------
Current thread (0x00007f83a001e000): JavaThread "SOFA-SEV-BOLT-BIZ-12200-3-T11" [_thread_in_native, id=27748, stack(0x00007f838b4b6000,0x00007f838b5b7000)]
siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000004
Registers:
Top of Stack: (sp=0x00007f838b5b4f70)
Instructions
Register to memory mapping:
RAX=0x0000000000000000 is an unknown value
RBX=0x0000000040ffb9c0 is an unknown value
RCX=0x0000000000000000 is an unknown value
RDX=0x00000000489f80d0 is an unknown value
RSP=0x00007f838b5b4f70 is pointing into the stack for thread: 0x00007f83a001e000
RBP=0x00000000407c23b8 is an unknown value
RSI=0x0000000041cb0e88 is an unknown value
RDI=0x00000000407c23b8 is an unknown value
R8 =0x00000007bfc84540 is an oop
com.naef.jnlua.LuaState
- klass: 'com/naef/jnlua/LuaState'
R9 =0x00000007bfc84540 is an oop
com.naef.jnlua.LuaState
- klass: 'com/naef/jnlua/LuaState'
R10=0x00000000000006ba is an unknown value
R11=0x00007f843da2cf3c: <offset 0xfccf3c> in /landon/lib/jdk1.8.0_221/jre/lib/amd64/server/libjvm.so at 0x00007f843ca60000
R12=0x0000000000000000 is an unknown value
R13=0x00007f838b5b50d0 is pointing into the stack for thread: 0x00007f83a001e000
R14=0x00000000407c2fa8 is an unknown value
R15=0x00007f838c9e43e0: <offset 0xa3e0> in /landon/lib/libjnlua5.1.so at 0x00007f838c9da000
Stack: [0x00007f838b4b6000,0x00007f838b5b7000], sp=0x00007f838b5b4f70, free space=1019k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [libluajit-5.1.so.2+0xc1ae]
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 4447 com.naef.jnlua.LuaState.lua_gc(II)I (0 bytes) @ 0x00007f84291dd278 [0x00007f84291dd240+0x38]
J 5983 C2 com.landon30.jlua.pool.JLua.call(Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/Object; (206 bytes) @ 0x00007f842946e01c [0x00007f842946dbe0+0x43c]
J 6040 C1 com.landon30.jlua.LuaCallManager.callLua(Ljava/util/Set;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/Object; (474 bytes) @ 0x00007f842965fd1c [0x00007f842965f4e0+0x83c]
复制
第三,四部分主要是Crash时进程和系统信息
--------------- P R O C E S S --------------- Java Threads: ( => current thread ) =>0x00007f83a001e000 JavaThread "SOFA-SEV-BOLT-BIZ-12200-3-T11" [_thread_in_native, id=27748, stack(0x00007f838b4b6000,0x00007f838b5b7000)] VM state:not at safepoint (normal execution) VM Mutex/Monitor currently owned by a thread: None Heap: Card table byte_map: Marking Bits: Polling page: CodeCache: Compilation events: GC Heap History: Deoptimization events: Classes redefined: Internal exceptions: Events: Dynamic libraries: VM Arguments: Environment Variables: Signal Handlers: --------------- S Y S T E M ---------------
复制
hs_error总结
通过上面的分析,打开ulimit -c后就会生成Core dump
6.9G Nov 2 21:58 core.26972
分析Core dump的一个工具是gdb,所以必须安装gdb
$ gdb --version
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-92.el6)
复制
分析Java进程crash生成的coredump
- 用gdb打开
$ gdb /landon/lib/jdk/bin/java core.26972
输入bt或bt full命令查看backtrace
- 从这里就可以看到一些实际的C堆栈
- 如#15 0x00007f21149fcc3d in resizestack (L=0x2, n=0) at lj_state.c:71
(gdb) bt #0 0x00007f2169c704f5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00007f2169c71cd5 in abort () at abort.c:92 #2 0x00007f2169560799 in os::abort(bool) () from /landon/lib/jdk1.8.0_221/jre/lib/amd64/server/libjvm.so #3 0x00007f2169725733 in VMError::report_and_die() () from /landon/lib/jdk1.8.0_221/jre/lib/amd64/server/libjvm.so #4 0x00007f216956aa45 in JVM_handle_linux_signal () from /landon/lib/jdk1.8.0_221/jre/lib/amd64/server/libjvm.so #5 0x00007f216955d8e8 in signalHandler(int, siginfo*, void*) () from /landon/lib/jdk1.8.0_221/jre/lib/amd64/server/libjvm.so #6 <signal handler called> #7 0x00007f21380cbb8d in ?? () #8 0x0000000000000001 in ?? () #9 0x000000005d5fdfa8 in ?? () #10 0x000000005d696b20 in ?? () #11 0x00007f21149edd03 in lj_cont_ra () from /landon/lib/libluajit-5.1.so.2 #12 0x00007f21149ee456 in lj_ff_coroutine_resume () from /landon/lib/libluajit-5.1.so.2 #13 0x00007f21149ee456 in lj_ff_coroutine_resume () from /landon/lib/libluajit-5.1.so.2 #14 0x00007f21149ee456 in lj_ff_coroutine_resume () from /landon/lib/libluajit-5.1.so.2 #15 0x00007f21149fcc3d in resizestack (L=0x2, n=0) at lj_state.c:71 #16 0x00007f2155595eaa in ?? () #17 0x0000000669220970 in ?? () #18 0x000000075f554730 in ?? () #19 0x0000000000000000 in ?? ()
复制btfull命令可以看到详细的信息包括调用参数等
- 此时结合源代码如 lj_state.c:71 + 参数分析原因
(gdb) bt full #0 0x00007f2169c704f5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 resultvar = 0 pid = <value optimized out> selftid = <value optimized out> #1 0x00007f2169c71cd5 in abort () at abort.c:92 save_stage = 2 act = {__sigaction_handler = {sa_handler = 0x7f20b70ede80, sa_sigaction = 0x7f20b70ede80}, sa_mask = {__val = { 139779442242560, 0, 139781486046377, 0, 139781485559377, 139781480345909, 5872491825372126914, 1925, 335544324, 139781480145399, 10, 139781480027928, 55, 1, 0, 0}}, sa_flags = 0, sa_restorer = 0x260} sigs = {__val = {32, 0 <repeats 15 times>}} #2 0x00007f2169560799 in os::abort(bool) () from /landon/lib/jdk1.8.0_221/jre/lib/amd64/server/libjvm.so No symbol table info available. #3 0x00007f2169725733 in VMError::report_and_die() () from /landon/lib/jdk1.8.0_221/jre/lib/amd64/server/libjvm.so No symbol table info available. #4 0x00007f216956aa45 in JVM_handle_linux_signal () from /landon/lib/jdk1.8.0_221/jre/lib/amd64/server/libjvm.so No symbol table info available. #5 0x00007f216955d8e8 in signalHandler(int, siginfo*, void*) () from /landon/lib/jdk1.8.0_221/jre/lib/amd64/server/libjvm.so No symbol table info available. #6 <signal handler called> No symbol table info available. #7 0x00007f21380cbb8d in ?? () No symbol table info available. #8 0x0000000000000001 in ?? () No symbol table info available. #9 0x000000005d5fdfa8 in ?? () No symbol table info available. #10 0x000000005d696b20 in ?? () No symbol table info available. #11 0x00007f21149edd03 in lj_cont_ra () from /landon/lib/libluajit-5.1.so.2 No symbol table info available. #12 0x00007f21149ee456 in lj_ff_coroutine_resume () from /landon/lib/libluajit-5.1.so.2 No symbol table info available. ---Type <return> to continue, or q <return> to quit--- #13 0x00007f21149ee456 in lj_ff_coroutine_resume () from /landon/lib/libluajit-5.1.so.2 No symbol table info available. #14 0x00007f21149ee456 in lj_ff_coroutine_resume () from /landon/lib/libluajit-5.1.so.2 No symbol table info available. #15 0x00007f21149fcc3d in resizestack (L=0x2, n=0) at lj_state.c:71 st = 0x7f2114c63f66 oldst = 0x7f20b70ef370 delta = 0 oldsize = 3071210352 realsize = 32544 up = 0x0 #16 0x00007f2155595eaa in ?? () No symbol table info available. #17 0x0000000669220970 in ?? () No symbol table info available. #18 0x000000075f554730 in ?? () No symbol table info available. #19 0x0000000000000000 in ?? () No symbol table info available.
复制按q退出
gdb还有许多其他命令,这里就不一一细说
使用jvm原生命令分析coredump
使用jstack分析
$ jstack /landon/lib/jdk/bin/java core.30298 >> core.30298.jstack.log
- 从这里可以看到所有的线程堆栈,而这个是hs_error没有的,hs_error只能看到当前线程堆栈和所有的线程
Deadlock Detection: No deadlocks found. Thread 30402: (state = IN_NATIVE) - com.naef.jnlua.LuaState.lua_pcall(int, int)
复制使用jmap分析
$ jmap /landon/lib/jdk/bin/java core.30298
... 0x00007f274a5eb000 55K /landon/lib/libjnlua5.1.so 0x00007f274a374000 474K /lanodn/lib/libluajit-5.1.so.2 ...
复制
$ jmap -dump:live,format=b,file=30298.hprof /landon/lib/jdk/bin/java core.30298
- 即将coredump文件转为hprof,这样就可以用如mat工具等分析内存相关了
- 此执行时间较长,线上慎用
注:其实jstack和jmap的help文档都有说明分析coredump
$ jstack --help jstack [-m] [-l] <executable> <core> (to connect to a core file) $ jmap --help jmap [option] <executable <core> (to connect to a core file)
复制
从gdb分析堆栈时,经常看到一些这样的信息 'No symbol table info available.'
没有可用符号表
其实简单来说就是调试信息,没有调试信息就无法看到详细的堆栈
这里延伸到Java,javac编译的时候
-g:none Generate no debugging info
-g:{lines,vars,source} Generate only some debugging info
复制
如何看一个so库是否有调试信息呢
使用nm命令:List symbols in [file(s)]
// 无符号表的一个so
$ nm -a libjnlua5.1.so_no_debug
nm: libjnlua5.1.so_no_debug: no symbols
$ nm -a libjnlua5.1.so
000000000020f818 b .bss
0000000000000000 n .comment
......
// 查看行号
$ nm -l libjnlua5.1.so | head
0000000000009a50 T JNI_OnLoad /laodon/lib/JNLuaJIT-master/src/main/c/Linux/../jnlua.c:1911
复制
直接使用gdb命令即可
$ gdb libjnlua5.1.so
// 如果有符号表,会直接读取成功,否则
Reading symbols from /landon/lib/libjnlua5.1.so...done.
// 没有则直接提示 no debugging symbols found
Reading symbols from /landon/lib/libjnlua5.1.so_no_debug...(no debugging symbols found)...done.
复制
还有一个很重要的是要查看文件是否被stripped,如果被stripped,则一定没有调试信息
// 这个显示stripped,则一定没有调试信息
$ file libjnlua5.1.so_no_debug
libjnlua5.1.so_no_debug: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, stripped
// 显示not stripped
$ file libjnlua5.1.so
libjnlua5.1.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, not stripped
复制
如何生成调试信息:以三个例子说明
$ make CCDEBUG=-g CFLAGS=-O0
-g -Wall -pedantic -fno-inline
gdb提示Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.212.el6_10.3.x86_64 libgcc-4.4.7-23.el6.x86_64 ncurses-libs-5.7-4.20090207.el6.x86_64
按照提示直接安装debuginfo即可
安装后,gcc等底层库的调试信息也都有了
Reading symbols from /lib64/libgcc_s.so.1...Reading symbols from /usr/lib/debug/lib64/libgcc_s-4.4.7-20120601.so.1.debug...done.
done.
复制