这篇文章讲述 CPU 单核跑满 100% 的原因及定位办法。
在生产环境中,代码可能存在 Bug, 被触发之后导致 CPU 跑满 100%, 从而影响其它线程的执行。这篇文章模拟种场景,使用 top
命令来定位有问题的线程,并找到对应的代码块。
命令介绍 top
是查看系统负载的命令,在本文中使用了以下的参数:
1 2 3 4 5 6 7 8 9 10 11 12 -c: 查看进程执行的命令; -p: 查看指定的进程; -H: 查看线程 c: 切换显示进程执行的命令; M: 按照内存进行排序; N: 按照 pid 进行排序; P: 按照 CPU 负载进行排序; T: 按照 TIME+ 进行排序; e: 切换内存单位
分析该问题,关键是找到跑满 CPU 的线程,可以按照以下步骤进行:
编写模拟程序;
使用 Top
命令定位问题线程;
使用 jstack
找到对应的代码块。
模拟程序 可以编写一个死循环的程序来模拟这个场景,其代码如下:
1 2 3 4 5 6 7 public static void main(String[] args) { while (true ) { // 死循环 System.out.println("Current Timestamp:" + System.currentTimeMillis()); } }
打包并执行命令 java -jar fatjar-lab-jar-with-dependencies.jar
.
定位问题线程 使用 top -c
命令查看有问题的进程,如下所示:
1 2 3 4 5 6 7 8 9 10 11 $ top -c top - 18:03:33 up 149 days, 22:21, 2 users , load average: 0.33, 0.10, 0.07 Tasks: 141 total, 1 running, 140 sleeping, 0 stopped, 0 zombie %Cpu(s): 14.5 us, 12.6 sy, 0.0 ni, 72.7 id , 0.0 wa, 0.0 hi, 0.2 si, 0.0 st KiB Mem : 8009820 total, 2528620 free, 1099948 used, 4381252 buff/cache KiB Swap: 2097148 total, 2097148 free, 0 used. 6108472 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 22900 root 20 0 4477412 57688 11996 S 96.4 0.7 0:09.14 java -jar fatjar-lab-jar-with-dependencies.jar 13732 root 20 0 156756 5616 4220 D 8.3 0.1 0:01.52 sshd: root@pts/0,pts/1 22461 root 20 0 162012 2440 1660 R 0.7 0.0 0:00.08 top -c
可以看到,有问题的进程是 22900
, CPU 跑到了 96.4%
, 下面使用 top -Hp 22900
查看线程信息,并使用交互命令 P(大写)
进行排序。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 $ top -Hp 22900 top - 18:11:28 up 149 days, 22:29, 3 users , load average: 1.17, 0.35, 0.17 Threads: 15 total, 1 running, 14 sleeping, 0 stopped, 0 zombie %Cpu(s): 15.8 us, 10.3 sy, 0.0 ni, 73.6 id , 0.0 wa, 0.0 hi, 0.3 si, 0.0 st KiB Mem : 8009820 total, 2526596 free, 1101308 used, 4381916 buff/cache KiB Swap: 2097148 total, 2097148 free, 0 used. 6107012 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 22901 root 20 0 4477412 57876 12104 R 92.0 0.7 0:36.46 java -jar fatjar-lab-jar-with-dependencies.jar 22906 root 20 0 4477412 57876 12104 S 0.3 0.7 0:00.02 java -jar fatjar-lab-jar-with-dependencies.jar 22900 root 20 0 4477412 57876 12104 S 0.0 0.7 0:00.00 java -jar fatjar-lab-jar-with-dependencies.jar 22902 root 20 0 4477412 57876 12104 S 0.0 0.7 0:00.00 java -jar fatjar-lab-jar-with-dependencies.jar 22903 root 20 0 4477412 57876 12104 S 0.0 0.7 0:00.00 java -jar fatjar-lab-jar-with-dependencies.jar 22904 root 20 0 4477412 57876 12104 S 0.0 0.7 0:00.00 java -jar fatjar-lab-jar-with-dependencies.jar 22905 root 20 0 4477412 57876 12104 S 0.0 0.7 0:00.00 java -jar fatjar-lab-jar-with-dependencies.jar 22907 root 20 0 4477412 57876 12104 S 0.0 0.7 0:00.00 java -jar fatjar-lab-jar-with-dependencies.jar 22908 root 20 0 4477412 57876 12104 S 0.0 0.7 0:00.00 java -jar fatjar-lab-jar-with-dependencies.jar 22909 root 20 0 4477412 57876 12104 S 0.0 0.7 0:00.00 java -jar fatjar-lab-jar-with-dependencies.jar 22910 root 20 0 4477412 57876 12104 S 0.0 0.7 0:00.23 java -jar fatjar-lab-jar-with-dependencies.jar 22911 root 20 0 4477412 57876 12104 S 0.0 0.7 0:00.25 java -jar fatjar-lab-jar-with-dependencies.jar 22912 root 20 0 4477412 57876 12104 S 0.0 0.7 0:00.07 java -jar fatjar-lab-jar-with-dependencies.jar 22913 root 20 0 4477412 57876 12104 S 0.0 0.7 0:00.00 java -jar fatjar-lab-jar-with-dependencies.jar 22914 root 20 0 4477412 57876 12104 S 0.0 0.7 0:00.03 java -jar fatjar-lab-jar-with-dependencies.jar
从上面的内容可以看出,线程 22901
占用了 92%
的 CPU 时间,该线程便是出问题的线程。
由于 jstack
命令中使用的线程 id 是 16 进制的,所以需要将 22901
转化为 16 进制。
1 2 $ printf "%x\n" 22901 5975
定位代码块 通过命令 jstack 22900 | grep '0x5975' -A20 --color
查看线程栈的信息,可以发现出问题的线程是 main
线程,代码位置是 org.noahsark.fatjar.App.main(App.java:14)
. 从而最终找到问题的原因。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 $ jstack 22900 | grep '0x5975' -A20 --color "main" java.lang.Thread.State: RUNNABLE at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:326) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) - locked <0x0000000085c07420> (a java.io.BufferedOutputStream) at java.io.PrintStream.write(PrintStream.java:482) - locked <0x0000000085c066a8> (a java.io.PrintStream) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104) - locked <0x0000000085c07540> (a java.io.OutputStreamWriter) at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:185) at java.io.PrintStream.newLine(PrintStream.java:546) - eliminated <0x0000000085c066a8> (a java.io.PrintStream) at java.io.PrintStream.println(PrintStream.java:807) - locked <0x0000000085c066a8> (a java.io.PrintStream) at org.noahsark.fatjar.App.main(App.java:14)
参考:
1. linux线上CPU100%排查