来源:https://www.ardanlabs.com/blog/2019/05/garbage-collection-in-go-part2-gctraces.html

开关GC的对比

有一个从不同的新闻提供商下载RSS并搜索的应用程序

在关闭GC的情况下测试并发请求应用程序耗时情况

1
2
3
4
5
$ go build
$ GOGC=off ./project > /dev/null

# 10k requests using 100 connections
hey -m POST -c 100 -n 10000 "http://localhost:5000/search?term=topic&cnn=on&bbc=on&nyt=on"

Untitled

处理10k个请求需要4188ms,每秒处理约2387个请求

在开启GC的情况下呢,

1
2
3
4
5
6
$ GODEBUG=gctrace=1 ./project > /dev/null
gc 3 @3.182s 0%: 0.015+0.59+0.096 ms clock, 0.19+0.10/1.3/3.0+1.1 ms cpu, 4->4->2 MB, 5 MB goal, 12 P
.
.
.
gc 2553 @8.452s 14%: 0.004+0.33+0.051 ms clock, 0.056+0.12/0.56/0.94+0.61 ms cpu, 4->4->2 MB, 5 MB goal, 12 P

共需要2551次gc(省略前两次还没有开始正式测试的gc数据)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
gc 2553 @8.452s 14%: 0.004+0.33+0.051 ms clock, 0.056+0.12/0.56/0.94+0.61 ms cpu, 4->4->2 MB, 5 MB goal, 12 P

gc 2553 : The 2553 GC runs since the program started
@8.452s : Eight seconds since the program started
14% : Fourteen percent of the available CPU so far has been spent in GC

// wall-clock
0.004ms : STW : Write-Barrier - Wait for all Ps to reach a GC safe-point.
0.33ms : Concurrent : Marking
0.051ms : STW : Mark Term - Write Barrier off and clean up.

// CPU time
0.056ms : STW : Write-Barrier
0.12ms : Concurrent : Mark - Assist Time (GC performed in line with allocation)
0.56ms : Concurrent : Mark - Background GC time
0.94ms : Concurrent : Mark - Idle GC time
0.61ms : STW : Mark Term

4MB : Heap memory in-use before the Marking started
4MB : Heap memory in-use after the Marking finished
2MB : Heap memory marked as live after the Marking finished
5MB : Collection goal for heap memory in-use after Marking finished

// Threads
12P : Number of logical processors or threads used to run Goroutines.
1
2
3
4
5
6
7
8
9
10
Requests            : 10,000
------------------------------------------------------
Requests/sec : 1,882 r/s - Hey
Total Duration : 5,311ms - Hey
Percent Time in GC : 14% - GC Trace
Total Collections : 2,551 - GC Trace
------------------------------------------------------
Total GC Duration : 744.54ms - (5,311ms * .14)
Average Pace of GC : ~2.08ms - (5,311ms / 2,551)
Requests/Collection : ~3.98 r/gc - (10,000 / 2,511)

Untitled

Untitled

开启GC后耗费的时间增加了1.1s的延迟

减少分配

首先使用go的pprof工具查看内存使用情况

1
2
3
4
5
6
7
8
import _ "net/http/pprof"

go func() {
http.ListenAndServe("localhost:5000", http.DefaultServeMux)
}()

---
$ go tool pprof http://localhost:5000/debug/pprof/allocs

使用top命令查看前6个分配内存最多的函数

1
2
3
4
5
6
7
8
9
10
11
(pprof) top 6 -cum
Showing nodes accounting for 0.56GB, 5.84% of 9.56GB total
Dropped 80 nodes (cum <= 0.05GB)
Showing top 6 nodes out of 51
flat flat% sum% cum cum%
0 0% 0% 4.96GB 51.90% net/http.(*conn).serve
0.49GB 5.11% 5.11% 4.93GB 51.55% project/service.handler
0 0% 5.11% 4.93GB 51.55% net/http.(*ServeMux).ServeHTTP
0 0% 5.11% 4.93GB 51.55% net/http.HandlerFunc.ServeHTTP
0 0% 5.11% 4.93GB 51.55% net/http.serverHandler.ServeHTTP
0.07GB 0.73% 5.84% 4.55GB 47.63% project/search.rssSearch
1
2
3
4
5
6
7
8
9
10
11
12
13
14
(pprof) list rssSearch
Total: 9.56GB
ROUTINE ======================== project/search.rssSearch in project/search/rss.go
71.53MB 4.55GB (flat, cum) 47.63% of Total

. . 117: // Capture the data we need for our results if we find ...
. . 118: for _, item := range d.Channel.Items {
. 4.48GB 119: if strings.Contains(strings.ToLower(item.Description), strings.ToLower(term)) {
48.53MB 48.53MB 120: results = append(results, Result{
. . 121: Engine: engine,
. . 122: Title: item.Title,
. . 123: Link: item.Link,
. . 124: Content: item.Description,
. . 125: })

strings.ToLower会在堆上面创建新的字符串,循环调用是不需要的,可以放在循环外

改正之后重新测试一下

1
2
3
4
5
6
7
$ go build
$ GODEBUG=gctrace=1 ./project > /dev/null
gc 3 @6.156s 0%: 0.011+0.72+0.068 ms clock, 0.13+0.21/1.5/3.2+0.82 ms cpu, 4->4->2 MB, 5 MB goal, 12 P
.
.
.
gc 1404 @8.808s 7%: 0.005+0.54+0.059 ms clock, 0.060+0.47/0.79/0.25+0.71 ms cpu, 4->5->2 MB, 5 MB goal, 12 P
1
2
3
4
5
6
7
8
9
10
11
12
With Extra Allocations              Without Extra Allocations
======================================================================
Requests : 10,000 Requests : 10,000
----------------------------------------------------------------------
Requests/sec : 1,882 r/s Requests/sec : 3,631 r/s
Total Duration : 5,311ms Total Duration : 2,753 ms
Percent Time in GC : 14% Percent Time in GC : 7%
Total Collections : 2,551 Total Collections : 1,402
----------------------------------------------------------------------
Total GC Duration : 744.54ms Total GC Duration : 192.71 ms
Average Pace of GC : ~2.08ms Average Pace of GC : ~1.96ms
Requests/Collection : ~3.98 r/gc Requests/Collection : 7.13 r/gc

Untitled

相比上一次减少了1149次GC,GC时间占比从14%下降到7%,程序运行速度提高了48%,collection时间减少74%

Untitled