Go 基准测试

作者收录于 Go

2021-07-17 约 3606 字预计阅读 8 分钟

Go 语言标准库内置的 testing 测试框架提供了基准测试（benchmark）的能力，能让我们很容易地对某一段代码进行性能测试。基准测试主要是通过测试 CPU 和 Memory 的效率问题，来评估被测试代码的性能，进而找到更好的解决方案。

基准测试命令

基准测试命令实例

1
2
3
4
5
6
$ go test -v -bench=.
$ go test -v -bench='Fib$' -benchtime=5s .
$ go test -v -bench='Fib$' -benchtime=1000x .
$ go test -v -bench='Fib$' -benchtime=1000x -count=5 .
$ go test -v -bench='Fib$' -cpu=2,4,8,16,32,64,128,256,512,1024,2048,4096 .
$ go test -v -benchmem -run=^$ -bench '^(BenchmarkFib)$' example.com/hello .

基准测试函数

基准测试函数格式

警告

BenchmarkXxx 中 Xxx 可以是任何字母数字字符串，但是第一个字母不能是小写字母。

1
2
3
4
5
6
func BenchmarkXxx(t *testing.T)

// 以下命名是合法的
func Benchmark123(t *testing.T)
func Benchmark中国(t *testing.T)
func BenchmarkMac(t *testing.T)

基准测试用例函数需要以 Benchmark 为前缀：

前缀用例文件不会参与正常源码编译，不会被包含到可执行文件中；
基准测试用例使用 go test -bench=. 指令来执行，没有也不需要 main() 作为函数入口。所有以 _test 结尾的源码内以 Benchmark 开头的函数会被自动执行；
基准测试函数的参数 b *test.B 必须传入，否则会报函数签名错误，即：wrong signature for BenchmarkXxx, must be: func BenchmarkXxx(b \*testing.B)；

要编写一个新的基准测试，需要创建一个名称以 _test.go 结尾的文件，该文件包含 BenchmarkXxx 函数。

基准测试用例

简单基准测试

待测代码：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
package hello

func Fib(n int) int {
  switch n {
  case 0:
    return 0
  case 1:
    return 1
  default:
    return Fib(n-1) + Fib(n-2)
  }
}

测试代码：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
package hello

import (
  "testing"
)

func BenchmarkFib(b *testing.B) {
  for n := 0; n < b.N; n++ {
    Fib(15)
  }
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
$ go test -v -benchmem -run=^$ -bench '^(BenchmarkFib)$' example.com/hello

goos: darwin
goarch: amd64
pkg: example.com/hello
cpu: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
BenchmarkFib
BenchmarkFib-4   	  238396	      4793 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	example.com/hello	2.190s

基准函数会运行目标代码 b.N 次。在基准执行期间，程序会自动调整 b.N 直到基准测试函数持续足够长的时间。b.N 对于每个用例都是不一样的。b.N 从 1 开始，如果该用例能够在 1s 内完成，b.N 的值便会增加，再次执行。b.N 的值大概以 1, 2, 3, 5, 10, 20, 30, 50, 100 这样的序列递增，越到后面，增加得越快。

提升准确度

Benchmark 的默认时间是 1s，那么我们可以使用 -benchtime 指定为 5s：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// 沿用实例一的代码做基准测试
$ go test -v -benchtime=5s -benchmem -run=^$ -bench '^(BenchmarkFib)$' example.com/hello
goos: darwin
goarch: amd64
pkg: example.com/hello
cpu: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
BenchmarkFib
BenchmarkFib-4           1290742              4529 ns/op               0 B/op          0 allocs/op
PASS
ok      example.com/hello       10.375s

实际执行的时间是 10.375s，比 benchtime 的 5s 要长，测试用例编译、执行、销毁等是需要时间的。

Benchmark 的 -benchtime 的值除了是时间外，还可以是具体的次数：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// 沿用实例一的代码做基准测试
$ go test -v -benchtime=50x -benchmem -run=^$ -bench '^(BenchmarkFib)$' example.com/hello
goos: darwin
goarch: amd64
pkg: example.com/hello
cpu: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
BenchmarkFib
BenchmarkFib-4                50              4556 ns/op               0 B/op          0 allocs/op
PASS
ok      example.com/hello       0.010s

Benchmark 的 -count 参数可以用来设置轮数：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// 沿用实例一的代码做基准测试
$ go test -v -count=3 -benchmem -run=^$ -bench '^(BenchmarkFib)$' example.com/hello
goos: darwin
goarch: amd64
pkg: example.com/hello
cpu: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
BenchmarkFib
BenchmarkFib-4            226737              4483 ns/op               0 B/op          0 allocs/op
BenchmarkFib-4            226686              5645 ns/op               0 B/op          0 allocs/op
BenchmarkFib-4            226284              4485 ns/op               0 B/op          0 allocs/op
PASS
ok      example.com/hello       3.489s

内存分配情况

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
package hello

import (
  "math/rand"
  "testing"
  "time"
)

func genWithCap(n int) []int {
  rand.Seed(time.Now().UnixNano())
  nums := make([]int, 0, n)
  for i := 0; i < n; i++ {
    nums = append(nums, rand.Int())
  }
  return nums
}

func genWithoutCap(n int) []int {
  rand.Seed(time.Now().UnixNano())
  nums := make([]int, 0)
  for i := 0; i < n; i++ {
    nums = append(nums, rand.Int())
  }
  return nums
}

func BenchmarkGenWithCap(b *testing.B) {
  for n := 0; n < b.N; n++ {
    genWithCap(1000000)
  }
}

func BenchmarkGenWithoutCap(b *testing.B) {
  for n := 0; n < b.N; n++ {
    genWithoutCap(1000000)
  }
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
$ go test -v -benchmem -run=^$ -bench '^BenchmarkGen' example.com/hello
goos: darwin
goarch: amd64
pkg: example.com/hello
cpu: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
BenchmarkGenWithCap
BenchmarkGenWithCap-4                 40          28733419 ns/op         8003585 B/op          1 allocs/op
BenchmarkGenWithoutCap
BenchmarkGenWithoutCap-4              26          38608707 ns/op        45188404 B/op         40 allocs/op
PASS
ok      example.com/hello       3.167s

可以看到 genWithoutCap 分配的内存空间是 genWithCap 的 45188404/8003585 ≈ 5.6 倍，设置了切片容量，内存只分配一次，而不设置切片容量，内存分配了 40 次。

测试输入规模

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
package hello

import (
  "math/rand"
  "testing"
  "time"
)

func gen(n int) []int {
  rand.Seed(time.Now().UnixNano())
  nums := make([]int, 0)
  for i := 0; i < n; i++ {
    nums = append(nums, rand.Int())
  }
  return nums
}

func helper(i int, b *testing.B) {
  for n := 0; n < b.N; n++ {
    gen(i)
  }
}

func BenchmarkGen10(b *testing.B)      { helper(10, b) }
func BenchmarkGen100(b *testing.B)     { helper(100, b) }
func BenchmarkGen1000(b *testing.B)    { helper(1000, b) }
func BenchmarkGen10000(b *testing.B)   { helper(10000, b) }
func BenchmarkGen100000(b *testing.B)  { helper(100000, b) }
func BenchmarkGen1000000(b *testing.B) { helper(1000000, b) }

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
$ go test -v -benchmem -run=^$ -bench '^(BenchmarkGen10|BenchmarkGen100|BenchmarkGen1000|BenchmarkGen10000|BenchmarkGen100000|BenchmarkGen1000000)$' example.com/hello

goos: darwin
goarch: amd64
pkg: example.com/hello
cpu: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
BenchmarkGen10
BenchmarkGen10-4        	   85984	     12604 ns/op	     248 B/op	       5 allocs/op
BenchmarkGen100
BenchmarkGen100-4       	   89773	     14911 ns/op	    2040 B/op	       8 allocs/op
BenchmarkGen1000
BenchmarkGen1000-4      	   28988	     38968 ns/op	   16376 B/op	      11 allocs/op
BenchmarkGen10000
BenchmarkGen10000-4     	    3843	    326670 ns/op	  386296 B/op	      20 allocs/op
BenchmarkGen100000
BenchmarkGen100000-4    	     357	   3429826 ns/op	 4654346 B/op	      30 allocs/op
BenchmarkGen1000000
BenchmarkGen1000000-4   	      37	  35700196 ns/op	45188381 B/op	      40 allocs/op
PASS
ok  	example.com/hello	9.278s

通过测试结果可以发现，输入变为原来的 10 倍，函数每次调用的时长也差不多是原来的 10 倍，这说明复杂度是线性的。

B 类型

报告方法

基准测试中，传递给基准测试函数的参数是 *testing.B 类型。B 是传递给基准测试函数的一种类型，它用于管理基准测试的计时行为，并指示应该迭代地运行测试多少次。

跟单元测试一样，基准测试会在执行的过程中积累日志，并在测试完毕时将日志转储到标准错误。但跟单元测试不一样的是，为了避免基准测试的结果受到日志打印操作的影响，基准测试总是会把日志打印出来。

B 类型中的报告方法使用方式和 T 类型是一样的，一般来说，基准测试中也不需要使用，毕竟主要是测性能。

计时方法

StartTimer：开始对测试进行计时。该方法会在基准测试开始时自动被调用，我们也可以在调用 StopTimer 之后恢复计时；
StopTimer：停止对测试进行计时。当你需要执行一些复杂的初始化操作，并且你不想对这些操作进行测量时，就可以使用这个方法来暂时地停止计时；
ResetTimer：对已经逝去的基准测试时间以及内存分配计数器进行清零。对于正在运行中的计时器，这个方法不会产生任何效果；

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
package hello

import (
  "testing"
  "time"
)

func fib(n int) int {
  if n == 0 || n == 1 {
    return n
  }
  return fib(n-2) + fib(n-1)
}

func BenchmarkFib(b *testing.B) {
  time.Sleep(time.Second * 3)
  // b.ResetTimer()
  for n := 0; n < b.N; n++ {
    fib(30)
  }
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
// 注释 b.ResetTimer()
$ go test -v -benchtime=50x -benchmem -run=^$ -bench '^(BenchmarkFib)$' example.com/hello
goos: darwin
goarch: amd64
pkg: example.com/hello
cpu: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
BenchmarkFib
BenchmarkFib-4   	      50	  66575387 ns/op	       1 B/op	       0 allocs/op
PASS
ok  	example.com/hello	6.358s

// 打开 b.ResetTimer()
$ go test -v -benchtime=50x -benchmem -run=^$ -bench '^(BenchmarkFib)$' example.com/hello
goos: darwin
goarch: amd64
pkg: example.com/hello
cpu: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
BenchmarkFib
BenchmarkFib-4   	      50	   6419112 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	example.com/hello	6.344s

可以看到，当注释 b.ResetTimer() 后，每次执行需要 66575387/1000000000=0.06657539≈0.067 秒；当打开 b.ResetTimer() 后，每次执行需要 6419112/1000000000=0.00641911≈0.006 秒。所以使用 b.ResetTimer() 重置定时器后快了 0.067/0.006=11.16666667≈11 倍。

在某些情况下，每次调用函数前后需要一些准备工作和清理工作，可以使用 StopTimer 暂停计时以及使用 StartTimer 开始计时：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
package hello

import (
  "math/rand"
  "testing"
  "time"
)

func genWithCap(n int) []int {
  rand.Seed(time.Now().UnixNano())
  nums := make([]int, 0, n)
  for i := 0; i < n; i++ {
    nums = append(nums, rand.Int())
  }
  return nums
}

func bubbleSort(nums []int) {
  for i := 0; i < len(nums); i++ {
    for j := 1; j < len(nums)-i; j++ {
      if nums[j] < nums[j-1] {
        nums[j], nums[j-1] = nums[j-1], nums[j]
      }
    }
  }
}

func BenchmarkBubbleSort(b *testing.B) {
  for n := 0; n < b.N; n++ {
    b.StopTimer()
    nums := genWithCap(10000)
    b.StartTimer()
    bubbleSort(nums)
  }
}

1
2
3
4
5
6
7
8
9
$ go test -v -benchtime=50x -benchmem -run=^$ -bench '^(BenchmarkBubbleSort)$' example.com/hello
goos: darwin
goarch: amd64
pkg: example.com/hello
cpu: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
BenchmarkBubbleSort
BenchmarkBubbleSort-4   	      50	 124283402 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	example.com/hello	6.380s

并行执行

通过 RunParallel 方法能够并行地执行给定的基准测试。RunParallel 会创建出多个 goroutine，并将 b.N 分配给这些 goroutine 执行，其中 goroutine 数量的默认值为 GOMAXPROCS。用户如果想要增加非 CPU 受限（non-CPU-bound）基准测试的并行性，那么可以在 RunParallel 之前调用 SetParallelism（如：SetParallelism(2)，则 goroutine 数量为 2*GOMAXPROCS）。RunParallel 通常会与 -cpu 标志一同使用。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
package hello

import (
  "bytes"
  "testing"
  "text/template"
)

func BenchmarkTemplateParallel(b *testing.B) {
  tpl := template.Must(template.New("test").Parse("Hello, {{.}}!"))
  // RunParallel will create GOMAXPROCS goroutines and distribute work among them.
  b.RunParallel(func(pb *testing.PB) {
    // Each goroutine has its own bytes.Buffer.
    var buf bytes.Buffer
    for pb.Next() {
    // The loop body is executed b.N times total across all goroutines.
      buf.Reset()
      tpl.Execute(&buf, "World")
    }
  })
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
$ go test -v -benchmem -run=^$ -bench '^(BenchmarkTemplateParallel)$' example.com/hello

goos: darwin
goarch: amd64
pkg: example.com/hello
cpu: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
BenchmarkTemplateParallel
BenchmarkTemplateParallel-4   	 7044910	       198.2 ns/op	      48 B/op	       1 allocs/op
PASS
ok  	example.com/hello	1.581s

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
package hello

import (
  "testing"
)

func BenchmarkSelectNonblock(b *testing.B) {
  ch1 := make(chan int)
  ch2 := make(chan int)
  ch3 := make(chan int, 1)
  ch4 := make(chan int, 1)
  b.RunParallel(func(pb *testing.PB) {
    for pb.Next() {
      select {
        case <-ch1:
        default:
      }
      select {
        case ch2 <- 0:
        default:
      }
      select {
        case <-ch3:
        default:
      }
      select {
        case ch4 <- 0:
        default:
      }
    }
  })
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
$ go test -v -benchmem -run=^$ -bench '^(BenchmarkSelectNonblock)$' example.com/hello

goos: darwin
goarch: amd64
pkg: example.com/hello
cpu: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
BenchmarkSelectNonblock
BenchmarkSelectNonblock-4   	96872894	        13.02 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	example.com/hello	1.286s

内存统计

ReportAllocs 方法用于打开当前基准测试的内存统计功能，与 go test 使用 -benchmem 标志类似，但 ReportAllocs 只影响那些调用了该函数的基准测试。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
package hello

import (
  "bytes"
  "html/template"
  "testing"
)

func BenchmarkTmplExucte(b *testing.B) {
  b.ReportAllocs()
  tpl := template.Must(template.New("test").Parse("Hello, {{.}}!"))
  b.RunParallel(func(pb *testing.PB) {
    // Each goroutine has its own bytes.Buffer.
    var buf bytes.Buffer
    for pb.Next() {
      // The loop body is executed b.N times total across all goroutines.
      buf.Reset()
      tpl.Execute(&buf, "World")
    }
  })
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
$ go test -v -benchmem -run=^$ -bench '^(BenchmarkTplExucte)$' example.com/hello

goos: darwin
goarch: amd64
pkg: example.com/hello
cpu: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
BenchmarkTmplExucte
BenchmarkTmplExucte-4
 1525129	       875.6 ns/op	     240 B/op	       8 allocs/op
PASS
ok  	example.com/hello	2.152s

基准测试结果

1
2
// 循环执行了 238396 次，每次循环花费 4793 ns
BenchmarkFib-4   	  238396	      4793 ns/op	       0 B/op	       0 allocs/op

BenchmarkFib-4：BenchmarkFib-8 中的 -8 即 GOMAXPROCS，默认等于 CPU 核数。可以通过 -cpu 参数改变 GOMAXPROCS，-cpu 支持传入一个列表作为参数（比如：-cpu=2,4,8,...）；
238396：基准测试的迭代总次数 b.N；
4793 ns/op：平均每次迭代所消耗的纳秒数；
0 B/op：平均每次迭代内存所分配的字节数；
0 allocs/op：平均每次迭代的内存分配次数；

在 testing 包中的 BenchmarkResult 类型保存了基准测试的结果，定义如下：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// BenchmarkResult contains the results of a benchmark run.
type BenchmarkResult struct {
  N         int           // The number of iterations.
  T         time.Duration // The total time taken.
  Bytes     int64         // Bytes processed in one iteration.
  MemAllocs uint64        // The total number of memory allocations.
  MemBytes  uint64        // The total number of bytes allocated.

  // Extra records additional metrics reported by ReportMetric.
  Extra map[string]float64
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
package main

import (
	"bytes"
	"fmt"
	"testing"
	"text/template"
)

func main() {
  res := testing.Benchmark(func(b *testing.B) {
    tpl := template.Must(template.New("test").Parse("Hello, {{.}}!"))
    b.RunParallel(func(pb *testing.PB) {
      var buf bytes.Buffer
      for pb.Next() {
        buf.Reset()
        tpl.Execute(&buf, "World")
      }
    })
  })
  fmt.Printf("%8d\t%10d ns/op\t%10d B/op\t%10d allocs/op\n",
    res.N,
    res.NsPerOp(),
    res.AllocedBytesPerOp(),
    res.AllocsPerOp(),
  )
}

1
2
$ go run main.go
 5220064	       196 ns/op	        48 B/op	         1 allocs/op

目录

Go 基准测试

基准测试命令

基准测试命令实例

基准测试函数

基准测试函数格式

基准测试用例

简单基准测试

提升准确度

内存分配情况

测试输入规模

B 类型

报告方法

计时方法

并行执行

内存统计

基准测试结果