What is a benchmark?
How to write a benchmark.
How to read the results of a benchmark.
Benchmark
Solver
Memory allocations (dynamic and static)
A problem may have different solutions.
\nLet’s take an example: you have lost your keys, and you want to open the door of your house. This problem has several solutions. You can :
\nCall somebody from your entourage that has a spare key
Call a locksmith to open your door
Go back to where you were and look for your keys; if not found in 3 hours, use solution 1 or 2.
Break a window and enter your house
Those solutions will have the same result; your door will be open. But if you are reasonable, you can rank those solutions in terms of cost or time. Solution 2 and 4 will cost you money. Solution 3 (look for your keys) will probably cost you more time. But what if you just forget the keys in your car parked 5 minutes away? Clearly, in that case, solution three will cost you less than expected.
\nBy examining all the different possible solutions and test them in your imagination, you are making a benchmark.
\n\nA benchmark is a tool to compare systems and components
A solver is usually a method.
\nTo choose the best solver, a rule has to be defined. During the benchmark, execution statistics are gathered (the computation time, the number of affectations, the number of function calls ...). With the help of those statistics, we can choose a decision rule.
\nThere is no such thing as a general rule. Rules might differ depending on your needs; for instance, if you want to select the program with less CPU usage, you have to focus only on these statistics. If you design a program that runs on devices with very small memory available, you might focus on the memory usage statistics to choose the best solver.
\n\nWe will compare two algorithms to concatenate strings. The first step is to create the two functions that will implement the two solutions :
\n// benchmark/basic/bench.go\npackage basic\n\nimport (\n "bytes"\n "strings"\n)\n\nfunc ConcatenateBuffer(first string, second string) string {\n var buffer bytes.Buffer\n buffer.WriteString(first)\n buffer.WriteString(second)\n return buffer.String()\n}\n\nfunc ConcatenateJoin(first string, second string) string {\n return strings.Join([]string{first, second}, "")\n}
\nBoth functions concatenate two strings. They use two different methods. The first function ConcatenateBuffer
will use a buffer (from the buffer
package). The second function is a wrapper of the function Join
from the package strings
. We want to know which approach is the best.
Benchmarks are living next to the unit tests. A benchmark is a function located in a test file. Its name must begin with Benchmark. The benchmark functions have the following signature
\nfunc BenchmarkXXX(b *testing.B) {\n}
\nThis function takes as parameter a pointer to a type struct testing.B
. This type struct has only one property exported: N
. Which represents the number of iterations to run. The benchmark will not just run the function one time but several time to gather reliable data about the execution of the benchmarked function. That’s why benchmark functions always encapsulate this kind of for loop :
for i := 0; i < b.N; i++ {\n // execute the function here\n}
\nYou can see that the loop start at 0 and will stop when b.N
is reached. Do not put a value instead of b.N
. The benchmark package will run the benchmark once and then decide if it should continue to run it. The value of N
is adjusted to reach a desirable level of reliability (we will go deep on that later in the chapter). Let’s see our two benchmarks :
// benchmark/basic/bench_test.go \n\nvar result string\nfunc BenchmarkConcatenateBuffer(b *testing.B) {\n var s string\n for i := 0; i < b.N; i++ {\n s = ConcatenateBuffer("test2","test3")\n }\n result = s\n}\n\nfunc BenchmarkConcatenateJoin(b *testing.B) {\n var s string\n for i := 0; i < b.N; i++ {\n s = ConcatenateJoin("test2","test3")\n }\n result = s\n}
\nWe first create a result
variable. This variable is just here to avoid compiler optimization (a tip given by Dave Cheney in a blog post: https://dave.cheney.net/2013/06/30/how-to-write-benchmarks-in-go). We will save the results of our benchmarks in this variable.
Then we define our two benchmark functions BenchmarkConcatenateBuffer
and BenchmarkConcatenateJoin
. Note that they have very similar constructions. The concatenation result is stored into a variable s
. Then we define a for loop, and inside it, we are executing the function we want to bench.
The arguments are fixed; we test the function under the same conditions.
\n\nTo run benchmarks, we use the same go test command :
\n$ go test -bench=.
\nThis command will output :
\ngoos: darwin\ngoarch: amd64\npkg: go_book/benchmark\nBenchmarkConcatenateBuffer-8 20000000 98.9 ns/op\nBenchmarkConcatenateJoin-8 30000000 56.1 ns/op\nPASS\nok go_book/benchmark 3.833s
\nWe will see in the next section how to interpret the test results.
\nThe previous command will run all the benchmarks of the package.
\n\nTo run only the ConcatenateBuffer benchmark, you can use the following command :
\n$ go test -bench ConcatenateBuffer
\nThe previous command is a shorthand for :
\n$ go test -test.bench ConcatenateBuffer
\n\nThe testing package exposes public methods to run a benchmark. Let’s take an example :
\n// benchmark/without-cli/main.go\npackage main\n\nimport (\n "bytes"\n "fmt"\n "testing"\n)\n\nfunc main() {\n res := testing.Benchmark(BenchmarkConcatenateBuffer)\n fmt.Printf("Memory allocations : %d \\n", res.MemAllocs)\n fmt.Printf("Number of bytes allocated: %d \\n", res.Bytes)\n fmt.Printf("Number of run: %d \\n", res.N)\n fmt.Printf("Time taken: %s \\n", res.T)\n}\n\n// ..\nfunc BenchmarkConcatenateBuffer(b *testing.B) {\n //..\n}
\nThe function testing.Benchmark
is waiting for a valid benchmark function, ie. a variable of type func(b *testing.B)
. Remember that in Go functions are the first-class citizens and can be passed to other functions.
The Benchmark function returns a variable of type BenchmarkResult :
\n// standard library\n// src/testing/benchmark.go (v1.11.4)\ntype BenchmarkResult struct {\n N int // The number of iterations.\n T time.Duration // The total time taken.\n Bytes int64 // Bytes processed in one iteration.\n MemAllocs uint64 // The total number of memory allocations.\n MemBytes uint64 // The total number of bytes allocated.\n}
\n\nBenchmarks are executed by default with GOMAXPROCS processors. To have a reliable benchmark, I suggest you control this value; it should be equal to the number of processors of the targeted machine.
\nYou must pass a regular expression to this flag. It will launch the benchmark functions which names match the regular expression.
\nFor instance, the command :
\n$ go test -bench .
\nWill launch all the benchmarks.
\n$ go test -bench Join
\nWill launch all the benchmark functions that contain the string \"Join\"
. In the example BenchmarkConcatenateJoin
will be launched but not BenchmarkConcatenateBuffer
This flag allows you to control your benchmarks’ execution time. You have to pass a duration string (ex: 3s
). The system will parse the duration and execute benchmarks for the specified amount of time. It means that you can increase/decrease the time that the benchmark will take
Example : Let’s run the benchmark named BenchmarkConcatenateJoin
for 5 seconds :
$ go test -bench BenchmarkConcatenateJoin -benchtime 5s\ngoos: darwin\ngoarch: amd64\npkg: go_book/benchmark\nBenchmarkConcatenateJoin-8 100000000 56.9 ns/op\nPASS\nok go_book/benchmark 5.760s
\nWill display in the result the memory allocation statistics. This flag is boolean; it’s set to false
by default. Just add it to the command line to activate this feature.
Example: We can run benchmarks with memory statistics with the following command :
\n$ go test -bench . -benchmem\ngoos: darwin\ngoarch: amd64\npkg: go_book/benchmark\nBenchmarkConcatenateBuffer-8 20000000 105 ns/op 128 B/op 2 allocs/op\nBenchmarkConcatenateJoin-8 30000000 60.2 ns/op 16 B/op 1 allocs/op\nPASS\nok go_book/benchmark 4.093s
\nNote that two additional columns are printed into the benchmark results. In the next section, we will see how to read those stats.
\n\nI find that benchmarks results are difficult to read. We will go through each kind of statistic. For each statistic, we will try to give actionable advice...
\nIn the figure 1 you can see the standard output of the following command :
\n$ go test -bench . -benchmem
\nHere we are running all the benchmarks of the current packages with memory statistics. The benchmark result contains the following statistics :
\nThe first elements to print in the benchmark result are the two Go env variables GOOS
and GOARCH
. You know them already, but they are useful to compare benchmark results.
Duration : This is the total time taken to execute the benchmarks
The number of iterations (second column) : Remember that inside every benchmark function, there is a for loop. This number represents the number of time the for loop has run to obtain the statistics. You can increase the number of iterations by using the -benchtime flag to increase the benchmark duration. It’s not the total number of iteration executed by the benchmark.
Nanoseconds per operation (third column) : it gives you an idea of how fast on average your solver run. In our example the ConcatenateBuffer
function takes on average 55.97 nanoseconds to run. Whereas the ConcatenateJoin
function takes on average 33.63 nanoseconds to run. The fastest function is ConcatenateJoin
in the context of our benchmark.
Number of bytes allocated per operation (fourth column) : This column is present only if you add the flag -benchmem. This will give you an idea about the memory consumption of your solvers. If your focus is to improve memory usage, then you should focus on this statistics.
Number of allocations per operation (fifth column): the name of this stat speaks for itself. This is the average number of memory allocations per run. In the section [sec:Detect-memory-allocations] we will see how to detect memory allocation to improve your code.
Go has a debug mode that allows you to print numerous and highly valuable information about your program’s performance. Memory allocation is an important variable to understand how a program performs. They are roughly two types of memory allocations :
\nStatic : memory is allocated when the program is started. In C, it happens when you create a global or a static variable. This memory is freed when the program stops. It’s only allocated one time
Dynamic : In a program, everything is not known when the program is compiled or start. The behavior of the program can vary in function of the user input for instance. Imagine a program that computes highly complex mathematical operations, the memory needed by the program will depend on the input and can vary drastically (making an addition do not require a lot of memory whereas getting the result of !10000 requires a lot more space). This is why programs need to allocate memory dynamically when they run.
We will focus on dynamic memory allocation. We will use the variable GODEBUG
to output the memory allocations that are done by our two functions.
The first thing to do is to create a sample application that will call our two functions :
\npackage main\n\n// inports\n\nfunc main() {\n basic.ConcatenateBuffer("first","second")\n basic.ConcatenateJoin("first","second")\n}
\nThis application will call our two functions (that are part of the package basic
with import path go_book/benchmark/basic
). Then we compile our program :
$ go build -o allocDetect main.go
\nNote that the -o
flag is used to give a specific name to our binary. Here we choose to name is allocDetect
you could have named it something else, of course.
Then we can run our binary with the GODEBUG
variable set :
$ GODEBUG=allocfreetrace=1 ./allocDetect &>> trace.log
\nGODEBUG
is an environment variable that accepts a list of key-value pairs. Here we tell the go runtime to generate a stack trace for each allocation and free. Then we add \"&>> trace.log\"
to redirect both the standard output and the standard error to the file trace.log. It will create this file if it does not exist, and if it exists, logs will be appended to it.
Inside our trace.log I have 1034 lines of text, consisting of stack traces. How to exploit them? If we refer to the documentation, each program’s memory allocation will generate a stack trace.
\nWe can search by hand into this file to see where the allocation append. But we can use the two commands cat
and grep
:
$ cat trace.log | grep -n /path/to/the/package/basic/bench.go
\nHere we are first printing the content of the trace.log file with “cat trace.log”, then we are asking grep
to search into this file for the string \"/path/to/the/package/basic/bench.go\"
(the string \"/path/to/the/package/basic/bench.go\"
needs to be changed by the path of the package file that you want to analyze)
There is a pipe (|
) between the two commands cat trace.log
and grep -n /path/to/the/package/basic/bench.go
. The pipe is used to chain commands. The output of the first command is the the input of the second command, the whole command is forming a pipeline.
Here is the output :
\n988: /path/to/the/package/basic/bench.go:9 +0x31 fp=0xc000044758 sp=0xc000044710 pc=0x1055c81\n1005: /path/to/the/package/basic/bench.go:12 +0xca fp=0xc000044758 sp=0xc000044710 pc=0x1055d1a\n1028: /path/to/the/package/basic/bench.go:16 +0x7e fp=0xc00008af58 sp=0xc00008aef0 pc=0x1055dde
\nThe path has been found three times in the trace.log on lines 988, 1005, and 1028 (the line numbers are returned by grep because we added the flag -n). Just next to the path string, you have the line number that caused allocation in /path/to/the/package/basic/bench.go.
\nThe next set is to analyze your code to see where memory allocation happens and how you can avoid it. In the ConcatenateBuffer function, line two caused a memory allocation. The creation of the buffer :
\nvar buffer bytes.Buffer
\nAnd the call to the String method :
\nbuffer.String()
\nThe complete list of debug options is available here: https://golang.org/pkg/runtime/#hdr-Environment_Variables
\n\nIn the previous sections, we have written benchmarks where the input remains stable. This approach is sufficient for most use cases. But you might need to understand how your function behaves when its arguments change.
\nWe will use the method Run
defined in the testing
package. The receiver of this method is a pointer to a testing.B
variable.
If we want to go deeper in the analysis, we can test our two functions with variable-length strings. We will use length that are powers of two :
\n2
16
128
1024
8192
65536
524288
4194304
16777216
134217728
The first step is to put those integers into a slice named lengths.
\nlengths := []int{2,16,128,1024,8192,65536,524288,4194304,16777216,134217728}
\nWith a for range loop we iterate over those numbers. At each iteration, we create two random strings.
\nfor _, l := range lengths {\n first := generateRandomString(l)\n second := generateRandomString(l)\n\n}
\nOnce those two strings are created, we can use them as input for our two benchmarked functions.
\nWe will create two sub benchmarks. Sub benchmarks are defined with the help of the Run method. They must be defined into a classical benchmark function. We will name this wrapping function “BenchmarkConcatenation” :
\n// benchmark/variable-input/bench_test.go \n\nfunc BenchmarkConcatenation(b *testing.B){\n var s string\n lengths := []int{2,16,128,1024,8192,65536,524288,4194304,16777216,134217728}\n for _, l := range lengths {\n first := generateRandomString(l)\n second := generateRandomString(l)\n\n }\n}
\nInside the for loop, we will call the b.Run
method twice (b.Run
will create a sub-benchmark). First, we benchmark the ConcatenateJoin
function :
b.Run(fmt.Sprintf("ConcatenateJoin-%d",l), func(b *testing.B) {\n for i := 0; i < b.N; i++ {\n s = ConcatenateJoin(first, second)\n }\n result = s\n})
\nAnd the second time with ConcatenateBuffer :
\nb.Run(fmt.Sprintf("ConcatenateBuffer-%d",l), func(b *testing.B) {\n for i := 0; i < b.N; i++ {\n s = ConcatenateBuffer(first, second)\n }\n result = s\n})
\nNote that the run function takes two arguments :
\nA name, that will be displayed in the benchmark results
A function which represents the sub benchmark. It must take as argument a pointer to a testing.B variable.
We customize the name of the benchmark. We append at the end of the name the value of l
(which represents the number of characters of the two concatenated strings). This customization is necessary to improve the readability of the results.
The second argument is a very classical benchmark function: a for loop that will iterate from 1
to b.N
. Inside this, for loop, you finally find the call to the benchmarked function. We save the result of the function to avoid compiler optimization.
To run this benchmark, you can use the same command as before :
\n$ go test -bench BenchmarkConcatenation -benchmem\ngoos: darwin\ngoarch: amd64\npkg: go_book/benchmark/variableInput\nBenchmarkConcatenation/ConcatenateJoin-2-8 30000000 51.2 ns/op 4 B/op 1 allocs/op\nBenchmarkConcatenation/ConcatenateBuffer-2-8 20000000 93.0 ns/op 116 B/op 2 allocs/op\nBenchmarkConcatenation/ConcatenateJoin-16-8 20000000 62.5 ns/op 32 B/op 1 allocs/op\nBenchmarkConcatenation/ConcatenateBuffer-16-8 20000000 103 ns/op 144 B/op 2 allocs/op\n//...\nok go_book/benchmark/variableInput 33.975s
\nThis is a partial output. I did not copy all the standard output.
\nWe can generate a graph from this data to understand the results better. We will redirect the output to a file for further processing :
\n$ go test -bench BenchmarkConcatenation -benchmem &>> benchmarkConcatenation.log
\nThen we can parse the benchmarkConcatenation.log file to generate a table and draw a graph :
\nThe figure 2 represented the data on a log-lin plot. A log-lin plot is a graph on which the vertical axis is logarithmic and the horizontal has a logarithmic scale. You might be unfamiliar with that approach (if you know it already, you can skip this section).
\nA logarithmic scale is used when the data range is big. In statistics, the range is the difference between the largest and the smallest value. For the dataset :
\n{2,16,128,1024,8192,65536,524288,4194304,16777216,134217728}
\nthe range is 134217728-2=134217726. This is big.
\nIn that case, it’s recommended to use a logarithmic scale and not a linear scale. Instead of having a scale where one millimeter always represent the same value, with a logarithmic scale, the value of a mark on a scale is equal to the the previous mark multiplied by a constant. This constant can vary, but it’s usually 10. The figure 3 shows the difference between the log and the linear scale.
\nOne axis can have a log scale and the other a linear scale. This type of graph is called “log-lin plots”. If both axes have a logarithmic scale, it’s called a log-log plot. Compare the figure 4 and the figure 2. Which plot is better?
\nUnfortunately, Go has no internal tool to generate this kind of graph. I had to manually parse the standard benchmark output to get the data. Here is the script I used :
\npackage main\n\nimport (\n "fmt"\n "io/ioutil"\n "regexp"\n)\n\nfunc main() {\n b, err := ioutil.ReadFile("/path/to/benchmarkConcatenation.log")\n if err!= nil {\n panic(err)\n }\n benchmarkResult := string(b)\n regexBench := regexp.MustCompile(`([a-zA-Z]*)-(\\d+)-.* (\\d+\\.?\\d+?)[\\t]ns.*[\\t](\\d+)[\\t]B.* (\\d+) allocs`)\n matches := regexBench.FindAllStringSubmatch(benchmarkResult,-1)\n fmt.Println("benchmarkedFunction,stringLen,nsPerOp,bytesPerOp,mallocs")\n for _, m := range matches {\n fmt.Printf("%s,%s,%s,%s,%s\\n",m[1],m[2],m[3],m[4],m[5])\n }\n}
\nI used the following regular expression with five capturing groups to retrieve the benchmark data :
\n`([a-zA-Z]*)-(\\d+)-.* (\\d+\\.?\\d+?)[\\t]ns.*[\\t](\\d+)[\\t]B.* (\\d+) allocs`
\nOn the figure 5 you can see the capturing groups highlighted :
\nThe first group captures the name of the function benchmarked (which is stored into m[1]
)
The second group captures the length of the string (m[2]
)
The third group captures the nanoseconds per operations (m[3]
)
The fourth is the number of bytes in memory per operation (m[4]
)
The final group represent the number of allocations (m[5]
)
The variable matches is a two dimensional slice of strings : [][]string
.matches[0]
represent the first benchmark and matches[0][1]
the name of the function benchmarked.
Take into consideration the time variable (ns/op) and the memory usage metrics.
Choose the variable (or the mix of variables) that is (are) coherent with your objectives.
Use the logarithmic scale on your graphs when appropriate (large range of data)
The time taken by the benchmarked function should not increase when the value of b.N increase. Your function’s input should not depend on the b.N number. Otherwise, your benchmark results will not be significant.
\nLet’s take an example :
\nfunc BenchmarkConcatenateBuffer(b *testing.B) {\n var s string\n for i := 0; i < b.N; i++ {\n s = ConcatenateBuffer(generateRandomString(b.N),generateRandomString(b.N))\n }\n result = s\n}
\nHere we have modified the input of ConcatenateBuffer. Instead of two fixed string, we use a random string generator named generateRandomString. This function will generate a pseudo-random string with the help of the math/rand package.
\nLet’s see the result of our benchmark :
\nBenchmarkConcatenateBuffer-8 30000 138583 ns/op 319600 B/op 8 allocs/op
\nThe final number of operations is only 30.000, and it takes an average of 138,583 nanoseconds per operation.
\nThose results are very different from the one we collected with two fixed strings: 100 nanoseconds per operation.
\n\nWhat is the standard header of a benchmark function (name and signature)?
Where are benchmark functions located in the source code?
What is the command to use to run a specific benchmark?
Which flag can you use to display memory allocation statistics?
True or False ? The statistics ns/op is the function’s total time to execute during the benchmark run.
How to create a benchmark with variable input?
What is the standard header of a benchmark function (name and signature)?
\nWhere are benchmark functions located in the source code?
\nWhat is the command to use to run a specific benchmark?
\nIf you have somewhere in your code the benchmark function func BenchmarkConcatenateBuffer(b *testing.B)
go test -bench ConcatenateBuffer
the string \"ConcatenateBuffer\"
is a regular expression
Which flag can you use to display memory allocation statistics?
\nTrue or False ? The statistics ns/op is the function’s total time to execute during the benchmark run.
\nFalse
This is the average time taken by the function.
How to create a benchmark with variable input?
\nb.Run
The objective of designing and running a benchmark is to find the best solving strategy (called solver).
The term “best” should be adapted for your needs.
\nDo you want the fastest solver?
Do you want the solver that has the lowest memory footprint?
A combination of both?
To create a benchmark, write a function with the following header : func BenchmarkNameHere(b *testing.B)
Benchmark functions are placed in unit test files. Here is an example benchmark :
func BenchmarkConcatenateJoin(b *testing.B) {\n var s string\n for i := 0; i < b.N; i++ {\n s = ConcatenateJoin("test2", "test3")\n }\n result = s\n}
\nTo run all benchmarks in a module, use the command : $ go test -bench=.
To run a specific benchmark, use this command : $ go test -bench ConcatenateBuffer
To display memory statistics, add the flag “benchmem” : go test -bench . -benchmem
With memory statistics set to ON, you can get the number of bytes allocated per operation and the number of allocations per operation.
The env variable GODEBUG
allows you to debug program runtime (listing memory allocations, for instance)
A benchmark function can have sub-benchmarks. This is practical to bench the function against different inputs :
func BenchmarkConcatenation(b *testing.B) {\n var s string\n lengths := []int{2, 16, 128, 1024, 8192, 65536, 524288, 4194304, 16777216, 134217728}\n for _, l := range lengths {\n first := generateRandomString(l)\n second := generateRandomString(l)\n b.Run(fmt.Sprintf("ConcatenateJoin-%d", l), func(b *testing.B) {\n for i := 0; i < b.N; i++ {\n s = ConcatenateJoin(first, second)\n }\n result = s\n })\n b.Run(fmt.Sprintf("ConcatenateBuffer-%d", l), func(b *testing.B) {\n for i := 0; i < b.N; i++ {\n s = ConcatenateBuffer(first, second)\n }\n result = s\n })\n }\n}
\n\n \n \n Previous
\n\t\t\t\t\t\t\t\t\tApplication Configuration
\n\t\t\t\t\t\t\t\tNext
\n\t\t\t\t\t\t\t\t\tBuild an HTTP Client
\n\t\t\t\t\t\t\t\t