package
interface
method
receiver
cyclomatic complexity
Halstead metrics
This chapter will try to answer the question “how do I design my Go code?”. As a new Go developer from other languages, I asked myself this question during my beginnings. When you write software for your only use, this question is not important. When you work in a team, you have to follow conventions and best practices.
\nI have read popular blog posts written by Go developers to write this chapter. My objective in this chapter is to compile those bits of advice. Do not follow them religiously. Keep in mind that each project is different.
\n\nPackages names are exposed to package users. Therefore they must be chosen with attention by developers. What does it means when I say exposed to users ? It means that when somebody wants to use the function Bar of your package, he has to write :
\npkgName.Bar()
\nHere are some standard rules that we can follow
\nSources :
Advice | \n
---|
Short: no more than one word | \n
No plural | \n
Lower case | \n
Informative about the service it gives | \n
No utility packages | \n
Examples
\nnet very short, no plural, all lowercase, immediately we know that this package contains networking functionalities.
os
Counterexamples
\nmodels: signals a package that is only here to support type struct that defines a data model. The service it gives is not clear (except that it gather data models). For instance, we could have a user package that will hold the user model and functionalities related to users.
userManagement : it’s not a single word. We will use it to manage the users of the application , but in my opinion, those functionalities should live in a “user” package (with methods with a pointer to the User type as receivers).
utils : this package generally hold function used in other packages. Therefore it’s advised to move the functions directly in the packages where they are used
About utility packages : having a utils package might seem legit for developers coming from other languages, but for gophers, it does not make sense because it will necessarily mix functions that could have been inserted directly where they are used. It’s some kind of reflex that we have at the beginning of a project to put in this kind of package functions that could be useful elsewhere. But utilities are not prohibited. The standard Go library exposes utility functions but tends to group them by type. For instance strings or bytes.
Interfaces define behaviors. Based on this definition of behaviors, you can define multiple implementations of behaviors (see 1). Users of your package are not interested in your implementation. They just care about the service you give them. An interface defines a contract for the use of your public API. The implementation might change. For instance, you can improve the performance of the implementation of your method. Even if you change drastically how your package does what it does, the way to call it will remain stable.
\nWith Go, an interface is a special kind of type. You can use it as a function or method argument.
\nAdvice | \n
---|
Use interfaces as function/method arguments & as field types | \n
Small interfaces are better | \n
To better understand, let’s take an example. Imagine building a new fancy cryptographic algorithm to exchange messages with a friend secretly.
\nYou will need to develop a function to decrypt messages (and to encrypt them). In the next listing, you can see your first attempt (Decrypt1
) :
func Decrypt1(b []byte) ([]byte, error) {\n //...\n}
\nIt takes a slice of bytes as argument and returns a slice of bytes and an error. There is nothing bad with this function except that we can only use it with a slice of bytes as input. Imagine that instead of a slice of byte, you want to decrypt a whole file with this method? You will need to read all bytes from the file and pass it to the Decrypt1
function.
We have to find a type that will make our Decrypt1
function more generic. The type interface that will serve this purpose is io.Reader
. Many types in the standard library implement this interface:
os.File
net.TCPConn
net.UDPConn
net.UnixConn
If you accept an io.Reader
as parameter, you can decrypt files, but also use it with data transmitted through TCP or UDP. Here is the second version of the function :
func Decrypt2(r io.Reader) ([]byte, error) {\n //...\n}
\nThe io.Reader
interface define one Behaviour, Read
. A type that implements the Read
function as defined in the interface io.Reader
is an io.Reader
.
It means that our Decrypt2
function can take any type that implements the io.Reader interface.
If we take, for example, the interfaces of the standard Go library, you can notice that they are often very small. The number of behaviors defines the size of an interface (in other words, the number of method signatures specified).
\nIn Go, you do not need to specify that a type implements an interface. Consequently, when your interface is composed of many behaviors, it’s hard to see which types implement the interface. That’s why small interfaces are easier to handle in the day-to-day programmer’s life.
\nYou can note that many interfaces are composed of 2-3 methods in Go standard library. Let’s take the example of the two famous io.Reader and io.Writer :
\ntype Reader interface {\n Read(p []byte) (n int, err error)\n}\ntype Writer interface {\n Write(p []byte) (n int, err error)\n}
\nHere is a counter-example :
\ntype Bad interface {\n Foo(string) error\n Bar(string) error\n Baz([]byte) error\n Bal(string, io.Closer) error\n Cux()\n Corge()\n Corege3()\n}
\nThe interface Bad
is hard to implement. Someone that wants to implement it will need to develop seven methods! If you plan to build a widely used package, you make it difficult to newcomers to use your abstractions.
A package can be composed of a single file. That’s perfectly legal, but if your file has more than 600 lines1, it can become difficult to read. Here are some practical pieces of advice to improve the readability of your sources.
\nAdvice | \n
---|
You should name one file like the package | \n
No more than 600 lines per file | \n
One file = One responsibility | \n
One file should be named like the package : if you have multiple files in your package it’s better to name a file like the package. For instance, in figure 2 you can see that we have two packages: fruit and bike. In the fruit package, we have a fruit.go, and in the bike package, we have a bike.go source file. Those files can support shared types, interfaces, and constants common to all fruits (or bikes).
No more than 600 lines per file : this advice will improve your program/package readability. Files should be short (but not too short); it will make the life of maintainers easier (scrolling over long files can be boring). Note that this limit is arbitrary; you can, of course, adapt it to your standards.
One file = One responsibility : imagine that you are part of the go developers team, and you have been assigned to the correction of a nasty bug. On the GitHub issue, the user is complaining about the way cookies are handled by the HTTP client. You will have to find where the cookies are managed. With no surprise, cookies are managed in the file net/http/cookie.go. This naming convention allows developers to locate source code responsibilities easily.
Errors and problems are part of the programming game. Your program has to handle all errors that might happen. As a programmer, you must think about the worst. Ask yourself what could go wrong in that line? What techniques could employ an evil user to make your program fail?
\nAdvice | \n
---|
Always add context to errors | \n
Never ignore errors | \n
Use fatal errors carefully | \n
Create fault-tolerant programs | \n
func main() {\n err := foo("test")\n if err != nil {\n fmt.Println(err)\n }\n}\n\nfunc foo(bar string) error {\n err := baz()\n if err != nil {\n return err\n }\n return nil\n}\n\nfunc baz() error {\n return corge()\n}\n\nfunc corge() error {\n _, err := ioutil.ReadFile("/my/imagination.go")\n if err != nil {\n return err\n }\n return nil\n}\n\nfunc looping() ([]byte, error) {\n return ioutil.ReadFile("/my/imagination.go")\n}
\nIn this small example, we have created three function foo
, baz
, corge
and looping
. In the main function, we are calling foo
. This function will call baz.Baz
will call corge
and corge
will finally try to open a file (that doesn’t exist).
When we execute the program, we get the following output :
\nopen /my/imagination.go: no such file or directory
\nWhere does the error come from? Does it come from the function corge ? Does it come from the function looping
? If you want to know, you will have to follow the execution path mentally (to discover finally that looping
is never called, and thus the error comes from corge.
This exercise is hard in this example; it can become a nightmare for bigger applications of packages with hundreds of files.
\nThe solution? Use the standard error package that allow you to wrap errors. (put another error into an error) 2:
\n// recommendation/errors/better/main.go\npackage main\n\nimport (\n "fmt"\n "io/ioutil"\n)\n\nfunc main() {\n err := foo("test")\n if err != nil {\n fmt.Println(err)\n }\n}\n\nfunc foo(bar string) error {\n err := baz()\n if err != nil {\n return fmt.Errorf("error while calling baz: %w", err)\n }\n return nil\n}\n\nfunc baz() error {\n return corge()\n}\n\nfunc corge() error {\n _, err := ioutil.ReadFile("/my/imagination.go")\n if err != nil {\n return fmt.Errorf("error while reading file: %w", err)\n }\n return nil\n}\n\nfunc looping() ([]byte, error) {\n return ioutil.ReadFile("/my/imagination.go")\n}
\nWe simply use fmt.Errorf
with the formatting verb %w
that will wrap the error. With this simple addition, the output of our program is now :
error while calling baz: error while reading file: open /my/imagination.go: no such file or directory
\nYou can see that the errors are clearer, and the localization of the failure immediate.
\nNever ignore errors. It’s maybe obvious, but still, many developers are making this mistake. Errors that arise should be handled :
\nreturned to the caller
or treated (your code implement some kind of auto-correction mechanism)
Use fatal errors carefully. When you make a call to log.Fatal[f] you are implicitly forcing your program to exit very abruptly (with an os.Exit(1)
).“The program terminates immediately; deferred functions are not run.” (os/proc.go). Deferred functions are often used for cleaning logic (for instance closing file descriptors). As a consequence, not running them is not optimal.
// standard log package.\n// Fatal is equivalent to Print() followed by a call to os.Exit(1).\nfunc Fatal(v ...interface{}) {\n std.Output(2, fmt.Sprint(v...))\n os.Exit(1)\n}
\nFor instance, you are building a program that makes a call to a webservice. During the execution of your program, the call failed. The source of failure is the network (your server has been disconnected from the internet). This error is recoverable, meaning that you can recover from the error because the network will become available again sometime.
\nIf the call to your webservice successfully pass through the network but returned an http 301 error (“Moved Permanently”) the error is not recoverable. You had made a mistake when you defined the URL of the webservice Or your webservice provider has changed something without warning you. Human intervention will be necessary.
\nA fallback option is “is a contingency option to be taken if the preferred choice is unavailable” (Wikipedia). In our program, for instance, the network call is not possible or has returned an error. We should think about options.
\nOptions will not be the same if the error is recoverable or not.
\nIf you experienced a network failure, instead of directly returning the error, you could implement a retry mechanism. You will retry to contact the webservice a configurable number of times.
\n\nFunctions and methods are everywhere inside a program. A syntactically correct function (i.e., the program compiles) might not be stylistically good. We want here to introduce some recommendations related to function writing. In a word, how to write functions with style.
\nAdvice | \n
---|
One function has one goal | \n
Simple names | \n
Limited length (100 lines maximum) | \n
Reduce cyclomatic complexity | \n
Reduce the number of nesting levels | \n
A function is a named procedure (or routine) that will perform a specific task. It can have input parameters and also output parameters. The important term here is “specific”. The function (or method) performs a single task, not multiple. It has a single goal.
\nA great function is a function that does one thing, and that does it perfectly well. In math, for instance, the exponential function will compute the value of exp(x) for every x real value.
\nThis function should have only one goal, which is simple to understand. The function will not compute at the same time the exponential and the logarithmic value of x. Instead, we have two functions, the exponential function, and the logarithm function.
\nHere is a counter-example :
\ntype User struct {\n //...\n}\n\nfunc (u *User) saveAndAuthorize error {\n //...\n return nil\n}
\nThe method saveAndAuthorize
perform 2 tasks :
save the user
and authorize it.
Two different tasks require different abilities (writing to a database, reading from a database, check an access token validity ...). This program will compile, but it will be difficult to test. The error returned can be provoked by the failure of the data layer but also from the security layer of the app.
\nA solution could be to split the function into two different ones : create
and authorize
.
func (u *User) create error {\n //..\n return nil\n}\n\nfunc (user *User) authorize error {\n //...\n return nil\n}
\n\nFor instance :
\nfunc (u *User) saveUser() error {\n\n return nil\n}\n\nfunc (u *User) authorizeUser() error {\n\n return nil\n}
\nWe can rename those two functions:
\nfunc (u *User) save() error {\n\n return nil\n}\n\nfunc (u *User) authorize() error {\n\n return nil\n}
\nWe reduce the function name’s size by removing the type name user. Remember to always think in the perspective of the caller of your package. Let’s compare those two snippets :
\nuser := user.NewUser()\nerr := user.saveUser()\nif err != nil {\n //..\n}\n\nuser := user.New()\nerr := user.save()\nif err != nil {\n //..\n}
\nThe second one is is much simpler than the first one. The first one is composed of 65 characters, whereas the second one is composed of 57 characters (space included).
\n\nA function should pursue one goal (see the previous section), and it should be small. When you increase the number of lines in a function, you also increase the time and cognitive effort to read it and then understand it.
\nFor instance, here is the function Pop
from the package heap
:
func Pop(h Interface) interface{} {\n n := h.Len() - 1\n h.Swap(0, n)\n down(h, 0, n)\n return h.Pop()\n}
\nThe number of lines of this function is just 4. It makes it pretty easy and to understand. Compare this function from the ascii85
package :
func (d *decoder) Read(p []byte) (n int, err error) {\n if len(p) == 0 {\n return 0, nil\n }\n if d.err != nil {\n return 0, d.err\n }\n\n for {\n // Copy leftover output from last decode.\n if len(d.out) > 0 {\n n = copy(p, d.out)\n d.out = d.out[n:]\n return\n }\n\n // Decode leftover input from last read.\n var nn, nsrc, ndst int\n if d.nbuf > 0 {\n ndst, nsrc, d.err = Decode(d.outbuf[0:], d.buf[0:d.nbuf], d.readErr != nil)\n if ndst > 0 {\n d.out = d.outbuf[0:ndst]\n d.nbuf = copy(d.buf[0:], d.buf[nsrc:d.nbuf])\n continue // copy out and return\n }\n if ndst == 0 && d.err == nil {\n // Special case: input buffer is mostly filled with non-data bytes.\n // Filter out such bytes to make room for more input.\n off := 0\n for i := 0; i < d.nbuf; i++ {\n if d.buf[i] > ' ' {\n d.buf[off] = d.buf[i]\n off++\n }\n }\n d.nbuf = off\n }\n }\n\n // Out of input, out of decoded output. Check errors.\n if d.err != nil {\n return 0, d.err\n }\n if d.readErr != nil {\n d.err = d.readErr\n return 0, d.err\n }\n\n // Read more data.\n nn, d.readErr = d.r.Read(d.buf[d.nbuf:])\n d.nbuf += nn\n }\n}
\nThis function has 50 lines.
\nWhat is the good number of lines. In my opinion, a good function should not have more than 30 lines. You should be able to display a function in your IDE (code editor) window without the need to scroll down. On my IDE, I can read just 38 lines at once.
\n\nThe number of lines inside a function is not sufficient to judge its simplicity. In 1976 Thomal J.McCabe developed an interesting notion called “cyclomatic complexity”
We can compose a function of one or more conditional statements. For instance, we can have several if statements. Let’s take a look at the following example.
\npackage main\n\nimport "fmt"\n\nfunc main() {\n fmt.Println(foo(2, 3))\n fmt.Println(foo(11, 0))\n fmt.Println(foo(8, 12))\n}\n\nfunc foo(a, b int) int {\n if a > 10 {\n return a\n }\n if b > 10 {\n return b\n }\n return b - a\n}
\nIn the function foo we have two input parameters, a
and b
. In the function body, we can observe two if statements. We have two conditions (we compare a
and b
to specific numbers).
When we run our function, we can imagine three logical “paths” :
\nThe first condition is true. The second condition is not evaluated. The returned value is a
.
The first condition is false, the second is true. The returned value is b
.
The first condition is false, and the second is also false. The return value is b-a
We have three paths. The more paths you have, the more effort you need to understand the function.
\nThe more paths you got, the more unit tests you have to develop to cover all the possible cases.
\n\nThis section is not mandatory to understand the concept of cyclomatic complexity reduction. But, you might find it interesting to know the reasoning behind “cyclomatic complexity”.
\nFirst, every program can be seen as a graph. A graph is composed of nodes and edges. For instance on the figure 3 you can see a graph. A graph is composed of nodes and edges. Each node will represent a group of code. The edges will represent the flow of control in the program.
\nLet’s take a simple example. We have the following function :
\nfunc bar(a int) {\n fmt.Println("start of function")\n if a > 2 {\n fmt.Println("a is greater than 2")\n return\n }\n fmt.Println("you got")\n fmt.Println("bad luck")\n}
\nWe have represented here the program with a set of nodes and edges. Each block of code is represented by a node. This is very important here. We do not add a node for every statement but for every group of executed statements following a decision rule. Here we have to call to fmt.Println
that are represented by a single node.
Let’s count the nodes and the edges on that graph :
\nFour nodes
Three edges
The formula to get the cyclomatic complexity is (for a function) :
\nV(G)= # of edges- # of nodes + 2
\nThe cyclomatic number is denoted V(G)
.
V(G)= 3 - 3 + 2 = 2
\nHere the cyclomatic number is equal to 2 it means that our program defines two linearly independent paths. When this number grows, your function grows in complexity :
\nMore paths mean more unit tests to develop to cover your code fully
More paths mean more brainpower needed for your colleagues to understand your code.
Some important facts about the cyclomatic number (
This number depends only on the “decision structure of the graph”.
It is not affected when you add a functional statement to your code.
If you insert a new edge in the graph, then you will increase the cyclomatic number by 1
I want to devote this section to the so-called “Halstead Complexity metrics”. Maurice Howard Halstead was one of the pioneers of computer science. He has developed in 1977 metrics to assess a program’s complexity with metrics derived from its source code.
\nThe program vocabulary
The program length
The effort required to write the program
The difficulty that will be needed to read and understand the program
Halstead metrics are based on two concepts. Operators and operands. A program is composed of tokens. Those tokens are reserved words, variables names, brackets, curly brackets... etc. Those tokens can be classed into two main categories :
\nOperators :
\nAll reserved words. (func, const, var,...)
Pairs of brackets, pairs of curly brackets ({},())
All the comparison and logical operators ( >,<,&&,||,...)
Operands :
\nIdentifiers (a, myVariableName, myConstantName, myFunction,...)
Constant values (“this is a string”, 3, 22,...)
Type specification (int, bool,...)
From those two definitions, we can extract some base numbers (we will use that to compute Halstead metrics) :
\nn_{1} the number of distinct operators
n_{2} the number of distinct operands
N_{1} the total number of operators
N_{2} the total number of operands
Let’s take an example program to extract those four numbers :
\nfunc bar(a int) {\n fmt.Println("start of function")\n if a > 2 {\n fmt.Println("a is greater than 2")\n return\n }\n fmt.Println("you got")\n fmt.Println("bad luck")\n}
\nn_{1} is the number of distinct operators
\nfunc, bar, (), {}, if, >, return
\nn_{2} is the number of distinct operands
\nint, a, fmt.Println,start of function, 2, a is greater than 2,you got,bad luck
\nN_{1} is the total number of operators
\nfunc, bar, (), () ,() ,() ,() ,{},{}, if, >, return
\nN_{2} is the total number of operands
\na, a,start of function, 2, a is greater than 2,you got,bad luck, fmt.Println, fmt.Println,fmt.Println, fmt.Println, int
\nLet’s compute the Halstead metrics for our program :
\nn=n_{1}+n_{2}=8+9=17
\nN=N_{1}+N_{2}=12+12=24
\n\\frac{n_{1}}{2}\\times\\frac{N_{2}}{n_{2}}=5.33
\n\\text{length}\\times log_{2}(\\text{vocabulary})=98.10
\n\\text{difficulty}\\times\\text{volume = 523.20}
\nThose formulas are worth some explanations.
\nThe vocabulary of a program is just like the vocabulary of some essay. For an English essay, we can say that an author’s vocabulary is the total number of different words. For a program, this is the addition of the total number of distinct operators and operands. If the program use only reserved words and a very limited number of identifiers, its vocabulary will below. On the contrary, if your program uses many identifiers, the vocabulary will increase.
The length of a program is the total number of operators and operands used. Here we are not counting distinct tokens but the total number of tokens.
The difficulty is here to give an idea about the amount of time needed to write the program, and to read it. This metrics is equal to half the number of operands multiplied by the quotient between the total number of operators and the distinct number of operands. If your program uses a limited number of operands, the difficulty will be reduced. If the total number of operands increases, the difficulty will also increase (more comparisons, more identifiers, more types to handle and remember).
The effort metric can then be used to compute the time necessary to write the program.
Time to write the program : E/18 (in seconds) : in our example : 29 seconds (523,20 / 18).
\nHalstead also details an estimated number of bugs! B=\\frac{E^{2/3}}{3000}
\n\nThose metrics are interesting, but we should take them cautiously.
However, they highlight that the more code we write, the more complex our program will become.
A simple, short, and stupid code is better than an over-engineered solution.
This advice has to be linked with the previous section. When you write a program, you can introduce nested statements, i.e., statements executed in a specific branch. Let’s take an example. Here, we have a dummy function with a first condition that creates two branches. In the first branch, visualize we have introduced another conditional statement (if b < 2 ) that will also create two branches.
\n// recommendation/nesting/main.go \n//...\n\nfunc nested(a, b int) {\n if a > 1 {\n if b < 2 { // nested condition\n fmt.Println("action 1")\n } else {\n fmt.Println("action 2")\n }\n } else {\n fmt.Println("action 3")\n }\n fmt.Println("action 4")\n}
\nYou can vizualize the in the sequence diagram (figure 4) the branches.
\nWe can add another level of nesting :
\n// recommendation/nesting/main.go \n//...\n\nfunc nested2(a, b int) {\n if a > 1 {\n if b < 2 { // nested condition\n if a > 100 {\n fmt.Println("action 1")\n } else {\n fmt.Println("action 2")\n }\n } else {\n fmt.Println("action 3")\n }\n } else {\n fmt.Println("action 4")\n }\n fmt.Println("action 5")\n}
\nOn the figure 5 you can see the impact on the sequence diagram of this new nesting level.
\nThe more nesting levels you got, the more complicated your code will become. One general piece of advice would be to limit nesting levels. If you find yourself in a situation where you cannot avoid it, you should instead create another function to support that complexity :
\n// recommendation/nesting/main.go \n//...\n\nfunc nested3(a, b int) {\n if a > 1 {\n subFct1(a, b)\n } else {\n fmt.Println("action 4")\n }\n fmt.Println("action 5")\n}\n\nfunc subFct1(a, b int) {\n if b < 2 { // nested condition\n if a > 100 {\n fmt.Println("action 1")\n } else {\n fmt.Println("action 2")\n }\n } else {\n fmt.Println("action 3")\n }\n}
\n\nPackage names
\nShort: no more than one word
No plural
Lower case
Informative about the service it gives
Avoid utilities/models packages
Interfaces
\nUse interfaces as function/method arguments & as field types
Small interfaces are better
Source files
\nOne file should be named like the package
No more than 600 lines per file
One file = One responsibility
Error Handling
\nAlways add context to errors
Never ignore errors
Use fatal errors carefully
Create fault-tolerant programs
Methods/functions
\nOne function has one goal
Simple names
Limited length (100 lines maximum)
Reduce cyclomatic complexity
Reduce the number of nesting levels
Previous
\n\t\t\t\t\t\t\t\t\tUpgrading or Downgrading Go
\n\t\t\t\t\t\t\t\tNext
\n\t\t\t\t\t\t\t\t\tCheatsheet
\n\t\t\t\t\t\t\t\t