A few days ago I had a problem with high CPU usage in one of my Go-based microservices. The microservice has evolved into two distinct components: a HTTP web-app and a batch processing service. At some point, we'll probably split these out. But in its current guise, we were seeing HTTP request latencies of greater than ten seconds. It turns out that the reason is due to the Go scheduler not scheduling the HTTP receiver thread. Read on to find out why.
First a quick recap. Go doesn't directly use OS threads. It has a concept built into the syntax of the language called a "go routine", which seems a lot like a thread. Go routines are then scheduled by the Go scheduler which has a fixed pool of OS threads to utilise. Depending on how many cores your CPU has, these OS threads will then be spread across the CPU. (I'm intentionally ignoring hyperthreading)
Let's run a simple experiment to test this out. The code below has two go routines. One is an "intensive" task and the other is a print-line. We also force the runtime to only use a single thread. You can play with this code here.
package main
import "fmt"
import "runtime"
import "time"
func cpuIntensive(p *int64) {
for i := int64(1); i <= 10000000; i++ {
*p = i
}
fmt.Println("Done intensive thing")
}
func printVar(p *int64) {
fmt.Printf("print x = %d.\n", *p)
}
func main() {
runtime.GOMAXPROCS(1)
x := int64(0)
go cpuIntensive(&x) // This should go into background
go printVar(&x) // This won't get scheduled until everything has finished.
time.Sleep(1 * time.Nanosecond) // Wait for goroutines to finish (has Gosched in code)
}
If you run this code, you will see that it results in:
Done intensive thing
print x = 10000000.
Note how the "CPU intensive thing" ran and finished first. This is because the go scheduler needs to be called in order to perform a scheduling event. If we add the code runtime.Gosched()
at line 10 in the for loop and run the code again, we get the following output:
print x = 1.
Done intensive thing
This time, we have allowed the scheduler to reschedule the tasks (and provide the print-line go routine with some more CPU time).
Note that the sleep for 1 nanosecond is used purely for the internal Gosched
call. The main routine does not finish before all the other code has had time to run (try it yourself via the link above).
Apparently, the Go codebase is littered with Gosched calls, so whenever you call time.Sleep or fmt.Printf, Gosched will be called for you. Let's try a more realistic example, decoding some JSON.
package main
import "fmt"
import "runtime"
import "time"
import "encoding/json"
import "strings"
const testBytes = `{ "Test": "value" }`
type Message struct {
Test string
}
func cpuIntensive(p *Message) {
for i := int64(1); i <= 1000; i++ {
json.NewDecoder(strings.NewReader(testBytes)).Decode(p)
}
fmt.Println("Done intensive thing")
}
func printVar(p *Message) {
fmt.Printf("print x = %v.\n", *p)
}
func main() {
runtime.GOMAXPROCS(1)
x := Message{}
go cpuIntensive(&x)
go printVar(&x)
time.Sleep(1 * time.Nanosecond)
}
If you run this code, you will see the same issue. This is because the JSON Decode method DOES NOT have a Gosched inside its code.
Compared to other languages that use OS threads, this is surprising. I didn't expect to have to call the scheduler myself and assumed there was some internal process that would manage it for me. Not so.
The interesting thing is that even if you force the GOMAXPROCS to something more than 1, it still occurs. I think this is because the playground is only allowing one equivalent CPU. And the scheduling is on a per-CPU basis.
In summary, for each CPU, Golang is always going to act like it’s running on a single thread. That is unless you call Gosched yourself, or call something that does or the Go routine ends. It doesn’t matter how many threads you specify or Goroutines you start, it will still do one CPU intensive thing per-CPU at a time.
Edit:
I received a few comments from readers with more information. Thank you to Cale Hoopes, Sam Whited, Dαve Cheney.
GOMAXPROCS
is the environmental variable that specifies the number of OS threads that Go can utilise. This is independent of the talk about calling Gosched
which in some high-performance situations you will need to insert into your code. Note that the vast majority of the time, the internal calls to Gosched
will be adequate.
More information:
https://golang.org/pkg/runtime/
https://github.com/golang/go/blob/master/src/runtime/proc.go
https://golang.org/s/go11sched
https://github.com/golang/go/issues/11462