adjoe Engineers’ Blog
 /  Debug SQL Connections with Go pprof Profiling Tool

Debug SQL Connections with Go pprof Profiling Tool

At adjoe, we operate with more than 30 client-facing backend services that need fast response. Any performance degradation is immediately noticeable to users, and the overall user experience would become suboptimal. 

To address performance issues, it’s essential to understand the root cause, and for this, we look into how profiling could help us. In the Go ecosystem, pprof is a built-in profiling tool that gives visibility into runtime behavior with minimal overhead.

Why pprof?

Whether you’re dealing with CPU bottlenecks, memory leaks, or goroutine proliferation, pprof provides deep insights into the runtime behavior of Go applications. With just a few lines of code, we can generate detailed profiles, visualize performance hotspots, and make data-driven decisions to improve efficiency. 

This article explores how pprof offered us deep visibility into a real production performance issue. We explain how we diagnosed and fixed performance issues caused by how Go’s database/sql package handles connection pooling, especially when used with GORM and MySQL database.

By combining runtime analysis with profiling data tools, we were able to pinpoint database connection issues handling that aren’t immediately obvious. 

This walkthrough aims to serve as a reference for engineers facing similar challenges in Go-based systems interacting with relational databases.

Backend:

MySQL as a management database, paired with the Go GORM library. 

Heap Allocation and Goroutine Spikes

It all started by looking at our service performance dashboard, trying to figure out the increase in the response time of the service. What was obvious was a sudden spike in heap allocations, with the stable number of requests.

This spike was also mirrored by a steady increase in goroutines count.

Identifying the Root Cause: Go SQL Connection Issues

A quick review of the error logs indicated that a deeper analysis was needed. Fortunately, we had already instrumented our mission-critical service with pprof HTTP handlers to facilitate on-demand profiling in the production environment. 

Our initial step was to inspect the heap profile, focusing on inuse_space to identify any unexpected memory usage. In order to do so, we ran the following command for the downloaded profile:

 go tool pprof heap.out

The results were largely unremarkable, showing typical allocations associated with response compression and JSON encoding, with no signs of memory leaks or abnormal growth.

File: app
Type: inuse_space
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing top 10 nodes out of 257
      flat  flat%   sum%        cum   cum%
  103.13MB 36.98% 36.98%   191.86MB 68.80%  compress/flate.NewWriter
   54.89MB 19.69% 56.67%    88.74MB 31.82%  compress/flate.(*compressor).init
   33.50MB 12.01% 68.68%    33.50MB 12.01%  encoding/json.(*decodeState).literalStore
   32.84MB 11.78% 80.46%    32.84MB 11.78%  compress/flate.newDeflateFast (inline)
   ...

Building on that, we considered the possibility that Go’s garbage collection might be cleaning up allocations too quickly for the profiler to capture them in the inuse_space snapshot. 

To test this hypothesis, we turned to the alloc_space profile, which tracks cumulative memory allocations over time, regardless of whether they were later freed. 

This profile revealed a much clearer picture: the dominant source of allocations was tied to the SQL layer, strongly indicating that the issue was rooted in our database interaction logic.

(pprof) alloc_space
Showing nodes accounting for 29.20GB, 63.51% of 45.98GB total
Dropped 1486 nodes (cum <= 0.23GB)
Showing top 10 nodes out of 143
      flat  flat%   sum%        cum   cum%
    5.49GB 11.93% 11.93%     7.10GB 15.44%  github.com/jinzhu/gorm.(*Scope).Fields
    4.70GB 10.22% 22.15%     6.30GB 13.71%  github.com/jinzhu/gorm.(*Scope).scan
    4.32GB  9.40% 31.55%     9.12GB 19.83%  github.com/jinzhu/gorm.(*Scope).New (inline)
    3.17GB  6.88% 38.43%     3.17GB  6.88%  github.com/jinzhu/gorm.(*search).clone (inline)
 ....

An inspection of the goroutine pprof profile revealed that all SQL-related goroutines were in a blocked or idle state, despite the service still actively receiving incoming requests. 

It suggests that the application was stalling at the database layer, potentially due to exhausted connections or contention within the connection pool.

File: app

Type: goroutine

Time: Feb 6, 2025 at 10:32am (CET)

Entering interactive mode (type "help" for commands, "o" for options)

(pprof) top

Showing nodes accounting for 1256, 99.84% of 1258 total

Dropped 129 nodes (cum <= 6)

Showing top 10 nodes out of 120

      flat  flat%   sum%        cum   cum%

      1256 99.84% 99.84%       1256 99.84%  runtime.gopark

         0     0% 99.84%        115  9.14%  adjoe.io/core/src/db.(*GormRepository).CountWithContext

         0     0% 99.84%         56  4.45%  adjoe.io/core/src/db.(*GormRepository).Find

         0     0% 99.84%        329 26.15%  adjoe.io/core/src/db.(*GormRepository).FindWithContext

         0     0% 99.84%          6  0.48%

While it’s expected to see parked goroutines during idle periods, it wasn’t the case here. The pprof graph indicated a steadily increasing number of goroutines over time. If the goroutines are asleep, why are their numbers growing? 

To investigate further, we turned again to pprof, this time to capture and inspect the full block stack traces of all active goroutines for clues about where and why they were being blocked.
http://host:port/debug/pprof/goroutine?debug=2

goroutine x [IO wait]:

internal/poll.runtime_pollWait(0x7fb141439140, 0x72)

/usr/local/go/src/net/http/server.go:677 +0xba

…..

goroutine x [select]:

database/sql.(*DB).conn….

database/sql.(*DB).query…..

database/sql.(*DB).QueryContext…

database/sql.(*DB).retry…

Developers familiar with the internals of Go’s database/sql package may already recognize the underlying issue, but let’s walk through it for clarity, revisiting the way the Go sql package is handling connections.

func (db *DB) QueryContext(ctx context.Context, query string, args ...any) (*Rows, error) {

var rows *Rows

var err error

err = db.retry(func(strategy connReuseStrategy) error {

rows, err = db.query(ctx, query, args, strategy)

return err

})

...

func (db *DB) query(ctx context.Context, query string, args []any, strategy connReuseStrategy) (*Rows, error) {

dc, err := db.conn(ctx, strategy)

if err != nil {

return nil, err

}

...

func (db *DB) conn(ctx context.Context, strategy connReuseStrategy) (*driverConn, error) {

...

req := make(chan connRequest, 1)

delHandle := db.connRequests.Add(req)

...

select {

case <-ctx.Done():

...

case ret, ok := <-req:

db.waitDuration.Add(int64(time.Since(waitStart)))

}

From the code above, we can see that unless the context is explicitly canceled, the request will block indefinitely while waiting to read from the <-req channel. 

In other words, if all SQL connections are exhausted, the Go runtime will continue to wait, retrying until a connection becomes available and potentially leading to a buildup of blocked goroutines.

Solving Go SQL Connection Bottlenecks

The solution may involve implementing one or more of the following strategies, ideally in combination, though that’s not always feasible depending on the system constraints:

  1. Check if enough connections were allocated.
...

db.DB().SetMaxOpenConns(maxOpenConns) // increase this value

The recommended number of connections depends on the application load, but it should not be higher than the database could handle. 

  1. Propagate context cancellation to SQL queries

This is useful as users may cancel the request, so there is no reason we should wait for the query result and hold the connection. 

  1. Introduce a timeout for the queries:
    Depending on the libraries used and other constraints, it is not always possible to cancel the context for queries. In this case a timeout for queries might come in handy.
    For example, if you use Mysql, it’s possible to set the read timeout in case of long running queries. This will ensure that such queries won’t stuck in a running state.
// see the  https://github.com/go-sql-driver/mysql/blob/master/dsn.go#L354 for // a reference

db, err := sql.Open("mysql", "user:password@/dbname?readTimeout=30s")

As demonstrated, the solution itself is relatively simple. However, uncovering such issues often requires correlating data from multiple profiling sources (such as heap, goroutine, and CPU profiles) to gain a better understanding. 

When analyzing collected profiles, it’s especially useful to explore different profile views and options, like alloc_space and inuse_space, to capture both allocation patterns and live memory usage, providing a fuller picture of the problem.

End Note 

With the right profiling setup in place, we used pprof to trace a complex issue back to Go’s SQL connection handling. Using pprof as your lens, even intricate connection pooling challenges become transparent and manageable, enabling your services to run leaner, faster, and more resilient.

This approach helps improve backend efficiency while supporting reliable and smooth user experiences, even under heavy load. 

For more in-depth explorations into our engineering practices, check out our Engineers’ Blog, where the adjoe team shares our latest innovations and lessons learned.

Build Products that Move Markets

Your Skills Have a Place at adjoe

Find a Position