Rate limiting in Golang HTTP client

I’ve been doing some interesting work with the team at MFlow writing HTTP clients that consume financial data, and it’s been eye-opening to see how different API platforms choose to protect their resources. Best practices for client-side rate limiting seem to be scarce when compared to server-side, so here are my thoughts on the subject and some code samples.
TL;DR — wrap
*http.Clientand calllimiter.Wait(ctx)before every request, wherelimiteris a*rate.Limiterfromgolang.org/x/time/rate. The token bucket honours bursts, blocks cleanly when you’re out of tokens, and respects context cancellation.
Understanding server-side rate limiting
Most API endpoints implement resource-consumption quotas in the form of rate limits . This is generally done either to protect their servers from being abused by too many requests, or to monetize the endpoints for more frequent updates. For example, Yahoo Finance (RIP ) was reported to allow on the order of a couple of thousand requests per hour. Another platform, IndependentReserve , enforces a rate limit of 1/sec. If you pay close attention, these two limits are different in the way the quota is managed over the timeframe. With an hourly quota you could in principle consume the whole allocation in just a few minutes — bursty/high-volume requests are allowed. With a per-second limit like IndependentReserve’s, you can call the public APIs only once a second, which I’d assume is in place to safeguard their servers from being abused by too many requests.
A quick detour from client-side rate limiting — I came across a lot of articles talking about ways to implement rate limiting in the server application logic. I strongly discourage this since you will potentially have to replicate the logic in many different applications, and this becomes a code-maintenance nightmare. Traffic control and shaping should be the function of a perimeter device, so implement an API gateway instead and offload this function to it. I’d recommend a gateway such as Google Apigee or Kong .
Some API platforms are kind enough to advertise the remaining quota in response headers, often using names such as RateLimit-Limit and RateLimit-Remaining. This lets a well-behaved client play nicely with the resource server. From my experience, these headers are a courtesy rather than a standard
that everyone follows. In most cases you’ll need a smart HTTP client — so keep reading. :)
Here is a sample response with the older-style triplet you’ll see in the wild today:
Request:
GET /items/123
Response:
HTTP/1.1 200 OK
Content-Type: application/json
RateLimit-Limit: 10
RateLimit-Remaining: 9
{"hello": "world"}
Heads-up: the IETF HTTPAPI working group is still iterating on this. The current
draft-ietf-httpapi-ratelimit-headers-10(Sept 2025) replaces the older triplet with two structured-field headers,RateLimitandRateLimit-Policy. It’s still a draft (not yet an RFC), and adoption in the wild is mixed — you’ll see both shapes for years to come.
When you exceed your rate limit you will generally receive an HTTP 429 Too Many Requests. At this stage you will need to wait before firing off the next request, and this is where the complexity arises. How long do you wait when an endpoint allows for bursty consumption of its resources over a large period of time?
What not to do
- Break long timeframes down into shorter ones — I’ve seen programmers take an hourly quota and divide it down into requests per minute or second. So a 2,000/hr quota turns into roughly 33 req/min. This works, but it throws away the bursty nature of the API in cases where it would be useful.
- Back off only on exhaustion — programmers use up the allocation without keeping track of the quota, and when they encounter an HTTP 429 they implement an implicit sleep that works on an exponential-backoff approach. This is a poor primary strategy: debugging issues becomes more difficult, the resulting code is non-deterministic, and the problem compounds when the HTTP client object is shared by multiple goroutines.
That said, defensive backoff is a useful complement to a token bucket — servers can rate-limit you even when your local accounting says you’re fine (clock drift, multiple clients sharing a quota, restarts, etc.). When you do back off, prefer the server-supplied Retry-After header over a fixed delay.
Repeat HTTP 429 offenders could be banned
Another problem with relying on backoff alone is that it assumes the API server will tolerate repeated requests after sending an HTTP 429. Some platforms, such as Binance , can temporarily or permanently ban a client when requests keep coming in after a 429. This can happen if the backoff algorithm doesn’t back off for long enough.
Client-side rate limiting — the duty of every HTTP client developer
Now that I’ve set some context, let’s jump into the code. Go’s standard library is extremely comprehensive, and the answer to our problem lies in the package golang.org/x/time/rate
; more precisely, in the Limiter type implemented in this package.
A
Limitercontrols how frequently events are allowed to happen. It implements a “token bucket” of size b, initially full and refilled at rate r tokens per second. Informally, in any large enough time interval, theLimiterlimits the rate to r tokens per second, with a maximum burst size of b events.— golang.org/x/time/rate docs , see also Token bucket on Wikipedia .
rate.Limiter exposes three idioms worth knowing:
| Method | Behaviour |
|---|---|
Allow() / AllowN(t, n) | Non-blocking. Returns true if a token is available right now. |
Wait(ctx) / WaitN(ctx, n) | Blocks until a token is available, or ctx is cancelled. |
Reserve() / ReserveN(t, n) | Returns a *Reservation that tells you exactly how long you’d block. |
For an HTTP client, Wait(ctx) is the most useful: it composes naturally with the request’s context.Context, so timeouts and cancellation are honoured for free. rate.Limiter is also goroutine-safe, so a single limiter can be shared across all calls from a client.
Here is a small Go HTTP client that demonstrates rate limiting. Let’s make a few calls to BTCMarkets to fetch the market ticker for Bitcoin (BTC-AUD):
package main
import (
"fmt"
"io"
"net/http"
"time"
"golang.org/x/time/rate"
)
// RLHTTPClient is a rate-limited HTTP client.
type RLHTTPClient struct {
client *http.Client
Ratelimiter *rate.Limiter
}
// Do dispatches the HTTP request to the network, blocking until a token is available
// or the request's context is cancelled.
func (c *RLHTTPClient) Do(req *http.Request) (*http.Response, error) {
if err := c.Ratelimiter.Wait(req.Context()); err != nil {
return nil, err
}
return c.client.Do(req)
}
// NewClient returns an HTTP client with a rate limiter attached.
func NewClient(rl *rate.Limiter) *RLHTTPClient {
return &RLHTTPClient{
client: http.DefaultClient,
Ratelimiter: rl,
}
}
func main() {
// BTCMarkets allows 50 req / 10 s on most public endpoints; we sit
// right on that edge and let the limiter pace us.
// 50 requests per 10 seconds = 5 req/s, with bursts up to 50.
rl := rate.NewLimiter(rate.Every(10*time.Second/50), 50)
c := NewClient(rl)
reqURL := "https://api.btcmarkets.net/v3/markets/BTC-AUD/ticker"
for i := range 300 {
req, err := http.NewRequest(http.MethodGet, reqURL, nil)
if err != nil {
fmt.Println("build request:", err)
return
}
resp, err := c.Do(req)
if err != nil {
fmt.Println("do request:", err)
return
}
// Drain and close the body so the connection is returned to the pool.
_, _ = io.Copy(io.Discard, resp.Body)
resp.Body.Close()
if resp.StatusCode == http.StatusTooManyRequests {
fmt.Printf("rate limit reached after %d requests\n", i)
return
}
}
fmt.Println("all 300 requests completed within the rate limit")
}
Run it as-is and it should complete all 300 requests without hitting the server’s rate limit. To see the difference, comment out the c.Ratelimiter.Wait(...) line in Do and re-run — the server will start returning HTTP 429 once your requests outpace the configured quota.
⚠️ Common gotcha:
rate.Every(10*time.Second)gives you a rate of one token every ten seconds, not “ten seconds’ worth of requests”. For “N requests per D duration”, userate.Every(D/N)(orrate.Limit(float64(N)/D.Seconds())) and set the burst size to N.

A token dispenser at a service counter — the same idea as a software token bucket.
In simple terms, a token bucket acts just like the token dispenser you might have seen at a bank or anywhere else where you queue with a numbered ticket to be served. If the dispenser is empty, you don’t get served. This is an effective method for crowd control, and the same approach is used in packet-switched computer networks.
Using the token-bucket implementation, we can make sure our code doesn’t send a request without a token, and we configure the bucket to refill at the rate the API server is happy to accept requests.

Token-bucket flow: each request waits for a token, takes one when available, then is dispatched.
Retrieving a token via Wait(ctx) is a blocking call, so no request is dispatched to the network unless a token is available before *http.Client.Do(req) is called. Because we’re passing a context, callers can cancel a queued request just as they would any other long-running operation.
A few practical extras
- Per-host limiters. If your client talks to multiple APIs (or multiple endpoints with different quotas), keep one
*rate.Limiterper host/route in amap[string]*rate.Limiterguarded by ansync.RWMutex. - As an
http.RoundTripper. The wrapper above is the simplest form. The idiomatic Go pattern is to implementhttp.RoundTripper; you can then keep the standard*http.Clienteverywhere and stack rate limiting alongside retries, tracing, and auth as middleware. - Honour
Retry-Afterand any rate-limit headers. When the server does respond with a 429, parseRetry-Afterand (depending on the server’s flavour)RateLimit-Resetor the structuredRateLimitfield, and adjust your local limiter (e.g. withlimiter.SetLimit) instead of guessing.
Congratulations on making it to the end! You can now go be a good citizen of the internet, who writes maintainable code that honours API rate limits. For a more comprehensive example, take a look at github.com/MflowAU/btcmarkets .