Finding the Optimal Concurrency [From High Performance MySQL]
Every web server has an optimal concurrency—that is, an optimal number of concurrent
connections that will result in requests being processed as quickly as possible,
without overloading your systems. A little trial and error can be required to find this
“magic number,” but it’s worth the effort.
It’s common for a high-traffic web site to handle thousands of connections to the
web server at the same time. However, only a few of these connections need to be
actively processing requests. The others may be reading requests, handling file
uploads, spoon-feeding content, or simply awaiting further requests from the client.
As concurrency increases, there’s a point at which the server reaches its peak
throughput. After that, the throughput levels off and often starts to decrease. More
importantly, the response time (latency) starts to increase.
To see why, consider what happens when you have a single CPU and the server
receives 100 requests simultaneously. One second of CPU time is required to process
each request. Assuming a perfect operating system scheduler with no overhead,
and no context switching overhead, the requests will need a total of 100 CPU seconds
to complete.
What’s the best way to serve the requests? You can queue them one after another, or
you can run them in parallel and switch between them, giving each request equal time
before switching to the next. In both cases, the throughput is one request per second.
However, the average latency is 50 seconds if they’re queued (concurrency = 1), and
100 seconds if they’re run in parallel (concurrency = 100). In practice, the average
latency would be even higher for parallel execution, because of the switching cost.
[Ideally]For a CPU-bound workload, the optimal concurrency is equal to the number of
CPUs (or CPU cores).
[Normally]However, processes are not always runnable, because they
make blocking calls such as I/O, database queries, and network requests. Therefore,
the optimal concurrency is usually higher than the number of CPUs. [That's why we are able to add more threads to handle concurrent requests]
You can estimate the optimal concurrency, but it requires accurate profiling.
[Conclusion:]It’s usually easier to experiment with different concurrency values and see what gives the
peak throughput without degrading response time.