Finding the Optimal Concurrency [From High Performance MySQL]
Every web server has an optimal concurrency—that is, an optimal number of concurrent
connections that will result in requests being processed as quickly as possible,
without overloading your systems. A little trial and error can be required to find this
“magic number,” but it’s worth the effort.
It’s common for a high-traffic web site to handle thousands of connections to the
web server at the same time. However, only a few of these connections need to be
actively processing requests. The others may be reading requests, handling file
uploads, spoon-feeding content, or simply awaiting further requests from the client.
As concurrency increases, there’s a point at which the server reaches its peak
throughput. After that, the throughput levels off and often starts to decrease. More
importantly, the response time (latency) starts to increase.
To see why, consider what happens when you have a single CPU and the server
receives 100 requests simultaneously. One second of CPU time is required to process
each request. Assuming a perfect operating system scheduler with no overhead,
and no context switching overhead, the requests will need a total of 100 CPU seconds
to complete.
What’s the best way to serve the requests? You can queue them one after another, or
you can run them in parallel and switch between them, giving each request equal time
before switching to the next. In both cases, the throughput is one request per second.
However, the average latency is 50 seconds if they’re queued (concurrency = 1), and
100 seconds if they’re run in parallel (concurrency = 100). In practice, the average
latency would be even higher for parallel execution, because of the switching cost.
[Ideally]For a CPU-bound workload, the optimal concurrency is equal to the number of
CPUs (or CPU cores).
[Normally]However, processes are not always runnable, because they
make blocking calls such as I/O, database queries, and network requests. Therefore,
the optimal concurrency is usually higher than the number of CPUs. [That's why we are able to add more threads to handle concurrent requests]
You can estimate the optimal concurrency, but it requires accurate profiling.
[Conclusion:]It’s usually easier to experiment with different concurrency values and see what gives the
peak throughput without degrading response time.
Cheng,
ReplyDeleteIO Wait is a big problem in webapps where we we tend not to do lots of computing but rather talking to other services - web services, databases, etc...
You can see a lot of work being done (again this is not new just made easier for the masses) with non-blocking IO. In java land that is NIO. You can see projects like netty (http://www.jboss.org/netty) taking off in this space. In server side javascript you can take a look at node.js (http://nodejs.org/). In these cases you don't have a lot of "threads" at all. You may find it interesting to read about these projects to get a different side of what you are writing about above.
You can see it in practice many place but the article done by Twitter on there new search architecture talks about this as well: http://engineering.twitter.com/2011/04/twitter-search-is-now-3x-faster_1656.html --
Jason
Thanks Jason! :)
ReplyDeleteNetty is pretty cool project, i love it! Hope we can try it someday!