Last week I published a bench on Circus and I had some good feedback in the comments.
We ended up saying that the app I was benching was hammering the disk too much to provide results that were accurate.
Following AdamSkutt suggestions, I've modified the benched app so it would do I/O-bound tasks without hitting the disk, by exchanging data with a thread through a pipe.
The new app does the following:
- a loop that does 10 * 1000 * 1000 10000 times. (CPU)
- a time.sleep(N) where N is between 75 to 100 ms (just piling up)
- 100 bytes sent to the DB thread using the pipe (I/O)
- a small HTML page sent by the DB thread and redirected by the main thread to the client (I/O)
A DB thread is a Thread that opens a pipe to get data from the main thread, and another pipe to send back data.
To have a realistic simulation, the app runs 10 of those DB threads in a queue and each incoming requests picks one and interacts with it.
That's basically what you would have when you use a pool of DB connectors in your application.
An interesting thing to notice is that with this application, we are doing some socket I/O, thus the gevent and meinheld workers should be slightly faster than with the previous app.
But the percentage of socket I/O work involved in the call is quite small compared to the rest -- around 10%. I did not want to make it too unfair for other workers that don't automagically patch the socket module.
To summarize, the new application:
- is twice faster -- around 100 ms per call
- does not hammer the disk anymore.
- still does some CPU and I/O work
Now to the results !
Gunicorn + gevent
The web server is ten times faster than the previous run, which is much better since the app is only 2 times slower. Not hitting the hard disk helps here of course.
Doing a higher RPS also makes the RPS graph much more readable -- with a bigger scale, we can see it steadily decreasing starting at 100 CUs.
Overall I would not say that things look that different for the duration, as it's similar to the previous run except the scale.
Chaussette + waitress
I tried the waitress backend as well this time, and was really surprised by how well it performed -- it's slightly slower than Gunicorn + Gevent but by not much. They look very similar, which was a surprise for me.
I was also happy to see that the RPS graph had a similar trend, making me think this bench is more accurate.
Chaussette + meinheld
Circus + Meinheld is slightly faster than Circus + Waitress (expected) and Gunicorn + Gevent. We're seeing the same tendency on high load.
Chaussette + gevent
Circus + Gevent is in turn slightly faster.
Chaussette + fastgevent
Circus + fastgevent is the fastest one unlike the previous run. We don't have the errors anymore on this run.
This new bench seems to be more accurate, and the results are a bit different for meinheld and fastgevent.
Fastest to slowest:
- Circus + fastgevent
- Circus + gevent
- Circus + meinheld
- Gunicorn + gevent
- Circus + waitress
Overall, my conclusion is similar to the previous one: that makes me confident we can switch to Circus in the future.
I still need to investigate on why fastgevent failed in the previous run.
Thanks AdamSkutt and al for the feedback.