WSGI Web Servers Bench Part 2

Last week I published a bench on Circus and I had some good feedback in the comments.

We ended up saying that the app I was benching was hammering the disk too much to provide results that were accurate.

Following AdamSkutt suggestions, I've modified the benched app so it would do I/O-bound tasks without hitting the disk, by exchanging data with a thread through a pipe.

The new app does the following:

  1. a loop that does 10 * 1000 * 1000 10000 times. (CPU)
  2. a time.sleep(N) where N is between 75 to 100 ms (just piling up)
  3. 100 bytes sent to the DB thread using the pipe (I/O)
  4. a small HTML page sent by the DB thread and redirected by the main thread to the client (I/O)


A DB thread is a Thread that opens a pipe to get data from the main thread, and another pipe to send back data.


To have a realistic simulation, the app runs 10 of those DB threads in a queue and each incoming requests picks one and interacts with it.

That's basically what you would have when you use a pool of DB connectors in your application.

An interesting thing to notice is that with this application, we are doing some socket I/O, thus the gevent and meinheld workers should be slightly faster than with the previous app.

But the percentage of socket I/O work involved in the call is quite small compared to the rest -- around 10%. I did not want to make it too unfair for other workers that don't automagically patch the socket module.

To summarize, the new application:

  • is twice faster -- around 100 ms per call
  • does not hammer the disk anymore.
  • still does some CPU and I/O work

Now to the results !

Gunicorn + gevent

The web server is ten times faster than the previous run, which is much better since the app is only 2 times slower. Not hitting the hard disk helps here of course.

Doing a higher RPS also makes the RPS graph much more readable -- with a bigger scale, we can see it steadily decreasing starting at 100 CUs.

Overall I would not say that things look that different for the duration, as it's similar to the previous run except the scale.

Chaussette + waitress

I tried the waitress backend as well this time, and was really surprised by how well it performed -- it's slightly slower than Gunicorn + Gevent but by not much. They look very similar, which was a surprise for me.

I was also happy to see that the RPS graph had a similar trend, making me think this bench is more accurate.

Chaussette + meinheld

Circus + Meinheld is slightly faster than Circus + Waitress (expected) and Gunicorn + Gevent. We're seeing the same tendency on high load.

Chaussette + gevent

Circus + Gevent is in turn slightly faster.

Chaussette + fastgevent

Circus + fastgevent is the fastest one unlike the previous run. We don't have the errors anymore on this run.


This new bench seems to be more accurate, and the results are a bit different for meinheld and fastgevent.

Fastest to slowest:

  1. Circus + fastgevent
  2. Circus + gevent
  3. Circus + meinheld
  4. Gunicorn + gevent
  5. Circus + waitress

Overall, my conclusion is similar to the previous one: that makes me confident we can switch to Circus in the future.

I still need to investigate on why fastgevent failed in the previous run.

Thanks AdamSkutt and al for the feedback.