Twisted vs Threads Benchmark
This benchmark simulates use of a multiple document interface. A script of commands are input at various intervals. This interval is the user delay.
A. Load (Image A)
B. Heavy (Image A)
C. Load (Image B)
D. Medium (Image B)
E. Save (Image B)
F. Resize (Image B)
G. Save (Image B)
H. Resize (Image A)
I. Save (Image A)
The benchmark runs the script several times with Threads and Twisted varying the delay between commands for each run. The benchmark monitors two important timers:
TimerA - This is time between accepting user input and immediately before execution of the triggered command. Lower values suggest more responsive, less latent UI.
TimerB - This is time between start and completion of processing for each individual command. Lower values suggest more efficient, powerful processing.
The benchmark saves results in CSV files. I have copied these into a proper Excel spreadsheet for analysis.
For years Twisted developers have aggressively recommended Twisted as a replacement
for threads, often without knowledge of the application. They may know better scientifically,
but they cannot help but recommend their product. This is damaging to multiple people.
It should not continue.
This benchmark was created in response to discussion with Twisted people.
As persistently requested, I explained what this benchmark now proves.
The results can be consistently reproduced with the provided code
TimerB averages for each method are similar across different user delays. However, Twisted processing times are slower than Threads. Threads execute commands in an average of 86.2 ms, while Twisted takes 103.4 ms.
Remember, TimerB is a measurement of processing power, not latency.
The processing results for TimerB are interesting, but the focal point of my discussion was latency in TimerA.
As the user delay increases the latency should decrease. When commands come quickly the latency should suffer. Both Threads and Twisted show this pattern for the initial few runs.
After the first few runs the Threaded version responds to commands without much latency, as expected. Twisted on the other hand begins to respond to them sporadically and more slowly.
You can see this behaviour on both graphs at different scale. Take note of the measurement at 0.80 User Delay on each graph. These graphs represent the two slices of the same data.
My argument was always one concerned with latency without an unreasonable processing efficiency sacrifice.
Threads allow more responsive and consistent UI.
Threads execute the commands faster.
Threads can switch just fine.
Threads are useful.
Twisted people assume threads switch poorly. Threads switch automatically and do it well. Twisted people have an unreasonable distrust in the OS's thread switching.
Twisted people believe the programmer should be responsible for coding in the possibility of more granular switching. This seems to me like a step backwards, especially given the above results for thread performance.
Threads in most OS work on multiprocessor and hyperthreading machines automatically. On similar hardware Twisted will use only 1/n of the available processing power, where n is the number or virtual or physical processors.
Twisted people think you should spawn a separate process for intensive work. If you do this you need to synchronize resource access as you would with threads. That is probably the most mentioned problem with threads.
You also need to find an effective portable method of inter-process communication. This is not much of a problem, but it is something you wouldn't have to do with threads.
This test uses small images and heavy/medium processing tasks which are barely considered 'heavy' or even 'medium'. Real world use could deal with larger images and more intensive processing. The predictable performer here would likely perform well with a more intensive setup and has in many other applications. I think it is safe to say the slower and unpredictable performer would continue to be slower and unpredictable.
I have a high-end machine. On lesser machines I expect the static nature of Twisted switching will cause even more inefficiency than documented here.
Thread switching is more scalable than Twisted on multiple levels.
When I approached Twisted people with questions about these results I was told I was not worth listening to. Followers stated bluntly they were smarter than me. They then banned me from further communication in #twisted and #python, which Twisted people have controlled with bias for years.
Some developers understand and agree with me privately. Some have even taken up my discussion where I left off, asking the Twisted people questions they hurry to explain with assumed facts proved false here.
I use Twisted for an above average web application and custom server. I am sincerely interested in using Twisted and possibly solving concurrency.
However, Twisted people misrepresent their software and this is damaging to would-be Twisted customers, ie. developers. There are developers who give credence to the Twisted bias shown in #python only to find Twisted unsuited for their work. Two developers in particular have recently expressed grief over time they consider wasted. One mentioned 4 days before being outdone by another who wasted 3 months. This is outrageous.
This is not the first time Twisted people have been adversely influenced by their emotions and ego so plainly and to the detriment of others.
Would-be Twisted customers should be aware of this benchmark.
Threads are not nearly as bad as Twisted people express.
"Sorry, Guido, but my trust for you goes about as far as I believe the features of Python I like were intentional."
- Twisted Developer Glyph to Python Creator Guido van Rossum, 2001
"Either you don't know what you're talking about on such a deep level
that you're unqualified to comment, or you are deliberately lying to
undermine the credibility of generous people who gave you a great deal
of their work and expertise for free.
You should post a retraction. While I imagine you will do some
temporary damage to Twisted if this keeps getting circulated, once the
dust has settled I think that you will be harming your own reputation a
- Mail from Glyph, Jan 27 2005
"It sounds like he tried to bring this up on IRC, and didn't get anywhere. Maybe he was rude or stubborn, I don't know -- the social dynamics of IRC seem to generally suck, IMHO, and misunderstandings abound in those forums.
Instead of attacking him (I mean geez, "politically motivated lies"?), have you tried making constructive critiques of the specific code, benchmarks, or criteria he is using? Do you have a measured rebuttal that doesn't resort to ad hominum attacks that you've asked him to link to in his document? Have you asked him to clarify some of his points, or provide a more measured response? Frankly his document comes off as the more reasonable one, expressing some frustration with the response he's received, but it doesn't come off as a wholesale condemnation of Twisted.
And are people not allowed to defend threading? Are people not allowed to critique Twisted? Are you only allowed to critique Java and Zope and threads? Because this response isn't a I-think-he's-wrong kind of response, it's a I-think-he's-wrong-for-even-saying-this kind of response, and there's a big difference in the civility of the debate."
- Ian Bicking reply to Glyph rant, Jan 28 2005
"Don't give up."
- John B Mudd, Jan 31 2005
"My reading of the previous benchmark was that he was testing in a CPU-bound situation. At least it seemed like he was doing image processing, which implies significant CPU use. With that kind of load I would expect the performance to be more on par. At least, by the time threads started showing scaling performance problems WRT connections, the application would be hopelessly bogged down anyway. Twisted might degrade better in that situation -- slowly but surely responding to requests -- but it'd be questionable whether you were testing a realistic situation at that point.
In CPU-bound situations it seems reasonable that threads would be faster (putting scaling aside). In that case in Twisted you'd either have to use threads, which can't beat a natively-threaded approach and may introduce additional overhead, or you'll have to refactor your algorithms to work in discrete chunks which is probably even slower. Properly implemented I imagine Twisted should be very close to a natively threaded system (and would probably use threads as well), and maybe that's where the original benchmark was inaccurate. But if you look at that benchmark as a defense of threads, not an attack on Twisted, I think the point is valid regardless of those implementation details.
So... the question is what scaling is more likely? It depends a lot on what you are doing. For an online chat server or a MMORPG the scaling you show here probably applicable. For a typical database-backed web application I think this kind of scaling doesn't mean much; that Apache 1.3 works as well as it does -- with its rather simplistic way of processing connections -- shows that serialized connections can work well. I don't know enough about GUI bottlenecks to say one way or another, though this kind of scaling doesn't really make sense to me in that context. GUIs are a weird middle ground, because async processing is the default, not threading, so you're likely to use threads in very selective ways. Anyway, no client should make that number of connections.
I'd say the most interesting part of this scaling, to me, is how performance degrades in extreme situations... though I don't have a good feel for how you should test that, and at what point you'd really be testing the underlying OS instead of the application."
- Ian Bicking reply to Glyph's graphs sans source, Feb 3 2005
Over one year later...
<foom> you can still get better response times for quick operations when also doing long running
blocking operations when using threads
<foom> which is his whole point, I think
- June 6, 2006, ding ding ding ding ding!