Professional Documents
Culture Documents
jmoiron
plays the blues
1/4
5/10/2014
pre-fetching and made this almost instant) and pulling it back in process.
With the resource starvation problem nicked, I prepared to run my triumphant
production test. 5-10 jobs/sec.
I added more logging, this time around every requests.getand around all
potentially time intensive input processing (json via ujson, xml via lxml), and
noticed that the worker stopped during the non-cooperative CPU bound steps.
This was, of course, entirely expected; but the duration of these stops were 100x
more than I'd experienced outside of this context.
Now convinced now that gevent had a major fault with running c-calling code, I
attempted to produce a synthetic example that would confirm my findings in a
way I could easily share and debug. I wrote a little program which read 50 xml
feeds and ran speedparser on all of them; first, serially in a normal loop, and then
"asynchronously" via a gevent pool. I ran my test, expecting this gevent bug to
show itself, but the timings were nearly identical. "Maybe", I thought, "this only
happens with network traffic."
So I wrote a gevent-based wsgi application which served these feeds from
memory, loaded the feeds off of that, and they still ended up being more nearly
identical. Exasperated, and with nowhere else to turn, I decided to write a cProfile
mode for my app and see what kinds of functions we were calling.
The first thing I noticed was that, somewhere down the line, I had been using
response.text.encode("utf-8"), and textin turn was using chardetto
detect the character encoding. I recalled from my many hours of performance
testing writing speedparser that chardetcan be an absolute dog, so excitedly
assuming I'd had the problem nicked again, I replaced that with un-encoded
response.contentand gave it a run. 2-3 jobs/sec.
Deflated, embarrassed, and depressed, I went back to my profiler. json and xml
were still taking far too long... In my encoding speed tests, ujson was decoding
over 100MB/sec, but my printed timing information showed it taking sometimes
up to a second for even small (50 kB) requests. Furthermore, the separation
between the log timestamps showed that nothing else was done during this time.
http://jmoiron.net/blog/async-hell-gevent-requests/
2/4
5/10/2014
When I went back into the pstatsoutput, I noticed that cProfile disagreed; very
little time was spent in ujson. Since ujson is a bit of a black box to the profiler, I
turned my attention to speedparser, which uses considerably more python code,
and while examining the callees noticed something very peculiar indeed:
1.
2.
3.
4.
5.
Function
called...
ncalls tottime cumtime
.../feed.py:48(_parse) -> 14
0.001
0.323 .../feedlib.py:55(speedparse_s
14
0.005
0.024 .../feedlib.py:98(postprocess
14
0.000 11.976 .../requests/models.py:759
3/4
5/10/2014
So now that this is all in the past, what could I have done to discover this in less
than 50 hours of debugging and hair pulling? Despite it seeming like the
proverbial howitzer when a simple pistol would do, I really should reach for the
profiler earlier in the future. The profiling code and subsequent pstats exploration
actually only took about 2 hours; and indeed, my only other major breakthrough
(the amqp bug) was discovered using a python-based gevent profiler which was
sadly slowed down my app a bit too much to be practical.
In addition, it's made me appreciate optimization and how it can really bite you
unknowingly. The lazy chunked-read behavior of response.contentis superior
to reading the whole response all the time, provided the resource you're
concerned about is memory. Laziness is generally considered a virtue (certainly,
by perl and haskell hackers), as it enables you to occasionally skip work
altogether. Ironically, it was the combination of request's vanity and gevent's
cleverness that led to this incredible performance nightmare.
Apr 23 2012
Gt Y
http://jmoiron.net/blog/async-hell-gevent-requests/
4/4