Saturday, April 28, 2012

Google App Engine urlfetch and twitter streaming

I spent about two days and finally came to realize the Google App Engine doesn't work with the Twitter streaming api via the urlfetch. I didn't want to do anything specific (yet), I just wanted to see the streaming api run on the engine and then figure out what to do next.

Regardless, it doesn't work. I even tried to change the implementation in Tweepy, but that wasn't the problem. Tweepy uses the httplib, so I thought I'd try to switch it to the urllib2 implementation. Same problem, when you open the connection to the streaming URL end point the response object never comes back to read on the App Engine. Run it on the command line, everything flys through the stream. You can see my forked version here of my attempt to change the Tweepy streaming implementation: streaming.py. Or you can see the commit differences as well for streaming.py.

Another trick I learned in this is adding the basic auth to the Tweepy streaming implementation. The documentation and examples are confusing for the first time around. I was trying to pass the oAuth to the streaming implementation, but it takes a basic http auth. See the gist I referenced: Twitter Streaming API sample using the filter stream. Only difference is I used the "sample" and they used the "filter" streams from twitter.


I am surprised with Google and why this implementation works the way it does. When I think "scale" for processing a Twitter feed, wouldn't Google want people to think of them as a platform to implement a solution for? Or any streaming HTTP endpoint for that matter, regardless of it being Twitter or something else (a log file for example). Hopefully they will improve this soon.

Some other resources:

No comments:

Share on Twitter