Saturday, April 28, 2012

Google App Engine urlfetch and twitter streaming

I spent about two days and finally came to realize the Google App Engine doesn't work with the Twitter streaming api via the urlfetch. I didn't want to do anything specific (yet), I just wanted to see the streaming api run on the engine and then figure out what to do next.

Regardless, it doesn't work. I even tried to change the implementation in Tweepy, but that wasn't the problem. Tweepy uses the httplib, so I thought I'd try to switch it to the urllib2 implementation. Same problem, when you open the connection to the streaming URL end point the response object never comes back to read on the App Engine. Run it on the command line, everything flys through the stream. You can see my forked version here of my attempt to change the Tweepy streaming implementation: streaming.py. Or you can see the commit differences as well for streaming.py.

Another trick I learned in this is adding the basic auth to the Tweepy streaming implementation. The documentation and examples are confusing for the first time around. I was trying to pass the oAuth to the streaming implementation, but it takes a basic http auth. See the gist I referenced: Twitter Streaming API sample using the filter stream. Only difference is I used the "sample" and they used the "filter" streams from twitter.


I am surprised with Google and why this implementation works the way it does. When I think "scale" for processing a Twitter feed, wouldn't Google want people to think of them as a platform to implement a solution for? Or any streaming HTTP endpoint for that matter, regardless of it being Twitter or something else (a log file for example). Hopefully they will improve this soon.

Some other resources:

Friday, April 20, 2012

The worst way to have access to logs

Do you have to access logs for your job? What the worst scenario you can imagine to make it hard to get to those logs and work through them? Let's try this:
  • Logs are on a windows box
  • To get to the log machine, you have to remote desktop to box A
  • But the logs aren't on box A, the logs are on a share via box B,C and D
  • You can't RDP to box B, C and D, so you have to map the drives on box A
  • Why do I need to map the drive? Because it's Windows and there are no Unix tools available
  • Cygwin is on box A not the others, so when I map the drive, I can access them via Cygwin
  • Usually just mapping the drive sucks for speed, because the I/O is via the mapped network connection, so this forces me to copy the logs to from B,C and D to A
  • Once moved, I can now grep and open logs in vi for search
Why is this bad? When an error occurs that we want to catch before the logs roll in live production, we need to get to the logs quickly and move them. This process above takes like 15 minutes even to begin looking at the logs. 

Why aren't the logs archived? They are, but I dont have access to that box either.

To top all this off, lets add a site to site VPN tunnel to slow the connection down even more.

How can this be better?

  • Give me access directly to the box with the logs
  • Install SSH on the box so I dont need a windowing system to get on (I don't need a host Unix system or variant, just some SSH action)
  • Aggregate the logs to a single location real time (using Apache Flume as an example)
  • Setup some log monitoring that automatically detects errors, fires off alerts with the relevant log details
  • Etcetera, etcetera, etcetera 

Friday, April 6, 2012

My MacBook doesn't have a touch screen

I was showing an elder some stuff on the web the other day, we were doing some research together. I don't think they ever used a Mac before, so everything was new, but we were just using the browser.

I had to show them how to scroll on the touch pad with the two finder swipe. They figured out the click fine even though there are no left and right click buttons.

In the midst of talking, the person touched the screen to click a link on a web site. I smiled, and clicked it for them using the touch pad.

Interesting what tablets and mobile devices are doing to people's expectations, however I dont think this person has used one of those either. Maybe just saw someone use it once and assumed my laptop worked the same.

See also: A Brief Rant on the Future of Interaction Design

Monday, April 2, 2012

How to break javascript compression using eval

Working on some javascript compression (or minification) for a site. Given this javascript:
function callingFunction(arrayVariable){
  var jsEval = 'globalFunction("String parameter " + arrayVariable["lookup"]);';
  return jsEval;
}

// non-global
var arrayVariable = [....];
var actEval = callingFunction(arrayVariable);
eval(actEval);
When the compression is ran, the variable in the function definition for "callingFunction" will be compressed, call it "a" from "arrayVariable" because it is not global. So now when "callingFunction" executes, and generates the "jsEval" to return a string with the non-compressed variable name. The execution of the eval will result in an exception of "arrayVariable" not being defined.

This is how you break javascript compression, or in other words, write javascript badly for compression. The right way? Don't use eval, or:
var jsEval = 'globalFunction("String parameter " + ' + arrayVariable["lookup"] +');';
Code might have an error, but you get the idea right? "arrayVariable["lookup"]" is being evaluated during the setting of variable "jsEval" verses the outer "eval(actEval)" execution.

See the following via the google closure compiler tutorial:
"Compilation with SIMPLE_OPTIMIZATIONS always preserves the functionality of syntactically valid JavaScript, provided that the code does not access local variables using string names (with, for example, eval() statements)."
And again Broken References between Compiled and Uncompiled Code:
'Keep in mind that "uncompiled code" includes any code passed to the eval() function as a string. Closure Compiler never alters string literals in code, so Closure Compiler does not change strings passed to eval() statements.'
You can also try the YUI Compressor.

Getting revision history from cvs

I had the worst time a week ago trying to get the history from cvs at a top level directory using eclipse. However I there is no option to right click a project directory, or any directory, and show the history for that item.

To get all my commits I came up with the following from reading around forums and reference sites:
cvs log -N -S -r -w > branch.log
I have no idea what all the options mean, but the "-w" are my commits and "-r" is the branch I am getting the log history on. An important point here is you will note I am feeding standard out to a file because this command prints all kinds of warning information I don't care about to the standard error stream.

The file will show something like this as a result:
RCS file: %FULL PATH FILE NAME IN REPO%,v
Working file: %FULL PATH FILE NAME ON FILE SYSTEM%
head: 1.70
branch:
locks: strict
access list:
keyword substitution: kv
total revisions: 77; selected revisions: 1
description:
----------------------------
revision 1.67.4.1
date: 2012-03-21 10:18:43 -0400;  author: %USERNAME%;  state: Exp;  lines: +4 -2;  commitid: hhCEg6Vtn6C46LXv;
%COMMIT COMMENT HERE%
=============================================================================
The key to finding my changes per ticket is in my "%COMMIT COMMENT HERE%" portion where I prefix my comments with the ticket as a standard.

This literally took me like 4 hours to put together due to my desperate attempt to find a solution working within Eclipse. Once I gave up on that, I started looking at a command line reference.

See also:

Share on Twitter