I’ve been slowly migrating from R to Python for new data analysis tasks. In particular, learning pandas and ŷhat’s port of ggplot to Python. I’m coming a little late to the party, but the Divvy Data Challenge completed recently to award prizes for visually compelling analysis of anonymized data from Chicago’s new bike share service. New York’s new bike share program has been in the news recently, highlighting ‘financial and operational challenges’ and implying the problem stems from low purchases of daily pass sales. The chart the Journal uses to support that claim is almost a textbook case of making the chart fit the story you want to tell rather than making a chart reflect the data. They plot weekly pass purchases vs. weekly cumulative membership numbers to make an apples-to-oranges comparison instead of something more sensible.
The story got me thinking about what real operational differences might exist between New York and Chicago. Digging into that was interesting in its own right, and it helped me better understand pandas compared to R.