Originally Posted by Ataturk
Why would choose a way of representing the data that minimizes the appearance of the difference if you're offering the chart to support your thesis that the difference is significant?
I am no statistician, but I can't see any reason for analyzing this data in "logs." There doesn't seem to be a single order of magnitude difference between the highest and the lowest data points.
He isn't trying to show the difference, he is trying to show the relationship. The raw data is not ideally suited to this form of analysis...but this is where we get into the reasons why The Atlantic isn't going to pick up on this guy's work if he controlled for everything and published a bunch of regression output.
There doesn't have to be an order of magnitude difference. We aren't trying to make a bar chart that says "These guys are low, these guys are high"...only to discover that the bars are so far apart that the chart is useless. The problem here is not with the data points itself, but with the residuals after you fit the model.
The residuals are going to be right skewed pretty bad, taking the log here helps linearize the relationship and give residuals that are closer to symmetric.
Also, it appears that there is heteroskedasticity to this data--that is, the spread on the residuals varies systematically with the dependent variable (the most conservative areas have very little spread, and you get more spread as you get higher values). The log transformation helps alleviate this problem by bringing the data closer to homoscedastic.
Eyeballing the data, it looks like it is still fairly heterosketastic *after* the log transformation, but I'll be honest that I am not sure what you'd do here. The Romney/Obama percentage gap is kind of a funny independent variable, so I am not sure what kind of additional corrections you might do (if any).