or Connect
Styleforum › Forums › General › Current Events, Power and Money › Daily CE Musings of Piob
New Posts  All Forums:Forum Nav:

Daily CE Musings of Piob - Page 92

post #1366 of 5110
Quote:
Originally Posted by Ataturk View Post

What strikes me about it is the weird graduation on the y-axis. It's not very effective for that reason if nothing else.

The log axis?

That's pretty standard...I would almost certainly run this analysis in logs rather than levels.
post #1367 of 5110
Quote:
Originally Posted by otc View Post

The log axis?

That's pretty standard...I would almost certainly run this analysis in logs rather than levels.

Why would choose a way of representing the data that minimizes the appearance of the difference if you're offering the chart to support your thesis that the difference is significant?

I am no statistician, but I can't see any reason for analyzing this data in "logs." There doesn't seem to be a single order of magnitude difference between the highest and the lowest data points.
post #1368 of 5110
Quote:
Originally Posted by Piobaire View Post

Is that bottom axis labelled correctly?

Uh, no. It's household income.

Whoops.
post #1369 of 5110
Thread Starter 
Quote:
Originally Posted by Gibonius View Post

Uh, no. It's household income.

Whoops.

That's what I thought but you guys are way smarter than me so I thought I'd ask. wink.gif
post #1370 of 5110
Quote:
Originally Posted by Ataturk View Post

Why would choose a way of representing the data that minimizes the appearance of the difference if you're offering the chart to support your thesis that the difference is significant?

I am no statistician, but I can't see any reason for analyzing this data in "logs." There doesn't seem to be a single order of magnitude difference between the highest and the lowest data points.

He isn't trying to show the difference, he is trying to show the relationship. The raw data is not ideally suited to this form of analysis...but this is where we get into the reasons why The Atlantic isn't going to pick up on this guy's work if he controlled for everything and published a bunch of regression output.

There doesn't have to be an order of magnitude difference. We aren't trying to make a bar chart that says "These guys are low, these guys are high"...only to discover that the bars are so far apart that the chart is useless. The problem here is not with the data points itself, but with the residuals after you fit the model.
The residuals are going to be right skewed pretty bad, taking the log here helps linearize the relationship and give residuals that are closer to symmetric.
Also, it appears that there is heteroskedasticity to this data--that is, the spread on the residuals varies systematically with the dependent variable (the most conservative areas have very little spread, and you get more spread as you get higher values). The log transformation helps alleviate this problem by bringing the data closer to homoscedastic.
Eyeballing the data, it looks like it is still fairly heterosketastic *after* the log transformation, but I'll be honest that I am not sure what you'd do here. The Romney/Obama percentage gap is kind of a funny independent variable, so I am not sure what kind of additional corrections you might do (if any).
post #1371 of 5110
Here's a better chart, from Facebook: *
post #1372 of 5110
Also, log dependent variables have a handy feature.

If you have a regression of the form Log(Y)=B0+B1X, you can interpret the parameters very easily. When you fit the model to Log(Y), you know that for every unit change of X, Y will see a percentage change of exp(B1)-1. For very small values of B1, you don't even have to take the anti-log...you can just use the value of B1 as an approximate percentage change in Y for every unit change of X (since, for example, exp(0.06)-1=0.06).
post #1373 of 5110
Quote:
Originally Posted by lawyerdad View Post

Here's a better chart, from Facebook: *

I wonder if that's true.

Could one also put a circle that reads: Divorce, or Death by Falling, or Small Pox, or DUI, or Got Fired, or Lost My House to Vegas?
post #1374 of 5110
Quote:
Originally Posted by Lighthouse View Post

I wonder if that's true.

Could one also put a circle that reads: Divorce, or Death by Falling, or Small Pox, or DUI, or Got Fired, or Lost My House to Vegas?

Well, we all have our perspectives, but for me one of those things is not like the others. So maybe it's like a Venn diagram or something?
post #1375 of 5110
Thread Starter 
Quote:
Originally Posted by otc View Post

Also, log dependent variables have a handy feature.

If you have a regression of the form Log(Y)=B0+B1X, you can interpret the parameters very easily. When you fit the model to Log(Y), you know that for every unit change of X, Y will see a percentage change of exp(B1)-1. For very small values of B1, you don't even have to take the anti-log...you can just use the value of B1 as an approximate percentage change in Y for every unit change of X (since, for example, exp(0.06)-1=0.06).

So are one of those guys? You know, the ones that really internalized math in university, and then went and got a job where you'd not forget it right after the final?
post #1376 of 5110
I'm totally going to work "heteroskedasticity" into a discussion the next time I have some wonky data.
post #1377 of 5110
Quote:
Originally Posted by Piobaire View Post

So are one of those guys? You know, the ones that really internalized math in university, and then went and got a job where you'd not forget it right after the final?

Nah, I immediately forgot all of those things. I had to relearn them from wikipedia and "The Cartoon Guide to Statistics" when they started coming back up at work.
post #1378 of 5110
Quote:
Originally Posted by otc View Post

He isn't trying to show the difference, he is trying to show the relationship. The raw data is not ideally suited to this form of analysis...but this is where we get into the reasons why The Atlantic isn't going to pick up on this guy's work if he controlled for everything and published a bunch of regression output.

There doesn't have to be an order of magnitude difference. We aren't trying to make a bar chart that says "These guys are low, these guys are high"...only to discover that the bars are so far apart that the chart is useless. The problem here is not with the data points itself, but with the residuals after you fit the model.
The residuals are going to be right skewed pretty bad, taking the log here helps linearize the relationship and give residuals that are closer to symmetric.
Also, it appears that there is heteroskedasticity to this data--that is, the spread on the residuals varies systematically with the dependent variable (the most conservative areas have very little spread, and you get more spread as you get higher values). The log transformation helps alleviate this problem by bringing the data closer to homoscedastic.
Eyeballing the data, it looks like it is still fairly heterosketastic *after* the log transformation, but I'll be honest that I am not sure what you'd do here. The Romney/Obama percentage gap is kind of a funny independent variable, so I am not sure what kind of additional corrections you might do (if any).

I think you are being deliberately obtuse.

And, frankly, I don't have to know what heterosketastic means to see that you're missing the point. The author offered the chart to support his position. Using a logarithmic graduation on the y-axis doesn't do that as effectively as linear graduation would, especially for a lay audience. And the difference between the high and low values isn't so extreme (as I said, it's less than a single order of magnitude) that the points would be so far away from each other that you couldn't fit them in a chart as you seem to imply.
post #1379 of 5110
I'd disagree with that.
Here is his raw data plotted both in log and in actual values. Scales approximately match his which looks to be 60-700 on the Y and -40-60 on the X.

If anything, I think the log version looks more effective. If you look at the non-log version, it kind of looks like the high observations are weird outliers that should just be ignored, while the log version does a much better job of saying "HEY LOOK! There might be a trend here!". I literally want to draw the line myself, and I would draw it in approximately the right place. If someone was mentally drawing a trend line on the actual values, they would probably draw it too low, mentally discounting the impact of those outliers on the top.

Certainly it is a matter of opinion, but if I were doing further analysis on it already in logs, I wouldn't hesitate to visualize it in the same way.

The only thing I think he really did *wrong* with his scatter plot is that he didn't start at the origin. His range from "0" to 100 is half the size of the range from 100 to 200 when it should really be about double. People will quibble with that...housing price per square foot has a floor (unless someone is giving away free houses!), so plotting that whitespace might be unnecessary, but it can also be misleading to omit it.
post #1380 of 5110
Statistically, the difference here is marginal. r^2 is a little higher in logs.
Visually though, it looks like a better fit....

Try for yourself, mentally draw what you think is a best fit line on both of those charts and then look here: Warning: Spoiler! (Click to show)
Second chart may be a little different because excel sucks.


Does the second one not look like a better fit? It doesn't lose the proximity to those points on the far left (where it is essentially predicting free houses despite being within the domain of the data). Most people would mentally picture the line in the first chart as having less slope and a higher intercept (the left side being higher up and the right end being a bit lower) because the human brain simply doesn't work the same way as statistics. The residuals are squared, but our mind doesn't like to think about exponential differences.
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Current Events, Power and Money
Styleforum › Forums › General › Current Events, Power and Money › Daily CE Musings of Piob