In my day-to-day research I spend much of my time reading urban economics papers. Most of them aim to identify some type of causal link between two variables. As a consequence, most of them have a strong empirical component... lots of tables, test statistics, and graphics. The empirical tool by excellence used by economists is regression analysis, or econometrics as they like to call it. I am very skeptical of regression analysis, and I want to spell out some of my thoughts about it.
Let us imagine an economist asking the question "Which is the variable that causes an object to fall when released from a certain height above, say, the floor?" Would the tools of econometrics help answer his question? Our economist perhaps heard somewhere that mass is what makes material objects to "fall" onto others.
So he collects information about many objects in a spreadsheet. Each row is a different object, and every column has information about the object. First, he includes a "dummy" variable that has the value 1 if the object fell when released, and 0 if it didn't. Most of the objects in his database did fall (and have 1's as their dependent variable to be explained), but still there are some did not, such as helium balloons. Then, he has other relevant information about the object: the height from which the object was released, the temperature of the environment, the temperature of the object, the time in which it was released, the initial acceleration, the final speed before hitting the ground, the latitude, the longitude, the volume, the color, the smell, the texture, the type of material, probably he also includes the price, and obviously, its mass. His question is: does mass cause objects to fall? And if it does, what is the strength of the effect?
It is a perfectly sound and logical question to ask, and as we've mentioned, he has the expectation that mass does cause objects to fall. So he runs a first regression to see if he finds a correlation. Correlation is supposed to be a necessary, although not sufficient, condition for causality. However, despite all the data, when he runs his first exploratory regression (no instrumental variables yet), the mass variable ends up with zero explanatory power. Zero. Clearly, physicists got it wrong. A simple regression, controlling for all sorts of variables, shows unambiguously that mass has no effect on whether an object will fall. Worse, mass doesn't even correlate with any of the other control variables. Its role in the regression equation is the same as including the zodiac sign of the experimenter.
We know, of course, that there are two effects cancelling each other. Mass does cause objects to fall, but mass also causes objects to resist changes in movement. The net effect: all objects fall, and all fall at the same acceleration. I am not an expert in econometrics, I am not a professional statistician, and don't have a PhD in Theoretical or Experimental Physics either. I may be wrong, but I think that there are no ways to test these two effects using regression analysis. If this type of problems arise identifying causes in situations as simple as this one, I expect more pathological problems to arise in the social sciences.
My hope really is that we stop relying so much on regression analysis to learn about the world.