We are teaching our 16 year old son the dark art of econometrics. I asked him the following question. Nutritionists claim that eating peanut butter makes you fat. A statistician (who is not trained in economics) has attended a "field experiment" conference in Cambridge and has received a big grant to randomly assign jars of peanut butter to people. To keep this simple, let's assume that people assigned to the treatment group always eat their peanut butter (no free disposal, no secondary markets in selling peanut butter). The statistician observes the weekly weight of each person in his sample and observes whether the person is randomly assigned to the treatment group and knows the date when the treatment starts.
Since assignment to treatment is randomly assigned, the simple regression for person j in week t can be run without "fear" of violating the classic OLS assumptions;
Weight_jt = intercept_j + B*Treatment_jt + U_jt
Intercept_j represents the average weight for person j in the absence of the treatment
Treatment_jt indicates if person j is receiving free peanut butter in week t
U_jt are the unexplained determinants of weight for person j in week t
B is the key coefficient of interest. The nutritionist's key hypothesis is that B> 0 and statistically significant .
Since treatment (i.e receiving free peanut butter) is randomly assigned, there appears to be no "bias" from running OLS.
Now let's do some economics; The researcher runs this experiment and is shocked to find that he can't reject the hypothesis that B=0. What inference do you make here?
My starting point with my son is that the production function of weight presented above is vague. The U_jt term reflects hundreds of unobserved determinants of weight. Suppose that one of them is exercise. Suppose that people believe that peanut butter does make you fat. Suppose that people do not want to gain weight. Under these assumptions, those who are randomly assigned to eat peanut butter respond to this treatment by exercising more. This behavioral response generates the B=0 finding.
If "all else (including exercise) had remained equal at the time of the treatment" then the nutritionist would have recovered a positive B. So, the random assignment to treatment actually causes a change in U_jt. This point has not been discussed enough in the literature and is relevant in almost all Regression Discontinuity studies. When economic agents are aware that they have been assigned to treatment, they often change their behavior on other margins and the statistician observes a "net effect". See my paper with Randy Walsh.
An example. A prominent QJE paper uses school attendance boundaries to document that home prices are higher on the good school district side and concludes that the total differential across the boundary explains this price jump. So the simple regression is:
home price = border fixed effect + B*(Home in Good School District) + U
What is U?
Suppose that richer people live on the good school district side and suppose that people have a utility function such that good schools, ping pong tables and Jacuzzis are complements. In this case, the homes on the good school side of the boundary will install these unobserved (to the statistician) features and the researcher will over-estimate B because the U jumps up at the boundary on the good side. Economic relationships play a key role in driving the statistical patterns that we see. "Random assignment" to treatment does not solve this problem if economic agents can reoptimize once they know their "endowment" (i.e whether they have been assigned the peanut butter or assigned to live in a good school district).