Data Analytics

**Designing Experiment Homework 5**

**Designing Experiment Homework 5**

**Article & Data**

This homework is partially based on Ferraro and Price (2013) Review of Economics and Statistics, which you used for HW1-HW3, and partially based on Thornton (2008) American Economic Review, which you used for HW3-HW4.

**Individual Questions**

**Questions on Ferraro and Price**

In the Ferraro-Price data set, the variable percent_report reports the household’s percentile for water use the year before the experiment was conducted (2006). Let’s define “below median” as

any household with percent_report<50 and “above median” as any household with percent_report>=50. We have thus defined two subgroups: (1) treated and control households who were below the median; and (2) treated and control households who were at or above the median. Within each subgroup, you can estimate an average treatment effect, known as a conditional average treatment effect (CATE) because it’s an ATE conditional on some characteristic of the experimental units (in this case, pre-treatment water use).

1a. Estimate the conditional average treatment effect (CATE) for below-median water users. You can estimate this CATE by either calculating a simple difference-in-means in summer 2007 water use (like in HW1) between treated and control groups or by regressing summer 2007 water use on treat3, water_2006, and apr_may07 (like in HW2). For calculating standard errors, you can either use formula 3.6 in your textbook (like in HW1) or the output that Excel reports from the regression (like in HW2).

1b. Estimate the standard error of the estimated CATE for below-median water users.

1c. What is the 95% confidence interval of the estimated CATE for below-median water users? [Use answer from 1b and the simple formula that you used in HW1 to calculate confidence intervals; If you used regression to estimate the CATE, the program output will report the 95% CI]

1d. In HW1 and HW2, you estimated the ATE of Treatment 3. In this HW5, you estimated the ATE conditional on being a below-median water use (1a above)? Which design has greater statistical power: the one for estimating the ATE or the one for estimating the CATE? How do you know? [Hint: You do not need to do any calculations. Use your knowledge of what factors affect statistical power]

2a. Estimate the CATE for above-median water users.

2b. Estimate the standard error of the CATE for above-median water users.

2c. What is the 95% confidence interval of the conditional average treatment effect for above-median water users? [Use answer from 2b and the simple formula that you used in HW1 to calculate confidence intervals; If you used regression to estimate the CATE, the program output will report the 95% CI]

2d. Based on your estimates in 1a and 2a, which subgroup seems to respond most to the Treatment 3 message? I’m just asking you to compare the magnitudes of the two numbers. No statistical test is necessary.

**Questions on Thornton (2008)**

For this question, you need to use the variable HIV_04, which equals zero if the test indicated the person was HIV negative (HIV-) and equals one if the test indicated the person was HIV positive (HIV+). Within each of these subgroups, you can estimate a CATE: the average treatment effect of receiving an incentive (any) on getting the test result (got), conditional on the person being HIV- or HIV+. You can estimate these two CATEs by calculating a simple difference-in-means in getting the test result (got = 1 vs got = 0).

3a. What is the CATE for HIV+ individuals (the effect of receiving any incentive on getting results for HIV+ individuals)?3b. What is the CATE for HIV- individuals (the effect of receiving any incentive on getting results for HIV- individuals)?