Wednesday, July 24, 2013
Brian Tamanaha’s Straw Men (Part 1): Why we used SIPP data from 1996 to 2011
BT Claim: We could have used more historical data without introducing continuity and other methodological problems
BT quote: “Although SIPP was redesigned in 1996, there are surveys for 1993 and 1992, which allow continuity . . .”
Response: Using more historical data from SIPP would likely have introduced continuity and other methodological problems
SIPP does indeed go back farther than 1996. We chose that date because it was the beginning of an updated and revitalized SIPP that continues to this day. SIPP was substantially redesigned in 1996 to increase sample size and improve data quality. Combining different versions of SIPP could have introduced methodological problems. That doesn't mean one could not do it in the future, but it might raise as many questions as it would answer.
Had we used earlier data, it could be difficult to know to what extent changes to our earnings premiums estimates were caused by changes in the real world, and to what extent they were artifacts caused by changes to the SIPP methodology.
Because SIPP has developed and improved over time, the more recent data is more reliable than older historical data. All else being equal, a larger sample size and more years of data are preferable. However, data quality issues suggest focusing on more recent data.
If older data were included, it probably would have been appropriate to weight more recent and higher quality data more heavily than older and lower quality data. We would likely also have had to make adjustments for differences that might have been caused by changes in survey methodology. Such adjustments would inevitably have been controversial.
Because the sample size increased dramatically after 1996, including a few years of pre 1996 data would not provide as much new data or have the potential to change our estimates by nearly as much as Professor Tamanaha believes. There are also gaps in SIPP data from the 1980s because of insufficient funding.
These issues and the 1996 changes are explained at length in the Survey of Income and Program Participation User’s Guide.
Changes to the new 1996 version of SIPP include:
- Roughly doubling the sample size
- This improves the precision of estimates and shrinks standard errors
- Lengthening the panels from 3 years to 4 years
- This reduces the severity of the regression to the median problem
- Introducing computer assisted interviewing to improve data collection and reduce errors or the need to impute for missing data
- Introducing oversampling of low income neighborhoods
- This mitigates response bias issues we previously discussed, which are most likely to affect the bottom of the distribution
- New income
topcoding procedures were instituted with the 1996 Panel
- This will affect both means and various points in the distribution
- Topcoding is done on a monthly or quarterly basis, and can therefore undercount end of year bonuses, even for those who are not extremely high income year-round
Most government surveys topcode income data—that is, there is a maximum income that they will report. This is done to protect the privacy of high-income individuals who could more easily be identified from ostensibly confidential survey data if their incomes were revealed.
Because law graduates tend to have higher incomes than bachelor’s, topcoding introduces downward bias to earnings premiums estimates. Midstream changes to topcoding procedures can change this bias and create problems with respect to consistency and continuity.
Without going into more detail, the topcoding procedure that began in 1996 appears to be an improvement over the earlier topcoding procedure.
These are only a subset of the problems extending the SIPP data back past 1996 would have introduced. For us, the costs of backfilling data appear to outweigh the benefits. If other parties wish to pursue that course, we'll be interested in what they find, just as we hope others were interested in our findings.