On Treatment Effects estimation using LOngitudinal data
Evaluating the effect of public policies, often called «treatment effects«, is a major goal in micro-econometrics. However, it is difficult to achieve because of endogeneity: beneficiaries of a public policy such as subsidized jobs often differ from non-beneficiaries. Thus, a comparison of these two populations captures not only the effect of the policy, but also the intrinsic differences between the populations. As a result, such a comparison is not informative on the efficiency of policies. Longitudinal data, taken in a broad sense (panel data or repeated cross sections), have long been considered as a means of solving this problem. Intuitively, they make it possible to control unobserved heterogeneity that is stable over time. Although longitudinal data are prominent in the evaluation of public policies, they still raise methodological challenges. Some commonly used estimation methods may lead to erroneous conclusions, especially if treatment effects are heterogeneous. This research project aims to improve these practices in this case. It is divided into two sub-themes related to the nature of the models considered. The first corresponds to linear models, and consists of studying regressions inspired by the «difference in differences« method. The second sub-theme focuses on nonlinear models, used in particular when the dependent variable is of limited variation (e.g., binary). For such models, we aim to clarify the identifying conditions on parameters of interest, and to develop tools to estimate them at best.
The theoretical parts are based on econometric and statistical theory, but sometimes also other subfields of mathematics such as functional analysis. The programs computing the estimators or statistical measures are developed with Stata.
A first version of the paper «Two-way fixed effects estimators with heterogeneous treatment effects« is published on arXiv:
The paper shows that the two-way fixed effects regressions, used very often by applied economists, do not identify interpretable causal effects in the presence of heterogeneity of treatment effects, but weighted sums of different average treatment effects. We display the formulas of the corresponding weights. We also propose robustness measures for these regressions, as well as another method for identifying interpretable average treatment effects. We finally show, by revising two articles published in the American Economic Review, that our results may modify the conclusions one could draw from these regressions.
Beyond the paper, we have developed, with Antoine Dib (a PhD student at UC Santa Barbara), a Stata package computing the weights and associated robustness measures (the package is available on our websites). The alternative method that we suggest can also be computed through another Stata package, fuzzydid, which we developed with Yannick Guyonvarch (a PhD student at CREST). In order to spread this method as much as possible, which was one of the key objectives of OTELO, we wrote and published in Stata Journal the documentation of the fuzzydid package.
Regarding nonlinear models, new theoretical results have been established, in particular a necessary condition for the existence of a root-n consistent estimator for marginal effects has been obtained. Stéphane Bonhomme and Laurent Davezies are currently working on the fact that the estimation of some parameters of interest requires regularization, implying that a smoothing parameter must be chosen. A heuristic based on the data leads to very satisfactory results on simulated data, but obtaining general theoretical results proves difficult. Regarding the identification of binary choice panel data models, we have exhibited moment conditions suggesting the possibility of identifying these models outside the case of logistic errors. We now study these moment conditions in more detail to understand exactly under which condition(s) identification can be obtained. Finally, we have made some progress on the identification of average marginal effects in the fixed effect logit model. We established the theoretical bounds on these effects and developed a simple method for computing these bounds. We also developed corresponding estimators. We are currently studying the theoretical properties of these estimators.
The paper « Two-way fixed effects estimators with heterogeneous treatment effects » has been returned for revision by the American Economic Review.
The estimation of the causal effects of public policies, often called "treatment effects", is a major goal in microeconometrics. This aim is complicated by the endogeneity problem. The recipients of public policies, such as workers employed on subsidized jobs, are often if not always different from the non-recipients. Therefore, a direct comparison between the two populations does not capture the sole effect of the policies, and cannot be used to assess their efficiency. In this respect, it has long been recognized that longitudinal data, defined in a broad sense (i.e. including both panel data and repeated cross sections), could be effective to identify and estimate treatment effects. Intuitively, they allow to control for time invariant unobserved heterogeneity. They can therefore solve the endogeneity problem, as long as it originates from such individual "fixed effects".
While longitudinal data play an important role in the evaluation of public policies, they still raise methodological challenges. Some commonly used estimation practices may lead to erroneous conclusions, especially if treatment effects are heterogeneous. This research project aims at improving these practices. It is organized around two sub-themes, related to the nature of the models considered.
The first corresponds to linear models. In the usual differences-of-differences method, groups are either fully treated or fully untread. But applied social scientists often face less clear-cut settings. In such cases, we conjecture, following de Chaisemartin and D'Haultfoeuille (2016), that usual estimators could be severely biased, especially if treatment effects are heterogeneous. Our purpose will be first to clarify the conditions under which such estimators are valid. We will also propose other methods based on less restrictive assumptions. We will finally reanalyze the abundant empirical research based on the usual estimators, with the lens of these new findings.
The second sub-theme corresponds to nonlinear models, which are especially useful when the outcome variable is binary or more generally limited. In such models, estimating the "primitive" parameters or treatment effects traditionally requires strategies that are model-specific. Moreover, point identification of these parameters may not hold, requiring a partial identification analysis. Here as well, our project will aim at making new methodological contributions. We will first extend Bonhomme (2012), who suggests a general analysis for these models. Particular attention will be given to the estimation of relevant treatment effect parameters, which are often more difficult to estimate than the primitive parameters of the model. When point identification of these parameters is impossible, we will consider the construction of optimal bounds, relying in particular on the tools developed in convex analysis for the so-called truncated moment problem.
These theoretical developments will be followed by the production of Stata programs, in order to disseminate as much as possible these new methods among social scientists. The packages will need to be both user friendly and robust to a wide variety of data configurations. This will involve careful and time-consuming programming, which will be carried out in part by research assistants.
Monsieur Xavier D'Haultfoeuille (CREST UMR9194)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
CNRS CREST UMR9194
CNRS CREST UMR9194
Help of the ANR 97,200 euros
Beginning and duration of the scientific project: - 48 Months