Keith Smolkowski (e-mail me)

Causal Inference

Oregon Research Institute

Jump to A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, R, S, T, U, V, W, or XYZ
See also Statistical Methods for Missing Data


Alshurafa, M., Briel, M., Akl, E. A., Haines, T., Moayyedi, P., Gentles, S. J., Rios, L., Tran, C., Bhatnagar, N., Lamontagne, F., Walter, S. D., & Guyatt, G. H. (2012). Inconsistent definitions for intention-to-treat in relation to missing outcome data: Systematic review of the methods literature. PloS One, 7(11), e49163-e49163. https://doi.org/​10.1371/​journal.pone.0049163

Altman, D. G. (1985). Comparability of randomised groups. Statistician, 34, 125-126.

Antonakis, J., Bendahan, S., Jacquart, P., & Lalive, R. (2010). On making causal claims: A review and recommendations. The Leadership Quarterly, 21, 1086-1120. https://doi.org/​10.1016/​j.leaqua.2010.10.010

Angrist, J. D. (2006). Instrumental variables methods in experimental criminological research: What, why and how. Journal of Experimental Criminology, 2(1), 23-44.

Angrist, J. D., & Imbens, G. W. (1995). Two stage least squares estimation of average causal effects in models with variable treatment intensity. Journal of the American Statistical Association, 90, 431-442.

Angrist, J. D., Imbens, G. W., & Rubin, D. (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association, 91, 444-455.

Excellent paper and recommended by Michael Sobel for reading on intermediate outcomes.

Angrist, J. D., & Krueger, A. (1991). Does compulsory school attendance affect schooling and earnings? Quarterly Journal of Economics, 106, 979-1014.

Atkins, D. C. (2009). Clinical trials methodology: Randomization, intent-to-treat, and random-effects regression. Depression and Anxiety, 26(8), 697-700. https://doi.org/​10.1002/​da.20594

Austin, P. C. (2008). A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003. Statistics in Medicine, 27, 2037-2049. https://doi.org/​10.1002/​sim.3150

Austin, P. C. (2009). Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Statistics in Medicine, 28, 3083-3107.

Austin, P. C. (2011). A tutorial and case study in propensity score analysis: An application to estimating the effect of in-hospital smoking cessation counseling on mortality. Multivariate Behavioral Research, 46(1), 119-151. https://doi.org/​10.1080/​00273171.2011.540480

Austin, P. C. (2011). An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behavioral Research, 46(3), 399-424. https://doi.org/​10.1080/​00273171.2011.568786

Bang, H., & Davis, C. (2007). On estimating treatment effects under non-compliance in randomized clinical trials: Are intent-to-treat or instrumental variables analyses perfect solutions? Statistics In Medicine, 26(5), 954-964.

Baron, J. (2000). Thinking and deciding (3rd ed.). New York: Cambridge University Press. ◊

See chapter 7 on hypothesis testing.

Bettinger, E. P. (2010). Instrumental variables. In P. Peterson, E. Baker, & B. McGaw (Eds.), International Encyclopedia of Education (3rd Ed., pp. 223-228). Oxford: Elsevier.

Bloom. H. S. (1984). Accounting for no-shows in experimental evaluation designs. Evaluation Review, 8, 225-246.

Brody, T. (2016). Intent-to-treat analysis versus per protocol analysis. In T. Brody, Clinical trials: Study design, endpoints and biomarkers, drug safety, and FDA and ICH guidelines (pp. 173-201). New York: Elsevier.

Cochran, W. G., & Rubin, D. B. (1973). Controlling bias in observational studies: A review. Sankhya: The Indian Journal of Statistics, Series A, 35(Part 4), 417-66.

Cohen, D. K., Raudenbush, S. W., & Ball, D. L. (2003). Resources, instruction, and research. Educational Evaluation and Policy Analysis, 25(2), 119-142.

Connell, A. M., Dishion, T. J., Yasui, M, & Kavanagh, K. (2007). An adaptive approach to family intervention: Linking engagement in family-centered intervention to reductions in adolescent problem behavior. Journal of Consulting and Clinical Psychology, 75(4), 568-579.

A practical example of the use of "Complier Average Causal Effect analysis (CACE; see G. Imbens & D. Rubin, 1997) to examine the impact of an adaptive approach to family intervention in the public schools on rates of substance use and antisocial behavior among students ages 11-17" (abstract). The study, however, may have included some important methodological flaws. This paper follows the same sample as Véronneau, Dishion, Connell, and Kavanagh (2016) and Stormshak, Connell, and Dishion (2009), but the papers have important differences. Stormshak et al. report that "when the students moved on to high school, FRC services were discontinued" (p. 225). Connell et al. (2007) reported something similar but, paradoxially, then asserted that "students . . . were offered services if they remained in the county" (p. 571). Véronneau et al. further declared that "FCUs were also offered in high school (in Grades 10-11) for those families remaining in the school district" (p. 6), noting that 44.7% of noncompliers in middle school participated in the FCU in high school. Because Stormshak et al. follows students through Grade 11, it is not clear how to reconcile the three reports. See notes for Véronneau et al. about problems with the CASE models, which likely apply to this study as well, on the Parenting Practices bibliography page. For exmaple, CACE models make restrictive assumptions such as the pivotal exclusion restriction: the assumption that "never-takers and always-takers receive identical treatment regardless of which treatment condition they are assigned to" (Jo, 2002, p. 181).

Cornfield, J. (1971). The University Group Diabetes Program. A further statistical analysis of the mortality findings. JAMA, 217(12), 1676-1687.

Cox, D. R. (1958). Planning of experiments. New York: Wiley.

Cox (1958) also noted that "there is no 'interference' between different units if the observation on one unit [is] unaffected by the particular assignment of treatments to the other units" (Cox 1958, p. 19). In terms of the potential outcomes framework (Rubin, 2005), this can be restated: "the [potential outcome for] one unit should be unaffected by the particular assignment of treatments to the other units" (Cox 1958, p. 19), which is similar to Rubin's (1980) stable unit treatment value assumption.

Cox (1958), therefore, offers early support for the analysis of individuals nested within treatment groups or providers even in designs that randomly assign individuals to condition: "In general, an experimental unit can be defined as the smallest division of experimental material such that any two units may receive different treatments" (Cox, 1958, p. 2). Raths (1967) rephrased Cox to declare that "a unit of an experiment is defined as the smallest entity within an experiment that may receive different treatments" (p. 263).

Cox, D. R. (1992). Planning of Experiments. New York: Wiley.

As in his earlier (1958) book, Cox (1992) defined the experimental unit as "the smallest division of experimental material such that any two units may receive different treatments in the actual experiment" (p. 2).

Cox, D. R., & Reid, N. (2000). The theory of the design of experiments. Chapman & Hall/CRC.

D'Agostino, R. B., Jr. (1998). Propensity score methods for bias reduction for the comparison of a treatment to a non-randomized control group. Statistics in Medicine, 17(19), 2265-2281.

Recommended by Michael Sobel for reading on causal inference.

D'Agostino, R. B., Jr. (2007). Propensity scores in cardiovascular research. Circulation, 115(17), 2340-2343.

D'Agostino, R. B., Jr., D'Agostino, R. B., Sr. (2007). Estimating treatment effects using observational data. Journal of the American Medical Association, 297(3), 314-316.

D'Agostino, R. B., Sr, & Kwan, H. (1995). Measuring effectiveness: What to expect without a randomized control group. Medical Care, 33(4 suppl), AS95-AS105.

Dunn, G., Maracy, M., Dowrick, C., Ayuso-Mateos, J. L., Dalgard, O. S., Page, H. Lehtinen, V., Casey, P., Wilkinson, C., Vázquez-Barquero, J. L., & Wilkinson, G. (2003). Estimating psychological treatment effects from a randomised controlled trial with both non-compliance and loss to follow-up. British Journal of Psychiatry, 183 323-331.

Dunn, P. M. (1997). James Lind (1716-94) of Edinburgh and the treatment of scurvy. Archives of Disease in Childhood, 76, F64-F65.

Dunn (1996) describes James Lind's first known attempt to conduct a controlled clinical trial (nonrandom) to investigate the treatment of scurvy. See Lind (1953).

Fisher, L. D., Dixon, D. O., Herson, J., Frankowski, R. K., Hearron, M. S., & Peace, K. E. (1990) Intention-to-treat in clinical trials. In K.E. Peace (Ed.), Statistical issues in drug research and development. New York: Marcel Dekker.

Fisher et al. (1990) suggest that analysts should include all randomized patients in the groups to which they were randomly assigned, regardless of their adherence with the entry criteria, regardless of the treatment they actually received, and regardless of subsequent withdrawal from treatment or deviation from the protocol.

Freedman, D. A. (1991). Statistical models and shoe leather. Sociological Methodology, 21, 291-313.

Freedman, D. A. (1997). From association to causation via regression. Advances in Applied Mathematics, 18, 59-110.

Funk, M., Westreich, D., Wiesen, C., Stürmer, T., Brookhart, M., & Davidian, M. (2011). Doubly robust estimation of causal effects. American Journal of Epidemiology, 173(7), 761-767.

Garrido, M. M., Kelley, A. S., Paris, J., Roza, K., Meier, D. E., Morrison, R. S., & Aldridge, M. D. (2014). Methods for constructing and assessing propensity scores. Health Services Research, 49(5), 1701-1720. https://doi.org/​10.1111/​1475-6773.12182

Genetian, L. A., Morris, P. A., Johannes. M., & Bloom, H. S. (2005). Constructing instrumental variables from experimental data to explore how treatments produce effects. In M. S. Bloom (Ed.). Learning more from social experiments (pp. 75-114). New York: Russell Sage.

Greenland, S. (1996). Basic methods for sensitivity analysis of biases. International Journal of Epidemiology, 25(6), 1107-1116.

Greenland, S., & Robins, J. M. (2009). Identifiability, exchangeability and confounding revisited. Epidemiologic Perspectives and Innovations, 64(4). https://doi.org/​10.1186/​1742-5573-6-4 [Retrieved from https://www.biomedcentral.com/]

Greenland and Robins (2009) review a paper they wrote more than 20 years earlier, "Identifiability, Exchangeability and Epidemiological Confounding" and discuss challenges in the literature and subsequent advances. For example, "many researchers" treated "causal intermediates (causes of disease affected by exposure) treated as confounders," which "adjusts away part of the very effect under study and can induce selection bias" (p. 2; see Dishion, Kavanagh, Schneiger, Nelson, & Kaufman, 2002, for an example). Greenland and Robins cover additional topics associated with confounding and causal inference, such as assumptions of ignorability and how randomization does not guarantee ignorability or the absence of confounding.

Grosz, M. P., Rohrer, J. M., & Thoemmes, F. (2020). The taboo against explicit causal inference in nonexperimental psychology. Perspectives on Psychological Science, 15(5), 1243-1255. https://doi.org/​10.1177/1745691620921521

Guo, H., Dawid, P., & Berzuini, G. (2016). Sufficient Covariate, Propensity Variable and Doubly Robust Estimation. In H. He, P. Wu, & D.-G. Chen (Eds.), Statistical causal inferences and their applications in public health research (pp. 49-89). Cham: Springer. bk

Hayduk, L, Cummings, G., Stratkotter, R., Nimmo, M., Grygoryev, K., Dosman, D., Gillespie, M., Pazderka-Robinson, H., & Boadu, K. (2003). Pearl's d-separation: One more step into causal thinking. Structural Equation Modeling, 10(2), 289-311.

He, H., Hu, J., & He, J. (2016). Overview of propensity score methods. In H. He, P. Wu, & D.-G. Chen (Eds.), Statistical causal inferences and their applications in public health research (pp. 29-48). Cham: Springer. bk

He, H., Wu, P., & Chen, D.-G. (Eds.) (2016). Statistical causal inferences and their applications in public health research. Cham: Springer.

He, Wu, and Chen (2016) begins with a chapter on causal inferences developed from the concept of potential outcomes. Additional chapters address propensity score methods, causal inference from randomized trials, and the use of structural equation models for mediation analysis. The latter chapters do not appear, however, to address the concerns of Maxwell and colleagues (Maxwell & Cole, 2007; Maxwell, Cole, & Mitchell, 2011; Mitchell & Maxwell, 2013; see also Imai, Jo, & Stuart, 2011).

Heckman, J. J. (2005). The scientific model of causality. Sociological Methodology, 35(1), 1-98.

Hill, J. (2008). Discussion of research using propensity-score matching: Comments on 'A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003' by Peter Austin, Statistics in Medicine. Statistics in Medicine, 27, 2055-2061. https://doi.org/​10.1002/sim.3245

Ho, D., Imai, K., King, G., & Stuart, E. A. (2007). Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis, 15(3), 199-236.

Ho, Imai, King, and Stuart (2007) "propose a unified approach [to matching] that makes it possible for researchers to preprocess data with matching . . . and then to apply the best parametric techniques they would have used anyway. This procedure makes parametric models produce more accurate and considerably less model-dependent causal inferences" (abstract). This paper is used to justify the WWC standard that baseline differences in RCTs and QEDs fall below an effect size (Hedges' g) of 0.25 standard deviations, although it appears that Ho et al. are addressing a different issue.

Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81(396), 945-960.

Holland, P. W. (1988). Causal inference, path analysis, and recursive structural equation models. In C. Clogg & G. Arminger (Eds.), Sociological Methodology, Volume 18 (pp. 449-484). Washington, DC: American Sociological Association.

Recommended by Michael Sobel for reading on intermediate outcomes. See book on Amazon.com.

Holland, P. W. (1993). Which comes first, cause or effect? In G Keren & C. Lewis (Eds.), A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 273-282). Hillsdale, NJ: Lawrence Erlbaum Associates.

Holland, P. W., & Rubin, D. B. (1982). On Lord's Paradox (Technical Report No 82-34). Princeton, NJ: Educational Testing Service. Retrieved from the Wiley Online Library: http://onlinelibrary.wiley.com/journal/​10.1002/(ISSN)2330-8516

Δ Holland and Rubin (1982) conclude that "the blind use of complicated statistical procedures, like analysis of covariance, is doomed ot lead to absurd concclusions" (p. 30). That said, Holland and Rubin argue that analysis of covariance can provide valuable answers in certain situations but that causal statements must be made explicit, ideally through the use of mathematics, rather than in natural language, which can be "vague and potentailly misleading" (p. 30).

Holland, P. W., & Rubin, D. B. (1983). On Lord's Paradox. In H. Wainer & S. Messick (Eds.), Principles of modern psychological measurement (pp. 3-35). Hillsdale, NJ: Lawrence Erlbaum.

Holland, P. W., & Rubin, D. B. (1988). Causal inference in retrospective studies. Evaluation Review, 12(3), 203-231. https://doi.org/​10.1177/​0193841X8801200301

Hollis, S., & Campbell, F. (1999). What is meant by intention to treat analysis? Survey of published randomised controlled trials. British Medical Journal, 319(7211), 670-674.

Imai, K., King, G., & Stuart, E. A. (2008). Misunderstandings between experimentalists and observationalists about causal inference. Journal of the Royal Statistical Society A, 171(2), 481-502. https://doi.org/​10.1111/​j.1467-985X.2007.00527.x

Imai, King, and Stuart (2008) discuss random sampling, random treatment assignment, blocking before assignment, and matching after data collection. The authors also discuss the absurdity of baseline testing for balance in randomized trials.

Imbens, G. W. (2004). Nonparametric estimation of average treatment effects under exogeneity: A review. Review of Economics and Statistics, 86(1), 4-29.

Recommended by Michael Sobel for reading on causal inference.

Imbens, G. W. (2010). An economist’s perspective on Shadish (2010) and West and Thoemmes (2010). Psychological Methods, 15(1), 47-55. https://doi.org/​10.1037/​a0018538

Imbens, G. W. & Angrist, J. D. (1994). Identification and estimation of local average treatment effects. Econometrica, 62, 467-475.

Imbens, G. W., & Rubin, D. B. (1997). Estimating outcome distributions for compliers in instrumental variables models. Review of Economic Studies, 64(4), 555-574.

Imbens, G. W., & Rubin, D. B. (2015). Causal inference for statistics, social, biomedical sciences: An introduction. New York: Cambridge University Press. ◊

Imbens and Rubin (2015) provide an exceptional introduction to the use of data and statistics to make causal inferences.

Jo, B. (2002). Estimation of intervention effects with noncompliance: Alternative model specifications (with discussion). Journal of Educational and Behavioral Statistics, 27, 385-420. https://doi.org/​10.3102/​10769986027004385

See also Jo's rejoinder to comments by Rubin and Mealli in JEBS.

Jo, B. (2002). Model misspecification sensitivity analysis in estimating causal effects of interventions with non-compliance. Statistics in Medicine, 21, 3161-3181.

Jo, B. (2002). Statistical power in randomized intervention studies with noncompliance. Psychological Methods, 7(2), 178-193.

Jo, B. (2008). Causal inference in randomized experiments with mediational processes. Psychological Methods, 13, 314-336. https://doi.org/​10.1037/​a0014207

Jo, B., Asparouhov, T., Muthén, B. O., Ialongo, N. S., & Brown, C. H. (2008). Cluster randomized trials with treatment noncompliance. Psychological Methods, 13(1), 1-18.

Jo, B., & Muthén, B. (2001). Modeling of intervention effects with noncompliance: A latent variable approach for randomized trials. In G. A. Marcoulides & R. E. Schumacker (Eds.), New developments and techniques in structural equation modeling (pp. 57-87). Mahwah, NJ: Lawrence Erlbaum Associates.

Joffe, M. M., Small, D., Hsu, C-S. (2007). Defining and estimating intervention effects for groups that will develop an auxiliary outcome. Statistical Science, 22(1), 74-97.

From the abstract: "It has recently become popular to define treatment effects for subsets of the target population characterized by variables not observable at the time a treatment decision is made. Characterizing and estimating such treatment effects is tricky; the most popular but naive approach inappropriately adjusts for variables affected by treatment and so is biased. We consider several appropriate ways to formalize the effects. . . ."

Kim, D., Pieper, C., Ahmed, A., & Colón-Emeric, C. (2016). Use and interpretation of propensity scores in aging research: A guide for clinical researchers. Journal of the American Geriatrics Society, 64(10), 2065-2073. https://doi.org/​10.1111/​jgs.14253

Kim, Pieper, Ahmed, and Colón-Emeric (2016) reivew four common methods that use proposensity scores: matching, weighting, stratification, and covariate adjustment. For each, they explain the procedure and review best practices and caveats.

Kim, Y., & Steiner, P. (2016). Quasi-experimental designs for causal inference. Educational Psychologist, 51(3-4), 395-405. https://doi.org/​10.1080/​00461520.2016.1207177

King, G., & Nielsen, R. (2019). Why propensity scores should not be used for matching. Political Analysis, 1-20. https://doi.org/​10.1017/​pan.2019.11 S

King and Nielsen (2109) "show that propensity score matching (PSM), an enormously popular method of preprocessing data for causal inference, often accomplishes the opposite of its intended goal—thus increasing imbalance, inefficiency, model dependence, and bias" (abstract). See Luellen (2007) for similar findings.

La Caze, A., Djulbegovic, B., & Senn, S. (2012). What does randomisation achieve? Evidence-Based Medicine, 17(1), 1-2. https://doi.org/​10.1136/​ebm.2011.100061

Lachin, J. M. (2000). Statistical considerations in the intent-to-treat principle. Controlled Clinical Trials, 21(3), 167-189.

Due to potential bias that can be introduced by postrandomization exclusions, "especially in a large study, the inflation in type I error probability can be severe, 0.50 or higher, even when the null hypothesis is true" (abstract).

Lanza, S., Moore, J., & Butera, N. (2013). Drawing causal inferences using propensity scores: A practical guide for community psychologists. American Journal of Community Psychology, 52(3/4), 380-392. https://doi.org/​10.1007/​s10464-013-9604-4

Larsen, R. J., & Marx, M. L. (1986). An introduction to mathematical statistics and its applications (2nd ed.). Englewood Cliffs, NJ: Pretice-Hall.

Lee, D. S. (2009). Training, wages, and sample selection: Estimating sharp bounds on treatment effects. The Review of Economic Studies, 76(3), 1071-1102.

Lemons, C. J., Fuchs, D., Gilbert, J. K., & Fuchs, L. S. (2014). Evidence-based practices in a changing world: Reconsidering the counterfactual in education research. Educational Researcher, 43(5), 242-252. https://doi.org/​10.3102/0013189x14539189

Lemons, Fuchs, Gilbert, and Fuchs (2014) make some very important points. They focus on improved student skill over time, but they could have easily pointed to comparisons between one treatment and another. For example, the WWC (2012) examined the effects of Reading Mastery (RM) only to conclude that it has no evidence of effects. The flawed report, however, relied on Cooke et al (2004), who compared RM to its sister program, Horizons. Notably, the programs overlapped on 17 of 22 (77%) of the fundamental program characteristics, from the scope to design to methods of field-testing (Engelmann, 2000). The WWC conclusion represents an "error in reasoning" (Lilienfeld, Ritschel, Lynn, Cautin, & Latzman, 2014, p. 355).

Lilienfeld, S. O., Ritschel, L. A., Lynn, S. J., Cautin, R. L., & Latzman, R. D. (2014). Why ineffective psychotherapies appear to work: A taxonomy of causes of spurious therapeutic effectiveness. Perspectives on Psychological Science, 9(4), 355-387. https://doi.org/​10.1177/​1745691614535216

Lilienfeld, Ritschel, Lynn, Cautin, and Latzman (2014) outline the potential causes of spurious treatment effects for psychological interventions that explain why interventions may appear to work when they, in fact, do not. They discuss the causes in terms of the perceptions of interventionists and their treatment recipients and potentially associates (e.g., family & friends). The authors locate each cause of spurious effects within four broad cognitive barriers: naïve realism, confirmation bias, illusory causation, and illusion of control. Many of the 26 potential causes of spurious effects have parallels for educational, social-behavioral, or other interventions, curricula, policies, and prevention programs.

Lind J. (1753). A treatise on the scurvy: In three parts. Edinburgh: Sands, Murray, and Cochran for A. Kincaid & A Donaldson. Retrieved from the James Lind Libraray, http://www.jameslindlibrary.org/

Lind (1753) reported, "On the 20th of May 1747, I selected twelve patients in the scurvy, on board the Salisbury at sea. Their cases were as similar as I could have them" (p. 191). See also Dunn (1997).

Little, R. J., & Yau, L. H. Y. (1998). Statistical techniques for analyzing data from prevention trails: Treatment of no-shows using Rubin's causal model. Psychological Methods, 3, 147-159.

Luellen, J. K. (2007). A comparison of propensity score estimation and adjustment methods on simulated data. Dissertation Abstracts International: Section B: The Sciences and Engineering, 68(5-B), 3433.

From the abstract: "This study used simulated data to examine the relative performance of five methods of estimating propensity scores (logistic regression, classification trees, bootstrap aggregation, boosted regression, and random forests) crossed with four types of adjustments that utilize propensity scores (matching, stratification, covariance adjustment, and weighting) at two levels of sample sizes (N = 200 and N = 1,000). . . . All combinations of propensity score methods led to at least some average reduction in selection bias, and for most combinations of methods these reductions were statistically significant. However, this seemingly promising finding is tempered by the fact that bias was actually introduced in many replicates, especially when the level of sample size was 200 [emphasis added]. The traditional approach to estimating propensity scores, logistic regression, worked well at reducing selection bias, on average, at both sample sizes and tended to result in more precise estimates of the treatment effect with less potential for introducing bias.  . . . Matching, stratification, and covariance adjustment were fairly competitive and a clear favorite was not discerned." See also King and Nielsen (2019).

Luellen, J. K., Shadish, W. R., & Clark, M. H. (2005). Propensity scores. Evaluation Review, 29(6), 530-558. https://doi.org/​10.1177/​0193841X05275596

Martin, W. (2014). Making valid causal inferences from observational data. Preventive Veterinary Medicine, 113(3), 281-297.

Mauro, R. (1990). Understand L.O.V.E. (left out variables error): A method for estimating the effects of omitted variables. Psychological Bulletin, 108, 314-329. https://doi.org/​10.1037/​0033-2909.108.2.314

Maxwell, S. E. (2010). Introduction to the special section on Campbell’s and Rubin’s conceptualizations of causality. Psychological Methods, 15(1), 1-2. https://doi.org/​10.1037/​a0018825

Maxwell (2010) introduces a special section on two perspectives of causal infrence, those developed by Donald Campbell and Donald Rubin. Commentaries were provided by Shadish (2010) and West and Thoemmes (2010). See also Rubin's (2010) and Impens' (2010) discussion of Shadish and West and Thoemmes.

Mayne, S., Lee, B., Auchincloss, A., & Adams, M. (2015). Evaluating propensity score methods in a quasi-experimental study of the impact of menu-labeling. PLoS ONE, 10(12), E0144962.

Mealli, F., & Rubin, D. B. (2002). Discussion of 'Estimation of intervention effects with noncompliance: Alternative model specifications' By Booil Jo. Journal of Educational and Behavioral Statistics, 27(4), 411-415. https://doi.org/​10.3102/​10769986027004411

Mealli and Rubin (2002) offer a commentary on Jo's (2002) paper in the same journal.

Morgan, S. L., & Winship, C. (2007). Counterfactuals and causal inference: Methods and principles for social research. New York: Cambridge University Press. ◊

Neyman, J. (1990). On the application of probability theory to agricultural experiments. Essay on principles. Section 9 (D. M. Dabrowska & T. P. Speed, Trans.). Statistical Science, 5, 465-480. (Original work published 1923 by Roczniki Nauk Rolniczych Tom X [Annals of Agricultural Sciences], 1-51)

Cite as Neyman (1923/1990). "The potential-outcomes model of causation, also known as the response-schedule or "counterfactual' model, was first formalized by Neyman in 1923" (Greenland & Robins, 2009, p. 2).

Pan, W., & Bai, H. (2018). Propensity score methods for causal inference: An overview. Behaviormetrika, 45(2), 317-334. https://doi.org/​10.1007/​s41237-018-0058-8

Pearl, J. (2000). Causality: Models, reasoning, and inference. New York: Cambridge University Press.

Pearl, J. (2009). Causality: Models, reasoning, and inference (2nd ed.). New York: Cambridge University Press.

Pearl, J., Glymour, M., & Jewell, N. P. (2016). Causal inference in statistics: A primer. New York: John Wiley & Sons. ♦¹

Posavac, E. J. (2002). Using p values to estimate the probability of a statistically significant replication. Understanding Statistics, 1(2), 101-112.

Raudenbush, S. W. (2001). Comparing personal trajectories and drawing causal inferences from longitudinal data. Annual Review of Psychology, 52, 501-25.

Raudenbush, S. W. (2005). How do we study 'what happens next'? Annals of the American Academy of Political and Social Science, 602(1), 131-144. https://doi.org/​10.1177/​0002716205280900

Raudenbush, S. W. (2008). Advancing policy by advancing research on instruction. American Educational Research Journal, 45(1), 206-230.

This theoretical yet accessible paper presents several challenges with educational research that tests "instructional regimes" at the classroom or school level. These include the application of randomization and stable unit treatment value assumption, both critical requirements for causal inference, within the framework of clustered trials. The paper also argues for measurement of the intervention activities, in this case measurement of the experienced, opposed to intended, instructional regimes. "Intended regimes are well measured and accessible to randomized trials, whereas experienced instruction is measured with error and not amenable to randomization" (abstract). Raudenbush also raises challenges associated with multiyear sequences of instruction.

Raudenbush, S. W., Reardon, S. F., Nomi, T. (2012). Statistical analysis for multisite trials using instrumental variables with random coefficients. Journal of Research on Educational Effectiveness, 5(3), 303-332.

Robins, J. M. (2000). Marginal structural models versus structural nested models as tools for causal inference. In M. E. Halloran & D. Berry (Eds.), Statistical models in epidemiology, the environment, and clinical trials (pp. 95-133). New York: Springer-Verlag.

Robins, J. M. (2003). Semantics of causal DAG models and the identification of direct and indirect effects. In P. J. Green, N. L. Hjort, & S. Richardson (Eds.), Highly structured stochastic systems (pp. 70-81). New York: Oxford.

Robins, J. M., & Greenland, S. (1992). Identifiability and exchangeability for direct and indirect effects. Epidemiology, 3(2), 143-155.

Rogosa, D., (1987). Causal models do not support scientific conclusion: A comment in support of Freedman. Journal of Educational Statistics, 12, 185-95.

Rogosa, D. (1988). Myths about longitudinal research. In K. W. Schaie, R. T. Campbell, W. Meredith, & S. C. Rawlings (Eds.), Methodological Issues in Aging Research (pp. 171-209). New York: Springer.

Chapter is concerned with methods for the analysis of longitudinal data. It seeks to convey "right thinking" about longitudinal research. Heroes of this chapter are statistical models for collections of individual growth (learning) curves. Myths indicate some of the beliefs that have impeded doing good longitudinal research.

Rohrer, J. M. (2018). Thinking clearly about correlations and causation: Graphical causal models for observational data. Advances in Methods and Practices in Psychological Science, 1(1), 27-42. https://doi.org/​10.1177/​251524591774562

Rosen, L., Manor, O., Engelhard, D., & Zucker, D. (2006). In defense of the randomized controlled trial for health promotion research. American Journal of Public Health, 96(7), 1181-1186. https://doi.org/​10.2105/​AJPH.2004.061713

Rosenbaum, P. R. (2002). Covariance adjustment in randomized experiments and observational studies. Statistical Science, 17(3), 286-327.

Includes comments by Angrist and Imbens, Hill, and Robins, with a rejoinder by Rosenbaum.

Rosenbaum. P. R. (2002). Observational studies (2nd ed.). New York: Springer

Rosenbaum. P. R. (2007). Interference between units in randomized experiments. Journal of the American Statistical Association, 102(477), 191-200. https://doi.org/​10.1198/​016214506000001112

Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41-55.

Rosenbaum, P. R., & Rubin, D. B. (1984). Reducing bias in observational studies using subcalssification on the propensity score. Journal of the American Statistical Association, 79(387), 516-524.

Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5), 688-701.

Rubin (1974) defines and defends the randomized experiment. He provides a clear explanation of the importance of experimental control, which can be created by randomization for most social science experiments. Rubin also compares the relative value of observational studies to experiments. This excellent and interesting paper should be read periodically, along with Jacob Cohen's (1990) "Things I have learned (so far)."

With respect to the addition of covariates, Rubin makes clear the need to include only carefully considered variables that lie on the causal pathway to the outcome of interest. "When trying to estimate the typical causal effect in the 2N trial experiment, handling additional variables may not be trivial without a well-developed causal model that will properly adjust for those prior variables that causally affect Y and ignore other variables that do not causally affect Y even if they are highly correlated with the observed values of Y. Without such a model, the investigator must be prepared to ignore some variables he feels cannot causally affect Y and use a somewhat arbitrary model to adjust for those variables he feels are important" (Rubin, 1974, p. 697). In a footnote, he points to Cornfield's (1971) discussion of a study on the utility of oral-diabetic drugs.

Rubin, D. B. (1977). Assignment to treatment group on the basis of a covariate. Journal of Educational Statistics, 2, 1-26.

Rubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization. Annals of Statistics, 6, 34-58.

Rubin, D. B. (1980). Discussion of "Randomization analysis of experimental data in the Fisher randomization test" by Basu. Journal of the American Statistical Association, 75, 591-93.

Rubin coins the phrase stable unit treatment value assumption (SUTVA) in his discussion of an article by D. Basu. For more on SUTVA, see Rubin (1986).

Rubin, D. B. (1981). Estimation in parallel randomized experiments. Journal of Educational Statistics, 6, 377-400.

Rubin, D. B. (1986). Which ifs have causal answers? Discussion of "Statistics and causal inference" by Holland. Journal of the American Statistical Association, 83, 396.

Rubin, D. B. (1990). Formal modes of statistical inference for causal effects. Journal of Statistical Planning and Inference, 25, 279-292.

The first six sections provide an interesting overview of causal effects and their defining characteristics. The following eight sections describe several modes of inference.

Rubin, D. B. (1991). Practical implications of models of statistical inference for causal effects and the critical role of random assignment. Biometrics, 47, 1213-1234.

Rubin, D. B. (2005). Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association, 100(469), 322-331.

Rubin, D. B. (2007). The design versus the analysis of observational studies for causal effects: Parallels with the design of randomized studies. Statistics in Medicine, 26(1), 20-36.

See comments on this article by Ian Shrier's (2008) letter with Rubin's reply , Ian Shrier's (2009) letter on propensity scores , and Judea Pearl's (2009) letter with Rubin's reply .

Rubin, D. B. (2010). Causal inference. In P. Peterson, E. Baker, & B. McGaw (Eds.), International Encyclopedia of Education (3rd Ed., pp. 66-71). Oxford: Elsevier.

Rubin, D. B. (2010). Reflections stimulated by the comments of Shadish (2010) and West and Thoemmes (2010). Psychological Methods, 15(1), 38-46. https://doi.org/​10.1037/​a0018537

Rubin, D. B., & Thomas N. (1996). Matching using estimated propensity scores: Relating theory to practice. Biometrics, 52, 249-264.

Schmidt, F. (2010). Detecting and correcting the lies that data tell. Perspectives on Psychological Science, 5(3) 233-242.

Schochet, P. Z., & Burghard, J. (2007). Using propensity scoring to estimate program-related subgroup impacts in experimental program evaluations. Evaluation Review, 31, 95-120.

Schochet and Burghard (2007) explain how to address variability in program impacts based on specific program features, which may include implementation fidelity.

Schultz, K. F., & Grimes, D. A. (2002). Sample size slippages in randomized trials: Exclusions and the lost and wayward. Lancet, 359, 781-785.

Schweig, J. D., & Pane, J. F. (2016). Intention-to-treat analysis in partially nested randomized controlled trials with real-world complexity. International Journal of Research & Method in Education, 39(3), 268-286. https://doi.org/​10.1080/​1743727x.2016.1170800

Senn, S. (2013). Seven myths of randomisation in clinical trials. Statistics in Medicine, 32(9), 1439-1450. https://doi.org/​10.1002/​sim.5713

Shadish, W. R. (2010). Campbell and Rubin: A primer and comparison of their approaches to causal inference in field settings. Psychological Methods, 15(1), 3-17. https://doi.org/​10.1037/​a0015916

Shadish, W. R., & Ragsdale, K. (1996). Random versus nonrandom assignment in controlled experiments: Do you get the same answer? Journal of Consulting and Clinical Psychology, 64(6), 1290-1305.

"It is concluded that studies using nonrandom assignment may produce acceptable approximations to results from randomized experiments under some circumstances but that reliance on results from randomized experiments as the gold standard is still well founded" (abstract). Nonetheless, "a slightly degraded randomized experiment may still produce better effect estimates than many quasi-experiments (Shadish & Ragsdale, 1996)" (Shadish, Cook, & Campbell, 2002, p. 229).

Sheiner, L. B. (2002). Is intent-to-treat analysis always (ever) enough? British Journal of Clinical Pharmacology, 54(2), 203-211. https://doi.org/​10.1046/​j.1365-2125.2002.01628.x

Sheiner, L. B., & Rubin, D. B. (1994). Intention-to-treat analysis and the goals of clinical trials. Clinical Pharmacology and Therapeutics, 87, 6-15.

Smith, G. C. S., & Pell, J. P. (2003). Parachute use to prevent death and major trauma related to gravitational challenge: Systematic review of randomised controlled trials. British Medical Journal, 327, 1459-1461. https://doi.org/​10.1136/​bmj.327.7429.1459

Sobel, M. E. (1995). Causal inference in the social and behavioral sciences. In G. Arminger, C. C. Clogg, & M. E. Sobel (Eds), Handbook of statistical modeling for the social and behavioral sciences. New York: Plenum.

Sobel, M. E. (1996). An introduction to causal inference. Sociological Methods and Research, 24(3), 353-379.

Recommended by Michael Sobel for reading on causal inference.

Sobel, M. E. (2008). Identification of causal parameters in randomized studies with mediating variables. Journal of Educational and Behavioral Statistics, 33(2), 230-251.

Recommended by Michael Sobel for reading on intermediate outcomes.

Sobel, M. (2009). Causal inference in randomized and non-randomized studies: The definition, identification, and estimation of causal parameters. The Sage handbook of quantitative methods in psychology (pp. 3-22). Thousand Oaks, CA: Sage.

Stuart, E. A., Perry, D. F., Le, H.-N., & Ialongo, N. S. (2008). Estimating intervention effects of prevention programs: Accounting for noncompliance. Prevention Science, 9, 288-298. https://doi.org/​10.1007/​s11121-008-0104-y.

Ten Have, T. R., Elliott, M. R., Joffe, M., Zanutto, E., & Datto, C. (2004). Causal models for randomized physician encouragement trials in treating primary care depression. Journal of the American Statistical Association, 99, 16-25.

Ten Have, T. R., Joffe, M., Lynch, K., Brown, G., & Maito, S. (2005). Causal mediation analyses with structural mean models (Biostatistics working paper). University of Pennsylvania.

van den Berg, G. J. (2007). An economic analysis of exclusion restrictions for instrumental variable estimation (IZA Discussion Paper No. 2585). Bonn, Germany: Institute for the Study of Labor. Retreived from the Institute for the Study of Labor (IZA), http://legacy.iza.org/en/webcontent/​publications/papers

West, S. G., & Thoemmes, F. (2010). Campbell’s and Rubin’s perspectives on causal inference. Psychological Methods, 15(1), 18-37. https://doi.org/​10.1037/​a0015917

Winship, C. & Morgan, S. L. (1999). The estimation of causal effects from observational data. Annual Review of Sociology, 25, 659-706.

Wilcox, A., & Wacholder, S. (2008). Observational data and clinical trials: Narrowing the gap? Epidemiology, 19(6), 765.

Introduction to the a debate, captured in issue 19(6) of Epidemiology, about the use of observation data to measure clinical outcomes in the context of postmenopausal hormone therapy and coronary heart disease studied through the Nurses' Health Study and the Women's Health Initiative. See references to all relevant papers on the General Public Health bibliography page.

Wu, M., & Cheng, P. W. (1999). Why causation need not follow from statistical association: Boundery conditions for the evaluation of generative and preventive causal powers. Psychological Science, 10(2), 92-97.



Recommended Reading on Causal Inference


Links to External Websites




Bibliography Help