Tehran University of Medical Sciences

Science Communicator Platform

Stay connected! Follow us on X network (Twitter):
Share this content! On (X network) By
Separation in Logistic Regression: Causes, Consequences, and Control Publisher Pubmed



Mansournia MA1 ; Geroldinger A2 ; Greenland S3, 4 ; Heinze G2
Authors
Show Affiliations
Authors Affiliations
  1. 1. Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran
  2. 2. Section for Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Austria Spitalgasse 23, Vienna, 1090, Austria
  3. 3. Department of Epidemiology, Fielding School of Public Health, University of California Los Angeles, Los Angeles, CA, United States
  4. 4. Department of Statistics, University of California Los Angeles, Los Angeles, CA, United States
  5. 5. Section for Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Austria

Source: American Journal of Epidemiology Published:2018


Abstract

Separation is encountered in regression models with a discrete outcome (such as logistic regression) where the covariates perfectly predict the outcome. It is most frequent under the same conditions that lead to small-sample and sparse-data bias such as presence of a rare outcome, rare exposures, highly correlated covariates, or covariates with strong effects. In theory, separation will produce infinite estimates for some coefficients. In practice, however, separation may be unnoticed or mishandled because of software limits in recognizing and handling the problem and in notifying the user. We discuss causes of separation in logistic regression and describe how common software packages deal with it. We then describe methods that remove separation, focusing on the same penalized-likelihood techniques used to address more general sparse-data problems. These methods improve accuracy, avoid software problems, and allow interpretation as Bayesian analyses with weakly informative priors. We discuss likelihood penalties, including some that can be implemented easily with any software package, and their relative advantages and disadvantages. We provide an illustration of ideas and methods using data from a case-control study of contraceptive practices and urinary tract infection. © The Author(s) 2018. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved.
Other Related Docs
12. Authors’ Reply, Climacteric (2018)
24. Is Irradiation Significantly Associated With a Higher Risk for Cvd?, European Archives of Oto-Rhino-Laryngology (2020)