The clinician’s guide to interpreting a regression analysis (2024)

Introduction

When researchers are conducting clinical studies to investigate factors associated with, or treatments for disease and conditions to improve patient care and clinical practice, statistical evaluation of the data is often necessary. Regression analysis is an important statistical method that is commonly used to determine the relationship between several factors and disease outcomes or to identify relevant prognostic factors for diseases [1].

This editorial will acquaint readers with the basic principles of and an approach to interpreting results from two types of regression analyses widely used in ophthalmology: linear, and logistic regression.

Linear regression analysis

Linear regression is used to quantify a linear relationship or association between a continuous response/outcome variable or dependent variable with at least one independent or explanatory variable by fitting a linear equation to observed data [1]. The variable that the equation solves for, which is the outcome or response of interest, is called the dependent variable [1]. The variable that is used to explain the value of the dependent variable is called the predictor, explanatory, or independent variable [1].

In a linear regression model, the dependent variable must be continuous (e.g. intraocular pressure or visual acuity), whereas, the independent variable may be either continuous (e.g. age), binary (e.g. sex), categorical (e.g. age-related macular degeneration stage or diabetic retinopathy severity scale score), or a combination of these [1].

When investigating the effect or association of a single independent variable on a continuous dependent variable, this type of analysis is called a simple linear regression [2]. In many circ*mstances though, a single independent variable may not be enough to adequately explain the dependent variable. Often it is necessary to control for confounders and in these situations, one can perform a multivariable linear regression to study the effect or association with multiple independent variables on the dependent variable [1, 2]. When incorporating numerous independent variables, the regression model estimates the effect or contribution of each independent variable while holding the values of all other independent variables constant [3].

When interpreting the results of a linear regression, there are a few key outputs for each independent variable included in the model:

  1. 1.

    Estimated regression coefficient—The estimated regression coefficient indicates the direction and strength of the relationship or association between the independent and dependent variables [4]. Specifically, the regression coefficient describes the change in the dependent variable for each one-unit change in the independent variable, if continuous [4]. For instance, if examining the relationship between a continuous predictor variable and intra-ocular pressure (dependent variable), a regression coefficient of 2 means that for every one-unit increase in the predictor, there is a two-unit increase in intra-ocular pressure. If the independent variable is binary or categorical, then the one-unit change represents switching from one category to the reference category [4]. For instance, if examining the relationship between a binary predictor variable, such as sex, where ‘female’ is set as the reference category, and intra-ocular pressure (dependent variable), a regression coefficient of 2 means that, on average, males have an intra-ocular pressure that is 2 mm Hg higher than females.

  2. 2.

    Confidence Interval (CI)—The CI, typically set at 95%, is a measure of the precision of the coefficient estimate of the independent variable [4]. A large CI indicates a low level of precision, whereas a small CI indicates a higher precision [5].

  3. 3.

    P value—The p value for the regression coefficient indicates whether the relationship between the independent and dependent variables is statistically significant [6].

Logistic regression analysis

As with linear regression, logistic regression is used to estimate the association between one or more independent variables with a dependent variable [7]. However, the distinguishing feature in logistic regression is that the dependent variable (outcome) must be binary (or dichotomous), meaning that the variable can only take two different values or levels, such as ‘1 versus 0’ or ‘yes versus no’ [2, 7]. The effect size of predictor variables on the dependent variable is best explained using an odds ratio (OR) [2]. ORs are used to compare the relative odds of the occurrence of the outcome of interest, given exposure to the variable of interest [5]. An OR equal to 1 means that the odds of the event in one group are the same as the odds of the event in another group; there is no difference [8]. An OR > 1 implies that one group has a higher odds of having the event compared with the reference group, whereas an OR < 1 means that one group has a lower odds of having an event compared with the reference group [8]. When interpreting the results of a logistic regression, the key outputs include the OR, CI, and p-value for each independent variable included in the model.

Clinical example

Sen et al. investigated the association between several factors (independent variables) and visual acuity outcomes (dependent variable) in patients receiving anti-vascular endothelial growth factor therapy for macular oedema (DMO) by means of both linear and logistic regression [9]. Multivariable linear regression demonstrated that age (Estimate −0.33, 95% CI − 0.48 to −0.19, p < 0.001) was significantly associated with best-corrected visual acuity (BCVA) at 100 weeks at alpha = 0.05 significance level [9]. The regression coefficient of −0.33 means that the BCVA at 100 weeks decreases by 0.33 with each additional year of older age.

Multivariable logistic regression also demonstrated that age and ellipsoid zone status were statistically significant associated with achieving a BCVA letter score >70 letters at 100 weeks at the alpha = 0.05 significance level. Patients ≥75 years of age were at a decreased odds of achieving a BCVA letter score >70 letters at 100 weeks compared to those <50 years of age, since the OR is less than 1 (OR 0.96, 95% CI 0.94 to 0.98, p = 0.001) [9]. Similarly, patients between the ages of 50–74 years were also at a decreased odds of achieving a BCVA letter score >70 letters at 100 weeks compared to those <50 years of age, since the OR is less than 1 (OR 0.15, 95% CI 0.04 to 0.48, p = 0.001) [9]. As well, those with a not intact ellipsoid zone were at a decreased odds of achieving a BCVA letter score >70 letters at 100 weeks compared to those with an intact ellipsoid zone (OR 0.20, 95% CI 0.07 to 0.56; p = 0.002). On the other hand, patients with an ungradable/questionable ellipsoid zone were at an increased odds of achieving a BCVA letter score >70 letters at 100 weeks compared to those with an intact ellipsoid zone, since the OR is greater than 1 (OR 2.26, 95% CI 1.14 to 4.48; p = 0.02) [9].

The narrower the CI, the more precise the estimate is; and the smaller the p value (relative to alpha = 0.05), the greater the evidence against the null hypothesis of no effect or association.

Conclusion

Simply put, linear and logistic regression are useful tools for appreciating the relationship between predictor/explanatory and outcome variables for continuous and dichotomous outcomes, respectively, that can be applied in clinical practice, such as to gain an understanding of risk factors associated with a disease of interest.

References

  1. Schneider A, Hommel G, Blettner M. Linear Regression. Anal Dtsch Ärztebl Int. 2010;107:776–82.

    Google Scholar

  2. Bender R. Introduction to the use of regression models in epidemiology. In: Verma M, editor. Cancer epidemiology. Methods in molecular biology. Humana Press; 2009:179–95.

  3. Schober P, Vetter TR. Confounding in observational research. Anesth Analg. 2020;130:635.

    Article Google Scholar

  4. Schober P, Vetter TR. Linear regression in medical research. Anesth Analg. 2021;132:108–9.

    Article Google Scholar

  5. Szumilas M. Explaining odds ratios. J Can Acad Child Adolesc Psychiatry. 2010;19:227–9.

    Article Google Scholar

  6. Thiese MS, Ronna B, Ott U. P value interpretations and considerations. J Thorac Dis. 2016;8:E928–31.

    Article Google Scholar

  7. Schober P, Vetter TR. Logistic regression in medical research. Anesth Analg. 2021;132:365–6.

    Article Google Scholar

  8. Zabor EC, Reddy CA, Tendulkar RD, Patil S. Logistic regression in clinical studies. Int J Radiat Oncol Biol Phys. 2022;112:271–7.

    Article Google Scholar

  9. Sen P, Gurudas S, Ramu J, Patrao N, Chandra S, Rasheed R, et al. Predictors of visual acuity outcomes after anti-vascular endothelial growth factor treatment for macular edema secondary to central retinal vein occlusion. Ophthalmol Retin. 2021;5:1115–24.

    Article Google Scholar

Download references

R.E.T.I.N.A. study group

Varun Chaudhary1,2, Mohit Bhandari1,2, Charles C. Wykoff5,6, Sobha Sivaprasad8, Lehana Thabane2,7, Peter Kaiser9, David Sarraf10, Sophie J. Bakri11, Sunir J. Garg12, Rishi P. Singh13,14, Frank G. Holz15, Tien Y. Wong16,17, and Robyn H. Guymer3,4

Author information

Authors and Affiliations

  1. Department of Surgery, McMaster University, Hamilton, ON, Canada

    Sofia Bzovsky,Mohit Bhandari&Varun Chaudhary

  2. Department of Health Research Methods, Evidence & Impact, McMaster University, Hamilton, ON, Canada

    Mark R. Phillips,Lehana Thabane,Mohit Bhandari&Varun Chaudhary

  3. Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC, Australia

    Robyn H. Guymer

  4. Department of Surgery, (Ophthalmology), The University of Melbourne, Melbourne, VIC, Australia

    Robyn H. Guymer

  5. Retina Consultants of Texas (Retina Consultants of America), Houston, TX, USA

    Charles C. Wykoff

  6. Blanton Eye Institute, Houston Methodist Hospital, Houston, TX, USA

    Charles C. Wykoff

  7. Biostatistics Unit, St. Joseph’s Healthcare Hamilton, Hamilton, ON, Canada

    Lehana Thabane

  8. NIHR Moorfields Biomedical Research Centre, Moorfields Eye Hospital, London, UK

    Sobha Sivaprasad

  9. Cole Eye Institute, Cleveland Clinic, Cleveland, OH, USA

    Peter Kaiser

  10. Retinal Disorders and Ophthalmic Genetics, Stein Eye Institute, University of California, Los Angeles, CA, USA

    David Sarraf

  11. Department of Ophthalmology, Mayo Clinic, Rochester, MN, USA

    Sophie J. Bakri

  12. The Retina Service at Wills Eye Hospital, Philadelphia, PA, USA

    Sunir J. Garg

  13. Center for Ophthalmic Bioinformatics, Cole Eye Institute, Cleveland Clinic, Cleveland, OH, USA

    Rishi P. Singh

  14. Cleveland Clinic Lerner College of Medicine, Cleveland, OH, USA

    Rishi P. Singh

  15. Department of Ophthalmology, University of Bonn, Bonn, Germany

    Frank G. Holz

  16. Singapore Eye Research Institute, Singapore, Singapore

    Tien Y. Wong

  17. Singapore National Eye Centre, Duke-NUD Medical School, Singapore, Singapore

    Tien Y. Wong

Authors

  1. Sofia Bzovsky

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  2. Mark R. Phillips

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  3. Robyn H. Guymer

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  4. Charles C. Wykoff

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  5. Lehana Thabane

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  6. Mohit Bhandari

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  7. Varun Chaudhary

    View author publications

    You can also search for this author in PubMedGoogle Scholar

Consortia

on behalf of the R.E.T.I.N.A. study group

  • Varun Chaudhary
  • ,Mohit Bhandari
  • ,Charles C. Wykoff
  • ,Sobha Sivaprasad
  • ,Lehana Thabane
  • ,Peter Kaiser
  • ,David Sarraf
  • ,Sophie J. Bakri
  • ,Sunir J. Garg
  • ,Rishi P. Singh
  • ,Frank G. Holz
  • ,Tien Y. Wong
  • &Robyn H. Guymer

Contributions

SB was responsible for writing, critical review and feedback on manuscript. MRP was responsible for conception of idea, critical review and feedback on manuscript. RHG was responsible for critical review and feedback on manuscript. CCW was responsible for critical review and feedback on manuscript. LT was responsible for critical review and feedback on manuscript. MB was responsible for conception of idea, critical review and feedback on manuscript. VC was responsible for conception of idea, critical review and feedback on manuscript.

Corresponding author

Correspondence to Varun Chaudhary.

Ethics declarations

Competing interests

SB: Nothing to disclose. MRP: Nothing to disclose. RHG: Advisory boards: Bayer, Novartis, Apellis, Roche, Genentech Inc.—unrelated to this study. CCW: Consultant: Acuela, Adverum Biotechnologies, Inc, Aerpio, Alimera Sciences, Allegro Ophthalmics, LLC, Allergan, Apellis Pharmaceuticals, Bayer AG, Chengdu Kanghong Pharmaceuticals Group Co, Ltd, Clearside Biomedical, DORC (Dutch Ophthalmic Research Center), EyePoint Pharmaceuticals, Gentech/Roche, GyroscopeTx, IVERIC bio, Kodiak Sciences Inc, Novartis AG, ONL Therapeutics, Oxurion NV, PolyPhotonix, Recens Medical, Regeron Pharmaceuticals, Inc, REGENXBIO Inc, Santen Pharmaceutical Co, Ltd, and Takeda Pharmaceutical Company Limited; Research funds: Adverum Biotechnologies, Inc, Aerie Pharmaceuticals, Inc, Aerpio, Alimera Sciences, Allergan, Apellis Pharmaceuticals, Chengdu Kanghong Pharmaceutical Group Co, Ltd, Clearside Biomedical, Gemini Therapeutics, Genentech/Roche, Graybug Vision, Inc, GyroscopeTx, Ionis Pharmaceuticals, IVERIC bio, Kodiak Sciences Inc, Neurotech LLC, Novartis AG, Opthea, Outlook Therapeutics, Inc, Recens Medical, Regeneron Pharmaceuticals, Inc, REGENXBIO Inc, Samsung Pharm Co, Ltd, Santen Pharmaceutical Co, Ltd, and Xbrane Biopharma AB—unrelated to this study. LT: Nothing to disclose. MB: Research funds: Pendopharm, Bioventus, Acumed—unrelated to this study. VC: Advisory Board Member: Alcon, Roche, Bayer, Novartis; Grants: Bayer, Novartis—unrelated to this study.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

About this article

The clinician’s guide to interpreting a regression analysis (1)

Cite this article

Bzovsky, S., Phillips, M.R., Guymer, R.H. et al. The clinician’s guide to interpreting a regression analysis. Eye 36, 1715–1717 (2022). https://doi.org/10.1038/s41433-022-01949-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41433-022-01949-z

The clinician’s guide to interpreting a regression analysis (2024)

FAQs

How do you interpret the value of a regression analysis? ›

Interpreting Linear Regression Coefficients

A positive coefficient indicates that as the value of the independent variable increases, the mean of the dependent variable also tends to increase. A negative coefficient suggests that as the independent variable increases, the dependent variable tends to decrease.

How can you determine if a regression model is good enough? ›

The best way to take a look at a regression data is by plotting the predicted values against the real values in the holdout set. In a perfect condition, we expect that the points lie on the 45 degrees line passing through the origin (y = x is the equation). The nearer the points to this line, the better the regression.

What question does regression analysis answer? ›

Multiple Linear Regression Analysis helps answer three key types of questions: (1) identifying causes, (2) predicting effects, and (3) forecasting trends. Identifying Causes: It determines the cause-and-effect relationships between one continuous dependent variable and two or more independent variables.

What is a good R2 value for regression? ›

What qualifies as a “good” R-squared value will depend on the context. In some fields, such as the social sciences, even a relatively low R-squared value, such as 0.5, could be considered relatively strong. In other fields, the standards for a good R-squared reading can be much higher, such as 0.9 or above.

How to interpret regression test results? ›

The first step in interpreting regression analysis results is to check how well the model fits the data. This means evaluating how closely the predicted values match the observed values, and how much of the variation in the dependent variable is explained by the independent variables.

How do you analyze regression analysis? ›

Linear Regression Analysis consists of more than just fitting a linear line through a cloud of data points. It consists of 3 stages – (1) analyzing the correlation and directionality of the data, (2) estimating the model, i.e., fitting the line, and (3) evaluating the validity and usefulness of the model.

How can you determine if a regression model is good enough quizlet? ›

Regression lines will be very misleading if your data isn't approximately linear. The best way to check this condition is to make a scatter plot of your data. If the data looks like it can roughly fit a line, you can perform regression.

What is an acceptable regression value? ›

Estimating the multivariate regression model using the data set below and using the ordinary least square regression method yields an of R-squared of 0.106. A model with a R-squared that is between 0.10 and 0.50 is good provided that some or most of the explanatory variables are statistically significant.

What is a good regression result? ›

Statisticians say that a regression model fits the data well if the differences between the observations and the predicted values are small and unbiased. Unbiased in this context means that the fitted values are not systematically too high or too low anywhere in the observation space.

What are the two main points of regression analysis? ›

Typically, a regression analysis is done for one of two purposes: In order to predict the value of the dependent variable for individuals for whom some information concerning the explanatory variables is available, or in order to estimate the effect of some explanatory variable on the dependent variable.

How to know if linear regression is appropriate? ›

If a linear model is appropriate, the histogram should look approximately normal and the scatterplot of residuals should show random scatter . If we see a curved relationship in the residual plot, the linear model is not appropriate. Another type of residual plot shows the residuals versus the explanatory variable.

What regression analysis tells us? ›

Regression analysis is a set of statistical methods used for the estimation of relationships between a dependent variable and one or more independent variables. It can be utilized to assess the strength of the relationship between variables and for modeling the future relationship between them.

What is a good p value in regression? ›

Hypothesis Testing and P-value

The P-value is used for this conclusion. A common threshold of the P-value is 0.05. Note: A P-value of 0.05 means that 5% of the times, we will falsely reject the null hypothesis. It means that we accept that 5% of the times, we might falsely have concluded a relationship.

What does R-squared 0.2 mean? ›

There, an R-squared of 0.2, or 20% of the variability explained by the model, would be fantastic. It depends on the complexity of the topic and how many variables are believed to be in play.

How do you interpret the R2 value? ›

The lowest R-squared is 0 and means that the points are not explained by the regression whereas the highest R-squared is 1 and means that all the points are explained by the regression line. For example, an R-squared of . 85 means that the regression explains 85% of the variation in our y-variable.

How do you interpret the meaning of the regression coefficients? ›

Interpreting the Regression Coefficients

The regression coefficients are interpreted as the effect of each variable on page costs, if all of the other explanatory variables are held constant. This is often “adjusting for” or “controlling for” the other explanatory variables.

What do the values in the regression equation mean? ›

The simple linear regression line, ^y=a+bx y ^ = a + b x , can be interpreted as follows: ^y is the predicted value of y , a is the intercept and predicts where the regression line will cross the y -axis, b predicts the change in y for every unit change in x .

How do you interpret the T value in a regression model? ›

The greater the magnitude of T, the greater the evidence against the null hypothesis. This means there is greater evidence that there is a significant difference. The closer T is to 0, the more likely there isn't a significant difference.

How to interpret a regression line? ›

The slope of the regression line quantifies the change in the response variable for a one-unit change in the predictor variable. A positive slope indicates a positive relationship between the variables, meaning that as the predictor variable increases, the response variable also tends to increase.

Top Articles
Latest Posts
Article information

Author: Eusebia Nader

Last Updated:

Views: 6089

Rating: 5 / 5 (80 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Eusebia Nader

Birthday: 1994-11-11

Address: Apt. 721 977 Ebert Meadows, Jereville, GA 73618-6603

Phone: +2316203969400

Job: International Farming Consultant

Hobby: Reading, Photography, Shooting, Singing, Magic, Kayaking, Mushroom hunting

Introduction: My name is Eusebia Nader, I am a encouraging, brainy, lively, nice, famous, healthy, clever person who loves writing and wants to share my knowledge and understanding with you.