The clinician’s guide to interpreting a regression analysis (2024)

Introduction

When researchers are conducting clinical studies to investigate factors associated with, or treatments for disease and conditions to improve patient care and clinical practice, statistical evaluation of the data is often necessary. Regression analysis is an important statistical method that is commonly used to determine the relationship between several factors and disease outcomes or to identify relevant prognostic factors for diseases [1].

This editorial will acquaint readers with the basic principles of and an approach to interpreting results from two types of regression analyses widely used in ophthalmology: linear, and logistic regression.

Linear regression analysis

Linear regression is used to quantify a linear relationship or association between a continuous response/outcome variable or dependent variable with at least one independent or explanatory variable by fitting a linear equation to observed data [1]. The variable that the equation solves for, which is the outcome or response of interest, is called the dependent variable [1]. The variable that is used to explain the value of the dependent variable is called the predictor, explanatory, or independent variable [1].

In a linear regression model, the dependent variable must be continuous (e.g. intraocular pressure or visual acuity), whereas, the independent variable may be either continuous (e.g. age), binary (e.g. sex), categorical (e.g. age-related macular degeneration stage or diabetic retinopathy severity scale score), or a combination of these [1].

When investigating the effect or association of a single independent variable on a continuous dependent variable, this type of analysis is called a simple linear regression [2]. In many circ*mstances though, a single independent variable may not be enough to adequately explain the dependent variable. Often it is necessary to control for confounders and in these situations, one can perform a multivariable linear regression to study the effect or association with multiple independent variables on the dependent variable [1, 2]. When incorporating numerous independent variables, the regression model estimates the effect or contribution of each independent variable while holding the values of all other independent variables constant [3].

When interpreting the results of a linear regression, there are a few key outputs for each independent variable included in the model:

1.
Estimated regression coefficient—The estimated regression coefficient indicates the direction and strength of the relationship or association between the independent and dependent variables [4]. Specifically, the regression coefficient describes the change in the dependent variable for each one-unit change in the independent variable, if continuous [4]. For instance, if examining the relationship between a continuous predictor variable and intra-ocular pressure (dependent variable), a regression coefficient of 2 means that for every one-unit increase in the predictor, there is a two-unit increase in intra-ocular pressure. If the independent variable is binary or categorical, then the one-unit change represents switching from one category to the reference category [4]. For instance, if examining the relationship between a binary predictor variable, such as sex, where ‘female’ is set as the reference category, and intra-ocular pressure (dependent variable), a regression coefficient of 2 means that, on average, males have an intra-ocular pressure that is 2 mm Hg higher than females.
2.
Confidence Interval (CI)—The CI, typically set at 95%, is a measure of the precision of the coefficient estimate of the independent variable [4]. A large CI indicates a low level of precision, whereas a small CI indicates a higher precision [5].
3.
P value—The p value for the regression coefficient indicates whether the relationship between the independent and dependent variables is statistically significant [6].

Logistic regression analysis

As with linear regression, logistic regression is used to estimate the association between one or more independent variables with a dependent variable [7]. However, the distinguishing feature in logistic regression is that the dependent variable (outcome) must be binary (or dichotomous), meaning that the variable can only take two different values or levels, such as ‘1 versus 0’ or ‘yes versus no’ [2, 7]. The effect size of predictor variables on the dependent variable is best explained using an odds ratio (OR) [2]. ORs are used to compare the relative odds of the occurrence of the outcome of interest, given exposure to the variable of interest [5]. An OR equal to 1 means that the odds of the event in one group are the same as the odds of the event in another group; there is no difference [8]. An OR > 1 implies that one group has a higher odds of having the event compared with the reference group, whereas an OR < 1 means that one group has a lower odds of having an event compared with the reference group [8]. When interpreting the results of a logistic regression, the key outputs include the OR, CI, and p-value for each independent variable included in the model.

Clinical example

Sen et al. investigated the association between several factors (independent variables) and visual acuity outcomes (dependent variable) in patients receiving anti-vascular endothelial growth factor therapy for macular oedema (DMO) by means of both linear and logistic regression [9]. Multivariable linear regression demonstrated that age (Estimate −0.33, 95% CI − 0.48 to −0.19, p < 0.001) was significantly associated with best-corrected visual acuity (BCVA) at 100 weeks at alpha = 0.05 significance level [9]. The regression coefficient of −0.33 means that the BCVA at 100 weeks decreases by 0.33 with each additional year of older age.

Multivariable logistic regression also demonstrated that age and ellipsoid zone status were statistically significant associated with achieving a BCVA letter score >70 letters at 100 weeks at the alpha = 0.05 significance level. Patients ≥75 years of age were at a decreased odds of achieving a BCVA letter score >70 letters at 100 weeks compared to those <50 years of age, since the OR is less than 1 (OR 0.96, 95% CI 0.94 to 0.98, p = 0.001) [9]. Similarly, patients between the ages of 50–74 years were also at a decreased odds of achieving a BCVA letter score >70 letters at 100 weeks compared to those <50 years of age, since the OR is less than 1 (OR 0.15, 95% CI 0.04 to 0.48, p = 0.001) [9]. As well, those with a not intact ellipsoid zone were at a decreased odds of achieving a BCVA letter score >70 letters at 100 weeks compared to those with an intact ellipsoid zone (OR 0.20, 95% CI 0.07 to 0.56; p = 0.002). On the other hand, patients with an ungradable/questionable ellipsoid zone were at an increased odds of achieving a BCVA letter score >70 letters at 100 weeks compared to those with an intact ellipsoid zone, since the OR is greater than 1 (OR 2.26, 95% CI 1.14 to 4.48; p = 0.02) [9].

The narrower the CI, the more precise the estimate is; and the smaller the p value (relative to alpha = 0.05), the greater the evidence against the null hypothesis of no effect or association.

Conclusion

Simply put, linear and logistic regression are useful tools for appreciating the relationship between predictor/explanatory and outcome variables for continuous and dichotomous outcomes, respectively, that can be applied in clinical practice, such as to gain an understanding of risk factors associated with a disease of interest.

References

Schneider A, Hommel G, Blettner M. Linear Regression. Anal Dtsch Ärztebl Int. 2010;107:776–82.
Google Scholar
Bender R. Introduction to the use of regression models in epidemiology. In: Verma M, editor. Cancer epidemiology. Methods in molecular biology. Humana Press; 2009:179–95.
Schober P, Vetter TR. Confounding in observational research. Anesth Analg. 2020;130:635.
Article Google Scholar
Schober P, Vetter TR. Linear regression in medical research. Anesth Analg. 2021;132:108–9.
Article Google Scholar
Szumilas M. Explaining odds ratios. J Can Acad Child Adolesc Psychiatry. 2010;19:227–9.
Article Google Scholar
See Also
Health Coverage and Your Taxes | Covered California™How does the tax exclusion for employer-sponsored health insurance work?Linear regression - still a Queen?How to do (or not to do) … a health financing incidence analysis
Thiese MS, Ronna B, Ott U. P value interpretations and considerations. J Thorac Dis. 2016;8:E928–31.
Article Google Scholar
Schober P, Vetter TR. Logistic regression in medical research. Anesth Analg. 2021;132:365–6.
Article Google Scholar
Zabor EC, Reddy CA, Tendulkar RD, Patil S. Logistic regression in clinical studies. Int J Radiat Oncol Biol Phys. 2022;112:271–7.
Article Google Scholar
Sen P, Gurudas S, Ramu J, Patrao N, Chandra S, Rasheed R, et al. Predictors of visual acuity outcomes after anti-vascular endothelial growth factor treatment for macular edema secondary to central retinal vein occlusion. Ophthalmol Retin. 2021;5:1115–24.
Article Google Scholar

Download references

R.E.T.I.N.A. study group

Varun Chaudhary^1,2, Mohit Bhandari^1,2, Charles C. Wykoff^5,6, Sobha Sivaprasad⁸, Lehana Thabane^2,7, Peter Kaiser⁹, David Sarraf¹⁰, Sophie J. Bakri¹¹, Sunir J. Garg¹², Rishi P. Singh^13,14, Frank G. Holz¹⁵, Tien Y. Wong^16,17, and Robyn H. Guymer^3,4

Author information

Authors and Affiliations

Department of Surgery, McMaster University, Hamilton, ON, Canada
Sofia Bzovsky,Mohit Bhandari&Varun Chaudhary
Department of Health Research Methods, Evidence & Impact, McMaster University, Hamilton, ON, Canada
Mark R. Phillips,Lehana Thabane,Mohit Bhandari&Varun Chaudhary
Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC, Australia
Robyn H. Guymer
Department of Surgery, (Ophthalmology), The University of Melbourne, Melbourne, VIC, Australia
Robyn H. Guymer
Retina Consultants of Texas (Retina Consultants of America), Houston, TX, USA
Charles C. Wykoff
Blanton Eye Institute, Houston Methodist Hospital, Houston, TX, USA
Charles C. Wykoff
Biostatistics Unit, St. Joseph’s Healthcare Hamilton, Hamilton, ON, Canada
Lehana Thabane
NIHR Moorfields Biomedical Research Centre, Moorfields Eye Hospital, London, UK
Sobha Sivaprasad
Cole Eye Institute, Cleveland Clinic, Cleveland, OH, USA
Peter Kaiser
Retinal Disorders and Ophthalmic Genetics, Stein Eye Institute, University of California, Los Angeles, CA, USA
David Sarraf
Department of Ophthalmology, Mayo Clinic, Rochester, MN, USA
Sophie J. Bakri
The Retina Service at Wills Eye Hospital, Philadelphia, PA, USA
See Also
Is a Progressive Tax More Fair Than a Flat Tax?
Sunir J. Garg
Center for Ophthalmic Bioinformatics, Cole Eye Institute, Cleveland Clinic, Cleveland, OH, USA
Rishi P. Singh
Cleveland Clinic Lerner College of Medicine, Cleveland, OH, USA
Rishi P. Singh
Department of Ophthalmology, University of Bonn, Bonn, Germany
Frank G. Holz
Singapore Eye Research Institute, Singapore, Singapore
Tien Y. Wong
Singapore National Eye Centre, Duke-NUD Medical School, Singapore, Singapore
Tien Y. Wong

Authors

Sofia Bzovsky
View author publications
You can also search for this author in PubMedGoogle Scholar
Mark R. Phillips
View author publications
You can also search for this author in PubMedGoogle Scholar
Robyn H. Guymer
View author publications
You can also search for this author in PubMedGoogle Scholar
Charles C. Wykoff
View author publications
You can also search for this author in PubMedGoogle Scholar
Lehana Thabane
View author publications
You can also search for this author in PubMedGoogle Scholar
Mohit Bhandari
View author publications
You can also search for this author in PubMedGoogle Scholar
Varun Chaudhary
View author publications
You can also search for this author in PubMedGoogle Scholar

Consortia

on behalf of the R.E.T.I.N.A. study group

Varun Chaudhary
,Mohit Bhandari
,Charles C. Wykoff
,Sobha Sivaprasad
,Lehana Thabane
,Peter Kaiser
,David Sarraf
,Sophie J. Bakri
,Sunir J. Garg
,Rishi P. Singh
,Frank G. Holz
,Tien Y. Wong
&Robyn H. Guymer

Contributions

SB was responsible for writing, critical review and feedback on manuscript. MRP was responsible for conception of idea, critical review and feedback on manuscript. RHG was responsible for critical review and feedback on manuscript. CCW was responsible for critical review and feedback on manuscript. LT was responsible for critical review and feedback on manuscript. MB was responsible for conception of idea, critical review and feedback on manuscript. VC was responsible for conception of idea, critical review and feedback on manuscript.

Corresponding author

Correspondence to Varun Chaudhary.

Ethics declarations

Competing interests

SB: Nothing to disclose. MRP: Nothing to disclose. RHG: Advisory boards: Bayer, Novartis, Apellis, Roche, Genentech Inc.—unrelated to this study. CCW: Consultant: Acuela, Adverum Biotechnologies, Inc, Aerpio, Alimera Sciences, Allegro Ophthalmics, LLC, Allergan, Apellis Pharmaceuticals, Bayer AG, Chengdu Kanghong Pharmaceuticals Group Co, Ltd, Clearside Biomedical, DORC (Dutch Ophthalmic Research Center), EyePoint Pharmaceuticals, Gentech/Roche, GyroscopeTx, IVERIC bio, Kodiak Sciences Inc, Novartis AG, ONL Therapeutics, Oxurion NV, PolyPhotonix, Recens Medical, Regeron Pharmaceuticals, Inc, REGENXBIO Inc, Santen Pharmaceutical Co, Ltd, and Takeda Pharmaceutical Company Limited; Research funds: Adverum Biotechnologies, Inc, Aerie Pharmaceuticals, Inc, Aerpio, Alimera Sciences, Allergan, Apellis Pharmaceuticals, Chengdu Kanghong Pharmaceutical Group Co, Ltd, Clearside Biomedical, Gemini Therapeutics, Genentech/Roche, Graybug Vision, Inc, GyroscopeTx, Ionis Pharmaceuticals, IVERIC bio, Kodiak Sciences Inc, Neurotech LLC, Novartis AG, Opthea, Outlook Therapeutics, Inc, Recens Medical, Regeneron Pharmaceuticals, Inc, REGENXBIO Inc, Samsung Pharm Co, Ltd, Santen Pharmaceutical Co, Ltd, and Xbrane Biopharma AB—unrelated to this study. LT: Nothing to disclose. MB: Research funds: Pendopharm, Bioventus, Acumed—unrelated to this study. VC: Advisory Board Member: Alcon, Roche, Bayer, Novartis; Grants: Bayer, Novartis—unrelated to this study.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bzovsky, S., Phillips, M.R., Guymer, R.H. et al. The clinician’s guide to interpreting a regression analysis. Eye 36, 1715–1717 (2022). https://doi.org/10.1038/s41433-022-01949-z

Download citation

Received: 08 January 2022
Revised: 17 January 2022
Accepted: 18 January 2022
Published: 31 January 2022
Issue Date: September 2022
DOI: https://doi.org/10.1038/s41433-022-01949-z

The clinician’s guide to interpreting a regression analysis (2024)

FAQs

How do you interpret the value of a regression analysis? ›

Interpreting Linear Regression Coefficients

A positive coefficient indicates that as the value of the independent variable increases, the mean of the dependent variable also tends to increase. A negative coefficient suggests that as the independent variable increases, the dependent variable tends to decrease.

Get More Info ›

How can you determine if a regression model is good enough? ›

The best way to take a look at a regression data is by plotting the predicted values against the real values in the holdout set. In a perfect condition, we expect that the points lie on the 45 degrees line passing through the origin (y = x is the equation). The nearer the points to this line, the better the regression.

Keep Reading ›

What question does regression analysis answer? ›

Multiple Linear Regression Analysis helps answer three key types of questions: (1) identifying causes, (2) predicting effects, and (3) forecasting trends. Identifying Causes: It determines the cause-and-effect relationships between one continuous dependent variable and two or more independent variables.

Learn More Now ›

What is a good R2 value for regression? ›

What qualifies as a “good” R-squared value will depend on the context. In some fields, such as the social sciences, even a relatively low R-squared value, such as 0.5, could be considered relatively strong. In other fields, the standards for a good R-squared reading can be much higher, such as 0.9 or above.

Find Out More ›

How to interpret regression test results? ›

The first step in interpreting regression analysis results is to check how well the model fits the data. This means evaluating how closely the predicted values match the observed values, and how much of the variation in the dependent variable is explained by the independent variables.

Explore More ›

How do you analyze regression analysis? ›

Linear Regression Analysis consists of more than just fitting a linear line through a cloud of data points. It consists of 3 stages – (1) analyzing the correlation and directionality of the data, (2) estimating the model, i.e., fitting the line, and (3) evaluating the validity and usefulness of the model.

Read The Full Story ›

How can you determine if a regression model is good enough quizlet? ›

Regression lines will be very misleading if your data isn't approximately linear. The best way to check this condition is to make a scatter plot of your data. If the data looks like it can roughly fit a line, you can perform regression.

Keep Reading ›

What is an acceptable regression value? ›

Estimating the multivariate regression model using the data set below and using the ordinary least square regression method yields an of R-squared of 0.106. A model with a R-squared that is between 0.10 and 0.50 is good provided that some or most of the explanatory variables are statistically significant.

Keep Reading ›

What is a good regression result? ›

Statisticians say that a regression model fits the data well if the differences between the observations and the predicted values are small and unbiased. Unbiased in this context means that the fitted values are not systematically too high or too low anywhere in the observation space.

Tell Me More ›

What are the two main points of regression analysis? ›

Typically, a regression analysis is done for one of two purposes: In order to predict the value of the dependent variable for individuals for whom some information concerning the explanatory variables is available, or in order to estimate the effect of some explanatory variable on the dependent variable.

Explore More ›

How to know if linear regression is appropriate? ›

If a linear model is appropriate, the histogram should look approximately normal and the scatterplot of residuals should show random scatter . If we see a curved relationship in the residual plot, the linear model is not appropriate. Another type of residual plot shows the residuals versus the explanatory variable.

Keep Reading ›

What regression analysis tells us? ›

Regression analysis is a set of statistical methods used for the estimation of relationships between a dependent variable and one or more independent variables. It can be utilized to assess the strength of the relationship between variables and for modeling the future relationship between them.

Know More ›

What is a good p value in regression? ›

Hypothesis Testing and P-value

The P-value is used for this conclusion. A common threshold of the P-value is 0.05. Note: A P-value of 0.05 means that 5% of the times, we will falsely reject the null hypothesis. It means that we accept that 5% of the times, we might falsely have concluded a relationship.

Get More Info Here ›

What does R-squared 0.2 mean? ›

There, an R-squared of 0.2, or 20% of the variability explained by the model, would be fantastic. It depends on the complexity of the topic and how many variables are believed to be in play.

View Details ›

How do you interpret the R2 value? ›

The lowest R-squared is 0 and means that the points are not explained by the regression whereas the highest R-squared is 1 and means that all the points are explained by the regression line. For example, an R-squared of . 85 means that the regression explains 85% of the variation in our y-variable.

Read The Full Story ›

How do you interpret the meaning of the regression coefficients? ›

Interpreting the Regression Coefficients

The regression coefficients are interpreted as the effect of each variable on page costs, if all of the other explanatory variables are held constant. This is often “adjusting for” or “controlling for” the other explanatory variables.

Show Me More ›

What do the values in the regression equation mean? ›

The simple linear regression line, ^y=a+bx y ^ = a + b x , can be interpreted as follows: ^y is the predicted value of y , a is the intercept and predicts where the regression line will cross the y -axis, b predicts the change in y for every unit change in x .