



RESEARCH SERIES NO. 25 

Year : 2022  Volume
: 23
 Issue : 1  Page : 2428 

Reorientation and simple understanding of regression analysis for student nurses: When and why to use
Anindita Mandal^{1}, Suresh K Sharma^{2}
^{1} PhD Scholar, College of Nursing, All India Institute of Medical Sciences, Rishikesh, Uttarakhand, India ^{2} Professor & Principal, College of Nursing, All India Institute of Medical Sciences, Jodhpur, Rajasthan, India
Date of Submission  18Jul2021 
Date of Decision  23Mar2022 
Date of Acceptance  24Mar2022 
Date of Web Publication  28Apr2022 
Correspondence Address: Ms. Anindita Mandal College of Nursing, All India Institute of Medical Sciences, Rishikesh  249 203, Uttarakhand India
Source of Support: None, Conflict of Interest: None
DOI: 10.4103/ijcn.ijcn_65_21
With the advancement in science and technology along with growing demand for emergency medical care, estimation of some unknown factors of interest has become vital in biomedical science and healthcare. Prediction based on a known variable could be applied in making decisions, correcting an error, preventing any risk, or for some achievement. It can be an estimation of needed items and devices; recovery time of a patient, profit or loss of the organization, etc. One of the requiring statistics for this purpose is regression analysis which can identify the influence of one variable on another and calculate the unknown prediction from a familiar one. A present narrative review on regression analysis has been structured by reviewing Eliterature, textbooks, unpublished information and practical experience of authors. This article would be useful to the researchers to be updated with more practical information and a needful application of the same.
Keywords: Linear regression, logistic regression, multiple regression, regression analysis
How to cite this article: Mandal A, Sharma SK. Reorientation and simple understanding of regression analysis for student nurses: When and why to use. Indian J Cont Nsg Edn 2022;23:248 
How to cite this URL: Mandal A, Sharma SK. Reorientation and simple understanding of regression analysis for student nurses: When and why to use. Indian J Cont Nsg Edn [serial online] 2022 [cited 2022 Dec 7];23:248. Available from: https://www.ijcne.org/text.asp?2022/23/1/24/344275 
Introduction   
Regression analysis is a strong quantitative measurement in statistics that inspect the relationship between two or more variables of interest^{[1]} and especially determine the influence of one or more independent variables on a dependent variable, for example, daily physical exercise and fatfree diet have a strong influence on serum cholesterol of a diabetic patient. The particular statistical test estimate or predict the unknown value of one variable from the known value of the related variable,^{[2]} prediction about the level of nursing care in intensive care unit (ICU) can be made from counting the number of patients having bedsore, the incidence of fall, missed medicine, developing thrombophlebitis among patients, etc. It can identify the relevant risk factors and calculate risk scores for every prediction. Regression analysis was a popular method of statistics in finance, marketing, investment and other disciplines earlier but now, there is a growing application of it in health and biosciences.
Purpose of The Review   
From this review, the basic knowledge about regression would be gained by the reader and the reader would be aware of common errors of interpretation through practical examples. Both the opportunities for applying regression analysis and its limitations are presented here. This narrative would help the reader to judge whether the method has been used correctly and the results have been interpreted appropriately in any research.
Methods/Search Strategy   
Authors searched PubMed, Medline, Embase, Up to date, Clinical key and Textbooks on biostatistics (last search 15^{th} February 2021) with the MeSH terms and keywords of regression, linear regression, logistic regression, multiple regression, etc., to make a brief introduction of regression models, broad types of regression along with an illustrative example to explain, a typical reason to perform regression analysis, and how the results should be interpreted. This narrative review is based on selected textbooks of biostatistics, a selective review of the eliterature (case study, original article and review article), and the reviewer's own experience.
How to Define Regression with Its Basic Terms?   
The term 'regression' was first coined by Francis Galton in the 19^{th} century to describe a biological phenomenon.^{[1]} YaLun Chou defined it as 'Analysis attempts to establish the nature of the relationship between the variables i.e., to study the functional relationship between the variables and thereby provide a mechanism for prediction, or forecasting'.
To understand regression analysis fully, it is essential to comprehend the two types of variable.
 Dependent Variable: This is a core factor that one tries to understand or predict
 Independent Variables: This can be one or more factors that are known or hypothesized, to have an impact on the dependent variable.^{[2]}
Example 1: College of Nursing, AIIMS Rishikesh had held a workshop on Statistical Analysis for undergraduate students in March 2019. Now, the AIIMS administration wants to measure the levels of satisfaction among the workshop attendees and identify the variables which influence the level of satisfaction as preparation of the next workshop going to be started. Any of the following variables have the potential to impact an attendee's level of satisfaction.
 It could be the topics which were covered in the individual sessions of the event
 The total time duration of the sessions
 Provision of food or catering services of the event
 Registration fees for the event
 Presence of motivating personalities
 Provision of current information, new technologies and/or handson practice session.
By applying regression analysis on this survey data, the AIIMS administration can conclude whether or not these factors had an impact on the attendee's satisfaction, and if so, to what extent. It also informs about the elements of the sessions which were well received, and where they need future focus, so that attendees would be more satisfied at the next event. Regression confidently determines which variables or factors matter the most, which one can be ignored, and how they influence each other.
In the above example,
 The dependent variable is the attendees' satisfaction with the event
 The independent variables are the covered topics, the time duration of the sessions, provision of food and the registration fees to attend the event.
Building a Regression Model: How Does it Work?   
Example 2: The nurse manager in operating room would be interested to estimate the number of sutures to be ordered for the next month from the usual number of operations held in gastroenterology operating room. To conduct a regression analysis, the nurse manager would collect the past data or information on the variable in question. If it is 80 surgeries per month, how many sutures would be needed? What about if it is 90?
The yaxis is the suture quantity, the item about which the nurse manager is interested. The xaxis is the known factor of the number of surgeries per month. Every single black dot represents 1month data; how many operations held in that month and the quantity of suture used in that same month [Figure 1].{Figure 1}
Steps to Make a Diagram   
 Step1: On the graph paper or normal paper draw lines, the horizontal line denoting xaxis and vertical line L as the yaxis^{[2]}
 Step2: On the horizontal xaxis (from left to right) place the independent variable (operation per month)
 Step3: On the vertical yaxis (from bottom to top) place the dependent variable (number of sutures used per month).
Now, a blue line can be drawn by a statistics programme such as Microsoft Excel, SPSS or any other statistical package which runs roughly through the mid of all the data points. It shows the line of best fits^{[3]} and is called the regression line. This line would help to answer with a degree of confidence, about the quantity of sutures that is typically needed for a certain number of operations. In addition to the line, the statistical program also outputs a mathematical formula which is:
Y = 50 + 4X + error/residual
An error term would be always with a line of best fit because, in real life, independent variables are never perfect predictors or accurate estimates of the dependent variables. It refers to the fact that regression is not perfectly precise. The error term depicts the certainty of the formula, the smaller the error, the more accurate is the determination of response variable by known variable. After ignoring the error term researcher will focus on the model.^{[2]}
Y = 50 + 4X : Y= a + bX
Suppose, if there is no planned operations for the next month, the average stock would be 50 and according to the past data, for every operation, an average of four sutures would be needed. Hence, Y = 50 if there is no X. For every one increment of X, the value of Y goes up by four.
The above example has shown the use of only one known variable to predict the factor of interest; here, it is the number of sutures is to be ordered. Typically, several independent variables can have an impact on a factor of estimation which can be also understood by regression analysis.^{[2]} Area of surgical wound, expertise of surgeon, type of incision are few other independent variables that could predict the number of sutures needed.
Y = a + b_{1} X_{1} + b2 X_{2} +……. b_{k} X_{K}
Graphical Representation of the Regression Model   
There is a difference or a gap between an observed value and the corresponding predicted value on the line of best fit in a regression model which is called residual.^{[4]} It is a vertical distance between line of best fit and each observation denoted by black line. Here, 'a' numerical constants, at the point a which line of best fit cross y axis and 'b' (red line) is the slope of line of best fit called regression coefficient [Figure 2].{Figure 2}
Similarity and Difference of Regression with Correlation   
 Correlation measures the relationship between two quantities where regression describes how a known value numerically estimate an unknown one^{[3]}
 Correlation can only represent the linear relationship between two variables. On the contrary, regression is used to fit the best line and so it can be curvilinear^{[3]}
 In correlation, there is no difference between a dependent and independent variable i.e., correlation between x and y is similar to y and x. However, the regression of 'y' on 'x' is different from 'x' on 'y' as it reflects the impact of a unit change in the independent variable on the dependent variable^{[2]}
 Since the value of the coefficient of correlation (r) cannot exceed one, one of the regression coefficients must be <1
 The coefficient of correlation will have the same sign with regression coefficients. Both will be either positive or negative.^{[2]}
How is Regression Analysis Typically Useful in Healthcare?   
Regression analysis in a health care setting is helpful to gain profit, prevent risk, to take proper decisions, for appropriate judgement and to make the right choice based on mathematical prediction.^{[3],[4],[5],[6]} Three common purposes of regression are as follows: (1) explanation, (2) adjustment and (3) prediction which are described with examples in [Table 1].{Table 1}
What are the Most Convenient Types of Regression Analysis in Biomedical and Social Sciences?   
Simple and multiple
Simple regression analysis involves the study of only two variables at a time whereas multiple regression analysis involves the study of more than two variables.^{[2]}
Example: (1) Simple regression: Marks of students in a performance test and their level of intelligence
(2) Multiple regression: Marks of students in a performance test and their level of intelligence and study hours they have spent.
Multivariable
The multivariable regression has its use on multiple dependent variables (instead of one) with the input of multiple independent variables.^{[5],[7]} It has more than one independent variable (x_1, x_2 ….x_m) to predict the Ys, the multiple dependent variables (y_1, y_2, y_3 …. y_n) by using a different formula.
Example: A school health nurse researcher has collected data on three psychological variables, four academic variables and different types of educational programmes for high school students.
Linear and nonlinear
In linear regression, the regression line is straight^{[5]} whereas, in nonlinear regression, the line of best fit is not straight. It is a curve that fits into the data points. The relationship between two variables is said to be nonlinear, if a unit change in one variable, does not change the other variable at the same constant rate but fluctuates in contrast to linear regression.^{[2]} It is also called polynomial or curvilinear [Figure 3].{Figure 3}
Logistic regression
Logistics regression is the secondmost popular form of regression analysis next to linear regression. It is used in medicine and social sciences. Broadly logistic regression can be categorized under the following three categories:
Binary logistic regression
It is used to predict the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratiolevel independent variables. Here, the response variable is categorical or discrete or dichotomous^{[5],[8],[9]} in nature or deals with Boolean values,
 Winner or loser
 Pass or fail
 True or false
 Yes or no
 Big or small.
Example: Does the body mass index, addiction, lipid profile, age is related to the probability of having coronary artery disease (yes/no)?^{[2]}
Political or social scientists assess the probabilities of the obligatory health minister winning in reelection based on the development of old hospitals, the opening of a new hospital, more ambulance and trauma care, the increasing salary of medical professionals, etc.
Ordinal logistic regression
It predicts an ordinal dependent variable (factor of interest with different order) in the presence of one or more independent variables.^{[9],[10],[11]}
Example: The marketing analyst want to inspect variables which influence the decision to buy a large, medium or a small bottle of medicated lotion for basic care of ICU patients (dependent variable) at the hospital pharmacy. An influencer can be price, quality and quantity which are the independent variables.
Nominal logistic regression
It presents the relationship between a nominal dependent variable and a set of independent variables. A nominal dependent variable must has at least three groups or factors that do not have a natural order i.e., defects of a product: dent, scratch and tear.^{[9],[12]}
Example: A product quality analyst inspects the factors of product defects of a cannula: Blunt tip, blockage, poor plastic grip (dependent variable). Factors of the independent variable can old technology machines, deficiency in expertise, staff shortage and poor raw material.
Cox regression
It is also called the proportional hazards regression which investigates the influence of numerous variables on the time duration of an event or incident.^{[5],[13]} It indicates the relationship between the stimulus and the specific episode of an occurrence.
Example: Variables that have an effect on time duration between cancer detection and death.
Poisson regression
Poisson variables are a count of something over a constant amount of time, area or another consistent length of observation. It can calculate the rate of occurrence.^{[5],[14]}
Example: 1. Biostatistician can analyse annual deaths caused by septic shock from 1999 to 2019 in India
2. Homicides per month
3. Number of calls received by the customer service department of a private hospital daily.
Negative binomial regression
Negative binomial regression would be helpful when plotted data have a higher variance compared to the mean or too much dispersion of the data when the researcher plots it.^{[15]}
Example: A health researcher is studying the relationship between the number of hospital visits by senior citizens of a community in the past 12 months and the characteristics of the individuals and the types of health plans under which each one is covered. In a given example, data of the independent variable would be quite dispersed.
Zeroinflated model
When count data might have too many zeros to follow the Poisson distribution then time zeroinflated model can be used.^{[16]} For example from the department of radiodiagnosis, it has been suggested to fill up an magnetic resonance imaging (MRI) feedback form by every patient when they come to the outpatient unit. A zeroinflated model may be suitable for this scenario as there are two processes for catching zero:
 Some patients of radio diagnosis outpatient unit would not go for MRI
 Some of the patients would go for an MRI and would not be interested to submit a feedback form.
Others
There are also other regressions such as ecological regression for history and political science, Elastic Net regression, Lasso regression, Ridge regression for machine learning.
Conclusion   
Regression analysis is an important statistic in biosciences, health and medicine. It is applicable for prediction, correction of an error, estimation and analysis in health care management, clinical audit, placebo effect, new treatment, and in public health also. It deals with output or effects in response to time, cost, activity, causative factors, etc. Linear, multiple linear and logistic regression are the frequently used regression analysis. Therefore, health data analysts should have sound knowledge and relevant practice of regression analysis for answering appropriate questions in the clinical field.
Acknowledgement
We would like to express our gratitude for Mr. Sandeep Singh, Senior Librarian of the AIIMS, Rishikesh for his support to make availability of the resources for writing purpose of this paper.
Financial support and sponsorship
Nil.
Conflicts of interest
There are no conflicts of interest.
References   
1.  Kumari K, Yadav S. Linear regression analysis study. J Pract Cardiovasc Sci 2018;4:336. [Full text] 
2.  Sharma SK. Nursing Research and Statistics. 3 ^{rd} ed. India: Elsevier; 2018. p. 48892. 
3.  Burns N, Grove SK. Understanding Nursing Research: Building an EvidenceBased Practice. 5 ^{th} ed. Philadelphia: Elsevier; 2014. p. 3979. 
4.  Cox DR, Snell EJ. A general definition of residuals. J R Stat Soc Series B 1968;30:24875. 
5.  Schneider A, Hommel G, Blettner M. Linear regression analysis: Part 14 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2010;107:77682. 
6.  Morton V, Torgerson DJ. Effect of regression to the mean on decision making in health care. BMJ 2003;326:10834. 
7.  Rencher AC, Christensen WF. Chapter 10, multivariate regression – Section 10.1, introduction. In: Methods of Multivariate Analysis, Wiley Series in Probability and Statistics, 709. 3 ^{rd} ed. United States of America: John Wiley & Sons; 2012. p. 19. 
8.  Cabrera AF. Logistic regression analysis in higher education: An applied perspective. In: Higher Education: Handbook of Theory and Research. Vol. 10. Newyork: Agathon Press; 1994. p. 22556. 
9.  Abedin T. Application of binary logistic regression in clinical research. JNHFB 2016;5:811. 
10.  Das S, Rahman RM. Application of ordinal logistic regression analysis in determining risk factors of child malnutrition in Bangladesh. Nutr J 2011;10:124. 
11.  Bender R, Grouven U. Ordinal logistic regression in medical research. J R Coll Physicians Lond 1997;31:54651. 
12.  ElHabil AM. An application on multinomial logistic regression model. Pak J Stat Oper Res 2012;8:27191. [doi: 10.18187/pjsor.v8i2.234]. 
13.  Rusmadi G, Saefuddin A, Sartono B. Applied ridge and LASSO methods in Cox proportional hazard modelling. Int J Sci Eng 2017;8:75961. 
14.  Hayat MJ, Higgins M. Understanding poisson regression. J Nurs Educ 2014; 53:20715. 
15.  LloydSmith JO, Schreiber SJ, Kopp PE, Getz WM. Super spreading and the effect of individual variation on disease emergence. Nature 2005;438:3559. 
16.  Diane L. Zeroinflated Poisson regression with an application to defects in manufacturing. Technometrics 1992;34:114. 
