• Users Online: 260
  • Print this page
  • Email this page

Table of Contents
Year : 2022  |  Volume : 23  |  Issue : 1  |  Page : 24-28

Re-orientation and simple understanding of regression analysis for student nurses: When and why to use

1 PhD Scholar, College of Nursing, All India Institute of Medical Sciences, Rishikesh, Uttarakhand, India
2 Professor & Principal, College of Nursing, All India Institute of Medical Sciences, Jodhpur, Rajasthan, India

Date of Submission18-Jul-2021
Date of Decision23-Mar-2022
Date of Acceptance24-Mar-2022
Date of Web Publication28-Apr-2022

Correspondence Address:
Ms. Anindita Mandal
College of Nursing, All India Institute of Medical Sciences, Rishikesh - 249 203, Uttarakhand
Login to access the Email id

Source of Support: None, Conflict of Interest: None

DOI: 10.4103/ijcn.ijcn_65_21

Rights and Permissions

With the advancement in science and technology along with growing demand for emergency medical care, estimation of some unknown factors of interest has become vital in biomedical science and healthcare. Prediction based on a known variable could be applied in making decisions, correcting an error, preventing any risk, or for some achievement. It can be an estimation of needed items and devices; recovery time of a patient, profit or loss of the organization, etc. One of the requiring statistics for this purpose is regression analysis which can identify the influence of one variable on another and calculate the unknown prediction from a familiar one. A present narrative review on regression analysis has been structured by reviewing E-literature, textbooks, unpublished information and practical experience of authors. This article would be useful to the researchers to be updated with more practical information and a needful application of the same.

Keywords: Linear regression, logistic regression, multiple regression, regression analysis

How to cite this article:
Mandal A, Sharma SK. Re-orientation and simple understanding of regression analysis for student nurses: When and why to use. Indian J Cont Nsg Edn 2022;23:24-8

How to cite this URL:
Mandal A, Sharma SK. Re-orientation and simple understanding of regression analysis for student nurses: When and why to use. Indian J Cont Nsg Edn [serial online] 2022 [cited 2022 Dec 7];23:24-8. Available from: https://www.ijcne.org/text.asp?2022/23/1/24/344275

  Introduction Top

Regression analysis is a strong quantitative measurement in statistics that inspect the relationship between two or more variables of interest[1] and especially determine the influence of one or more independent variables on a dependent variable, for example, daily physical exercise and fat-free diet have a strong influence on serum cholesterol of a diabetic patient. The particular statistical test estimate or predict the unknown value of one variable from the known value of the related variable,[2] prediction about the level of nursing care in intensive care unit (ICU) can be made from counting the number of patients having bedsore, the incidence of fall, missed medicine, developing thrombophlebitis among patients, etc. It can identify the relevant risk factors and calculate risk scores for every prediction. Regression analysis was a popular method of statistics in finance, marketing, investment and other disciplines earlier but now, there is a growing application of it in health and bio-sciences.

  Purpose of The Review Top

From this review, the basic knowledge about regression would be gained by the reader and the reader would be aware of common errors of interpretation through practical examples. Both the opportunities for applying regression analysis and its limitations are presented here. This narrative would help the reader to judge whether the method has been used correctly and the results have been interpreted appropriately in any research.

  Methods/Search Strategy Top

Authors searched PubMed, Medline, Embase, Up to date, Clinical key and Textbooks on biostatistics (last search 15th February 2021) with the MeSH terms and keywords of regression, linear regression, logistic regression, multiple regression, etc., to make a brief introduction of regression models, broad types of regression along with an illustrative example to explain, a typical reason to perform regression analysis, and how the results should be interpreted. This narrative review is based on selected textbooks of bio-statistics, a selective review of the e-literature (case study, original article and review article), and the reviewer's own experience.

  How to Define Regression with Its Basic Terms? Top

The term 'regression' was first coined by Francis Galton in the 19th century to describe a biological phenomenon.[1] Ya-Lun Chou defined it as 'Analysis attempts to establish the nature of the relationship between the variables i.e., to study the functional relationship between the variables and thereby provide a mechanism for prediction, or forecasting'.

To understand regression analysis fully, it is essential to comprehend the two types of variable.

  • Dependent Variable: This is a core factor that one tries to understand or predict
  • Independent Variables: This can be one or more factors that are known or hypothesized, to have an impact on the dependent variable.[2]

Example 1: College of Nursing, AIIMS Rishikesh had held a workshop on Statistical Analysis for undergraduate students in March 2019. Now, the AIIMS administration wants to measure the levels of satisfaction among the workshop attendees and identify the variables which influence the level of satisfaction as preparation of the next workshop going to be started. Any of the following variables have the potential to impact an attendee's level of satisfaction.

  • It could be the topics which were covered in the individual sessions of the event
  • The total time duration of the sessions
  • Provision of food or catering services of the event
  • Registration fees for the event
  • Presence of motivating personalities
  • Provision of current information, new technologies and/or hands-on practice session.

By applying regression analysis on this survey data, the AIIMS administration can conclude whether or not these factors had an impact on the attendee's satisfaction, and if so, to what extent. It also informs about the elements of the sessions which were well received, and where they need future focus, so that attendees would be more satisfied at the next event. Regression confidently determines which variables or factors matter the most, which one can be ignored, and how they influence each other.

In the above example,

  • The dependent variable is the attendees' satisfaction with the event
  • The independent variables are the covered topics, the time duration of the sessions, provision of food and the registration fees to attend the event.

  Building a Regression Model: How Does it Work? Top

Example 2: The nurse manager in operating room would be interested to estimate the number of sutures to be ordered for the next month from the usual number of operations held in gastroenterology operating room. To conduct a regression analysis, the nurse manager would collect the past data or information on the variable in question. If it is 80 surgeries per month, how many sutures would be needed? What about if it is 90?

The y-axis is the suture quantity, the item about which the nurse manager is interested. The x-axis is the known factor of the number of surgeries per month. Every single black dot represents 1-month data; how many operations held in that month and the quantity of suture used in that same month [Figure 1].{Figure 1}

  Steps to Make a Diagram Top

  • Step-1: On the graph paper or normal paper draw lines, the horizontal line denoting x-axis and vertical line L as the y-axis[2]
  • Step-2: On the horizontal x-axis (from left to right) place the independent variable (operation per month)
  • Step-3: On the vertical y-axis (from bottom to top) place the dependent variable (number of sutures used per month).

Now, a blue line can be drawn by a statistics programme such as Microsoft Excel, SPSS or any other statistical package which runs roughly through the mid of all the data points. It shows the line of best fits[3] and is called the regression line. This line would help to answer with a degree of confidence, about the quantity of sutures that is typically needed for a certain number of operations. In addition to the line, the statistical program also outputs a mathematical formula which is:

Y = 50 + 4X + error/residual

An error term would be always with a line of best fit because, in real life, independent variables are never perfect predictors or accurate estimates of the dependent variables. It refers to the fact that regression is not perfectly precise. The error term depicts the certainty of the formula, the smaller the error, the more accurate is the determination of response variable by known variable. After ignoring the error term researcher will focus on the model.[2]

Y = 50 + 4X |: Y= a + bX

Suppose, if there is no planned operations for the next month, the average stock would be 50 and according to the past data, for every operation, an average of four sutures would be needed. Hence, Y = 50 if there is no X. For every one increment of X, the value of Y goes up by four.

The above example has shown the use of only one known variable to predict the factor of interest; here, it is the number of sutures is to be ordered. Typically, several independent variables can have an impact on a factor of estimation which can be also understood by regression analysis.[2] Area of surgical wound, expertise of surgeon, type of incision are few other independent variables that could predict the number of sutures needed.

Y = a + b1 X1 + b2 X2 +……. bk XK

  Graphical Representation of the Regression Model Top

There is a difference or a gap between an observed value and the corresponding predicted value on the line of best fit in a regression model which is called residual.[4] It is a vertical distance between line of best fit and each observation denoted by black line. Here, 'a' numerical constants, at the point a which line of best fit cross y axis and 'b' (red line) is the slope of line of best fit called regression coefficient [Figure 2].{Figure 2}

  Similarity and Difference of Regression with Correlation Top

  • Correlation measures the relationship between two quantities where regression describes how a known value numerically estimate an unknown one[3]
  • Correlation can only represent the linear relationship between two variables. On the contrary, regression is used to fit the best line and so it can be curvilinear[3]
  • In correlation, there is no difference between a dependent and independent variable i.e., correlation between x and y is similar to y and x. However, the regression of 'y' on 'x' is different from 'x' on 'y' as it reflects the impact of a unit change in the independent variable on the dependent variable[2]
  • Since the value of the coefficient of correlation (r) cannot exceed one, one of the regression coefficients must be <1
  • The coefficient of correlation will have the same sign with regression coefficients. Both will be either positive or negative.[2]

  How is Regression Analysis Typically Useful in Healthcare? Top

Regression analysis in a health care setting is helpful to gain profit, prevent risk, to take proper decisions, for appropriate judgement and to make the right choice based on mathematical prediction.[3],[4],[5],[6] Three common purposes of regression are as follows: (1) explanation, (2) adjustment and (3) prediction which are described with examples in [Table 1].{Table 1}

  What are the Most Convenient Types of Regression Analysis in Biomedical and Social Sciences? Top

Simple and multiple

Simple regression analysis involves the study of only two variables at a time whereas multiple regression analysis involves the study of more than two variables.[2]

Example: (1) Simple regression: Marks of students in a performance test and their level of intelligence

(2) Multiple regression: Marks of students in a performance test and their level of intelligence and study hours they have spent.


The multivariable regression has its use on multiple dependent variables (instead of one) with the input of multiple independent variables.[5],[7] It has more than one independent variable (x_1, x_2 ….x_m) to predict the Ys, the multiple dependent variables (y_1, y_2, y_3 …. y_n) by using a different formula.

Example: A school health nurse researcher has collected data on three psychological variables, four academic variables and different types of educational programmes for high school students.

Linear and non-linear

In linear regression, the regression line is straight[5] whereas, in non-linear regression, the line of best fit is not straight. It is a curve that fits into the data points. The relationship between two variables is said to be non-linear, if a unit change in one variable, does not change the other variable at the same constant rate but fluctuates in contrast to linear regression.[2] It is also called polynomial or curvilinear [Figure 3].{Figure 3}

Logistic regression

Logistics regression is the second-most popular form of regression analysis next to linear regression. It is used in medicine and social sciences. Broadly logistic regression can be categorized under the following three categories:-

Binary logistic regression

It is used to predict the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables. Here, the response variable is categorical or discrete or dichotomous[5],[8],[9] in nature or deals with Boolean values,

  • Winner or loser
  • Pass or fail
  • True or false
  • Yes or no
  • Big or small.

Example: Does the body mass index, addiction, lipid profile, age is related to the probability of having coronary artery disease (yes/no)?[2]

Political or social scientists assess the probabilities of the obligatory health minister winning in re-election based on the development of old hospitals, the opening of a new hospital, more ambulance and trauma care, the increasing salary of medical professionals, etc.

Ordinal logistic regression

It predicts an ordinal dependent variable (factor of interest with different order) in the presence of one or more independent variables.[9],[10],[11]

Example: The marketing analyst want to inspect variables which influence the decision to buy a large, medium or a small bottle of medicated lotion for basic care of ICU patients (dependent variable) at the hospital pharmacy. An influencer can be price, quality and quantity which are the independent variables.

Nominal logistic regression

It presents the relationship between a nominal dependent variable and a set of independent variables. A nominal dependent variable must has at least three groups or factors that do not have a natural order i.e., defects of a product: dent, scratch and tear.[9],[12]

Example: A product quality analyst inspects the factors of product defects of a cannula: Blunt tip, blockage, poor plastic grip (dependent variable). Factors of the independent variable can old technology machines, deficiency in expertise, staff shortage and poor raw material.

Cox regression

It is also called the proportional hazards regression which investigates the influence of numerous variables on the time duration of an event or incident.[5],[13] It indicates the relationship between the stimulus and the specific episode of an occurrence.

Example: Variables that have an effect on time duration between cancer detection and death.

Poisson regression

Poisson variables are a count of something over a constant amount of time, area or another consistent length of observation. It can calculate the rate of occurrence.[5],[14]

Example: 1. Biostatistician can analyse annual deaths caused by septic shock from 1999 to 2019 in India

2. Homicides per month

3. Number of calls received by the customer service department of a private hospital daily.

Negative binomial regression

Negative binomial regression would be helpful when plotted data have a higher variance compared to the mean or too much dispersion of the data when the researcher plots it.[15]

Example: A health researcher is studying the relationship between the number of hospital visits by senior citizens of a community in the past 12 months and the characteristics of the individuals and the types of health plans under which each one is covered. In a given example, data of the independent variable would be quite dispersed.

Zero-inflated model

When count data might have too many zeros to follow the Poisson distribution then time zero-inflated model can be used.[16] For example from the department of radio-diagnosis, it has been suggested to fill up an magnetic resonance imaging (MRI) feedback form by every patient when they come to the outpatient unit. A zero-inflated model may be suitable for this scenario as there are two processes for catching zero:

  • Some patients of radio diagnosis outpatient unit would not go for MRI
  • Some of the patients would go for an MRI and would not be interested to submit a feedback form.


There are also other regressions such as ecological regression for history and political science, Elastic Net regression, Lasso regression, Ridge regression for machine learning.

  Conclusion Top

Regression analysis is an important statistic in biosciences, health and medicine. It is applicable for prediction, correction of an error, estimation and analysis in health care management, clinical audit, placebo effect, new treatment, and in public health also. It deals with output or effects in response to time, cost, activity, causative factors, etc. Linear, multiple linear and logistic regression are the frequently used regression analysis. Therefore, health data analysts should have sound knowledge and relevant practice of regression analysis for answering appropriate questions in the clinical field.


We would like to express our gratitude for Mr. Sandeep Singh, Senior Librarian of the AIIMS, Rishikesh for his support to make availability of the resources for writing purpose of this paper.

Financial support and sponsorship


Conflicts of interest

There are no conflicts of interest.

  References Top

Kumari K, Yadav S. Linear regression analysis study. J Pract Cardiovasc Sci 2018;4:33-6.  Back to cited text no. 1
  [Full text]  
Sharma SK. Nursing Research and Statistics. 3rd ed. India: Elsevier; 2018. p. 488-92.  Back to cited text no. 2
Burns N, Grove SK. Understanding Nursing Research: Building an Evidence-Based Practice. 5th ed. Philadelphia: Elsevier; 2014. p. 397-9.  Back to cited text no. 3
Cox DR, Snell EJ. A general definition of residuals. J R Stat Soc Series B 1968;30:248-75.  Back to cited text no. 4
Schneider A, Hommel G, Blettner M. Linear regression analysis: Part 14 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2010;107:776-82.  Back to cited text no. 5
Morton V, Torgerson DJ. Effect of regression to the mean on decision making in health care. BMJ 2003;326:1083-4.  Back to cited text no. 6
Rencher AC, Christensen WF. Chapter 10, multivariate regression – Section 10.1, introduction. In: Methods of Multivariate Analysis, Wiley Series in Probability and Statistics, 709. 3rd ed. United States of America: John Wiley & Sons; 2012. p. 19.  Back to cited text no. 7
Cabrera AF. Logistic regression analysis in higher education: An applied perspective. In: Higher Education: Handbook of Theory and Research. Vol. 10. Newyork: Agathon Press; 1994. p. 225-56.  Back to cited text no. 8
Abedin T. Application of binary logistic regression in clinical research. JNHFB 2016;5:8-11.  Back to cited text no. 9
Das S, Rahman RM. Application of ordinal logistic regression analysis in determining risk factors of child malnutrition in Bangladesh. Nutr J 2011;10:124.  Back to cited text no. 10
Bender R, Grouven U. Ordinal logistic regression in medical research. J R Coll Physicians Lond 1997;31:546-51.  Back to cited text no. 11
El-Habil AM. An application on multinomial logistic regression model. Pak J Stat Oper Res 2012;8:271-91. [doi: 10.18187/pjsor.v8i2.234].  Back to cited text no. 12
Rusmadi G, Saefuddin A, Sartono B. Applied ridge and LASSO methods in Cox proportional hazard modelling. Int J Sci Eng 2017;8:759-61.  Back to cited text no. 13
Hayat MJ, Higgins M. Understanding poisson regression. J Nurs Educ 2014;53:207-15.  Back to cited text no. 14
Lloyd-Smith JO, Schreiber SJ, Kopp PE, Getz WM. Super spreading and the effect of individual variation on disease emergence. Nature 2005;438:355-9.  Back to cited text no. 15
Diane L. Zero-inflated Poisson regression with an application to defects in manufacturing. Technometrics 1992;34:1-14.  Back to cited text no. 16


    Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
    Access Statistics
    Email Alert *
    Add to My List *
* Registration required (free)  

  In this article
Purpose of The R...
Methods/Search S...
How to Define Re...
Building a Regre...
Steps to Make a ...
Graphical Repres...
Similarity and D...
How is Regressio...
What are the Mos...

 Article Access Statistics
    PDF Downloaded61    
    Comments [Add]    

Recommend this journal