• Benadette Wambui

Regression in Data Science


Regression is one of the simplest, popular and useful statistical method used in many disciplines for prediction. that attempts to determine the strength and character of the relationship between one dependent variable (usually denoted by Y) and one other variable or even a series of other variables (known as independent variables).


For example a scenario looking for a relationship between temperature and plant growth(say height), would have the dependent variable being the height and the independent variable being temperature. In that case we can plot an x-y graph that indicates


  1. The relationship between temperature and height

  2. To what extend is the temperature variable affecting the height (remember the height might also be affected by other variables.


The two basic types of regression are simple linear regression and multiple linear regression, although there are non-linear regression methods for more complicated data and analysis. Simple linear regression uses one independent variable to explain or predict the outcome of the dependent variable Y, while multiple linear regression uses two or more independent variables to predict the outcome.

Regression takes a group of random variables, thought to be predicting Y, and tries to find a mathematical relationship between them. This relationship is typically in the form of a straight line (linear regression) that best approximates all the individual data points.


The R programming language easily helps create linear models and find relationships within variables.


Learn more about practical day to day applications of linear models at the