A Step by Step Guide to Multiple Linear Regression in R

Perform Linear Regression in R by identifying the most important predictors

Renu Khandelwal
7 min readDec 14, 2021


Here you will learn

What is Linear Regression? How to perform multiple linear regression on the dataset.

  • Reading the data from a CSV file.
  • Exploring and transforming the data by converting categorical variables to factors, removing missing values, and removing outliers
  • Check the data against linear regression assumptions
  • Identify the best predictors to build the linear regression model
  • Splitting data into train and test
  • Run multiple linear regression with the best predictor
  • Evaluate the linear regression results

What is Linear Regression?

Linear Regression is the most basic, easy, and common technique for predictive analysis. The goal of linear regression is to learn the relationship between the outcome variable and a set of explanatory variables. Once the relationship is explicit between the independent and dependent variables, predictions can be made based on explanatory variables.

  • Outcome variable Y is also called a dependent or response variable and is always numeric.
  • Explanatory/predictor variable x is also called an independent variable or covariate. Independent variables can be numerical or categorical.
Multiple Linear Regression

Linear Regression is useful for

  • Determining the strength of predictors by understanding the relationship between the outcome and the predictor variables. Understand which predictors have the most significant impact on the response variable.
  • Modeling for forecasting the outcome variable based on the explanatory variables.

Following libraries are used for the code for linear regression.


The dataset used in this post is the House prediction dataset from Kaggle.



Renu Khandelwal

A Technology Enthusiast who constantly seeks out new challenges by exploring cutting-edge technologies to make the world a better place!