Skip to content

Polynomial Regression : Machine Learning in Python

In my previous blog, we had a discussion about Multiple linear regression technique. Today we are going to learn about the Polynomial regression of Machine Learning in Python. Polynomial regression is another type of Linear regression where model to powers of a single predictor by the method of linear least squares. So what is it? Let’s look at a simple linear regression graph below,

 

If you look at the above linear regression graph, the regression line goes by dividing the data points mostly. So that the line became best fit for the data points. Most of our linear regression problems were like this. Now let’s look into the below graph.

Think about drawing a best fitting line for solving this problem. Our current intuition will be using a linear regressor concept for solving this problem. Right isn’t it? But the result will look like below,

Observing the graph, it is clear that few data points are far away from the fitting line. If you are aware of graph terminologies, it is clear that the increase in distance between an actual point and fitting line point will increase the error rate also. Above graph is the fine example of an error-filled model. So if you look at the data points plots, it forms a polynomial line when they joined with lines. So think about drawing a fitting line same as a polynomial.It will look like this.

Now in the above graph, most of our data points are more adjacent and near to the line. So this line will be the best predicting line for this model other than our previous intuitions. This type of modelling is called a polynomial regression. Where the points are allocated in a polynomial line. The mathematical equation for three regressions so far is given below.

Here Y is the predicted outcome and b1...bn are the regression coefficients. x is the value of our independent variable. But for a polynomial regression, x variables are squared. If you are a mathematical genius, you might be thinking about the equation of polynomial function. Right isn’t it? You can avoid these headaches if you are not interested in the mathematics behind it. Now, let’s look into the Python approach for solving the Polynomial regression.

Let’s look into the code. Import the necessary modules first

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures

Now let us process our dataset. You can download a sample dataset here.

# Importing the dataset
dataset = pd.read_csv('dataset.csv')
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values

Here comes the core part of your code. On our previous regressions, we have used LinearRegression of sklearn.linear_model. For polynomial curve problem, we found that Linear Regression will make up the errors in the model. You can test the accuracy of the model later. For more accuracy, First, we have to preprocess our dataset which is in polynomial line form and later we have to implement the linear regression fitting. So that the regressor will form a regressor line which is more identical to a polynomial.

# Fitting Polynomial Regression to the dataset
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
poly_reg = PolynomialFeatures(degree = 2)
X_poly = poly_reg.fit_transform(X)
poly_reg.fit(X_poly, y)
lin_reg = LinearRegression()
lin_reg.fit(X_poly, y)

The parameter degree is the degree value of polynomial features in our dataset. After forming the polynomial matrix of independent variable it will be fitted into a linear regression object. Now our model is ready for prediction. You can make your predictions using line_reg.predict() method.

Data visualization

You might be noticed that I have imported a  library called matplotlib. Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. Matplotlib can be used in Python scripts, the Python and IPython shells, the Jupyter notebook, web application servers, and four graphical user interface toolkits. Now let us make use of it for visualizing our model.

# Visualising the Polynomial Regression results
plt.scatter(X, y, color = 'red')
plt.plot(X, lin_reg.predict(X_poly), color = 'blue')
plt.title('TGraph')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

Scatter function will load a graph with X and Y axis where data points are depicted as dots (Scattered graph). Also for plotting our regressor line, we have to pass the predictions using lin_reg.predict().  Also you can add title and x and y-axis labels to the graph. Final output graph will result in a new window.The regressor line will be a polynomial as we have expected.

Finally, we have completed the polynomial regression technique in machine learning using python. As I have mentioned in the previous post, you can split the dataset into training and testing. So that you can make out of best fitting regressor line for the problem. See you in my next post in machine learning.

Happy Coding!

Published inMachine Learningposts

One Comment

  1. Wow. So helpful
    Do continue posting more about Machine learning

Leave a Reply

Your email address will not be published. Required fields are marked *