As per the discussion on our introduction post, Regression is a modelling technique used to make some prediction based on the dataset provided. Linear regression is used to predict target value based on the linear relationship between the dependent and independent variables. Now let us try to figure out what these variables are intended. Consider the below dataset,
This dataset contains 10 observations of the amount spent on a firm and their resulted profit. If you observe clearly, You can understand that the profit is having some linear relationship on the spent amount. So we can clearly say that Profit is a dependent variable of the Spent amount. Also, the spent amount is an independent variable. It makes sense right?
As I have mentioned earlier, Linear regression will try to predict the target dependent variable value of a new independent variable is introduced. So now let’s start making some predictions on this dataset. We have to know how much will be the profit of a spent amount of 10.
Now lets name independent variable as X and dependent variable as Y. If we try to plot X and Y values in a 2-Dimensional graph, It will look like this.
In order to solve this problem, we can use very simple and basic approach to Machine Learning, called Simple Linear Regression. In theory, Simple Linear Regression can be accounted like this
By mathematical convention, the two factors that are involved in a simple linear regression analysis are designated x and y. The equation that describes how y is related to x is known as the regression model. The linear regression model also contains an error term that is represented by Ε, or the Greek letter epsilon. The error term is used to account for the variability in y that cannot be explained by the linear relationship between x and y. There also parameters that represent the population being studied.
Now if you observe the graph above, It is clear that data points can be divided equally by drawing a diagonal line like below image.
This line is called as regressor line. So for using regressor line, we can predict the value for any new coming X values. Is it very simple right? Just put a straight vertical line from X point you wanted and draw a horizontal line to Y value to get the prediction. This is how a simple linear regression model works. Using the data points, the model will try to create a regression line and the regression line will account for any forthcoming datapoints.
If you are interested in reading and searching the Mathematical approach of Simple Linear Regression model, I have not mentioned it here. You can simply Google it find some useful resources there.
If you are aware of basics of Simple linear regression concepts, You can directly dive into this section. Python language has a broad set of libraries for solving this kind of machine learning problems. For the above dataset, We can create regression model in Python.
1. Importing the required modules
Create a blank text file. If you are a python programmer, you might be knowing what is a module. For those who don’t know what module is, they are an object of some python classes. For our purpose, we have to import two modules. The code will look like this
import numpy as np from sklearn.linear_model import LinearRegression
Numpy is a package which contains functions for fundamental computing and mathematical concepts. And sklearn is (also known as scikit learn) a module for simple and efficient tools for data mining and data analysis.
2.Collecting or creating the dataset
A dataset can be created using numpy library or it can be imported from other formats such as spreadsheet, text file, JSON representation etc., Here our dataset is very simple and short. So we can create X and Y valued array using numpy module. The code is given below.
X=np.array([0,1,2,3,4,5,6,7,8,9]) X=X.reshape(-1,1) y=np.array([1,3,4,2,6,3,8,5,10,8]) y=y.reshape(-1,1)
If you look at the above code snippet, It is clear that we have used reshape() function of numpy. This means that you give a new shape to the array without changing its data. So that your 1-dimensional array will be converted to 2D.
3.Fitting the dataset
Now here comes the most prominent part. Fitting dataset into our regressor. Here in this part, we will be creating an object of LinearRegressor class. This object contains a function called fit() which accepts X and Y variables for creating the regressor line. Once you have done, You can start predicting the new values.
Try to run your code by compiling it. If it doesn’t throw any error, it means that your model is ready for predictions.
Once the regressor object is fit into your dataset and compiled successfully, You can simply print the prediction for a value which is not in the dataset. Simply call
#Predicting profit for spent amount 10 y_pred=regressor.predict(10) print (y_pred)
This will print your predicted output as 9.5
That’s it! Congrats you have now built your first machine learning model! But for solving a real world problem, It will be tricky to make a perfect model. So on upcoming chapters, we will be talking about data preprocessing and more complex regression techniques. Before that, you can get most commonly available datasets (Like Iris and Boston houses dataset) for learning simple linear regression.