Ad Code

Predicting the sales from advertising budget using Linear Regression model

Predicting the sales from advertising budget using Linear Regression model         


Hi, in this project I tried to create a prediction model for sales analysis. In this model, we need to feed the advertising budget the model will predict the possible sales. For designing the model the machine learning method I used is linear regression and the tool I used for coding is jupyter notebook. 

For testing and training the model the dataset, I used is from an advertising agency which contains records of the budget for TV advertising, Radio advertising and News advertising and also some sales record.

The process for the model building will be completed in the following steps:

1. Importing all the required modules for the model

2. Extracting all the data from the dataset

3  Data cleaning and wrangling 

4. Preparing training data for model

5. Preparing testing data for model

6. Creating, training and testing the model

7. Checking the accuracy of the model

8. Plotting the model graph to analyse


https://somenplus.blogspot.com/2020/09/predicting-sales-from-advertising.html



1. Importing all required modules.    

Here the python modules I used are pandas, seaborn, sklearn, and matplotlib.


code

import pandas as pd

import seaborn as sns

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error              

from matplotlib import pyplot as plt


2. Extracting data from the dataset  

The dataset was in the form of a CSV file, so I used the read_CSV file function from the pandas module. The picture of the dataset I just have given below, you can observe that it consists of four columns of TV advertising budget, Radio advertising budget, Newspaper Advertising budget and Sales records.


code

read_data=pd.read_csv("tvradioadvertising.csv")

read_data

https://somenplus.blogspot.com/2020/09/predicting-sales-from-advertising.html

3. Data cleaning and Wrangling 

After extracting the data from the dataset now its time to check the purity of the data. Try to find the missing values, null values which are hidden inside the dataset and remove that this will make the model more accurate. 

To check the purity of the data set I use the heatmap function from the seaborn module. This will show the impurities hidden on which column.


code

sns.heatmap(read_data.isnull(),yticklabels=False,cmap='viridis')  

https://somenplus.blogspot.com/2020/09/predicting-sales-from-advertising.html

Luckily my dataset is already clean so we can't see any impurity. but to do if their any impurity present let me show you. 

https://somenplus.blogspot.com/2020/09/predicting-sales-from-advertising.html     


Here is the picture of a heat map of another dataset. In this heatmap, on the rightmost column, you can see the impurities present which is shown by the yellow colour. 
Now to remove this null values you just need to write the following code,

read_data.dropna(inplace=True)

This will remove all the null values rows from the dataset.

And after that, if you again see the heat map this yellow colour will not present.


4. Preparing data from the dataset for the training of the model  

Ok, now its time to prepare our training data for machine learning model. This dataset consists of 200 rows and For these, I will take the first 100 rows of the column TV and store it to the tvadvertising_train variable. 

The same thing also is done with the radio, newspaper and sales column.


code

tvadvertising_train=read_data.iloc[0:100,0:1]

radioadvertising_train=read_data.iloc[0:100,1:2]

newsadvertising_train=read_data.iloc[0:100,2:3]

sales_train=read_data.iloc[0:100,3:4]    


5. Creating test data for the model 

After creating the training data now its time to prepare the testing data for the model. For these, I will take the rest of the 100 rows as test data, from this data we will get the predicted sales values. 

The same thing also done with the radio and newspaper columns. 

 

code

tvadvertising_test=read_data.iloc[100:200,0:1] 

radioadvertising_test=read_data.iloc[100:200,1:2]

newsadvertising_test=read_data.iloc[100:200,2:3]   


6. Creating, training, and testing the model 

Now its time to create the machine learning model, for that all we need to import the Linear regression from the sklearn module. After importing the module fit the tv advertising train data and sales train data to the model this is for training the model.

After training the model now its time to test the model and get the predicted value. For that, we will feed the tv advertising test data and collect the predicted value.

The picture for the output predicted value I had given below.


code

model=LinearRegression()

model.fit(tvadvertising_train,sales_train)

prediction_tv=model.predict(tvadvertising_test)

https://somenplus.blogspot.com/2020/09/predicting-sales-from-advertising.html


7. Checking the accuracy of the model 

So, our model is created now let's check its accuracy. For that we will use the R square method, the sklearn module also consists of r square function to calculate its accuracy. 


code 

r2_score = model.score(tvadvertising_train,sales_train)

f"your R square score is {r2_score*100} %" 

 

We can see that the accuracy of our model TV advertising is 81.950 % and I think it's pretty good.


8. Plotting the model. 

lets plot our model how its look. For plotting the graph I will use matplotlib module.


code

plt.scatter(tvadvertising_train,sales_train)

plt.plot(tvadvertising_test,prediction_tv,color='red')

plt.xlabel("TV advertising budget")

plt.ylabel("sales record")

plt.show()  

https://somenplus.blogspot.com/2020/09/predicting-sales-from-advertising.html 
Here is the graph for the linear regression model looks like.
In this linear graph, we can see the scattered graph which is created by the train data and the model line which is denoted by the red line is created by our test data and predicted output. We can observe that most of the dots are on the line hence we can say that our model is just created the best fit line.


Predicting sales for Radio advertising 

The entire same process also done radio and sales columns to create the model and get the predicted value and plot it.


code

model.fit(radioadvertising_train,sales_train)

prediction_radio=model.predict(radioadvertising_test)  

https://somenplus.blogspot.com/2020/09/predicting-sales-from-advertising.html


r2_score = model.score(radioadvertising_train,sales_train)

f"your R square score is {r2_score*100} %"  

 

plt.scatter(radioadvertising_train,sales_train)

plt.plot(radioadvertising_test,prediction_radio,color='red')

plt.xlabel("radio advertising budget")

plt.ylabel("sales record")

plt.show()

https://somenplus.blogspot.com/2020/09/predicting-sales-from-advertising.html

Predicting sales for News advertising  

Similarly, the same process also is done with the newspaper column and the sales column.


code

model.fit(newsadvertising_train,sales_train)
prediction_news=model.predict(newsadvertising_test)  
https://somenplus.blogspot.com/2020/09/predicting-sales-from-advertising.html



r2_score = model.score(newsadvertising_train,sales_train)
f"your R square score is {r2_score*100} %"   


plt.scatter(newsadvertising_train,sales_train)
plt.plot(newsadvertising_test,prediction_news,color='red')
plt.xlabel("news advertising budget")
plt.ylabel("sales record")
plt.show()

https://somenplus.blogspot.com/2020/09/predicting-sales-from-advertising.html

From the entire model we observed

accuracy for TV advertising model is 81.950 % 
accuracy for Radio advertising model is 15.583 % 
accuracy for TV advertising model is 1.286 % 

Hence the model for TV advertising is the best for sales prediction.  



Follow me on: 

Linkedin - https://www.linkedin.com/in/somen-das-6a933115a/  

Instagram - https://www.instagram.com/somen912/?hl=en

And don't forget to subscribe to the blog.

so...

Thanks for your time and stay creative...