Predicting the sales from advertising budget using Linear Regression model
Hi, in this project I tried to create a prediction model for sales analysis. In this model, we need to feed the advertising budget the model will predict the possible sales. For designing the model the machine learning method I used is linear regression and the tool I used for coding is jupyter notebook.
For testing and training the model the dataset, I used is from an advertising agency which contains records of the budget for TV advertising, Radio advertising and News advertising and also some sales record.
The process for the model building will be completed in the following steps:
1. Importing all the required modules for the model
2. Extracting all the data from the dataset
3 Data cleaning and wrangling
4. Preparing training data for model
5. Preparing testing data for model
6. Creating, training and testing the model
7. Checking the accuracy of the model
8. Plotting the model graph to analyse
1. Importing all required modules.
Here the python modules I used are pandas, seaborn, sklearn, and matplotlib.
code
import pandas as pd
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from matplotlib import pyplot as plt
2. Extracting data from the dataset
The dataset was in the form of a CSV file, so I used the read_CSV file function from the pandas module. The picture of the dataset I just have given below, you can observe that it consists of four columns of TV advertising budget, Radio advertising budget, Newspaper Advertising budget and Sales records.
code
read_data=pd.read_csv("tvradioadvertising.csv")
read_data
3. Data cleaning and Wrangling
After extracting the data from the dataset now its time to check the purity of the data. Try to find the missing values, null values which are hidden inside the dataset and remove that this will make the model more accurate.
To check the purity of the data set I use the heatmap function from the seaborn module. This will show the impurities hidden on which column.
code
sns.heatmap(read_data.isnull(),yticklabels=False,cmap='viridis')
Luckily my dataset is already clean so we can't see any impurity. but to do if their any impurity present let me show you.
4. Preparing data from the dataset for the training of the model
Ok, now its time to prepare our training data for machine learning model. This dataset consists of 200 rows and For these, I will take the first 100 rows of the column TV and store it to the tvadvertising_train variable.
The same thing also is done with the radio, newspaper and sales column.
code
tvadvertising_train=read_data.iloc[0:100,0:1]
radioadvertising_train=read_data.iloc[0:100,1:2]
newsadvertising_train=read_data.iloc[0:100,2:3]
sales_train=read_data.iloc[0:100,3:4]
5. Creating test data for the model
After creating the training data now its time to prepare the testing data for the model. For these, I will take the rest of the 100 rows as test data, from this data we will get the predicted sales values.
The same thing also done with the radio and newspaper columns.
code
tvadvertising_test=read_data.iloc[100:200,0:1]
radioadvertising_test=read_data.iloc[100:200,1:2]
newsadvertising_test=read_data.iloc[100:200,2:3]
6. Creating, training, and testing the model
Now its time to create the machine learning model, for that all we need to import the Linear regression from the sklearn module. After importing the module fit the tv advertising train data and sales train data to the model this is for training the model.
After training the model now its time to test the model and get the predicted value. For that, we will feed the tv advertising test data and collect the predicted value.
The picture for the output predicted value I had given below.
code
model=LinearRegression()
model.fit(tvadvertising_train,sales_train)
prediction_tv=model.predict(tvadvertising_test)
7. Checking the accuracy of the model
So, our model is created now let's check its accuracy. For that we will use the R square method, the sklearn module also consists of r square function to calculate its accuracy.
code
r2_score = model.score(tvadvertising_train,sales_train)
f"your R square score is {r2_score*100} %"
We can see that the accuracy of our model TV advertising is 81.950 % and I think it's pretty good.
lets plot our model how its look. For plotting the graph I will use matplotlib module.
code
plt.scatter(tvadvertising_train,sales_train)
plt.plot(tvadvertising_test,prediction_tv,color='red')
plt.xlabel("TV advertising budget")
plt.ylabel("sales record")
plt.show()
Here is the graph for the linear regression model looks like.Predicting sales for Radio advertising
The entire same process also done radio and sales columns to create the model and get the predicted value and plot it.
code
model.fit(radioadvertising_train,sales_train)
prediction_radio=model.predict(radioadvertising_test)
r2_score = model.score(radioadvertising_train,sales_train)
f"your R square score is {r2_score*100} %"
plt.scatter(radioadvertising_train,sales_train)
plt.plot(radioadvertising_test,prediction_radio,color='red')
plt.xlabel("radio advertising budget")
plt.ylabel("sales record")
plt.show()
Predicting sales for News advertising
Similarly, the same process also is done with the newspaper column and the sales column.
code
Follow me on:
Linkedin - https://www.linkedin.com/in/somen-das-6a933115a/
Instagram - https://www.instagram.com/somen912/?hl=en
And don't forget to subscribe to the blog.
so...
Thanks for your time and stay creative...