Startertutorials Blog
Tutorials and articles related to programming, computer science, technology and others.
Subscribe to Startertutorials.com's YouTube channel for different tutorial and lecture videos.

Categories: Machine Learning. No Comments on Gradient Descent and Cost Function in Python
5
(1)

In this article we are going to look at gradient descent and cost function in Python programming language.

 

Mean Squared Error (MSE)

In our school days we used to solve linear equations. For example consider the linear equation y=2x+3. For given x values, [1,2,3,4,5] we can calculate y values as [5,7,9,11,13].

 

However, in case of machine learning we have the values of x and y already available with us and using them we have to derive a linear equation. The equation is also known as prediction function. We can use the prediction function to calculate the future value of y.

 

In our earlier simple linear regression tutorial, we have used the following data to predict the house prices:


Subscribe to our monthly newsletter. Get notified about latest articles, offers and contests.


 

AreaPrice
2600550000
3000565000
3200610000
3600680000
4000725000

 

The linear equation we got while implementing linear regression in Python is:

135.78 * area + 180616.43

 

So, our goal today is to determine how to get the above equation. This equation is nothing but the line which best fits the given data as shown below.

 

Linear Regression Single Variable in Python

 

Coming up with the best fit line and the linear equation is to calculate the sum of squared errors as already discussed in our simple linear regression tutorial. We divide the sum of squared errors with the number of data points n. The result we get is called Mean Squared Error (MSE). The formula for MSE is:

 

mean squared error

 

The mean squared error is also called a cost function. There are other cost functions as well but MSE is the popular one. In the above equation we will expand ypredicted as shown below:

 

mean squared error expanded

 

Now we can toss various values for m and b and see which values will give least MSE value. Such a brute force method is inefficient. We need another approach which will give the optimum values for m and in a few steps.

 

Gradient Descent Algorithm

Gradient descent is an algorithm which finds the best fit line for the given dataset.

 

If we plot a 3D graph for some value for m (slope), (intercept), and cost function (MSE), it will be as shown in the below figure.

 

Gradient Descent and Cost Function in Python

Source: miro.medium.com

 

In the above figure, intercept is bslope is m and cost is MSE. If we plot the cost for various values of m and b, we will get the graph as shown above. Generally people start with the initial values of m=0 and b=0. The corresponding cost will be the starting point of the red line (at the top) as shown above.

 

Then we reduce the values of m and b (a step) and again calculate the cost. We repeat this process of steps until we get the minimum cost (the red dot in the above figure). Then we take the corresponding values of m and b for getting the linear equation.

 

When we look at the above 3D graph with x-axis and m as x-axis, the graphs will be as shown below.

 

gradient descent m and b

Source: youtube.com

 

Let’s consider the values of b. We generally start at some random initial value. Let’s say we are decreasing the value of b (steps) by a constant value. In such case, we might miss the optimum value (also called global minima) of b (red dot) as shown in below figure.

 

gradient descent b miss

Source: youtube.com

 

If we decrease the value of b so that it follows the curvature in the graph, we can reach the optimum value of b as shown below.

 

gradient descent b success

Source: youtube.com

 

So, how do we follow the curvature? At each step we have to calculate the slope of the tangent (dashed-line in below figure) and use something called learning rate to approach at the next value of b.

 

gradient descent b slope

Source: youtube.com

 

To calculate the slope of a line at a particular point we need the concepts of derivatives and partial derivatives (calculus). Those concepts will not be covered here. So, we are directly going to jump into the equations as shown below.

 

gradient descent partial derivative equations

 

Now as we got the slope (partial derivatives given above), we need to calculate the step (next value) which uses learning rate. So, for taking the next step, the equations are as given below.

 

gradient descent updated values

 

For example, updating the value of b will look as shown below.

 

gradient descent update b value

Source: youtube.com

 

The gradient descent algorithm in action looks something like as shown in the below animation.

 

Gradient Descent Animation

 

Gradient Descent and Cost Function in Python

Now, let’s try to implement gradient descent using Python programming language. First we import the NumPy library for arrays purpose as they are easy when compared to Python lists. Then we define a function for implementing gradient descent as shown below.

 

import numpy as np

def gradient_descent(x, y):
    #initial value of m and b
    m_curr = b_curr = 0
    #initialize number of steps
    iterations = 1000
    #Number of data points n
    n = len(x)
    #Initialize learning rate
    learning_rate = 0.001
    
    for i in range(iterations):
        y_pred = m_curr * x + b_curr
        cost = (1/n) * sum([val**2 for val in (y-y_pred)])
        md = -(2/n)*sum(x*(y-y_pred))
        bd = -(2/n)*sum(y-y_pred)
        m_curr = m_curr - learning_rate * md
        b_curr = b_curr - learning_rate * bd
        print("m {}, b {}, cost {}, iteration {}".format(m_curr, b_curr, cost, i))

 

In the above code we are just trying out some values for m_curr, b_curr, iterations and learning_rate. Now, we execute our function with the following code.

 

x = np.array([1,2,3,4,5])
y = np.array([5,7,9,11,13])

gradient_descent(x, y)

 

The output (modified) for the above code is given below.

 

m 0.062, b 0.018000000000000002, cost 89.0, iteration 0
m 0.122528, b 0.035592000000000006, cost 84.881304, iteration 1
m 0.181618832, b 0.052785648000000004, cost 80.955185108544, iteration 2
m 0.239306503808, b 0.069590363712, cost 77.21263768455901, iteration 3
m 0.29562421854195203, b 0.086015343961728, cost 73.64507722605434, iteration 4
m 0.35060439367025875, b 0.10206956796255283, cost 70.2443206760065, iteration 5
m 0.40427867960173774, b 0.11776180246460617, cost 67.00256764921804, iteration 6
m 0.4566779778357119, b 0.13310060678206653, cost 63.912382537082294, iteration 7
m 0.5078324586826338, b 0.14809433770148814, cost 60.966677449199324, iteration 8
m 0.5577715785654069, b 0.16275115427398937, cost 58.15869595270883, iteration 9
m 0.606524096911324, b 0.17707902249404894, cost 55.481997572035766, iteration 10
....
m 2.4499152007487943, b 1.3756633674836414, cost 0.4805725141922505, iteration 991
m 2.449763086127419, b 1.3762125495441813, cost 0.4802476096343826, iteration 992
m 2.4496110229353505, b 1.3767615459283284, cost 0.4799229247373742, iteration 993
m 2.4494590111552026, b 1.3773103566988596, cost 0.47959845935271794, iteration 994
m 2.449307050769595, b 1.3778589819185307, cost 0.47927421333200615, iteration 995
m 2.449155141761153, b 1.3784074216500761, cost 0.47895018652693155, iteration 996
m 2.449003284112507, b 1.3789556759562092, cost 0.4786263787892881, iteration 997
m 2.448851477806295, b 1.3795037448996217, cost 0.4783027899709678, iteration 998
m 2.448699722825159, b 1.3800516285429847, cost 0.47797941992396514, iteration 999

 

From the above output we can see that the cost in the last iteration is still reducing. So, we can increase the number of iterations or change the learning rate to get the least cost.

 

Generally we will start with less number of iterations and start with a higher learning rate, say, 0.1 and see the output values. From here onwards we will just change the learning rate and number of iterations (trial and error) to get the minimum cost.

 

When you run the above function with learning_rate as 0.08 and iterations as 10000, you can see that we will get the m value as 2 (approx.) and value as 3 (approx.) which are the expected values. So, the linear equation is          y = 2x + 3.

 

Now its time for an exercise.

 

Exercise on Gradient Descent and Cost Function

We are going to work on a dataset which contains student test scores of mathematics (math) and computer science (cs) subjects. We have to find the correlation (some kind of dependency) between the math and cs subjects. Download the test scores dataset. The data is shown below.

 

namemathcs
david9298
laura5668
sanjay8881
wei7080
jeff8083
aamir4952
venkat6566
virat3530
arthur6668
paul6773

 

So, for this exercise we will consider the math values as input  or x and cs  values as the output or y. The code for gradient descent will be as shown below.

 

import numpy as np
import pandas as pd
import math

df = pd.read_csv("test_scores.csv")

def gradient_descent2(x, y):
    #initial value of m and b
    m_curr = b_curr = 0
    #initialize number of steps
    iterations = 1000000
    #Number of data points n
    n = len(x)
    #Initialize learning rate
    learning_rate = 0.0002
    
    cost_previous = 0
    
    for i in range(iterations):
        y_pred = m_curr * x + b_curr
        cost = (1/n) * sum([val**2 for val in (y-y_pred)])
        md = -(2/n)*sum(x*(y-y_pred))
        bd = -(2/n)*sum(y-y_pred)
        m_curr = m_curr - learning_rate * md
        b_curr = b_curr - learning_rate * bd
        if math.isclose(cost, cost_previous, rel_tol=1e-20):
            break
        cost_previous = cost
        print("m {}, b {}, cost {}, iteration {}".format(m_curr, b_curr, cost, i))

x = np.array(df.math)
y = np.array(df.cs)

gradient_descent2(x, y)

 

In the above program we are using math.isclose method to stop the execution of our gradient descent function when the previous cost is pretty close to the current cost.

 

If we execute the above program we will get as 1.0177381667793246, b as 1.9150826134339467 and cost as 31.604511334602297 at iteration number 415532.

 

If you want to verify whether the values m and correct or not, we can use our linear regression model. The code is given below.

 

from sklearn import linear_model

reg = linear_model.LinearRegression()
reg.fit(df[['math']],df.cs)

print(reg.coef_, reg.intercept_)

 

When we execute the above code the values we will get are 1.01773624 and 1.9152193111569176. These values are pretty close to what we got earlier.

 

That’s it folks! I hope you enjoyed this tutorial. If you have any questions or suggestions please comment below.

 

For more information visit the following links:

How useful was this post?

Click on a star to rate it!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?

Suryateja Pericherla

Suryateja Pericherla, at present is a Research Scholar (full-time Ph.D.) in the Dept. of Computer Science & Systems Engineering at Andhra University, Visakhapatnam. Previously worked as an Associate Professor in the Dept. of CSE at Vishnu Institute of Technology, India.

He has 11+ years of teaching experience and is an individual researcher whose research interests are Cloud Computing, Internet of Things, Computer Security, Network Security and Blockchain.

He is a member of professional societies like IEEE, ACM, CSI and ISCA. He published several research papers which are indexed by SCIE, WoS, Scopus, Springer and others.

Leave a Reply

Your email address will not be published. Required fields are marked *