ML/DL Notes
  • Home
  • Advacned Notebooks
    • Optional Lab: Model Evaluation and Selection
    • Optional Lab: Diagnosing Bias and Variance
    • Practice Lab: Neural Networks for Handwritten Digit Recognition, Binary
    • Optional Lab - Neurons and Layers
    • Optional Lab - Simple Neural Network
    • Optional Lab - Simple Neural Network
    • Practice Lab: Neural Networks for Handwritten Digit Recognition, Multiclass
    • Optional Lab: Back propagation using a computation graph
    • Optional Lab - Derivatives
    • Optional Lab - Multi-class Classification
    • Optional Lab - ReLU activation
    • Optional Lab - Softmax Function
    • Practice Lab: Advice for Applying Machine Learning
    • Practice Lab: Decision Trees
    • Ungraded Lab: Decision Trees
    • Ungraded Lab - Trees Ensemble
  • Advanced Learning Algorithms
    • Advanced
  • Reinforcement learning notebooks
    • Deep Q-Learning - Lunar Lander
    • State Action Value Function Example
    • Utils
  • Supervised Legacy
    • Classic_Supervised
  • Supervised Notebooks
    • Optional Lab: Cost Function
    • Optional Lab: Gradient Descent for Linear Regression
    • Optional Lab: Python, NumPy and Vectorization
    • Optional Lab: Multiple Variable Linear Regression
    • Optional Lab: Feature scaling and Learning Rate (Multi-variable)
    • Optional Lab: Feature Engineering and Polynomial Regression
    • Optional Lab: Linear Regression using Scikit-Learn
    • Practice Lab: Linear Regression
    • Optional Lab: Classification
    • Optional Lab: Logistic Regression
    • Optional Lab: Logistic Regression, Decision Boundary
    • Optional Lab: Logistic Regression, Logistic Loss
    • Optional Lab: Cost Function for Logistic Regression
    • Optional Lab: Gradient Descent for Logistic Regression
    • Ungraded Lab: Logistic Regression using Scikit-Learn
    • Ungraded Lab: Overfitting
    • Optional Lab - Regularized Cost and Gradient
    • Logistic Regression
  • Udacity
    • Changing K Solution
    • DBSCAN Lab
    • Feature Scaling Solution
    • 2. KMeans vs GMM on The Iris Dataset
    • Selecting the optimal number of clusters using Silhouette Scoring on KMeans and GMM clustering - SOLUTION
    • Implementing the Gradient Descent Algorithm
    • Hierarchical Clustering Lab
    • Independent Component Analysis Lab
    • Interpret PCA Results Solution
    • Mini Batch Gradient Descent
    • Multiple Linear Regression
    • PCA 1 Solution
    • PCA Mini Project Solution
    • Random Projection Solution
    • Predicting Student Admissions with Neural Networks
    • Perceptron algorithm
  • Unsupervised,Recommenders,Reinforcement Learning
    • Unsupervised
  • Unsupervised notebooks
    • K-means Clustering
    • Practice lab: Collaborative Filtering Recommender Systems
    • PCA - An example on Exploratory Data Analysis
    • Practice lab: Deep Learning for Content-Based Filtering
  • Previous
  • Next
  • Predicting Student Admissions with Neural Networks
    • Loading the data
    • Plotting the data
    • TODO: One-hot encoding the rank
    • TODO: Scaling the data
    • Splitting the data into Training and Testing
    • Splitting the data into features and targets (labels)
    • Training the 2-layer Neural Network
  • TODO: Backpropagate the error
    • Calculating the Accuracy on the Test Data

Predicting Student Admissions with Neural Networks¶

In this notebook, we predict student admissions to graduate school at UCLA based on three pieces of data:

  • GRE Scores (Test)
  • GPA Scores (Grades)
  • Class rank (1-4)

The dataset originally came from here: http://www.ats.ucla.edu/

Loading the data¶

To load the data and format it nicely, we will use two very useful packages called Pandas and Numpy. You can read on the documentation here:

  • https://pandas.pydata.org/pandas-docs/stable/
  • https://docs.scipy.org/
In [1]:
Copied!
# Importing pandas and numpy
import pandas as pd
import numpy as np

# Reading the csv file into a pandas DataFrame
data = pd.read_csv('student_data.csv')

# Printing out the first 10 rows of our data
data[:10]
# Importing pandas and numpy import pandas as pd import numpy as np # Reading the csv file into a pandas DataFrame data = pd.read_csv('student_data.csv') # Printing out the first 10 rows of our data data[:10]
Out[1]:
admit gre gpa rank
0 0 380 3.61 3
1 1 660 3.67 3
2 1 800 4.00 1
3 1 640 3.19 4
4 0 520 2.93 4
5 1 760 3.00 2
6 1 560 2.98 1
7 0 400 3.08 2
8 1 540 3.39 3
9 0 700 3.92 2

Plotting the data¶

First let's make a plot of our data to see how it looks. In order to have a 2D plot, let's ingore the rank.

In [2]:
Copied!
# Importing matplotlib
import matplotlib.pyplot as plt

# Function to help us plot
def plot_points(data):
    X = np.array(data[["gre","gpa"]])
    y = np.array(data["admit"])
    admitted = X[np.argwhere(y==1)]
    rejected = X[np.argwhere(y==0)]
    plt.scatter([s[0][0] for s in rejected], [s[0][1] for s in rejected], s = 25, color = 'red', edgecolor = 'k')
    plt.scatter([s[0][0] for s in admitted], [s[0][1] for s in admitted], s = 25, color = 'cyan', edgecolor = 'k')
    plt.xlabel('Test (GRE)')
    plt.ylabel('Grades (GPA)')
    
# Plotting the points
plot_points(data)
plt.show()
# Importing matplotlib import matplotlib.pyplot as plt # Function to help us plot def plot_points(data): X = np.array(data[["gre","gpa"]]) y = np.array(data["admit"]) admitted = X[np.argwhere(y==1)] rejected = X[np.argwhere(y==0)] plt.scatter([s[0][0] for s in rejected], [s[0][1] for s in rejected], s = 25, color = 'red', edgecolor = 'k') plt.scatter([s[0][0] for s in admitted], [s[0][1] for s in admitted], s = 25, color = 'cyan', edgecolor = 'k') plt.xlabel('Test (GRE)') plt.ylabel('Grades (GPA)') # Plotting the points plot_points(data) plt.show()
<matplotlib.figure.Figure at 0x7f65af9a0d30>

Roughly, it looks like the students with high scores in the grades and test passed, while the ones with low scores didn't, but the data is not as nicely separable as we hoped it would. Maybe it would help to take the rank into account? Let's make 4 plots, each one for each rank.

In [3]:
Copied!
# Separating the ranks
data_rank1 = data[data["rank"]==1]
data_rank2 = data[data["rank"]==2]
data_rank3 = data[data["rank"]==3]
data_rank4 = data[data["rank"]==4]

# Plotting the graphs
plot_points(data_rank1)
plt.title("Rank 1")
plt.show()
plot_points(data_rank2)
plt.title("Rank 2")
plt.show()
plot_points(data_rank3)
plt.title("Rank 3")
plt.show()
plot_points(data_rank4)
plt.title("Rank 4")
plt.show()
# Separating the ranks data_rank1 = data[data["rank"]==1] data_rank2 = data[data["rank"]==2] data_rank3 = data[data["rank"]==3] data_rank4 = data[data["rank"]==4] # Plotting the graphs plot_points(data_rank1) plt.title("Rank 1") plt.show() plot_points(data_rank2) plt.title("Rank 2") plt.show() plot_points(data_rank3) plt.title("Rank 3") plt.show() plot_points(data_rank4) plt.title("Rank 4") plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

This looks more promising, as it seems that the lower the rank, the higher the acceptance rate. Let's use the rank as one of our inputs. In order to do this, we should one-hot encode it.

TODO: One-hot encoding the rank¶

Use the get_dummies function in Pandas in order to one-hot encode the data.

In [4]:
Copied!
# TODO:  Make dummy variables for rank
one_hot_data = pd.concat([data, pd.get_dummies(data['rank'], prefix='rank')], axis=1)

# TODO: Drop the previous rank column
one_hot_data = one_hot_data.drop('rank', axis=1)

# Print the first 10 rows of our data
one_hot_data[:10]
# TODO: Make dummy variables for rank one_hot_data = pd.concat([data, pd.get_dummies(data['rank'], prefix='rank')], axis=1) # TODO: Drop the previous rank column one_hot_data = one_hot_data.drop('rank', axis=1) # Print the first 10 rows of our data one_hot_data[:10]
Out[4]:
admit gre gpa rank_1 rank_2 rank_3 rank_4
0 0 380 3.61 0 0 1 0
1 1 660 3.67 0 0 1 0
2 1 800 4.00 1 0 0 0
3 1 640 3.19 0 0 0 1
4 0 520 2.93 0 0 0 1
5 1 760 3.00 0 1 0 0
6 1 560 2.98 1 0 0 0
7 0 400 3.08 0 1 0 0
8 1 540 3.39 0 0 1 0
9 0 700 3.92 0 1 0 0

TODO: Scaling the data¶

The next step is to scale the data. We notice that the range for grades is 1.0-4.0, whereas the range for test scores is roughly 200-800, which is much larger. This means our data is skewed, and that makes it hard for a neural network to handle. Let's fit our two features into a range of 0-1, by dividing the grades by 4.0, and the test score by 800.

In [5]:
Copied!
# Making a copy of our data
processed_data = one_hot_data[:]

# TODO: Scale the columns

# Printing the first 10 rows of our procesed data
processed_data['gre'] = processed_data['gre']/800
processed_data['gpa'] = processed_data['gpa']/4.0
processed_data[:10]
# Making a copy of our data processed_data = one_hot_data[:] # TODO: Scale the columns # Printing the first 10 rows of our procesed data processed_data['gre'] = processed_data['gre']/800 processed_data['gpa'] = processed_data['gpa']/4.0 processed_data[:10]
Out[5]:
admit gre gpa rank_1 rank_2 rank_3 rank_4
0 0 0.475 0.9025 0 0 1 0
1 1 0.825 0.9175 0 0 1 0
2 1 1.000 1.0000 1 0 0 0
3 1 0.800 0.7975 0 0 0 1
4 0 0.650 0.7325 0 0 0 1
5 1 0.950 0.7500 0 1 0 0
6 1 0.700 0.7450 1 0 0 0
7 0 0.500 0.7700 0 1 0 0
8 1 0.675 0.8475 0 0 1 0
9 0 0.875 0.9800 0 1 0 0

Splitting the data into Training and Testing¶

In order to test our algorithm, we'll split the data into a Training and a Testing set. The size of the testing set will be 10% of the total data.

In [6]:
Copied!
sample = np.random.choice(processed_data.index, size=int(len(processed_data)*0.9), replace=False)
train_data, test_data = processed_data.iloc[sample], processed_data.drop(sample)

print("Number of training samples is", len(train_data))
print("Number of testing samples is", len(test_data))
print(train_data[:10])
print(test_data[:10])
sample = np.random.choice(processed_data.index, size=int(len(processed_data)*0.9), replace=False) train_data, test_data = processed_data.iloc[sample], processed_data.drop(sample) print("Number of training samples is", len(train_data)) print("Number of testing samples is", len(test_data)) print(train_data[:10]) print(test_data[:10])
Number of training samples is 360
Number of testing samples is 40
     admit    gre     gpa  rank_1  rank_2  rank_3  rank_4
106      1  0.875  0.8900       1       0       0       0
270      1  0.800  0.9875       0       1       0       0
8        1  0.675  0.8475       0       0       1       0
288      0  1.000  0.7875       0       0       0       1
52       0  0.925  0.8425       0       0       0       1
140      0  0.800  0.9825       0       1       0       0
367      0  0.775  0.9075       0       0       1       0
296      0  0.700  0.7900       1       0       0       0
228      0  0.600  0.8575       0       1       0       0
396      0  0.700  0.7600       0       0       1       0
    admit    gre     gpa  rank_1  rank_2  rank_3  rank_4
9       0  0.875  0.9800       0       1       0       0
23      0  0.850  0.7975       0       0       0       1
55      1  0.925  1.0000       0       0       1       0
63      1  0.850  0.9625       0       0       1       0
67      0  0.775  0.8250       1       0       0       0
79      1  0.775  1.0000       1       0       0       0
84      1  0.625  0.9000       0       0       1       0
89      1  0.825  1.0000       0       1       0       0
93      0  0.725  0.7325       0       1       0       0
97      0  0.600  0.8925       0       1       0       0

Splitting the data into features and targets (labels)¶

Now, as a final step before the training, we'll split the data into features (X) and targets (y).

In [7]:
Copied!
features = train_data.drop('admit', axis=1)
targets = train_data['admit']
features_test = test_data.drop('admit', axis=1)
targets_test = test_data['admit']

print(features[:10])
print(targets[:10])
features = train_data.drop('admit', axis=1) targets = train_data['admit'] features_test = test_data.drop('admit', axis=1) targets_test = test_data['admit'] print(features[:10]) print(targets[:10])
       gre     gpa  rank_1  rank_2  rank_3  rank_4
106  0.875  0.8900       1       0       0       0
270  0.800  0.9875       0       1       0       0
8    0.675  0.8475       0       0       1       0
288  1.000  0.7875       0       0       0       1
52   0.925  0.8425       0       0       0       1
140  0.800  0.9825       0       1       0       0
367  0.775  0.9075       0       0       1       0
296  0.700  0.7900       1       0       0       0
228  0.600  0.8575       0       1       0       0
396  0.700  0.7600       0       0       1       0
106    1
270    1
8      1
288    0
52     0
140    0
367    0
296    0
228    0
396    0
Name: admit, dtype: int64

Training the 2-layer Neural Network¶

The following function trains the 2-layer neural network. First, we'll write some helper functions.

In [8]:
Copied!
# Activation (sigmoid) function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))
def sigmoid_prime(x):
    return sigmoid(x) * (1-sigmoid(x))
def error_formula(y, output):
    return - y*np.log(output) - (1 - y) * np.log(1-output)
# Activation (sigmoid) function def sigmoid(x): return 1 / (1 + np.exp(-x)) def sigmoid_prime(x): return sigmoid(x) * (1-sigmoid(x)) def error_formula(y, output): return - y*np.log(output) - (1 - y) * np.log(1-output)

TODO: Backpropagate the error¶

Now it's your turn to shine. Write the error term. Remember that this is given by the equation $$ (y-\hat{y}) \sigma'(x) $$

In [9]:
Copied!
# TODO: Write the error term formula
def error_term_formula(x, y, output):
    return (y - output)*sigmoid_prime(x)
# TODO: Write the error term formula def error_term_formula(x, y, output): return (y - output)*sigmoid_prime(x)
In [10]:
Copied!
# Neural Network hyperparameters
epochs = 1000
learnrate = 0.5

# Training function
def train_nn(features, targets, epochs, learnrate):
    
    # Use to same seed to make debugging easier
    np.random.seed(42)

    n_records, n_features = features.shape
    last_loss = None

    # Initialize weights
    weights = np.random.normal(scale=1 / n_features**.5, size=n_features)

    for e in range(epochs):
        del_w = np.zeros(weights.shape)
        for x, y in zip(features.values, targets):
            # Loop through all records, x is the input, y is the target

            # Activation of the output unit
            #   Notice we multiply the inputs and the weights here 
            #   rather than storing h as a separate variable 
            output = sigmoid(np.dot(x, weights))

            # The error, the target minus the network output
            error = error_formula(y, output)

            # The error term
            error_term = error_term_formula(x, y, output)

            # The gradient descent step, the error times the gradient times the inputs
            del_w += error_term * x

        # Update the weights here. The learning rate times the 
        # change in weights, divided by the number of records to average
        weights += learnrate * del_w / n_records

        # Printing out the mean square error on the training set
        if e % (epochs / 10) == 0:
            out = sigmoid(np.dot(features, weights))
            loss = np.mean((out - targets) ** 2)
            print("Epoch:", e)
            if last_loss and last_loss < loss:
                print("Train loss: ", loss, "  WARNING - Loss Increasing")
            else:
                print("Train loss: ", loss)
            last_loss = loss
            print("=========")
    print("Finished training!")
    return weights
    
weights = train_nn(features, targets, epochs, learnrate)
# Neural Network hyperparameters epochs = 1000 learnrate = 0.5 # Training function def train_nn(features, targets, epochs, learnrate): # Use to same seed to make debugging easier np.random.seed(42) n_records, n_features = features.shape last_loss = None # Initialize weights weights = np.random.normal(scale=1 / n_features**.5, size=n_features) for e in range(epochs): del_w = np.zeros(weights.shape) for x, y in zip(features.values, targets): # Loop through all records, x is the input, y is the target # Activation of the output unit # Notice we multiply the inputs and the weights here # rather than storing h as a separate variable output = sigmoid(np.dot(x, weights)) # The error, the target minus the network output error = error_formula(y, output) # The error term error_term = error_term_formula(x, y, output) # The gradient descent step, the error times the gradient times the inputs del_w += error_term * x # Update the weights here. The learning rate times the # change in weights, divided by the number of records to average weights += learnrate * del_w / n_records # Printing out the mean square error on the training set if e % (epochs / 10) == 0: out = sigmoid(np.dot(features, weights)) loss = np.mean((out - targets) ** 2) print("Epoch:", e) if last_loss and last_loss < loss: print("Train loss: ", loss, " WARNING - Loss Increasing") else: print("Train loss: ", loss) last_loss = loss print("=========") print("Finished training!") return weights weights = train_nn(features, targets, epochs, learnrate)
Epoch: 0
Train loss:  0.273178303031
=========
Epoch: 100
Train loss:  0.206943485712
=========
Epoch: 200
Train loss:  0.205207717134
=========
Epoch: 300
Train loss:  0.204257055158
=========
Epoch: 400
Train loss:  0.20370743212
=========
Epoch: 500
Train loss:  0.203358613263
=========
Epoch: 600
Train loss:  0.203111016968
=========
Epoch: 700
Train loss:  0.20291564917
=========
Epoch: 800
Train loss:  0.202748296552
=========
Epoch: 900
Train loss:  0.202596879151
=========
Finished training!

Calculating the Accuracy on the Test Data¶

In [11]:
Copied!
# Calculate accuracy on test data
test_out = sigmoid(np.dot(features_test, weights))
predictions = test_out > 0.5
accuracy = np.mean(predictions == targets_test)
print("Prediction accuracy: {:.3f}".format(accuracy))
# Calculate accuracy on test data test_out = sigmoid(np.dot(features_test, weights)) predictions = test_out > 0.5 accuracy = np.mean(predictions == targets_test) print("Prediction accuracy: {:.3f}".format(accuracy))
Prediction accuracy: 0.625
In [ ]:
Copied!


Documentation built with MkDocs.

Keyboard Shortcuts

Keys Action
? Open this help
n Next page
p Previous page
s Search