Simple Nerural Net

In this lecture we will build a neural net and train it to predict the student performance at university.

Objective

Build a simple, sequential neural net
Load, preprocess and use training data
Train a neural net
Save the trained model for future use
Use the neural net
- from python program
- publishing it at fastapi

Setting up the environment

We will use pycharm editor for this project.

Download PyCharm from https://www.jetbrains.com/pycharm/ and install it.

Create a pycharm project, named StudentPerformance. Leave the default settings and click create.

Creating the project

In a windows computer a folder with the name of the project "StudentPerformance" will be created under c:\Users\<username>\pycharm Projects\StudentPerformance.

Within the project create 3 python files named:

student_performance.py

In this file we will create the model, load the training data, train the model and save it for future usage.

use_student_performance.py

In this file will write a sample python program that loads the saved model and uses it

student_performance_api.py

In this file we will publish the model in a fast api, accessible to be accessed from a network sockect (IP:Port)

Explore the created folder

c:\Users\<username>\pycharm Projects\StudentPerformance

You should see this content.

Folder .venv stores the python virtual environemment, including python and libraries used in the project

Folder .idea is used by pycharm to store project files

Training data

In this example we will use historical data of students enrolled in the university. For each student we recorded City, Highschool Grade, Gender and Perfomance.

We will need to use an input data Gender, HighSchoolGrade and the student performance as ouput data of the neural net.

We will try to predict the student performance at the unversity based on historical data from gender, highschoolgrade and city.

Download training data by clicking Training Data Download

After downloading the csv file, create a folder named data within you pycharm project and put the training data in ths folder.

The content of the project folder should look something like:

Installing requirements

In this project we will use tensorflow, scikit-learn, pandas

Tensorflow (incl. Keras)

Is a deep learning framework. It provides us with the functions to build, train and save neural network models.

It provides the Sequantial and Dense layer classes that we will use in our example.
It performs all matrix math , gradient calculation, and optimization during training
Lets you compile the model (model.compile()) , traing the model (model.fit()) and evaluate it
Save the training model (model.save() / .keras files) so you can load it later

scikit-learn

It is a utility toolkit for classical ML preprocessing.

In our example we will use it to pre-process the training data, to encode Labels into numbers using LabelEncoder(), and split the data into training and test data, using train_test_split().

We will use it to save and load the model using joblib

pandas

Is a library that handles data loading , cleaning and tabular manipulation. We will use it to:

Load csv training data, example read_csv("student_data.csv")
Manipulate columns

numpy

Is the numerical backbone for Python ML. Tensorflow and pandas rely on it.

numpy is installed implicitely by the other packages.

To install the packages in pycharm terminal, write:

> pip install tensorflow

> pip install scikit-learn

> pip install pandas

all packages are installed inside your project’s virtual environment folder, .venv folder

Complete Listing : student_performance.py

below is the complete listing of the python file that creates , trains, tests and saves a sequantial neural net model to predict the performance of students at the university based on gender, city from where they come and the high school grade.

Imm the sections below parts of the code are explained along with theoretical concepts.

Use the listing to copy and paste in pycharm to have a working solution, but read the explanation of the code in the following sections for better understanding it.

import numpy as np

import pandas as pd

import tensorflow as tf

from keras.src.layers import Dense

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import LabelEncoder

from tensorflow import keras

from tensorflow.keras.initializers import Zeros

# 1. Load dataset from CSV (make sure it's in your project folder or data/ subfolder)

df = pd.read_csv("data/student_training_data.csv")

#print(df.head())

# 2. Encode categorical variables

le_gender = LabelEncoder()

le_city = LabelEncoder()

le_perf = LabelEncoder()

#print (le_gender,le_city,le_perf)

df['Gender'] = le_gender.fit_transform(df['Gender'])

df['City'] = le_city.fit_transform(df['City'])

df['Performance'] = le_perf.fit_transform(df['Performance'])

#print(df.head())

X=df[["Gender","HighSchoolGrade","City"]].values

Y=df["Performance"].values

#print(X[0])

#print(Y[0])

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

#print(X_train.shape)

#print(X_test.shape)

#print(Y_train.shape)

#print(Y_test.shape)

#print(X_train[0])

#print(Y_train[0])

model = tf.keras.models.Sequential()

model.add(Dense(8, input_shape=(3,), activation="relu", bias_initializer=Zeros(), name="input_hidden"))

model.add(Dense(6, activation="relu", bias_initializer=Zeros(),name="hidden_layer_2"))

model.add(Dense(4, activation="relu", bias_initializer=Zeros(), name="hidden_layer_3"))

model.add(Dense(3, activation="softmax", bias_initializer=Zeros(), name="output_layer"))

model.compile(

optimizer="adam",

loss="sparse_categorical_crossentropy",

metrics=["accuracy"]

)

print("\n Model Summary: \n")

model.summary()

# Print initial weights (before training)

for layer in model.layers:

weights = layer.get_weights()

print(f"\nLayer {layer.name} initial weights:")

for w in weights:

print(w.shape)

print(w)

history = model.fit(X_train, Y_train, epochs=100, batch_size=16)

# Print weights (after training)

for layer in model.layers:

weights = layer.get_weights()

print(f"\nLayer {layer.name} alter trainng weights:")

for w in weights:

print(w.shape)

print(w)

loss, acc = model.evaluate(X_test, Y_test,verbose=0)

print("\nTest loss:", loss)

print("Test accuracy:", acc)

#example prediction

sample = pd.DataFrame([{

"Gender": "Female",

"HighSchoolGrade": 9,

"City": "Tirana"

}])

sample["Gender"] = le_gender.transform(sample["Gender"])

sample["City"] = le_city.transform(sample["City"])

prediction = model.predict(sample.values)

pred_class = np.argmax(prediction)

print("Predicted Performance:", le_perf.inverse_transform([pred_class])[0])

# Save the trained model

model.save("student_model.keras")

# Save the encoders separately using joblib

import joblib

joblib.dump(le_gender, "le_gender.pkl")

joblib.dump(le_city, "le_city.pkl")

joblib.dump(le_perf, "le_perf.pkl")

print(" Model and encoders saved in project folder.")

Importing required libraries

import pandas as pd

import numpy as np

import tensorflow as tf

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import LabelEncoder

from tensorflow import keras

from tensorflow.keras.initializers import Zeros

Loading training data

Training data should be stored in the data/student_training_data.csv file within the project folder.

df = pd.read_csv("data/student_training_data.csv")

print(df.describe())

read_csv function would load data from csv file to df variable.

We can verify the loaded data using

print(df.head())

which would print the first 5 rows in the df variable, similar to the result below:

Gender HighSchoolGrade City Performance

0 Male 9 Shkodra excellent

1 Female 9 Tirana excellent

2 Male 5 Durres bad

3 Male 7 Durres good

4 Male 6 Durres bad

Or, we can use the describe() function to describe the data. It will describe the data for columnt Highschool data which has numbers, and the default bahavior of describe() function is to describe only numerical columns

HighSchoolGrade

count 500.000000

mean 7.462000

std 1.715354

min 5.000000

25% 6.000000

50% 8.000000

75% 9.000000

max 10.000000

Encoding Training data

In order to be used in the model we need to encode labels with numerical values.

Columns Gender, City and Performance contain text values , for example Genger:{Male,Female}, Performance:{bad,good,excelent}, City:{Tirana, Durres, ...}

For each of the columns distinct values should be identified and for each unique text value a number value should be assigned.

To do that we will use a LabelEncoder() object, part of the scikit-learn library. First we need to create a LabelEncoder object for each column:

le_gender=LabelEncoder()

le_city=LabelEncoder()

le_perf=LabelEncoder()

LabelEncoder() object provides three methods that we can use to perform the encodeing task.

fit()

The encoder learns all unique values and assigns a unique number to each of the values

fit() method returns a pointer to the encoder object. Does not return the data.

example

le_city.fit(df['City'])

print(le_city.classes_)

would return the unique city values, in our training data

['Durres' 'Shkodra' 'Tirana' 'Vlora']

print("Mapping of labels to encoded values:")

for index, label in enumerate(le_city.classes_):

print(f"{label} → {index}")

would print the labels and assigned number for each value

Mapping of labels to encoded values:

Durres → 0

Shkodra → 1

Tirana → 2

Vlora → 3

transform()

LabelEncoder.transform() method encodes the labels to the assigned number.

It does not change the data it gets in input, rather it returns the array of assigned numbers

encoded_cities=le_city.transform(df['City'])

print(encoded_cities)

prints

[1 2 0 0 0 2 1 1 2 1 2 1 3 3 0 3 3 3 2 3 3 0 3 3 1 0 0 0 0 0 0 2 3 0 3 3 2

0 1 2 1 3 3 2 2 1 0 1 2 1 2 1 2 .......]

fit_transform()

LabelEncoder fit_transform(), combines both methods, it identifies unique values, assign a number to each label and then returns the transformed data. It does not modify the data it gets as input, instead it returns the encoded data in an array.

for example

encoded_cities=le_city.fit_transform(df['City'])

print(encoded_cities)

would print

[1 2 0 0 0 2 1 1 2 1 2 1 3 3 0 3 3 3 2 3 3 0 3 3 1 0 0 0 0 0 0 2 3 0 3 3 2

0 1 2 1 3 3 2 2 1 0 1 2 1 2 1 2 .......]

The complete code to encode our training data:

print('Original Data: \n')

print(df.head())

# 2. Encode categorical variables

# 2.a create label encoder objects for each of the text coluns

le_gender = LabelEncoder()

le_city = LabelEncoder()

le_perf = LabelEncoder()

#2.b encode values

df['Gender'] = le_gender.fit_transform(df['Gender'])

df['City'] = le_city.fit_transform(df['City'])

df['Performance'] = le_perf.fit_transform(df['Performance'])

print('Encoded Data: \n')

print(df.head())

print commands show the original data and the encoded data:

Original Data:

Gender HighSchoolGrade City Performance

0 Male 9 Shkodra excellent

1 Female 9 Tirana excellent

2 Male 5 Durres bad

3 Male 7 Durres good

4 Male 6 Durres bad

Encoded Data:

Gender HighSchoolGrade City Performance

0 1 9 1 1

1 0 9 2 1

2 1 5 0 0

3 1 7 0 2

4 1 6 0 0

Train and test data

When you build a model, your goal is not to make it perfect on the data you already have — it’s to make it perform well on new, unseen data.

That’s why we split our dataset into two (or sometimes three) parts:

Training data → used to teach the model.
Test data → used to check how well the model learned general patterns.

What is training data

Training data is the portion of your dataset that the model uses to learn relationships between input and output

In out case:

Inputs: Gender,HighSchoolGrade,City

Outputs: Performance (bad,good,excelent)

During training tensorflow goes through these examples many times, epochs, and gradualy adjusts the model's internal weights to minimize error - a process called backpropagation.

Now the encoded data are stored in a 4 column array df.

In order to feed the data to the model for training we need to divide the data in input values X[Gender, HighSchoolGrade,City] and output values Y[performance].

Given a student record given in input X[Gender, HighSchoolGrade,City] the performance at the university will be predicted in the output value Y[performance].

Additionaly,

the input and output data have to be split in training and test, for example taking 80% for training and 20% for testing of the model.

Testin data are unseen data to the model and the model will try to make the best guess when an unseen record is inputed into the model, in order to predict the output value.

In code we use the function train_test_split from scikit-learn library

In code :

#devide data into input and output

X=df[["Gender","HighSchoolGrade","City"]].values

Y=df["Performance"].values

#split in train and test subsets

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

Creating a neural net model

For this example we will create a sequential model, meaning that output data from the first layer are passed as input to the second layer of neurons, to the next until the output layer.

Each layer will be dense, meaning that each neuron of a layer is connected to each neuron of the consecutive layer.

In our example we will create a sequential model and add 4 layers into it.

Input layer with 8 neurons
First hiden layer with 6 neurons
Second hidden layer with 4 neurons
Output layer with 3 neurons, one for each output class, Performance: {bad,good,excelent}

Code that creates the above model

model = tf.keras.models.Sequential()

model.add(Dense(8, input_shape=(3,), activation="relu", bias_initializer=Zeros(), name="input_hidden"))

model.add(Dense(6, activation="relu", bias_initializer=Zeros(),name="hidden_layer_2"))

model.add(Dense(4, activation="relu", bias_initializer=Zeros(), name="hidden_layer_3"))

model.add(Dense(3, activation="softmax", bias_initializer=Zeros(), name="output_layer"))

here

model = tf.keras.models.Sequential()

creates an empty sequential model.

To the model layers are added sequentially using model.add(layer)

In this example we use Dense layers, which mean that it is connected to all nodes on the other layer.

It means each neuron in this layer receives input from every neuron in the previous layer

Mathematicaly a dense layer performs:

y=f(Wx+b)

where

W = weight matrix (learned multipliers)
x = input vector (your features)
b = bias vector
f = activation function (e.g., ReLU)

Dense() method

The Dense method creates a layer of neurons,

It takes as input parameters:

number of neurons (units), example the input layer has 8 neurons
input_shape( x,) , example input_shape(3,) means the the neuron expects inputs of 3 variables (x1,x2,x3), in our example (gender, High School Grade, City). We fix only one dimension while it can receive a variable number of records of 3 variables.
activation function, ex. activation="relu" or activation="softmax". These are the activation functions that control the value returned by the neurons of the layer.
- relu : Rectified Linear Unifier, is the function , f(x)=max(0,x) , so it returns either the computed value is it is positive or 0 if it is negative. ReLU (Rectified Linear Unit) helps networks learn non-linear patterns while avoiding vanishing gradients.
- softmax : is one of the most important functions in machine learning, especially for classification problems like your student performance prediction (bad / good / excellent).
  We use it in the output layer, which has 3 neurons, one for each class: bad, good, excellent.
kernel_initializer

kernel_initializer defines how the weight vector of the layer will be initialized.

The weights can be initialized to:

Zeros(),
glorot_uniform (default),
random_normal,
uniform,
he_normal

bias_initializer

It defines the initial values for the bias vector b. Default is usualy Zeros().

name , a name given to the layer, optional.

Softmax activation function

Let's explain further the softmax activation function

Softmax takes a vector of numbers and converts them into probability, such that they:

- - - are positive and
    - all sum at 1

For a vector z=[z1,z2...zn]

Softmax guaranties the sum of softmax(zi), sums at 1.

For example in our case the softmax function let suppose it will have three input numbers

[2.0, 1.0, 0.1]

It will compute

softmax(2.0)=0.66

softmax(1.0)=0.24

softmax(0.1)=0.10

if we sum the values of the softmax function would be 1.

So these numbers reppresent the propoability that the model predicts the output class:

66%: bad

24%: good

10%: excelent

This means:

66% chance → class 0 (e.g., "excellent")
24% chance → class 1 ("good")
10% chance → class 2 ("bad")

The softmax activator using the above formula, takes in input the output number, called logits of each neuron in the output layer and returns a vector of the same dimension, where logits are replaced with probabilities.

Compiling the model

model.compile(

optimizer="adam",

loss="sparse_categorical_crossentropy",

metrics=["accuracy"]

)

This command defines how our model learns, how errors are measured, and how success is reported.

What does compile() do

compile() configures the model for training.

It tells Tensorflow three things:

How to update weights ( the optimizer)
How to measure errors (the loss function)
How to report performance (metrics)

Let's briefly explain each of the items used in our model:

Optimizer

optimizer="adam"

The optimizer controls how the model weights are updated after each training step.

Adam is short for Adaptive Moment Estimation, is one of the most popular optimizers.

It combines the strength of two classical methods:

Momentum - remembers previous gradient
RMSProp - scales learning rate based on recent gradient magnitude

It automaticaly adjusts learning rates for each parameter, providing faster more stable convergence.

Loss function

the loss function tells the model how to measure how wrong it is after each prediction. The optimizer tries to minimize this loss value. Since we are doing multi-class classification ( 3 output neurons, one correct class) , we use cross_entropy, which measures the distance between the true label and the predicted probability distribution.

We can use categorical_crossentropy or sparse_categorical_crossentropy.

The main difference is the exptected label format:

categorical_crossentropy expect a one-hot encoded vector for each label, such as [0,1,0] or [1,0,0]
sparse_categorical_crossentropy expects an integer encoded labels, such as 0 or 1 or 2

Before with label encoder we encoded each label as an integer value herewith we will use the sparse_categorical_crossentropy

metrics

Metrics are for monitoring the model, they do not influence training.

During training keras will display both:

loss - tha value being optimized
accuracy - % of predictions where the highest probability class equals the true class

Training the model

Now we have created the model. we compiled it, we hve prepared the training data, so we are ready to train the model.

We train the model using the model.fit() method:

history = model.fit(

X_train, # your input features

Y_train, # putput value

epochs=50, # how many times to go through the full dataset

batch_size=16, # how many samples per training step

validation_split=0.2, # optional: keep 20% for validation

verbose=1 # show progress bar

)

Let us explain the parameters:

X_train: contains the input data, in our example is the array containing [ Gender, Grade, City ]

Y_train: contains the student performance for each of the input records. This value is known in the training data. The loss function is calculated as a difference between this true value and the predicted value. The optimizer tries to optimize this value.

batch_size: is the number of the records that the model processes before updating the weights. In our case it predicts 16 cases (batch size) before it adjusts the weights. Usually, loss is calculated as a mean of losses of each individual case.

epochs: an epoch is a full pass over all the dataset. epochs are the number of times to go over all the dataset

verbose: show the progress bar. 1= show, 0= do not show

validation_split: what amount of data is used for validation. In our case 20%, validation_split=0.2

model.fit() returns the history of training metrics per each epoch

Validation

What is validation data?

It's a holdout set from your training data, which your model doesn't use to adjust weights, but it's still used to measure the performance after each epoch.

So during the training:

the model learns (adjusts weights) using 80% of the data

after each epoch it tests itself on validation data (20%)

it reports both at the end of each epoch:

loss:...., accuracy:..., val_loss:...., val_accuracy:....

The validation set helps detect:

Overfitting, when training accuracy but validation accuracy worsens
Underfitting, when both are lows

Why we need validation

Because training accuracy alone can be misleading.

A model can learn to perfectly memorize the training data, but fail on new unseen data.

Validation data simulates unseen data- it tells you if the model generalizes well.

Understand the training results

Understanding the training results in important.

The accuracy and the loss value at the end of the training, represent the accuracy of the model to predict cases from the training data.

While the val_accuracy and val_loss, represent the accuracy of the model on unseen data, the validation data set, or better the ability of the model to generalize.

Epoch 50/50

20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.9062 - loss: 0.3974 - val_accuracy: 0.8750 - val_loss: 0.4139

you can visualize the result of the training usung matplotlib.

Before in the python terminal within pycharm you need to install matplotlib.

>pip install matplotlib

then import it in the beging of the python file

import matplotlib as plt

than after the fit command, that stores the training result in history object.

history = model.fit(

X_train, # your input features

Y_train, # putput value

epochs=50, # how many times to go through the full dataset

batch_size=16, # how many samples per training step

validation_split=0.2, # optional: keep 20% for validation

verbose=1 # show progress bar

)

use the matplot lib to visualize it

plt.plot(history.history['accuracy'], label='train accuracy')

plt.plot(history.history['val_accuracy'], label='validation accuracy')

plt.xlabel('Epoch')

plt.ylabel('Accuracy')

plt.legend()

plt.show()

Visualizing the training results

In our example we see that the accuracy started a 40% due to random weights after a first decrease at the 10th epoch, started to increase and ended the 50 training epoch with more than 90% accuracy on training data.

While the validation data accuracy is slightly below 90%

Testing the model

Now that we have a trained model, we will need to test it with the test data that we putted appart from the begining, in X_test and Y_test.

To test the model we use model.evaluate() method.

loss, acc = model.evaluate(X_test, Y_test,verbose=0)

print("\nTest loss:", loss)

print("Test accuracy:", acc)

model.evaluate() measures the model’s performance on a dataset —

it computes the loss and all the metrics defined in model.compile().

when you call evaluated, keras performs a forward pass through our network.

It runs prediction on the entire training dataset, but does not perform backpropagation, so no training occurs.

The internal flow within the evaluate function:

Split the test data into batches (default batch size=32)
For each batch
- Compute the model's output
- Calculate loss for that batch using loss function we defined, in our example scarse_categorical_crossentropy
- Calculate the metrics, ex. accuracy
Average result across batches,
- one final scalar for loss and accuracy
Return them as dictionary

the result of the testing on our trained model

4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.8300 - loss: 0.7141

Saving the model

Saving the model for future usage, it requires to save the model itself and the encodings.

There are a number of formats to save a trained model, but we would recommend the .keras format.

The .keras format, it stores:

model architecture
weights
optimizer configuration and state
training history and metadata

When saving the model only the model itself is stored, NO label encodings are stored automatically

In our example we need to store the label encodings as well for gender, city and performance.

To save the label encoding we will use joblib

joblib is a small python library used for serializing (saving) and deserializing (loading) python objects efficiently, such as:

scikit-learn encoders (LabelEncoder, OneHotEncoder etc)
preprocessors (StandardScaler)
trained models
any other python object ( dictionaries, numpy arrays)

we first need to install it.

from python terminal in pycharm

> pip install joblib

then import it

import joblib

the complete code for saving the model and the Label Encoding

model.save("student_model.keras")

joblib.dump(le_city, "le_city.pkl")

joblib.dump(le_gender, "le_gender.pkl")

joblib.dump(le_perf, "le_perf.pkl")

the 4 files will be created in the current project folder.

.idea

.venv

data

le_city.pkl

le_gender.pkl

le_perf.pkl

student_model.keras

student_predictor.py

Page updated

Report abuse