In this lecture, we will build a neural net and train it to classify images, based on MNIST fashion dataset
Traditional programming works well when we can define clear rules.
For example:
if age > 18:
allow_access()
The programmer explicitly writes the logic.
Suppose we want a computer to recognize:
a shoe
a t-shirt
trousers
How would we write rules?
A shoe can appear:
rotated
darker or brighter
partially hidden
in different shapes
Writing manual rules for every possibility becomes almost impossible.
Neural networks do not rely on explicit rules.
Instead, they:
observe many examples
learn patterns automatically
adjust internal weights during training
The model gradually learns:
edges
shapes
textures
combinations of patterns
This makes image recognition an excellent exercise for neural networks.
Computers do not “see” images like humans.
An image is stored as numbers.
A grayscale image stores one value per pixel.
Each value represents brightness:
0 → black
255 → white
values in between → shades of gray
Example:
A 48×48 image is stored as:
48×48=230448 \times 48 = 230448×48=2304
So the computer stores 2304 numeric values.
Color images use 3 channels:
Red
Green
Blue
Each pixel contains 3 values.
Example:
Pixel = [255, 0, 0]
means pure red.
An RGB image of size 48×48 contains:
48×48×3=6912 values (bytes)
The dataset commonly used in introductory neural network exercises is:
Fashion-MNIST dataset
It contains grayscale images of clothing items.
Examples:
T-shirt
Shoe
Bag
Trouser
Dress
Each image:
is grayscale
size = 28×28 pixels
contains one clothing item
belongs to one class
There are 10 classes total.
Example classes
The table below lists the different classes in fashin minist dataset
The dataset is divided into two parts.
Used to teach the neural network.
The model:
receives images
predicts a class
compares prediction with the correct answer
adjusts weights
Fashion-MNIST contains:
600006000060000
training images.
Used after training to evaluate performance.
The model has never seen these images before.
Fashion-MNIST contains:
100001000010000
test images.
If we tested on training images only, the model could simply memorize them.
The test set measures:
generalization
ability to recognize new images
Original pixel values are:
0→255
In many neural network examples we normalize them:
x normalized=x/255
So values become:
0→1
Neural networks train better when numbers are small and consistent.
Benefits:
faster training
more stable gradients
easier optimization
better convergence
Without normalization, training can become unstable.
model = Sequential([
Flatten(input_shape=(28,28)),
Dense(128, activation='relu'),
Dense(10, activation='softmax')
])
The image below ilustrates the architecture of the neural net created in the example
The image:
28×28
is transformed into:
784
numbers.
The neural network expects a vector, not a matrix.
Dense(128, activation='relu')
This layer contains:
128128128
neurons.
Each neuron learns patterns from the image.
Some neurons may specialize in:
vertical edges
curves
shoe-like shapes
texture patterns
Dense(10, activation='softmax')
There are 10 neurons because Fashion-MNIST has 10 classes.
Example output:
[0.01, 0.02, 0.90, 0.01, ...]
The largest probability becomes the predicted class.
Connections between neurons contain values called:
Artificial Neural Network Weight
Weights determine:
how important an input is
how strongly neurons influence each other
During Training the network:
predicts output
computes error
adjusts weights
improves gradually
This process is called:
Backpropagation
Suppose:
input size = 784
hidden neurons = 128
The weight matrix dimensions are:
784×128
This means:
100,352
trainable weights.
Most examples use:
Rectified Linear Unit
defined as:
f(x)=max(0,x)
Why ReLU?
simple
fast
works well in deep learning
The output layer often uses: Softmax Function which converts outputs into probabilities.
Training usually follows these steps:
Load dataset
Normalize images
Build neural network
Train model
Evaluate on test set
import gzip #used to uncomress the zip files
import numpy as np
import matplotlib.pyplot as plt
from tensorflow import keras
from tensorflow.keras import layers
# function to load images
def load_images(path):
with gzip.open(path, 'rb') as f:
data = np.frombuffer(f.read(), np.uint8, offset=16)
return data.reshape(-1, 28, 28)
def load_labels(path):
with gzip.open(path, 'rb') as f:
data = np.frombuffer(f.read(), np.uint8, offset=8)
return data
x_train = load_images('data/train-images-idx3-ubyte.gz')
y_train = load_labels('data/train-labels-idx1-ubyte.gz')
x_test = load_images('data/t10k-images-idx3-ubyte.gz')
y_test = load_labels('data/t10k-labels-idx1-ubyte.gz')
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)
# see the data
'''
class_names = [
"T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",
"Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"
]
# Plot a 5×5 grid of training images
plt.figure(figsize=(8, 8))
for i in range(25):
plt.subplot(5, 5, i + 1)
plt.xticks([]); plt.yticks([]) # no axis ticks
plt.grid(False)
plt.imshow(x_train[i], cmap='gray') # grayscale
plt.xlabel(class_names[y_train[i]])
plt.suptitle("Fashion-MNIST training samples", fontsize=14)
plt.show()
'''
# -------------------------
# 3. Build the model
# -------------------------
model = keras.Sequential([
layers.Flatten(input_shape=(28, 28)),
layers.Dense(128, activation='relu'),
layers.Dropout(0.2),
layers.Dense(10, activation='softmax')
])
# -------------------------
# 4. Compile the model
# -------------------------
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
# 2. Print initial weights (before training)
print("=== Weights BEFORE training ===")
for layer in model.layers:
weights = layer.get_weights()
print(f"Layer: {layer.name}")
for w in weights:
print(w)
print("-" * 40)
# -------------------------
# 5. Train the model
# -------------------------
history = model.fit(
x_train, y_train,
epochs=10,
batch_size=32,
validation_split=0.1,
verbose=1
)
# -------------------------
# 6. Evaluate
# -------------------------
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=2)
print(f"\n✅ Test accuracy: {test_acc:.3f}")
# 4. Print weights after training
print("=== Weights AFTER training ===")
for layer in model.layers:
weights = layer.get_weights()
print(f"Layer: {layer.name}")
for w in weights:
print(w)
print("-" * 40)
# -------------------------
# 7. Save model
# -------------------------
model.save("fashion_mnist_model.keras")
print('acuracy\n')
print(history.history['accuracy'])
print('val accuracy\n')
print(history.history['val_accuracy'])
# -------------------------
# 8. Plot training progress
# -------------------------
plt.figure(figsize=(10,4))
plt.subplot(1,2,1)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
We can train a model for a number of epochs, save it. Then we can load the saved model and continue to train it for another number of epochs. The listing below illustrates how this can be done.
# --- First training session ---
history1 = model.fit(X_train, y_train, epochs=50, validation_data=(X_val, y_val))
model.save("my_model.keras")
# --- Later training continuation ---
model2 = keras.models.load_model("my_model.keras")
history2 = model2.fit(X_train, y_train,
epochs=100, # total target epochs
initial_epoch=50, # continue from where we left off
validation_data=(X_val, y_val))
Training data contain 4 files:
train-images-idx3-ubyte
train-labels-idx1-ubyte
t10k-images-idx3-ubyte
t10k-labels-idx1-ubyte
import gzip
import numpy as np
import matplotlib.pyplot as plt
from tensorflow import keras
# -----------------------------
# Helper functions to load data
# -----------------------------
def load_images(path):
with gzip.open(path, 'rb') as f:
data = np.frombuffer(f.read(), np.uint8, offset=16)
return data.reshape(-1, 28, 28)
def load_labels(path):
with gzip.open(path, 'rb') as f:
return np.frombuffer(f.read(), np.uint8, offset=8)
# -----------------------------
# 1️⃣ Load the offline test data
# -----------------------------
x_test = load_images('data/t10k-images-idx3-ubyte.gz')
y_test = load_labels('data/t10k-labels-idx1-ubyte.gz')
# -----------------------------
# 2️⃣ Load the trained model
# -----------------------------
model = keras.models.load_model("fashion_mnist_model.keras")
# -----------------------------
# 3️⃣ Define class names
# -----------------------------
class_names = [
"T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",
"Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"
]
# -----------------------------
# 4️⃣ Predict the test images
# -----------------------------
predictions = model.predict(x_test)
# -----------------------------
# 5️⃣ Plot the first 20 test images
# -----------------------------
plt.figure(figsize=(12, 10))
for i in range(20):
plt.subplot(4, 5, i + 1)
plt.imshow(x_test[i], cmap="gray")
pred_label = np.argmax(predictions[i])
true_label = y_test[i]
color = "green" if pred_label == true_label else "red"
plt.title(
f"P: {class_names[pred_label]}\nT: {class_names[true_label]}",
color=color, fontsize=8
)
plt.axis("off")
plt.suptitle("First 20 Test Images — Predicted (P) vs True (T)", fontsize=14)
plt.tight_layout()
plt.show()