Advanced Optimizers using Fashion-MNIST¶
Author: Tianxiang (Adam) Gao
Course: CSC 383/483: Applied Deep Learning
Description: In this assignment, you will investigate how different optimization algorithms affect the training dynamics and final performance of a neural network on the Fashion-MNIST dataset using the Keras deep learning library. The Fashion-MNIST dataset contains 70,000 grayscale images of clothing items (such as shirts, shoes, and bags), each of size 28×28 pixels across 10 categories.
You will train a simple multilayer perceptron (MLP) model using several optimizers—SGD, SGD with Momentum, RMSprop, and Adam—while keeping the network architecture, data, and initialization fixed. By comparing their learning curves and test accuracies, you will gain intuition about how optimizer choice influences convergence speed, stability, and generalization.
Setup¶
We will first import some useful libraries:
numpyfor numerical operations (e.g., arrays, random sampling).kerasfor loading the MNIST dataset and building deep learning models.keras.layersprovides the building blocks (dense layers, convolutional layers, activation functions, etc.) to design neural networks.matplotlibfor visualizing images and plotting graphs.
import numpy as np
import keras
from keras import layers
import matplotlib.pyplot as plt
Prepare the Data [5/5]¶
- Use
keras.datasets.fashion_mnist.load_data()to load the Fashion-MNIST training and testing sets. - Normalize all pixel values from integers in the range [0, 255] to floating-point numbers between 0 and 1.
Note:
In contrast to Assignment 1, we do not use np.expand_dims() to add a channel dimension because MLPs operate on flattened feature vectors rather than image tensors. Similarly, we do not convert labels to one-hot encoding, since we will use the loss function "sparse_categorical_crossentropy", which expects integer labels directly.
(x_train, y_train), (x_test, y_test) =
x_train =
x_test =
print("x_train shape:", x_train.shape)
num_classes, input_shape = 10, x_train.shape[1:]
print("num_classes:", num_classes)
print("input_shape:", input_shape)
x_train shape: (60000, 28, 28) num_classes: 10 input_shape: (28, 28)
Visualize the Data [5/5]¶
- Randomly pick 9 images from the training set
x_train. Display them in a 3×3 grid using Matplotlib (plt.subplot). For each image, show its corresponding digit label (fromy_train) as the subplot title.
indices = np.random.choice(len(x_train), 9, replace=False)
plt.figure(figsize=(6, 6))
for i, idx in enumerate(indices):
plt.subplot(3, 3, i + 1)
# squeeze last channel back to 2D for grayscale display
plt.imshow()
# show the integer label, not one-hot
plt.title()
plt.axis("off")
plt.tight_layout()
plt.show()
Build the Model [30/30]¶
Implement a helper function
make_model(num_classes, input_shape)that returns a simple two-layer MLP built bykeras.Sequentialwith the following layers:- Input layer: accepts images of shape
input_shape. - Flatten layer: converts each 2D image into a 1D vector.
- Dense layer: fully connected layer with 128 hidden units and a
"sigmoid"activation function. - Output layer: fully connected layer with
num_classesunits (one for each digit 0–9) and"softmax"activation.
- Input layer: accepts images of shape
Create a
base_modelusing your helper function and inspect the model by callingmodel.summary()to display the network architecture, output shapes, and number of parameters in each layer.Save the
initial_weightsof thebase_modelfor reuse in optimizer comparisons.
def make_model(num_classes, input_shape):
model = keras.Sequential([
])
return model
base_model =
base_model.summary()
initial_weights =
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ flatten (Flatten) │ (None, 784) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense (Dense) │ (None, 128) │ 100,480 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_1 (Dense) │ (None, 10) │ 1,290 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 101,770 (397.54 KB)
Trainable params: 101,770 (397.54 KB)
Non-trainable params: 0 (0.00 B)
Define Optimizer Function [10/10]¶
- To easily switch between different optimization algorithms, we will define a helper function
get_optimizer(name, lr=1e-3)that returns the corresponding Keras optimizer object based on its name. Implement the function with the following behavior:"sgd": Standard stochastic gradient descent (SGD)."momentum": SGD with momentum (momentum = 0.9)."rmsprop": RMSprop optimizer (adaptive learning rate)."adam": Adam optimizer (adaptive learning rate + momentum).- Raise a
ValueErrorif the name is unknown.
def get_optimizer(name, lr=1e-3):
if name == "sgd":
return keras.optimizers.SGD(learning_rate=lr)
raise ValueError(f"Unknown optimizer: {name}")
Train and Compare Optimizers [30/30]¶
We will write a helper function
train(name, batch_size=128, epochs=20)to train the model with a specified optimizer, then loop over several optimizers to collect their training histories for plotting:- Recreate a fresh model via
make_model(...). - Reset to the same
initial_weightsto ensure a fair start. - Build the optimizer with
get_optimizer(name). compilewithloss="sparse_categorical_crossentropy"andmetrics=["accuracy"].fiton(x_train, y_train)withvalidation_data=(x_test, y_test).- Return the
Historyobject.
- Recreate a fresh model via
Compare optimizers:
- Create a list
optimizers = ["sgd", "momentum", "rmsprop", "adam"]. - For each name, call
train(name, ...)and store the returned history in a dictionaryhistories[name].
- Create a list
def train(name, batch_size=128, epochs=20):
model =
model.set_weights()
opt =
model.compile()
print(f"\n===Training with {name}===")
hist = model.fit()
return hist
optimizers = ["sgd", "momentum", "rmsprop", "adam"]
histories = {}
for opt in optimizers:
histories[opt] = train(opt)
===Training with sgd=== Epoch 1/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 6s 12ms/step - accuracy: 0.1539 - loss: 2.3461 - val_accuracy: 0.2847 - val_loss: 2.1709 Epoch 2/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.3314 - loss: 2.1411 - val_accuracy: 0.4456 - val_loss: 2.0608 Epoch 3/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.4632 - loss: 2.0359 - val_accuracy: 0.5240 - val_loss: 1.9636 Epoch 4/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.5373 - loss: 1.9416 - val_accuracy: 0.5794 - val_loss: 1.8748 Epoch 5/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.5881 - loss: 1.8514 - val_accuracy: 0.6145 - val_loss: 1.7934 Epoch 6/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.6172 - loss: 1.7729 - val_accuracy: 0.6328 - val_loss: 1.7188 Epoch 7/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.6344 - loss: 1.6998 - val_accuracy: 0.6466 - val_loss: 1.6505 Epoch 8/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.6484 - loss: 1.6329 - val_accuracy: 0.6533 - val_loss: 1.5879 Epoch 9/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.6604 - loss: 1.5687 - val_accuracy: 0.6645 - val_loss: 1.5305 Epoch 10/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.6699 - loss: 1.5167 - val_accuracy: 0.6729 - val_loss: 1.4779 Epoch 11/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.6774 - loss: 1.4613 - val_accuracy: 0.6828 - val_loss: 1.4296 Epoch 12/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.6846 - loss: 1.4133 - val_accuracy: 0.6881 - val_loss: 1.3852 Epoch 13/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.6940 - loss: 1.3689 - val_accuracy: 0.6951 - val_loss: 1.3444 Epoch 14/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7001 - loss: 1.3287 - val_accuracy: 0.6993 - val_loss: 1.3068 Epoch 15/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7026 - loss: 1.2929 - val_accuracy: 0.7024 - val_loss: 1.2721 Epoch 16/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7040 - loss: 1.2593 - val_accuracy: 0.7065 - val_loss: 1.2400 Epoch 17/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.7093 - loss: 1.2253 - val_accuracy: 0.7082 - val_loss: 1.2103 Epoch 18/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 4s 3ms/step - accuracy: 0.7103 - loss: 1.1981 - val_accuracy: 0.7114 - val_loss: 1.1827 Epoch 19/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7152 - loss: 1.1698 - val_accuracy: 0.7124 - val_loss: 1.1571 Epoch 20/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7157 - loss: 1.1456 - val_accuracy: 0.7149 - val_loss: 1.1333 ===Training with momentum=== Epoch 1/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.3855 - loss: 2.0748 - val_accuracy: 0.6636 - val_loss: 1.4875 Epoch 2/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6830 - loss: 1.3795 - val_accuracy: 0.7153 - val_loss: 1.1383 Epoch 3/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7229 - loss: 1.0771 - val_accuracy: 0.7262 - val_loss: 0.9662 Epoch 4/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7335 - loss: 0.9297 - val_accuracy: 0.7381 - val_loss: 0.8695 Epoch 5/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7447 - loss: 0.8415 - val_accuracy: 0.7402 - val_loss: 0.8070 Epoch 6/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7544 - loss: 0.7796 - val_accuracy: 0.7474 - val_loss: 0.7629 Epoch 7/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7566 - loss: 0.7448 - val_accuracy: 0.7530 - val_loss: 0.7304 Epoch 8/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7632 - loss: 0.7082 - val_accuracy: 0.7584 - val_loss: 0.7042 Epoch 9/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7690 - loss: 0.6838 - val_accuracy: 0.7642 - val_loss: 0.6830 Epoch 10/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7740 - loss: 0.6656 - val_accuracy: 0.7700 - val_loss: 0.6652 Epoch 11/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7808 - loss: 0.6455 - val_accuracy: 0.7739 - val_loss: 0.6500 Epoch 12/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7855 - loss: 0.6280 - val_accuracy: 0.7768 - val_loss: 0.6366 Epoch 13/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7909 - loss: 0.6121 - val_accuracy: 0.7817 - val_loss: 0.6241 Epoch 14/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.7904 - loss: 0.6078 - val_accuracy: 0.7861 - val_loss: 0.6140 Epoch 15/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7951 - loss: 0.5959 - val_accuracy: 0.7890 - val_loss: 0.6040 Epoch 16/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7993 - loss: 0.5824 - val_accuracy: 0.7926 - val_loss: 0.5955 Epoch 17/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8024 - loss: 0.5783 - val_accuracy: 0.7946 - val_loss: 0.5873 Epoch 18/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8049 - loss: 0.5701 - val_accuracy: 0.7970 - val_loss: 0.5798 Epoch 19/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.8108 - loss: 0.5587 - val_accuracy: 0.7993 - val_loss: 0.5734 Epoch 20/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8112 - loss: 0.5489 - val_accuracy: 0.8024 - val_loss: 0.5675 ===Training with rmsprop=== Epoch 1/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7185 - loss: 0.9097 - val_accuracy: 0.8258 - val_loss: 0.4924 Epoch 2/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8372 - loss: 0.4627 - val_accuracy: 0.8223 - val_loss: 0.4771 Epoch 3/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8540 - loss: 0.4088 - val_accuracy: 0.8367 - val_loss: 0.4460 Epoch 4/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8591 - loss: 0.3899 - val_accuracy: 0.8567 - val_loss: 0.4049 Epoch 5/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.8664 - loss: 0.3658 - val_accuracy: 0.8553 - val_loss: 0.4013 Epoch 6/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8754 - loss: 0.3451 - val_accuracy: 0.8635 - val_loss: 0.3869 Epoch 7/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8764 - loss: 0.3427 - val_accuracy: 0.8630 - val_loss: 0.3777 Epoch 8/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8824 - loss: 0.3253 - val_accuracy: 0.8667 - val_loss: 0.3744 Epoch 9/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8861 - loss: 0.3176 - val_accuracy: 0.8706 - val_loss: 0.3613 Epoch 10/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8876 - loss: 0.3117 - val_accuracy: 0.8724 - val_loss: 0.3600 Epoch 11/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 4s 9ms/step - accuracy: 0.8895 - loss: 0.3051 - val_accuracy: 0.8693 - val_loss: 0.3588 Epoch 12/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8926 - loss: 0.2976 - val_accuracy: 0.8687 - val_loss: 0.3675 Epoch 13/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.8962 - loss: 0.2846 - val_accuracy: 0.8762 - val_loss: 0.3479 Epoch 14/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8959 - loss: 0.2861 - val_accuracy: 0.8763 - val_loss: 0.3435 Epoch 15/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.9008 - loss: 0.2762 - val_accuracy: 0.8729 - val_loss: 0.3561 Epoch 16/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.8987 - loss: 0.2749 - val_accuracy: 0.8743 - val_loss: 0.3456 Epoch 17/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9025 - loss: 0.2645 - val_accuracy: 0.8791 - val_loss: 0.3336 Epoch 18/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.9031 - loss: 0.2631 - val_accuracy: 0.8805 - val_loss: 0.3382 Epoch 19/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.9058 - loss: 0.2562 - val_accuracy: 0.8793 - val_loss: 0.3353 Epoch 20/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.9109 - loss: 0.2469 - val_accuracy: 0.8805 - val_loss: 0.3296 ===Training with adam=== Epoch 1/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.6987 - loss: 0.9718 - val_accuracy: 0.8263 - val_loss: 0.5024 Epoch 2/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.8406 - loss: 0.4565 - val_accuracy: 0.8438 - val_loss: 0.4457 Epoch 3/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8578 - loss: 0.4026 - val_accuracy: 0.8504 - val_loss: 0.4228 Epoch 4/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8658 - loss: 0.3767 - val_accuracy: 0.8559 - val_loss: 0.4023 Epoch 5/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8738 - loss: 0.3564 - val_accuracy: 0.8618 - val_loss: 0.3897 Epoch 6/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8781 - loss: 0.3435 - val_accuracy: 0.8641 - val_loss: 0.3800 Epoch 7/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.8825 - loss: 0.3291 - val_accuracy: 0.8673 - val_loss: 0.3723 Epoch 8/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.8855 - loss: 0.3189 - val_accuracy: 0.8695 - val_loss: 0.3647 Epoch 9/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8882 - loss: 0.3116 - val_accuracy: 0.8713 - val_loss: 0.3553 Epoch 10/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8910 - loss: 0.2991 - val_accuracy: 0.8750 - val_loss: 0.3521 Epoch 11/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8967 - loss: 0.2872 - val_accuracy: 0.8768 - val_loss: 0.3442 Epoch 12/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.9000 - loss: 0.2782 - val_accuracy: 0.8723 - val_loss: 0.3554 Epoch 13/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.9005 - loss: 0.2713 - val_accuracy: 0.8776 - val_loss: 0.3417 Epoch 14/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.9015 - loss: 0.2689 - val_accuracy: 0.8811 - val_loss: 0.3348 Epoch 15/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.9054 - loss: 0.2594 - val_accuracy: 0.8828 - val_loss: 0.3310 Epoch 16/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.9092 - loss: 0.2517 - val_accuracy: 0.8840 - val_loss: 0.3295 Epoch 17/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.9091 - loss: 0.2508 - val_accuracy: 0.8806 - val_loss: 0.3308 Epoch 18/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.9123 - loss: 0.2433 - val_accuracy: 0.8842 - val_loss: 0.3303 Epoch 19/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.9145 - loss: 0.2363 - val_accuracy: 0.8852 - val_loss: 0.3264 Epoch 20/20 469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.9184 - loss: 0.2284 - val_accuracy: 0.8856 - val_loss: 0.3205
Plot Training and Test Loss [10/10]¶
- We will now visualize the training and test loss curves for all optimizers on a single plot. Write a function
plot_histories(histories, optimizers, log_scale=False)that:
- Creates a new figure using
plt.figure(figsize=(8,5)). - Iterates through each optimizer in the list.
- Plots both training loss (solid line) and test loss (dashed line) for the same color.
- Adds axis labels, title, legend, and grid lines.
- Optionally uses a logarithmic y-axis if
log_scale=True(useplt.yscale("log")).
- Call the function twice:
- Once with the default linear scale.
- Once with
log_scale=Trueto observe convergence differences in early epochs.
def plot_histories(histories, optimizers, log_scale=False):
plt.figure(figsize=(8,5))
colors = plt.cm.tab10.colors
for i, opt in enumerate(optimizers):
color = colors[i % len(colors)]
plt.plot()
plt.plot()
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Training vs Validation Loss (Different Optimizers)")
plt.legend()
plt.grid(True, which="both", ls=":")
if log_scale:
plt.yscale("log")
plt.ylabel("Loss (log scale)")
plt.show()
# Example calls
plot_histories() # linear scale
plot_histories() # log scale
Interpretation and Discussion [10/10]¶
Write a short reflection (3–5 sentences) interpreting your results.
Note:
Your explanation should focus on what you observe from the training and validation curves rather than theoretical definitions.
Be concise but specific — mention patterns you can clearly see in your plots and test results.