Advanced Optimizers using Fashion-MNIST¶

Author: Tianxiang (Adam) Gao
Course: CSC 383/483: Applied Deep Learning
Description: In this assignment, you will investigate how different optimization algorithms affect the training dynamics and final performance of a neural network on the Fashion-MNIST dataset using the Keras deep learning library. The Fashion-MNIST dataset contains 70,000 grayscale images of clothing items (such as shirts, shoes, and bags), each of size 28×28 pixels across 10 categories.

You will train a simple multilayer perceptron (MLP) model using several optimizers—SGD, SGD with Momentum, RMSprop, and Adam—while keeping the network architecture, data, and initialization fixed. By comparing their learning curves and test accuracies, you will gain intuition about how optimizer choice influences convergence speed, stability, and generalization.

Setup¶

We will first import some useful libraries:

  • numpy for numerical operations (e.g., arrays, random sampling).
  • keras for loading the MNIST dataset and building deep learning models.
  • keras.layers provides the building blocks (dense layers, convolutional layers, activation functions, etc.) to design neural networks.
  • matplotlib for visualizing images and plotting graphs.
In [ ]:
import numpy as np
import keras
from keras import layers
import matplotlib.pyplot as plt

Prepare the Data [5/5]¶

  1. Use keras.datasets.fashion_mnist.load_data() to load the Fashion-MNIST training and testing sets.
  2. Normalize all pixel values from integers in the range [0, 255] to floating-point numbers between 0 and 1.

Note:
In contrast to Assignment 1, we do not use np.expand_dims() to add a channel dimension because MLPs operate on flattened feature vectors rather than image tensors. Similarly, we do not convert labels to one-hot encoding, since we will use the loss function "sparse_categorical_crossentropy", which expects integer labels directly.

In [ ]:
(x_train, y_train), (x_test, y_test) =
x_train =
x_test =
print("x_train shape:", x_train.shape)
num_classes, input_shape = 10, x_train.shape[1:]
print("num_classes:", num_classes)
print("input_shape:", input_shape)
x_train shape: (60000, 28, 28)
num_classes: 10
input_shape: (28, 28)

Visualize the Data [5/5]¶

  1. Randomly pick 9 images from the training set x_train. Display them in a 3×3 grid using Matplotlib (plt.subplot). For each image, show its corresponding digit label (from y_train) as the subplot title.
In [ ]:
indices = np.random.choice(len(x_train), 9, replace=False)

plt.figure(figsize=(6, 6))
for i, idx in enumerate(indices):
    plt.subplot(3, 3, i + 1)
    # squeeze last channel back to 2D for grayscale display
    plt.imshow()
    # show the integer label, not one-hot
    plt.title()
    plt.axis("off")

plt.tight_layout()
plt.show()
No description has been provided for this image

Build the Model [30/30]¶

  1. Implement a helper function make_model(num_classes, input_shape) that returns a simple two-layer MLP built by keras.Sequential with the following layers:

    • Input layer: accepts images of shape input_shape.
    • Flatten layer: converts each 2D image into a 1D vector.
    • Dense layer: fully connected layer with 128 hidden units and a "sigmoid" activation function.
    • Output layer: fully connected layer with num_classes units (one for each digit 0–9) and "softmax" activation.
  2. Create a base_model using your helper function and inspect the model by calling model.summary() to display the network architecture, output shapes, and number of parameters in each layer.

  3. Save the initial_weights of the base_model for reuse in optimizer comparisons.

In [ ]:
def make_model(num_classes, input_shape):
    model = keras.Sequential([

    ])
    return model

base_model =
base_model.summary()

initial_weights =
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ flatten (Flatten)               │ (None, 784)            │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense (Dense)                   │ (None, 128)            │       100,480 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 10)             │         1,290 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 101,770 (397.54 KB)
 Trainable params: 101,770 (397.54 KB)
 Non-trainable params: 0 (0.00 B)

Define Optimizer Function [10/10]¶

  1. To easily switch between different optimization algorithms, we will define a helper function get_optimizer(name, lr=1e-3) that returns the corresponding Keras optimizer object based on its name. Implement the function with the following behavior:
    • "sgd": Standard stochastic gradient descent (SGD).
    • "momentum": SGD with momentum (momentum = 0.9).
    • "rmsprop": RMSprop optimizer (adaptive learning rate).
    • "adam": Adam optimizer (adaptive learning rate + momentum).
    • Raise a ValueError if the name is unknown.
In [ ]:
def get_optimizer(name, lr=1e-3):
    if name == "sgd":
        return keras.optimizers.SGD(learning_rate=lr)

    raise ValueError(f"Unknown optimizer: {name}")

Train and Compare Optimizers [30/30]¶

  1. We will write a helper function train(name, batch_size=128, epochs=20) to train the model with a specified optimizer, then loop over several optimizers to collect their training histories for plotting:

    • Recreate a fresh model via make_model(...).
    • Reset to the same initial_weights to ensure a fair start.
    • Build the optimizer with get_optimizer(name).
    • compile with loss="sparse_categorical_crossentropy" and metrics=["accuracy"].
    • fit on (x_train, y_train) with validation_data=(x_test, y_test).
    • Return the History object.
  2. Compare optimizers:

    • Create a list optimizers = ["sgd", "momentum", "rmsprop", "adam"].
    • For each name, call train(name, ...) and store the returned history in a dictionary histories[name].
In [ ]:
def train(name, batch_size=128, epochs=20):
  model =
  model.set_weights()
  opt =
  model.compile()
  print(f"\n===Training with {name}===")
  hist = model.fit()
  return hist
In [ ]:
optimizers = ["sgd", "momentum", "rmsprop", "adam"]
histories = {}
for opt in optimizers:
  histories[opt] = train(opt)
===Training with sgd===
Epoch 1/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 6s 12ms/step - accuracy: 0.1539 - loss: 2.3461 - val_accuracy: 0.2847 - val_loss: 2.1709
Epoch 2/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.3314 - loss: 2.1411 - val_accuracy: 0.4456 - val_loss: 2.0608
Epoch 3/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.4632 - loss: 2.0359 - val_accuracy: 0.5240 - val_loss: 1.9636
Epoch 4/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.5373 - loss: 1.9416 - val_accuracy: 0.5794 - val_loss: 1.8748
Epoch 5/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.5881 - loss: 1.8514 - val_accuracy: 0.6145 - val_loss: 1.7934
Epoch 6/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.6172 - loss: 1.7729 - val_accuracy: 0.6328 - val_loss: 1.7188
Epoch 7/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.6344 - loss: 1.6998 - val_accuracy: 0.6466 - val_loss: 1.6505
Epoch 8/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.6484 - loss: 1.6329 - val_accuracy: 0.6533 - val_loss: 1.5879
Epoch 9/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.6604 - loss: 1.5687 - val_accuracy: 0.6645 - val_loss: 1.5305
Epoch 10/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.6699 - loss: 1.5167 - val_accuracy: 0.6729 - val_loss: 1.4779
Epoch 11/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.6774 - loss: 1.4613 - val_accuracy: 0.6828 - val_loss: 1.4296
Epoch 12/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.6846 - loss: 1.4133 - val_accuracy: 0.6881 - val_loss: 1.3852
Epoch 13/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.6940 - loss: 1.3689 - val_accuracy: 0.6951 - val_loss: 1.3444
Epoch 14/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7001 - loss: 1.3287 - val_accuracy: 0.6993 - val_loss: 1.3068
Epoch 15/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7026 - loss: 1.2929 - val_accuracy: 0.7024 - val_loss: 1.2721
Epoch 16/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7040 - loss: 1.2593 - val_accuracy: 0.7065 - val_loss: 1.2400
Epoch 17/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.7093 - loss: 1.2253 - val_accuracy: 0.7082 - val_loss: 1.2103
Epoch 18/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 4s 3ms/step - accuracy: 0.7103 - loss: 1.1981 - val_accuracy: 0.7114 - val_loss: 1.1827
Epoch 19/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7152 - loss: 1.1698 - val_accuracy: 0.7124 - val_loss: 1.1571
Epoch 20/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7157 - loss: 1.1456 - val_accuracy: 0.7149 - val_loss: 1.1333

===Training with momentum===
Epoch 1/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.3855 - loss: 2.0748 - val_accuracy: 0.6636 - val_loss: 1.4875
Epoch 2/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6830 - loss: 1.3795 - val_accuracy: 0.7153 - val_loss: 1.1383
Epoch 3/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7229 - loss: 1.0771 - val_accuracy: 0.7262 - val_loss: 0.9662
Epoch 4/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7335 - loss: 0.9297 - val_accuracy: 0.7381 - val_loss: 0.8695
Epoch 5/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7447 - loss: 0.8415 - val_accuracy: 0.7402 - val_loss: 0.8070
Epoch 6/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7544 - loss: 0.7796 - val_accuracy: 0.7474 - val_loss: 0.7629
Epoch 7/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7566 - loss: 0.7448 - val_accuracy: 0.7530 - val_loss: 0.7304
Epoch 8/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7632 - loss: 0.7082 - val_accuracy: 0.7584 - val_loss: 0.7042
Epoch 9/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7690 - loss: 0.6838 - val_accuracy: 0.7642 - val_loss: 0.6830
Epoch 10/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7740 - loss: 0.6656 - val_accuracy: 0.7700 - val_loss: 0.6652
Epoch 11/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7808 - loss: 0.6455 - val_accuracy: 0.7739 - val_loss: 0.6500
Epoch 12/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7855 - loss: 0.6280 - val_accuracy: 0.7768 - val_loss: 0.6366
Epoch 13/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7909 - loss: 0.6121 - val_accuracy: 0.7817 - val_loss: 0.6241
Epoch 14/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.7904 - loss: 0.6078 - val_accuracy: 0.7861 - val_loss: 0.6140
Epoch 15/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7951 - loss: 0.5959 - val_accuracy: 0.7890 - val_loss: 0.6040
Epoch 16/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7993 - loss: 0.5824 - val_accuracy: 0.7926 - val_loss: 0.5955
Epoch 17/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8024 - loss: 0.5783 - val_accuracy: 0.7946 - val_loss: 0.5873
Epoch 18/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8049 - loss: 0.5701 - val_accuracy: 0.7970 - val_loss: 0.5798
Epoch 19/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.8108 - loss: 0.5587 - val_accuracy: 0.7993 - val_loss: 0.5734
Epoch 20/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8112 - loss: 0.5489 - val_accuracy: 0.8024 - val_loss: 0.5675

===Training with rmsprop===
Epoch 1/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7185 - loss: 0.9097 - val_accuracy: 0.8258 - val_loss: 0.4924
Epoch 2/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8372 - loss: 0.4627 - val_accuracy: 0.8223 - val_loss: 0.4771
Epoch 3/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8540 - loss: 0.4088 - val_accuracy: 0.8367 - val_loss: 0.4460
Epoch 4/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8591 - loss: 0.3899 - val_accuracy: 0.8567 - val_loss: 0.4049
Epoch 5/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.8664 - loss: 0.3658 - val_accuracy: 0.8553 - val_loss: 0.4013
Epoch 6/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8754 - loss: 0.3451 - val_accuracy: 0.8635 - val_loss: 0.3869
Epoch 7/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8764 - loss: 0.3427 - val_accuracy: 0.8630 - val_loss: 0.3777
Epoch 8/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8824 - loss: 0.3253 - val_accuracy: 0.8667 - val_loss: 0.3744
Epoch 9/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8861 - loss: 0.3176 - val_accuracy: 0.8706 - val_loss: 0.3613
Epoch 10/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8876 - loss: 0.3117 - val_accuracy: 0.8724 - val_loss: 0.3600
Epoch 11/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 4s 9ms/step - accuracy: 0.8895 - loss: 0.3051 - val_accuracy: 0.8693 - val_loss: 0.3588
Epoch 12/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8926 - loss: 0.2976 - val_accuracy: 0.8687 - val_loss: 0.3675
Epoch 13/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.8962 - loss: 0.2846 - val_accuracy: 0.8762 - val_loss: 0.3479
Epoch 14/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8959 - loss: 0.2861 - val_accuracy: 0.8763 - val_loss: 0.3435
Epoch 15/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.9008 - loss: 0.2762 - val_accuracy: 0.8729 - val_loss: 0.3561
Epoch 16/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.8987 - loss: 0.2749 - val_accuracy: 0.8743 - val_loss: 0.3456
Epoch 17/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9025 - loss: 0.2645 - val_accuracy: 0.8791 - val_loss: 0.3336
Epoch 18/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.9031 - loss: 0.2631 - val_accuracy: 0.8805 - val_loss: 0.3382
Epoch 19/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.9058 - loss: 0.2562 - val_accuracy: 0.8793 - val_loss: 0.3353
Epoch 20/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.9109 - loss: 0.2469 - val_accuracy: 0.8805 - val_loss: 0.3296

===Training with adam===
Epoch 1/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.6987 - loss: 0.9718 - val_accuracy: 0.8263 - val_loss: 0.5024
Epoch 2/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.8406 - loss: 0.4565 - val_accuracy: 0.8438 - val_loss: 0.4457
Epoch 3/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8578 - loss: 0.4026 - val_accuracy: 0.8504 - val_loss: 0.4228
Epoch 4/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8658 - loss: 0.3767 - val_accuracy: 0.8559 - val_loss: 0.4023
Epoch 5/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8738 - loss: 0.3564 - val_accuracy: 0.8618 - val_loss: 0.3897
Epoch 6/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8781 - loss: 0.3435 - val_accuracy: 0.8641 - val_loss: 0.3800
Epoch 7/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.8825 - loss: 0.3291 - val_accuracy: 0.8673 - val_loss: 0.3723
Epoch 8/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.8855 - loss: 0.3189 - val_accuracy: 0.8695 - val_loss: 0.3647
Epoch 9/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8882 - loss: 0.3116 - val_accuracy: 0.8713 - val_loss: 0.3553
Epoch 10/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8910 - loss: 0.2991 - val_accuracy: 0.8750 - val_loss: 0.3521
Epoch 11/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8967 - loss: 0.2872 - val_accuracy: 0.8768 - val_loss: 0.3442
Epoch 12/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.9000 - loss: 0.2782 - val_accuracy: 0.8723 - val_loss: 0.3554
Epoch 13/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.9005 - loss: 0.2713 - val_accuracy: 0.8776 - val_loss: 0.3417
Epoch 14/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.9015 - loss: 0.2689 - val_accuracy: 0.8811 - val_loss: 0.3348
Epoch 15/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.9054 - loss: 0.2594 - val_accuracy: 0.8828 - val_loss: 0.3310
Epoch 16/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.9092 - loss: 0.2517 - val_accuracy: 0.8840 - val_loss: 0.3295
Epoch 17/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.9091 - loss: 0.2508 - val_accuracy: 0.8806 - val_loss: 0.3308
Epoch 18/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.9123 - loss: 0.2433 - val_accuracy: 0.8842 - val_loss: 0.3303
Epoch 19/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.9145 - loss: 0.2363 - val_accuracy: 0.8852 - val_loss: 0.3264
Epoch 20/20
469/469 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.9184 - loss: 0.2284 - val_accuracy: 0.8856 - val_loss: 0.3205

Plot Training and Test Loss [10/10]¶

  1. We will now visualize the training and test loss curves for all optimizers on a single plot. Write a function plot_histories(histories, optimizers, log_scale=False) that:
  • Creates a new figure using plt.figure(figsize=(8,5)).
  • Iterates through each optimizer in the list.
  • Plots both training loss (solid line) and test loss (dashed line) for the same color.
  • Adds axis labels, title, legend, and grid lines.
  • Optionally uses a logarithmic y-axis if log_scale=True (use plt.yscale("log")).
  1. Call the function twice:
  • Once with the default linear scale.
  • Once with log_scale=True to observe convergence differences in early epochs.
In [ ]:
def plot_histories(histories, optimizers, log_scale=False):
    plt.figure(figsize=(8,5))
    colors = plt.cm.tab10.colors

    for i, opt in enumerate(optimizers):
        color = colors[i % len(colors)]
        plt.plot()
        plt.plot()

    plt.xlabel("Epoch")
    plt.ylabel("Loss")
    plt.title("Training vs Validation Loss (Different Optimizers)")
    plt.legend()
    plt.grid(True, which="both", ls=":")
    if log_scale:
        plt.yscale("log")
        plt.ylabel("Loss (log scale)")
    plt.show()

# Example calls
plot_histories()              # linear scale
plot_histories()  # log scale
No description has been provided for this image
No description has been provided for this image

Interpretation and Discussion [10/10]¶

Write a short reflection (3–5 sentences) interpreting your results.

Note:
Your explanation should focus on what you observe from the training and validation curves rather than theoretical definitions.
Be concise but specific — mention patterns you can clearly see in your plots and test results.