CSC483: Introduction to Deep Learning

Personnel

Instructor: Tianxiang Gao
Meeting time: Thursdays 5:45PM - 9:00PM
Location: CDM Center 224 at Loop Campus
Office Hours: Mondays 10:00AM-11:00AM | Zoom
Overview: Syllabus | Slides
Discussion: Discord

Course Description

This course covers the foundations of deep learning, including fundamental neural network architectures (e.g., multilayer perceptrons) and training methodologies, including widely used optimization techniques (e.g., momentum, RMSprop, Adam). It also addresses generalization and regularization strategies (e.g., overparameterization, the double descent phenomenon, and weight decay). We will explore cutting-edge neural network architectures, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and transformers (e.g., GPT and BERT) with attention mechanisms. Students will gain hands-on experience by implementing these models and applying them to real-world problems in computer vision (CV), natural language processing (NLP), and computational biology. This course covers the foundations of deep learning, including fundamental neural network architectures (e.g., multilayer perceptrons) and training methodologies, including advanced optimization techniques (e.g., momentum, RMSprop, Adam). It also addresses generalization and regularization strategies (e.g., overparameterization, the double descent phenomenon, and weight decay). We will explore cutting-edge neural network architectures, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), transformers (e.g., GPT and BERT), and graph neural networks (GNNs). Students will gain hands-on experience by implementing these models and applying them to real-world problems in computer vision, natural language processing, and graph machine learning.

Prerequisites

CSC 412 provides basic knowledge in linear algebra, multivariate calculus, and probability.
CSC 480 introduces the fundamental concepts of artificial intelligence and machine learning.

Textbook

No textbook is required. Materials will be drawn from classical books and recent papers. Recommended readings:

Dive into Deep Learning by Aston Zhang, Zack C. Lipton, Mu Li, Alex J. Smola.
Deep Learning book by Goodfellow, Bengio, and Courville

A list of key papers in deep learning will also be provided.

Grading

Quizzes: 25%
Programming Assignments: 30%
Student-Designed Assignment : 25%
Peer-Reviewed Evaluation: 20%
Bonus Points (Optional): 5%

Schedule

Week 1: Introduction to Neural Networks. Slides
- Deep Learning, Nature 2015
Week 2: Training Neural Networks. Slides
- Back-propagation, Nature 1986
- Understanding the difficulty of training DNNs, AISTAT 2010
- On the difficulty of training RNNs, ICML 2013
- He initialization, ICCV 2015
Week 3: Advanced Optimizers. Slides
- On the Importance of Initialization and Momentum in Deep Learning, ICML 2013
- Adam: A Method for Stochastic Optimization, ICLR 2017
- RMSProp: Divide the Gradient by a Running Average of Its Recent Magnitude, Lecture Slice by Geoff Hinton, 2012
- Stochastic Gradient Descent, Lecture Slice by Ryan Tibshirani at CMU, Fall 2019
Week 4: Generalization and Regularization. Slides
- Understanding Deep Learning Requires Rethinking Generalization, ICLR 2017
- Dropout: A Simple Way to Prevent Neural Networks from Overfitting, JMLR 2015
- Averaging Weights Leads to Wider Optima and Better Generalization, UAI 2018
- On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima, ICLR 2017
- SGDR: Stochastic Gradient Descent with Warm Restarts, ICLR 2017
- Reconciling modern machine learning practice and the bias-variance trade-off, PNAS 2019

Additional Reading and Resources

Review of Linear Algebra, by Zico Kolter and Chuong Do from Stanford
Review of Probability Theory, by Arian Maleki and Tom Do from Stanford
11-785 Introduction to Deep Learning, by Bhiksha Raj and Rita Singh at CMU, Fall 2024
Deep Learning Specialization, by Andrew Ng at Coursera and DeepLearning.AI, Fall 2021
Foundations of Machine Learning, Textbook by Mehryar Mohri, 2018
Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks, ICLR 2019
Deep Networks with Stochastic Depth, ECCV 2016
Deep Double Descent: Where Bigger Models and More Data Hurt, ICLR 2020
Improving Language Understanding by Generative Pre-Training, OpenAI Blog 2018 (Introduced GPT-1)
Language Models are Unsupervised Multitask Learners, OpenAI Blog 2019 (Introduced GPT-2)
Training Compute-Optimal Large Language Models, NeurIPS 2022
The Annotated Transformer, Harvard NLP Blog 2018

Assignments

Assignment 1: Image Classification using MNIST
Download Notebook | Open in Colab
Assignment 2: Advanced Optimizers using Fashion-MNIST
Download Notebook | Open in Colab
Assignment 3: Regularization and Hyperparameter Tuning using Fashion-MNIST
Download Notebook | Open in Colab