CSC483: Introduction to Deep Learning
Personnel
- Instructor: Tianxiang Gao
- Meeting time: Thursdays 5:45PM - 9:00PM
- Location: CDM Center 224 at Loop Campus
- Office Hours: Mondays 10:00AM-11:00AM | Zoom
- Overview: Syllabus | Slides
- Discussion: Discord
Course Description
This course covers the foundations of deep learning, including fundamental neural network architectures (e.g., multilayer perceptrons) and training methodologies, including widely used optimization techniques (e.g., momentum, RMSprop, Adam). It also addresses generalization and regularization strategies (e.g., overparameterization, the double descent phenomenon, and weight decay). We will explore cutting-edge neural network architectures, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and transformers (e.g., GPT and BERT) with attention mechanisms. Students will gain hands-on experience by implementing these models and applying them to real-world problems in computer vision (CV), natural language processing (NLP), and computational biology. This course covers the foundations of deep learning, including fundamental neural network architectures (e.g., multilayer perceptrons) and training methodologies, including advanced optimization techniques (e.g., momentum, RMSprop, Adam). It also addresses generalization and regularization strategies (e.g., overparameterization, the double descent phenomenon, and weight decay). We will explore cutting-edge neural network architectures, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), transformers (e.g., GPT and BERT), and graph neural networks (GNNs). Students will gain hands-on experience by implementing these models and applying them to real-world problems in computer vision, natural language processing, and graph machine learning.
Prerequisites
- CSC 412 provides basic knowledge in linear algebra, multivariate calculus, and probability.
- CSC 480 introduces the fundamental concepts of artificial intelligence and machine learning.
Textbook
No textbook is required. Materials will be drawn from classical books and recent papers. Recommended readings:
- Dive into Deep Learning by Aston Zhang, Zack C. Lipton, Mu Li, Alex J. Smola.
- Deep Learning book by Goodfellow, Bengio, and Courville
A list of key papers in deep learning will also be provided.
Grading
- Quizzes: 25%
- Programming Assignments: 30%
- Student-Designed Assignment : 25%
- Peer-Reviewed Evaluation: 20%
- Bonus Points (Optional): 5%
Schedule
- Week 1: Introduction to Neural Networks. Slides
- Deep Learning, Nature 2015
- Week 2: Training Neural Networks. Slides
- Back-propagation, Nature 1986
- Understanding the difficulty of training DNNs, AISTAT 2010
- On the difficulty of training RNNs, ICML 2013
- He initialization, ICCV 2015
- Week 3: Advanced Optimizers. Slides
- On the Importance of Initialization and Momentum in Deep Learning, ICML 2013
- Adam: A Method for Stochastic Optimization, ICLR 2017
- RMSProp: Divide the Gradient by a Running Average of Its Recent Magnitude, Lecture Slice by Geoff Hinton, 2012
- Stochastic Gradient Descent, Lecture Slice by Ryan Tibshirani at CMU, Fall 2019
- Week 4: Generalization and Regularization. Slides
- Understanding Deep Learning Requires Rethinking Generalization, ICLR 2017
- Dropout: A Simple Way to Prevent Neural Networks from Overfitting, JMLR 2015
- Averaging Weights Leads to Wider Optima and Better Generalization, UAI 2018
- On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima, ICLR 2017
- SGDR: Stochastic Gradient Descent with Warm Restarts, ICLR 2017
- Reconciling modern machine learning practice and the bias-variance trade-off, PNAS 2019
Additional Reading and Resources
- Review of Linear Algebra, by Zico Kolter and Chuong Do from Stanford
- Review of Probability Theory, by Arian Maleki and Tom Do from Stanford
- 11-785 Introduction to Deep Learning, by Bhiksha Raj and Rita Singh at CMU, Fall 2024
- Deep Learning Specialization, by Andrew Ng at Coursera and DeepLearning.AI, Fall 2021
- Foundations of Machine Learning, Textbook by Mehryar Mohri, 2018
- Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks, ICLR 2019
- Deep Networks with Stochastic Depth, ECCV 2016
- Deep Double Descent: Where Bigger Models and More Data Hurt, ICLR 2020
- Improving Language Understanding by Generative Pre-Training, OpenAI Blog 2018 (Introduced GPT-1)
- Language Models are Unsupervised Multitask Learners, OpenAI Blog 2019 (Introduced GPT-2)
- Training Compute-Optimal Large Language Models, NeurIPS 2022
- The Annotated Transformer, Harvard NLP Blog 2018
Assignments
-
Assignment 1: Image Classification using MNIST
Download Notebook | Open in Colab -
Assignment 2: Advanced Optimizers using Fashion-MNIST
Download Notebook | Open in Colab -
Assignment 3: Regularization and Hyperparameter Tuning using Fashion-MNIST
Download Notebook | Open in Colab