CSC578: Neural Networks and Deep Learning

Personnel

Instructor: Tianxiang Gao
Meeting time: Thursdays 5:45PM - 9:00PM
Location: CDM Center 224 at Loop Campus
Office Hours: Mondays 9:00AM-11:00AM | Zoom
Overview: Syllabus | Slides
Discussion: Discord

Course Description

This course covers the foundations of deep learning, including fundamental neural network architectures (e.g., multilayer perceptrons) and training methodologies, including advanced optimization techniques (e.g., momentum, RMSprop, Adam). It also addresses generalization and regularization strategies (e.g., overparameterization, the double descent phenomenon, and weight decay). We will explore cutting-edge neural network architectures, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), transformers (e.g., GPT and BERT), and graph neural networks (GNNs). Students will gain hands-on experience by implementing these models and applying them to real-world problems in computer vision, natural language processing, and graph machine learning.

Prerequisites

CSC 412 provides basic knowledge in linear algebra, multivariate calculus, and probability.
DSC 478 or CSC 480 introduces the fundamental concepts of artificial intelligence and machine learning.
You will implement and train deep neural networks using PyTorch, so basic Python proficiency is required.

Textbook

No textbook is required. Materials will be drawn from classical books and recent papers. Recommended readings:

Dive into Deep Learning by Aston Zhang, Zack C. Lipton, Mu Li, Alex J. Smola.
Deep Learning book by Goodfellow, Bengio, and Courville

A list of key papers in deep learning will also be provided.

Grading

Quizzes: 25%
Programming Assignments: 35%
Midterm: 20%
Final Project: 20% (Proposal: 8%, Final Report: 12%)

Only the best 5 out of 10 quizzes and assignments will count toward the final grade.

Schedule

Week 1: Introduction to Neural Networks. Slides, Video
- Deep Learning, Nature 2015
Week 2: Training Neural Networks. Slides, Video
- Back-propagation, Nature 1986
- Understanding the difficulty of training DNNs, AISTAT 2010
- On the difficulty of training RNNs, ICML 2013
- He initialization, ICCV 2015
Week 3: Advanced Optimizers. Slides, Video
- On the Importance of Initialization and Momentum in Deep Learning, ICML 2013
- Adam: A Method for Stochastic Optimization, ICLR 2017
- RMSProp: Divide the Gradient by a Running Average of Its Recent Magnitude, Lecture Slice by Geoff Hinton, 2012
- Stochastic Gradient Descent, Lecture Slice by Ryan Tibshirani at CMU, Fall 2019
Week 4: Generalization and Regularization. Slides, Video
- Understanding Deep Learning Requires Rethinking Generalization, ICLR 2017
- Dropout: A Simple Way to Prevent Neural Networks from Overfitting, JMLR 2015
- Averaging Weights Leads to Wider Optima and Better Generalization, UAI 2018
- On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima, ICLR 2017
- SGDR: Stochastic Gradient Descent with Warm Restarts, ICLR 2017
- Reconciling modern machine learning practice and the bias-variance trade-off, PNAS 2019
Week 5: CNNs. Slides, Video
- ImageNet Classification with Deep Convolutional Neural Networks, NeurIPS 2012
- U-Net: Convolutional Networks for Biomedical Image Segmentation, MICCAI 2015
- Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, ICML 2009
- Very deep convolutional networks for large-scale image recognition, ICLR 2015
- Deep residual learning for image recognition, CVPR 2016
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, ICML 2015
Week 6: Learning with CNNs. Slides, Video
- Network in Network, ICLR 2014
- Going Deeper with Convolutions, CVPR 2014
- MobileNetV2: Inverted Residuals and Linear Bottlenecks, CVPR 2018
- EfficientNet: Rethinking Model Scaling for CNNs, ICML 2019
- You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016
- DeepFace: Closing the Gap to Human-Level Performance in Face Verification, CVPR 2014
- FaceNet: A Unified Embedding for Face Recognition and Clustering, CVPR 2015
- Visualizing and Understanding Convolutional Networks, ECCV 2014
Week 7: RNNs. Slides, Video
- Learning to Forget: Continual Prediction with LSTM, Neural Computation 2005
- Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation, EMNL 2014
- Bidirectional Recurrent Neural Networks, IEEE TSP 1997
- Linguistic Regularities in Continuous Space Word Representations, NAACL 2013
- A Neural Probabilistic Language Model, JMLR 2003
- Efficient Estimation of Word Representations in Vector Space, ICLR 2013
- GloVe: Global Vectors for Word Representation, EMNLP 2014
Week 8: Seq2Seq Models and Transformers. Slides, Video
- Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, EMNLP 2014
- Sequence to Sequence Learning with Neural Networks, NeurIPS 2014
- BLEU: a Method for Automatic Evaluation of Machine Translation, ACL 2002
- Neural Machine Translation by Jointly Learning to Align and Translate, ICLR 2015
- Attention Is All You Need, NeurIPS 2017
Week 9: LLMs and Efficient Transformers. Slides, Video
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, NAACL 2019
- Language Models are Few-Shot Learners, NeurIPS 2020 (Introduced GPT-3)
- Scaling Laws for Neural Language Models, OpenAI Blog 2020
- Training Language Models to Follow Instructions with Human Feedback, NeurIPS 2022
Week 10: GNNs. Slides, Video
- Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017
- Neural Message Passing for Quantum Chemistry, ICML 2017
- Graph Attention Networks, ICLR 2018
- How Powerful are Graph Neural Networks?, ICLR 2019
- Recipe for a General, Powerful, Scalable Graph Transformer, NeurIPS 2022

Additional Reading and Resources

Review of Linear Algebra, by Zico Kolter and Chuong Do from Stanford
Review of Probability Theory, by Arian Maleki and Tom Do from Stanford
10-725 Convex Optimization Course Notes, by Ryan Tibshirani at CMU, Fall 2019
11-785 Introduction to Deep Learning, by Bhiksha Raj and Rita Singh at CMU, Fall 2024
Deep Learning Specialization, by Andrew Ng at Coursera and DeepLearning.AI, Fall 2021
Foundations of Machine Learning, Textbook by Mehryar Mohri, 2018
Lectures on Convex Optimization, Textbook by Yurii Nesterov, 2018
Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks, ICLR 2019
Deep Networks with Stochastic Depth, ECCV 2016
Deep Double Descent: Where Bigger Models and More Data Hurt, ICLR 2020
Speech recognition with deep recurrent neural networks, ICASSP 2013
Man is to computer programmer as woman is to homemaker? Debiasing word embeddings, NeurIPS 2016
Show and Tell: A Neural Image Caption Generator, CVPR 2015
Improving Language Understanding by Generative Pre-Training, OpenAI Blog 2018 (Introduced GPT-1)
Language Models are Unsupervised Multitask Learners, OpenAI Blog 2019 (Introduced GPT-2)
Emergent Abilities of Large Language Models, TMLR 2022
Training Compute-Optimal Large Language Models, NeurIPS 2022
Deep Reinforcement Learning from Human Preferences, NeurIPS 2017
The Annotated Transformer, Harvard NLP Blog 2018
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity, JMLR 2022
Predict Then Propagate: Graph Neural Networks Meet Personalized PageRank, ICLR 2018

Assignments

Assignment 1: Python Basics and Multilayer Perceptron
Download Notebook | Open in Colab
Assignment 2: Neural Network Training: MLP, Backpropogation, Gradient Descent
Download Notebook | Open in Colab
Assignment 3: Advanced Optimizers: Accelerated GD, RMSProp, Adam
Download Notebook | Open in Colab
Assignment 4: Generalization and Regularization: PyTorch, Autograd, Hyperparameter Tune, Overparameterization
Download Notebook | Open in Colab
Assignment 5: Introduction to CNNs: Implementation of CNNs, Semantic Segmentation through UNet
Download Notebook | Open in Colab
Assignment 6: Computer Vision with CNNs: Neural Style Transfer using Pre-Trained VGG
Download Notebook | Open in Colab
Assignment 7: Recurrent Neural Networks: Implementation and Training
Download Notebook | Open in Colab
Assignment 8: Seq2seq: Neural Machine Translation
Download Notebook | Open in Colab
Assignment 9: Transformer: GPT and ChatBot
Download Notebook | Open in Colab