CSC578: Neural Networks and Deep Learning
Personnel
- Instructor: Tianxiang Gao
- Meeting time: Thursdays 5:45PM - 9:00PM
- Location: CDM Center 224 at Loop Campus
- Office Hours: Mondays 9:00AM-11:00AM | Zoom
- Overview: Syllabus | Slides
- Discussion: Discord
Course Description
This course covers the foundations of deep learning, including fundamental neural network architectures (e.g., multilayer perceptrons) and training methodologies, including advanced optimization techniques (e.g., momentum, RMSprop, Adam). It also addresses generalization and regularization strategies (e.g., overparameterization, the double descent phenomenon, and weight decay). We will explore cutting-edge neural network architectures, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), transformers (e.g., GPT and BERT), and graph neural networks (GNNs). Students will gain hands-on experience by implementing these models and applying them to real-world problems in computer vision, natural language processing, and graph machine learning.
Prerequisites
- CSC 412 provides basic knowledge in linear algebra, multivariate calculus, and probability.
- DSC 478 or CSC 480 introduces the fundamental concepts of artificial intelligence and machine learning.
- You will implement and train deep neural networks using PyTorch, so basic Python proficiency is required.
Textbook
No textbook is required. Materials will be drawn from classical books and recent papers. Recommended readings:
- Neural Networks and Deep Learning by Michael Nielsen
- Deep Learning book by Goodfellow, Bengio, and Courville
A list of key papers in deep learning will also be provided.
Grading
- Quizzes: 25%
- Programming Assignments: 35%
- Midterm: 20%
- Final Project: 20% (Proposal: 8%, Final Report: 12%)
Only the best 5 out of 10 quizzes and assignments will count toward the final grade.
Schedule
- Week 1: Introduction to Neural Networks. Slides, Video
- Deep Learning, Nature 2015
- Week 2: Training Neural Networks. Slides, Video
- Back-propagation, Nature 1986
- Understanding the difficulty of training DNNs, AISTAT 2010
- On the difficulty of training RNNs, ICML 2013
- He initialization, ICCV 2015
- Week 3: Advanced Optimizers. Slides, Video
- On the Importance of Initialization and Momentum in Deep Learning, ICML 2013
- Adam: A Method for Stochastic Optimization, ICLR 2017
- RMSProp: Divide the Gradient by a Running Average of Its Recent Magnitude, Lecture Slice by Geoff Hinton, 2012
- Stochastic Gradient Descent, Lecture Slice by Ryan Tibshirani at CMU, Fall 2019
- Week 4: Generalization and Regularization. Slides, Video
- Understanding Deep Learning Requires Rethinking Generalization, ICLR 2017
- Dropout: A Simple Way to Prevent Neural Networks from Overfitting, JMLR 2015
- Averaging Weights Leads to Wider Optima and Better Generalization, UAI 2018
- On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima, ICLR 2017
- SGDR: Stochastic Gradient Descent with Warm Restarts, ICLR 2017
- Reconciling modern machine learning practice and the bias-variance trade-off, PNAS 2019
- Week 5: CNNs. Slides, Video
- ImageNet Classification with Deep Convolutional Neural Networks, NeurIPS 2012
- U-Net: Convolutional Networks for Biomedical Image Segmentation, MICCAI 2015
- Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, ICML 2009
- Very deep convolutional networks for large-scale image recognition, ICLR 2015
- Deep residual learning for image recognition, CVPR 2016
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, ICML 2015
- Week 6: Learning with CNNs. Slides, Video
- Network in Network, ICLR 2014
- Going Deeper with Convolutions, CVPR 2014
- MobileNetV2: Inverted Residuals and Linear Bottlenecks, CVPR 2018
- EfficientNet: Rethinking Model Scaling for CNNs, ICML 2019
- You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016
- DeepFace: Closing the Gap to Human-Level Performance in Face Verification, CVPR 2014
- FaceNet: A Unified Embedding for Face Recognition and Clustering, CVPR 2015
- Visualizing and Understanding Convolutional Networks, ECCV 2014
- Week 7: RNNs. Slides, Video
- Learning to Forget: Continual Prediction with LSTM, Neural Computation 2005
- Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation, EMNL 2014
- Bidirectional Recurrent Neural Networks, IEEE TSP 1997
- Linguistic Regularities in Continuous Space Word Representations, NAACL 2013
- A Neural Probabilistic Language Model, JMLR 2003
- Efficient Estimation of Word Representations in Vector Space, ICLR 2013
- GloVe: Global Vectors for Word Representation, EMNLP 2014
- Week 8: Seq2Seq Models and Transformers. Slides, Video
- Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, EMNLP 2014
- Sequence to Sequence Learning with Neural Networks, NeurIPS 2014
- BLEU: a Method for Automatic Evaluation of Machine Translation, ACL 2002
- Neural Machine Translation by Jointly Learning to Align and Translate, ICLR 2015
- Attention Is All You Need, NeurIPS 2017
- Week 9: LLMs and Efficient Transformers. Slides, Video
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, NAACL 2019
- Language Models are Few-Shot Learners, NeurIPS 2020 (Introduced GPT-3)
- Scaling Laws for Neural Language Models, OpenAI Blog 2020
- Training Language Models to Follow Instructions with Human Feedback, NeurIPS 2022
- Week 10: GNNs. Slides, Video
Additional Reading and Resources
- Review of Linear Algebra, by Zico Kolter and Chuong Do from Stanford
- Review of Probability Theory, by Arian Maleki and Tom Do from Stanford
- 10-725 Convex Optimization Course Notes, by Ryan Tibshirani at CMU, Fall 2019
- 11-785 Introduction to Deep Learning, by Bhiksha Raj and Rita Singh at CMU, Fall 2024
- Deep Learning Specialization, by Andrew Ng at Coursera and DeepLearning.AI, Fall 2021
- Foundations of Machine Learning, Textbook by Mehryar Mohri, 2018
- Lectures on Convex Optimization, Textbook by Yurii Nesterov, 2018
- Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks, ICLR 2019
- Deep Networks with Stochastic Depth, ECCV 2016
- Deep Double Descent: Where Bigger Models and More Data Hurt, ICLR 2020
- Speech recognition with deep recurrent neural networks, ICASSP 2013
- Man is to computer programmer as woman is to homemaker? Debiasing word embeddings, NeurIPS 2016
- Show and Tell: A Neural Image Caption Generator, CVPR 2015
- Improving Language Understanding by Generative Pre-Training, OpenAI Blog 2018 (Introduced GPT-1)
- Language Models are Unsupervised Multitask Learners, OpenAI Blog 2019 (Introduced GPT-2)
- Emergent Abilities of Large Language Models, TMLR 2022
- Training Compute-Optimal Large Language Models, NeurIPS 2022
- Deep Reinforcement Learning from Human Preferences, NeurIPS 2017
- The Annotated Transformer, Harvard NLP Blog 2018
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity, JMLR 2022
Assignments
-
Assignment 1: Python Basics and Multilayer Perceptron
Download Notebook | Open in Colab -
Assignment 2: Neural Network Training: MLP, Backpropogation, Gradient Descent
Download Notebook | Open in Colab -
Assignment 3: Advanced Optimizers: Accelerated GD, RMSProp, Adam
Download Notebook | Open in Colab -
Assignment 4: Generalization and Regularization: PyTorch, Autograd, Hyperparameter Tune, Overparameterization
Download Notebook | Open in Colab -
Assignment 5: Introduction to CNNs: Implementation of CNNs, Semantic Segmentation through UNet
Download Notebook | Open in Colab -
Assignment 6: Computer Vision with CNNs: Neural Style Transfer using Pre-Trained VGG
Download Notebook | Open in Colab -
Assignment 7: Recurrent Neural Networks: Implementation and Training
Download Notebook | Open in Colab -
Assignment 8: Seq2seq: Neural Machine Translation
Download Notebook | Open in Colab -
Assignment 9: Transformer: GPT and ChatBot
Download Notebook | Open in Colab