This is a collection of papers I’ve read and enjoyed and feel like have the most ROI in understanding the current DL / LLM landscape.
Foundational Concepts and Architectures
- [1986] Learning representations by back-propagating errors
- [1989] Handwritten Digital Recognition with a Back-Propagation Network
- [2012] ImageNet Classification with Deep Convolutional Neural Networks
- [2013] Efficient Estimation of Word Representations in Vector Space
- [2014] Sequence to Sequence Learning with Neural Networks
- [2014] Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation
- [2017] Attention Is All You Need
- [2018] Generating Wikipedia by Summarizing Long Sequences
- [2018] Improving Language Understand by Generative Pre-Training
- [2018] Language Models are Unsupervised Multitask Learners
- [2022] Training Language Models to Follow Instructions with Human Feedback
Network Stability / Regularization Techniques
- [2014] Dropout: A Simple Way to Prevent Neural Networks from Overfitting
- [2015] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Fine-Tuning & PEFT
- [2012] LoRa: Low-Rank Adaption of Large Language Models
- [2023] Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning