Stephen Carmody

A place to write about AI topics and ML in production

Posts Interesting Papers About

Interesting Papers

This is a collection of papers I’ve read and enjoyed and feel like have the most ROI in understanding the current DL / LLM landscape.


Foundational Concepts and Architectures

  • [1986] Learning representations by back-propagating errors
  • [1989] Handwritten Digital Recognition with a Back-Propagation Network
  • [2012] ImageNet Classification with Deep Convolutional Neural Networks
  • [2013] Efficient Estimation of Word Representations in Vector Space
  • [2014] Sequence to Sequence Learning with Neural Networks
  • [2014] Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation
  • [2017] Attention Is All You Need
  • [2018] Generating Wikipedia by Summarizing Long Sequences
  • [2018] Improving Language Understand by Generative Pre-Training
  • [2018] Language Models are Unsupervised Multitask Learners
  • [2022] Training Language Models to Follow Instructions with Human Feedback


Network Stability / Regularization Techniques

  • [2014] Dropout: A Simple Way to Prevent Neural Networks from Overfitting
  • [2015] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift


Fine-Tuning & PEFT

  • [2012] LoRa: Low-Rank Adaption of Large Language Models
  • [2023] Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning


Production and Deployment

  • [2023] Efficient Memory Management for Large Language Model Serving with PagedAttention