Reference

Curated Resources

The best books, courses, research papers, tools, and communities for learning deep learning — hand-picked and verified.

📚 Textbooks

These books range from beginner-friendly introductions to graduate-level references. All are available free online or have free editions.

Deep Learning

I. Goodfellow, Y. Bengio, A. Courville — MIT Press, 2016

The definitive graduate-level textbook. Covers mathematical foundations, modern deep networks, and research topics including autoencoders, representation learning, and generative models.

Read Free Online →

Dive into Deep Learning

A. Zhang, Z. C. Lipton, M. Li, A. J. Smola — 2023

Interactive, code-first textbook adopted by 500+ universities. Every concept comes with runnable PyTorch, TensorFlow, and JAX code. Covers CNNs, RNNs, attention, transformers, GANs, and optimization.

Read Free Online → GitHub →

Neural Networks and Deep Learning

M. A. Nielsen — Determination Press, 2015

Beginner-focused free online book. Explains core concepts — backpropagation, gradient descent, regularization — with clear writing and interactive visualizations. Great first book.

Read Free Online →

Understanding Deep Learning

S. J. D. Prince — MIT Press, 2023

Modern, comprehensive textbook covering everything from linear regression to transformers and diffusion models. Includes 400+ figures and exercises. Freely available as PDF.

Read Free Online → GitHub →

The Little Book of Deep Learning

F. Fleuret — University of Geneva, 2023

A concise 170-page overview of deep learning designed to fit in a shirt pocket. Covers architectures, training, and applications with remarkable clarity and density.

Free PDF →

🎓 University Courses

Full lecture series from Stanford, MIT, and other top universities — all freely available online. These are the same courses taken by researchers at leading AI labs.

CS231n: Deep Learning for Computer Vision

Stanford University — Fei-Fei Li, Andrej Karpathy, Justin Johnson

The gold standard for computer vision. Covers image classification, CNNs, object detection, segmentation, generative models, and visualizing learned features. Assignments in Python / NumPy / PyTorch.

Course Site → YouTube →

CS224N: NLP with Deep Learning

Stanford University — Christopher Manning

Comprehensive NLP course covering word vectors, dependency parsing, RNNs, attention, transformers, pretraining (BERT, GPT), question answering, and text generation.

Course Site → YouTube →

MIT 6.S191: Introduction to Deep Learning

Massachusetts Institute of Technology — Alexander Amini, Ava Amini

MIT's official introductory deep learning course. Covers foundations, CNNs, RNNs, generative models, reinforcement learning, and AI for science. Updated annually with labs in TensorFlow.

Course Site → YouTube →

Practical Deep Learning for Coders

fast.ai — Jeremy Howard, Sylvain Gugger

Top-down, code-first approach to deep learning. Teaches how to train state-of-the-art models in computer vision and NLP within the first lesson. Uses PyTorch and the fastai library.

Course Site → YouTube →

Neural Networks: Zero to Hero

Andrej Karpathy — 2023

Build neural networks from scratch in Python. Starts with micrograd (autograd engine), builds makemore (character-level language model), and culminates in a GPT from scratch. No libraries needed.

Course Page → YouTube →

Deep Learning Specialization

DeepLearning.AI — Andrew Ng (Coursera)

Five-course specialization covering neural networks, hyperparameter tuning, structuring ML projects, CNNs, and sequence models. Assignments in Python / TensorFlow. Audit for free.

Coursera →

📄 Landmark Research Papers

The 10 most influential papers in deep learning history. Reading these will give you a solid understanding of where the field came from and where it's going.

2017

Attention Is All You Need

Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, Polosukhin

Introduced the Transformer architecture — self-attention replacing recurrence entirely. Foundation of GPT, BERT, T5, and virtually all modern language models.

arXiv →

2015

Deep Residual Learning for Image Recognition

He, Zhang, Ren, Sun (Microsoft Research)

Introduced skip connections (ResNet), enabling training of 152+ layer networks. Won ILSVRC 2015 and remains the backbone of most vision systems.

arXiv →

2014

Generative Adversarial Networks

Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio

Introduced the GAN framework — two networks (generator and discriminator) competing in a minimax game. Pioneered neural image generation.

arXiv →

2018

BERT: Pre-training of Deep Bidirectional Transformers

Devlin, Chang, Lee, Toutanova (Google AI)

Showed that pretraining a bidirectional transformer on masked language modeling then fine-tuning produced SOTA on 11 NLP tasks. Sparked the pretraining revolution.

arXiv →

1998

Gradient-Based Learning Applied to Document Recognition

LeCun, Bottou, Bengio, Haffner

Introduced LeNet-5 and demonstrated end-to-end CNN training for handwritten digit recognition. Established the convolutional neural network architecture used today.

PDF →

1997

Long Short-Term Memory

Hochreiter, Schmidhuber

Solved the vanishing gradient problem in RNNs with gated memory cells. LSTM became the dominant architecture for sequence modeling for two decades until transformers.

PDF →

2012

ImageNet Classification with Deep Convolutional Neural Networks

Krizhevsky, Sutskever, Hinton

AlexNet won ILSVRC 2012 by a huge margin using GPU-trained deep CNNs with ReLU and dropout. The result that reignited the deep learning revolution.

NeurIPS →

2014

Neural Machine Translation by Jointly Learning to Align and Translate

Bahdanau, Cho, Bengio

Introduced the attention mechanism for seq2seq models, allowing the decoder to focus on relevant source words. The crucial precursor to the Transformer.

arXiv →

2020

Denoising Diffusion Probabilistic Models

Ho, Jain, Abbeel (UC Berkeley)

Showed that iterative denoising can generate high-quality images rivaling GANs. Foundation of Stable Diffusion, DALL·E 2, and Midjourney.

arXiv →

2018

Improving Language Understanding by Generative Pre-Training

Radford, Narasimhan, Salimans, Sutskever (OpenAI)

The original GPT paper. Showed that unsupervised pretraining of a transformer decoder followed by supervised fine-tuning achieves strong NLP performance. Led to GPT-2, 3, 4.

PDF →

🛠️ Tools & Frameworks

The essential software for building, training, and deploying deep learning models. All free and open-source.

🔥

PyTorch

Dynamic computation graphs, Pythonic API. The most popular framework in research and increasingly in production.

pytorch.org →

🟠

TensorFlow

Google's production-ready ML platform. TensorFlow 2.x with Keras API for rapid prototyping and TF Serving for deployment.

tensorflow.org →

🤗

Hugging Face

500K+ pretrained models, datasets, and Spaces. The hub for NLP, vision, and audio models with the Transformers library.

huggingface.co →

📓

Google Colab

Free GPU/TPU Jupyter notebooks in the cloud. Zero setup — import data, train models, and share results instantly.

colab.google →

📊

Weights & Biases

Experiment tracking, hyperparameter sweeps, and model versioning. Free for personal and academic use.

wandb.ai →

📑

Papers With Code

Browse state-of-the-art results, find code implementations for any paper, and compare model performance on benchmarks.

paperswithcode.com →

🤝 Communities

Where to ask questions, share progress, read discussions, and find collaborators.

🟠

r/MachineLearning

3M+ members. Research paper discussions, industry news, and AMA sessions with top researchers.

Visit →

📖

r/learnmachinelearning

Beginner-friendly subreddit. Study groups, resource recommendations, project feedback, and career advice.

Visit →

🤗

Hugging Face Forums

Official community for Transformers, Datasets, and model hosting. Great for NLP and vision model questions.

Visit →

🔥

PyTorch Forums

Official PyTorch discussion forum. Best place for framework-specific questions, debugging, and feature requests.

Visit →