Deep Learning Playground

Learn by doing — run interactive demos right in your browser. No installs, no setup, instant feedback.

Gradient Descent Visualizer

Watch how optimization algorithms navigate a loss landscape to find the minimum. Adjust the learning rate and number of steps to see how they affect convergence.

Controls

Drag the learning rate slider to change step size. Higher values learn faster but may overshoot; lower values are stable but slow.

Loss: —
Step: 0
Ready

The Theory Behind the Demo

Understanding why gradient descent works is as important as seeing it in action. Here are the key concepts at play.

🎯 Loss Functions

A loss function measures how wrong a model's predictions are. The goal of training is to find parameter values that minimize this function. Common choices include mean squared error for regression and cross-entropy for classification.

L(θ) = (1/n) Σ ℓ(f(xᵢ; θ), yᵢ)
Goodfellow et al., Deep Learning, Ch. 6.2

📐 Gradient Descent

The gradient ∇L(θ) points in the direction of steepest increase. We move in the opposite direction, scaled by a learning rate η. This simple rule, repeated thousands of times, is how neural networks learn from data.

θ ← θ − η · ∇L(θ)

⚡ Learning Rate

Too large → the optimizer overshoots the minimum and diverges. Too small → it converges very slowly or gets stuck in local minima. Finding a good learning rate is one of the most important hyperparameter decisions in deep learning.

η ∈ {3e-4, 1e-3, 3e-3} (typical range for Adam)
Kingma, D. & Ba, J., Adam: A Method for Stochastic Optimization, ICLR 2015

🏔️ Loss Landscapes

Real neural network loss surfaces are not smooth bowls — they contain saddle points, narrow valleys, and flat regions. The demo shows a simplified 1D landscape; real models optimize over millions of dimensions.

f(x) = 0.15·sin(3x) + 0.1·sin(7x+1) + 0.5·0.3·(x−1.5)²
Li, H. et al., Visualizing the Loss Landscape of Neural Nets, NeurIPS 2018

📄 Further Reading

Ruder (2016)An overview of gradient descent optimization algorithms — comprehensive survey of SGD, Momentum, RMSProp, Adam, and other optimizers. arXiv:1609.04747

Kingma & Ba (2015)Adam: A Method for Stochastic Optimization — the most widely used optimizer in deep learning today. arXiv:1412.6980

Smith (2018)A disciplined approach to neural network hyper-parameters — practical guide to setting learning rates, batch sizes, and more. arXiv:1803.09820

More Demos Coming

We're building more interactive tools to help you build intuition for how deep learning works under the hood.

🧠

Neural Network Builder

Design a network architecture layer by layer. Watch forward passes flow through neurons and backpropagation update weights in real time.

Coming Soon
🔍

CNN Filter Visualizer

Upload an image and see what each convolutional layer detects — edges, textures, patterns, and high-level features.

Coming Soon
🎯

Attention Heatmap

Type a sentence and visualize how transformer attention heads focus on different words when making predictions.

Coming Soon
📉

Optimizer Comparison

Watch SGD, Momentum, RMSProp, and Adam race to minimize the same loss function side by side. See their strengths and failure modes.

Coming Soon