Neural Architecture
Hub
My notes on deep learning — written to make the math click. Every page pairs derivations with an interactive visualisation so you build intuition before you formalise it.
Browse all docsDeep Neural Networks
7 articlesThe building blocks — convolutional layers, optimisation landscapes, and positional representations that underpin vision and language models.
- Understanding Convolutional Neural NetworksA deep dive into the architecture and mathematics of CNNs — the backbone of modern computer vision.
- Dropout: Regularization by NoiseHow randomly silencing neurons during training prevents overfitting — with interactive visualizations of masks, rate effects, and the inverted-dropout trick.
- Tokens to Embeddings — Giving Numbers MeaningHow integer token IDs become dense learned vectors that carry semantic meaning — the embedding layer explained from lookup table to positional encoding.
- Understanding Optimizers in Deep LearningA comprehensive guide to gradient descent variants and optimization algorithms used in training deep neural networks.
- Positional Embeddings in TransformersA deep dive into absolute, relative, sine/cosine, and rotational positional embeddings — with interactive playgrounds.
- RNN, LSTM & GRU: Sequence ModelingHow recurrent networks learn from sequences — the hidden state, vanishing gradients, LSTM's memory cell, and GRU's streamlined gating — with interactive visualizations of each architecture.
- Tokenization — How Language Models Read TextA deep dive into subword tokenization, Byte-Pair Encoding, and vocabulary lookup — the first stage of every language model.
Generative Models
5 articlesThe math behind models that create — from next-token prediction to diffusion processes that synthesise images from noise.
- Autoregressive ModelsHow modern language models generate sequences one token at a time — from the chain rule of probability to sampling strategies and causal attention.
- Diffusion ModelsHow gradually adding and then reversing noise teaches a neural network to generate data — from the forward Markov chain to classifier-free guidance and latent diffusion.
- Generative Adversarial NetworksHow two neural networks locked in competition learn to generate indistinguishably realistic data — from the minimax game to Wasserstein distance and StyleGAN.
- Normalizing FlowsHow invertible neural networks learn exact probability distributions — from the change-of-variables formula to modern coupling layers and continuous flows.
- Variational AutoencodersHow VAEs learn structured latent spaces by combining neural networks with Bayesian inference — from the ELBO to the reparameterisation trick and beyond.
Work in progress — more topics added as I work through them.