Deep Learning Foundations
Understand neural networks and train models with PyTorch, from CNNs to transformers and deployment.
Content
Building Models in PyTorch
Versions:
Watch & Learn
AI-discovered learning video
Sign in to watch the learning video for this topic.
Building Models in PyTorch — hands-on, with fewer tears
You already met PyTorch tensors (memory, shapes, device moves) and got a feel for backprop intuition in the previous lessons. Now we stitch those ideas together and go from data + tensors → a trained model you can save, load, and brag about at parties.
"Think of PyTorch like LEGO for neural nets: tensors are bricks, autograd is the instruction manual, and your training loop is the ritual dance that assembles everything."
What this lesson is and why it matters
We're focusing on building, training, evaluating, and saving models in PyTorch — the practical stuff. If you've used scikit-learn, you know the comfy fit/predict API and pipelines. PyTorch is lower-level: you get more control and more power (and, yes, slightly more responsibility). This control is what you need for deep learning and production-ready neural nets.
Use cases: image classification, tabular regression, transfer learning, research experiments, rapid prototyping for model-based features.
Quick roadmap (so we don't get lost)
- Define a model with torch.nn.Module
- Prepare data as tensors and DataLoader (recall tensors!)
- Training loop: forward, loss, backward, step (recall backprop)
- Evaluation and saving/loading
- Reproducibility and tips (the scikit-learn way, but PyTorch style)
1) Define a model: the Module ritual
Micro explanation: nn.Module is a container for layers + parameters and tells PyTorch how to run the forward pass.
Example: a tiny feedforward net for tabular data.
import torch
import torch.nn as nn
class SmallMLP(nn.Module):
def __init__(self, input_dim, hidden=64, out_dim=1):
super().__init__()
self.net = nn.Sequential(
nn.Linear(input_dim, hidden),
nn.ReLU(),
nn.Linear(hidden, out_dim)
)
def forward(self, x):
return self.net(x)
Why this matters: encapsulation. You get clean forward/backward behavior and easy to inspect parameters (model.parameters()).
2) Data: tensors → Dataset → DataLoader
If you liked scikit-learn's fit(X, y), this is similar but with explicit batching and shuffling. Convert arrays to tensors and wrap them.
from torch.utils.data import TensorDataset, DataLoader
X = torch.tensor(X_np, dtype=torch.float32)
Y = torch.tensor(y_np, dtype=torch.float32).unsqueeze(1)
dataset = TensorDataset(X, Y)
loader = DataLoader(dataset, batch_size=32, shuffle=True)
Link to previous content: remember how tensors live on CPU or GPU and how shape matters — those lessons are what keep your model from throwing tantrums.
3) The canonical training loop (the place where backprop happens)
Here's the meat. This is where autograd (covered in Backprop Intuition) automatically builds gradients during the forward pass and applies them during backward().
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = SmallMLP(input_dim=10).to(device)
opt = torch.optim.Adam(model.parameters(), lr=1e-3)
loss_fn = nn.MSELoss()
for epoch in range(10):
model.train()
for xb, yb in loader:
xb, yb = xb.to(device), yb.to(device)
pred = model(xb)
loss = loss_fn(pred, yb)
opt.zero_grad()
loss.backward() # autograd computes gradients
opt.step() # optimizer updates weights
print(f'epoch {epoch} loss {loss.item():.4f}')
Key mechanics to remember:
- model.train() vs model.eval() — toggles dropout, batchnorm behavior
- opt.zero_grad() — avoids gradient accumulation unless you want it
- torch.no_grad() for validation (saves memory + time)
4) Evaluate and save
Evaluation example:
model.eval()
val_loss = 0
with torch.no_grad():
for xb, yb in val_loader:
xb, yb = xb.to(device), yb.to(device)
val_loss += loss_fn(model(xb), yb).item() * xb.size(0)
val_loss /= len(val_dataset)
print('Val loss:', val_loss)
Saving and loading:
torch.save(model.state_dict(), 'model.pth')
# to load
model = SmallMLP(input_dim=10)
model.load_state_dict(torch.load('model.pth'))
model.to(device)
model.eval()
5) Reproducibility & scikit-learn vibes
scikit-learn pipelines give deterministic steps (if random_state is set). In PyTorch you must do more explicit work:
import random
import numpy as np
seed = 42
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
if torch.cuda.is_available():
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
For hyperparameter tuning, you can still use scikit-learn-style tools (Optuna, Ray Tune, or sklearn's GridSearchCV wrapper) — but you usually run training loops as the objective.
Tips, gotchas, and shortcuts
- Use GPU when possible: move both model and data to the same device.
- Save the model.state_dict() and also save the optimizer state if you want to resume training.
- Use model.train() before training and model.eval() for validation/testing.
- For classification, use nn.CrossEntropyLoss (it expects raw logits, not softmax).
- Watch out for gradient explosion: consider clipping (torch.nn.utils.clip_grad_norm_).
- For reproducible experiments, log seeds and configs — just like scikit-learn pipelines log params.
Table cheat-sheet:
| Step | scikit-learn | PyTorch equivalent |
|---|---|---|
| Fit | model.fit(X, y) | training loop (forward, backward, step) |
| Predict | model.predict(X) | model.eval(); with torch.no_grad(): model(X) |
| Save | joblib.dump | torch.save(state_dict) |
| Seed | random_state | torch.manual_seed + numpy + cudnn flags |
Why do people keep misunderstanding this?
Because PyTorch asks you to handle details (batches, device moves, training/eval modes) that scikit-learn hides. That extra explicitness is powerful: you can do custom losses, per-sample gradients, custom training regimes — but you're the one juggling the balls.
Quick checklist before you run your first real training
- Data as float tensors, shapes correct
- Model on correct device
- Loss function suitable for task
- Optimizer with sensible lr
- Training loop with zero_grad, backward, step
- model.eval() + torch.no_grad() for validation
- Save model.state_dict() and config
Takeaways (memorable insights)
- PyTorch gives you the wrench and the blueprint; scikit-learn gives you the prebuilt widget. Use PyTorch when you need flexibility and speed with GPUs.
- Tensors + autograd + Module + optimizer + training loop is the minimal recipe for deep learning in PyTorch.
- Reuse prior lessons: the tensor rules you learned earlier and your backprop intuition are the backbone of every training run.
Go build something: a tiny classifier, a transfer learning experiment, or a model wrapped in a neat training script. If the model doesn't learn at first, don't cry — debug shapes, data, and learning rate in that order.
Good luck. Remember: models are like plants — give them the right data, gradients, and patience.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!