Deep Learning Foundations
Understand neural networks and train models with PyTorch, from CNNs to transformers and deployment.
Content
PyTorch Tensors
Versions:
Watch & Learn
AI-discovered learning video
Sign in to watch the learning video for this topic.
PyTorch Tensors: The Building Blocks of Every Neural Net (But Cooler)
"This is the moment where the concept finally clicks."
You're coming off learning about activation functions and the intuition behind backpropagation — nice. Now meet the actual data structure that makes both of those things happen in code: PyTorch tensors. If activations are the neurons and backpropagation is the brain's gossip network, tensors are the neurons' furniture: they hold the numbers, move them around, and occasionally go to the GPU gym.
Why tensors matter (and how this builds on what you already know)
- From our scikit-learn work you know models expect arrays (usually NumPy). In deep learning, models expect tensors. Think: NumPy + GPU + autodiff.
- Activation functions operate element-wise on tensors.
- Backpropagation uses tensors with
requires_grad=Trueso autograd can compute gradients for updates.
In short: if you want to train neural networks, you must be fluent in tensors.
Quick tour: What is a tensor? (Short, lovable definition)
- Tensor = N-dimensional array (like NumPy) + metadata (dtype, device) + autograd features.
- dtype: float32, float64, int64, etc. For speed on GPUs use float32.
- device: CPU or GPU (
'cpu'or'cuda:0'). Move tensors between devices with.to(device). - requires_grad: if True, PyTorch will track operations for backpropagation.
Micro explanation
- A 2D tensor is like a matrix. A 4D tensor often means (batch, channel, height, width) for images.
Create tensors — basic recipes (code you will copy forever)
import torch
# From lists
x = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
# From NumPy (common when moving from scikit-learn)
import numpy as np
arr = np.random.randn(10, 3)
t = torch.from_numpy(arr).float()
# Quick factories
zeros = torch.zeros(2, 3)
ones = torch.ones(4)
rand = torch.randn(5, 5)
# Put on GPU if available
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
rand = rand.to(device)
# For autodiff
x = torch.randn(3, requires_grad=True)
Tip: if you're coming from scikit-learn pipelines, remember to convert NumPy float64 arrays to float32 before putting them on GPU: float64 is slower and may not be supported on all devices.
Shapes, reshape, and the little functions you use 100x/day
.shape— like NumPy's.shape..view()or.reshape()— change tensor shape (non-copy when possible)..unsqueeze(dim)/.squeeze(dim)— add/remove dimensions (useful for batch dims)..transpose()/.permute()— reorder axes (permute for >2D).
Example: convert a (H, W) to (1, 1, H, W) for a conv input: img.unsqueeze(0).unsqueeze(0) or img.view(1, 1, H, W).
Math, broadcasting, and matrix ops
- Elementwise:
+,-,*,/ - Matrix multiply:
@ortorch.matmul(a, b) - Reduce:
sum(),mean(),max() - Einstein sum:
torch.einsum()for fancy index algebra
Broadcasting rules are like NumPy's — handy, occasionally glorious, sometimes surprising.
Autograd in practice — how tensors power backprop
You learned backprop intuition earlier. Here's how those ideas map to tensors.
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y = x * 2 # elementwise op tracked by autograd
z = y.pow(2).sum() # scalar loss
z.backward() # compute gradients
print(x.grad) # d/dx of z: 2 * (2*x) = 4*x -> [4, 8, 12]
requires_grad=Truetells PyTorch to record operations onx.backward()computes gradients through the dynamic computation graph..gradstores gradients (note: it accumulates across backward calls, so you often.zero_()them when doing manual updates).
Important: many operations are in-place (end with _, e.g., x.add_(1)) — avoid in-place ops on tensors that require grad unless you know what you're doing; they can break the computation graph.
Training-time primitives: detach, no_grad, and .item()
with torch.no_grad():— temporarily disable gradient tracking (used during evaluation and when converting model outputs back to NumPy).tensor.detach()— get a new tensor that shares storage but is detached from the graph.tensor.item()— get Python scalar from single-element tensor.
Common pattern when evaluating model predictions and logging metrics:
model.eval()
with torch.no_grad():
outputs = model(inputs)
preds = outputs.argmax(dim=1)
numpy_preds = preds.cpu().numpy()
Device and dtype pitfalls (learn these the hard way so others don't)
- GPU and CPU tensors cannot be mixed in ops. Move all operands to same device.
- Prefer
torch.float32for training. scikit-learn often yields float64 — cast with.astype(np.float32)or.float(). - If you see mysterious errors in backward, check for in-place ops or tensors that were
.detach()d accidentally.
From scikit-learn to PyTorch: a tiny workflow
- Use scikit-learn for preprocessing pipelines (StandardScaler, PCA, feature engineering).
- Convert final dataset to NumPy arrays.
- Cast to float32 and convert to tensors:
X = X.astype(np.float32)
X_tensor = torch.from_numpy(X)
Y_tensor = torch.from_numpy(y).long() # for classification
- Wrap in a Dataset + DataLoader, move batches to device, and feed tensors to models.
This gives you reproducible preprocessing with scikit-learn and the training power of PyTorch — best of both worlds.
Quick checklist (aka survival kit)
- Use float32 unless you have a good reason.
- Set
requires_grad=Trueonly for tensors you need gradients for (usually model parameters; intermediate activations tracked automatically if computed from them). - Use
with torch.no_grad()for evaluation/prediction to save memory and time. .zero_()oroptimizer.zero_grad()beforeloss.backward()if you accumulate gradients manually.- Move tensors to the right device:
tensor.to(device).
Final takeaways — short and punchy
- Tensors are NumPy on steroids: same vibe, but with GPU and automatic differentiation.
- They connect your preprocessing (scikit-learn) to your model forward pass and the backpropagation machinery you learned earlier.
- Mastering shape ops, device management, and autograd basics will make training models feel like driving — not like being behind the wheel of a runaway blender.
If you've ever wondered where gradients live and how activations turn into updates, now you know: tensors carry it all. Start playing: create tensors, toggle requires_grad, run simple backward passes, and watch the math happen.
Tags: beginner, practical, hands-on, pytorch
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!