Courses/CFA Level 1/Quantitative Methods

Quantitative Methods

679 views

Fundamentals of quantitative analysis used in finance.

Content

5 of 10

Correlation and Regression

Regression with Sass: CFA Quant Methods Crash

122 views

intermediate

humorous

finance

quantitative methods

gpt-5-mini

122 views

Versions:

Regression with Sass: CFA Quant Methods Crash

Watch & Learn

AI-discovered learning video

Start learning for free

Bookmark content and pick up later
AI-generated study materials
Flashcards, timelines, and more
Progress tracking and certificates

Free to join · No credit card required

Correlation and Regression — The Sexy Side of Numbers (Actually Useful for CFA L1)

"Correlation is the gossip of data; regression is the confession booth." — Probably me, right now.

You just finished Probability Concepts and Statistical Inference — so you know about distributions, sampling variability, and hypothesis testing. Now we move to the duo that lets you quantify relationships between variables: correlation (how tight the gossip circle is) and regression (who influences whom — or at least who looks like they do). This is crucial for finance: think factor models, forecasting returns, or just making your Excel look impressively academic. But remember Ethics 101: correlation ≠ causation — misuse here is a fast route to misleading clients (and failing ethics questions).

1) Correlation: The Short Summary

What it is: A standardized measure of linear association between two variables. The most common: Pearson correlation coefficient (r).

Range: -1 to +1.
- r = +1 perfect positive linear relationship
- r = -1 perfect negative linear relationship
- r ≈ 0 little-to-no linear relationship
Formula (conceptual):

r = cov(X, Y) / (σ_X * σ_Y)

Interpretation: If r = 0.8, X and Y move together strongly in a linear sense. If r = 0.2, weak linear association — but there might still be a non-linear relationship.

Quick heuristics (context matters!):

|r| < 0.3 — weak
0.3 ≤ |r| < 0.6 — moderate
|r| ≥ 0.6 — strong

Ask yourself: Is the correlation economically meaningful, or just statistically significant because my sample is huge? Large N can make tiny r significant. That's where your Statistical Inference lessons kick in.

Nonparametric alternative

Spearman rank correlation: measures monotonic relationships (good when data aren't linear or are ordinal).

2) Simple Linear Regression: The Basics

Model:

Y = β0 + β1 X + ε

β1 (slope): expected change in Y for a one-unit change in X (ceteris paribus).
β0 (intercept): predicted value of Y when X = 0 (may be meaningless if X = 0 is outside data range).

Estimation (OLS): choose β-hats to minimize sum of squared residuals.

Formulas (for simple regression):

β1_hat = Σ(x_i - x̄)(y_i - ȳ) / Σ(x_i - x̄)^2
β0_hat = ȳ - β1_hat * x̄

Good to know: OLS gives unbiased estimates under the Gauss–Markov assumptions (we’ll summarize these next).

Partitioning variance: SST = SSR + SSE

SST (total) = Σ(y_i - ȳ)^2
SSR (explained by model) = Σ(ŷ_i - ȳ)^2
SSE (residual) = Σ(y_i - ŷ_i)^2

R-squared: SSR / SST — proportion of variance in Y explained by X.

Note: A high R² isn't an automatic green light. Check residuals, think economics/logic, and watch for overfitting.

3) Hypothesis testing in regression

Test slope = 0 (no linear relationship):

t = β1_hat / SE(β1_hat)

Compare t to t-critical or compute a p-value. This ties directly to your Statistical Inference knowledge: sampling distributions, t-statistics, and confidence intervals.

Confidence interval for β1: β1_hat ± t_(α/2, n-2) * SE(β1_hat).

Prediction vs. Estimation:

Confidence interval: for the mean E[Y|X=x0]
Prediction interval: for an individual Y at X = x0 (wider because includes residual variance)

4) Assumptions (the LINE checklist) and what breaks

Linearity: relationship is linear in parameters
Independence of errors: no autocorrelation
Normality of errors (for small-sample inference)
Equal variance (homoskedasticity)

If assumptions are violated: biased or inefficient estimates, wrong SEs, and misleading inference.

Common problems and quick remedies:

Heteroskedasticity → use robust (White) standard errors
Autocorrelation (time series) → use Durbin–Watson test; consider AR models or Newey–West SEs
Multicollinearity (in multiple regression) → large SEs, unstable β-hats; check VIFs (>10 is suspicious)
Omitted variable bias → estimate may be biased; think carefully about causal structure

Omitted variable bias formula (simple intuition):

Bias(β1_hat) = β2 * [Cov(X1, X2) / Var(X1)]

Meaning: if an omitted variable affects Y and correlates with X, your β1 is biased.

5) Practical finance example (mini)

Imagine regressing a stock's excess return (Y) on market excess return (X) — the CAPM spirit.

β1_hat is the stock's beta (systematic risk).
Test H0: β1 = 1 (is the stock as risky as market?) with t-test — this is a hypothesis test you've seen in Statistical Inference.
Low R² doesn't mean beta useless — beta may still be a key parameter for risk.

Table (toy data):

Month	Market Excess (%)	Stock Excess (%)
1	2.0	3.0
2	-1.0	-1.5
3	1.5	1.0
4	0.0	0.2

(You'd compute β1_hat using the formulas above — practice this in Excel or your calculator.)

6) Ethics: Don’t be that analyst who lies with statistics

Never imply causation from correlation without a defensible causal model.
Don’t cherry-pick variables or time periods to produce a headline-grabbing R².
Disclose model limitations: sample period, data snooping, and assumption checks.

If your regression magically predicts everything with R² = 0.99, either you’ve discovered a financial miracle or you accidentally leaked future information into your predictor. Probable guilty party: look-back bias or data leakage.

7) Quick checklist before you report regression results

Plot data and residuals (visualize before you worship a number).
Check linearity and influential points (Cook’s distance).
Test for heteroskedasticity and autocorrelation if time series.
Consider multicollinearity in multivariate models (VIFs).
Report β-hats, SEs, t-stats, p-values, R² (and adj. R²), and prediction vs confidence intervals.
Be upfront about potential omitted variables and causality limits.

Closing: TL;DR (with Flair)

Correlation tells you about co-movement, not cause.
Regression estimates marginal effects and lets you test hypotheses (bring your t-tests!).
Assumptions matter — violate them and your inference is a house of cards.
Ethics matters — statistical glamour without transparency = investor harm and exam failure.

Final pep talk: Run your regressions, but don’t worship coefficients. Combine math with economic sense, check assumptions, and always ask: Does this story make sense outside the sample? If not, don’t publish it; fix it.

Version note: This builds on the probability and inference foundations you’ve already learned — now you get to apply those tests to relationships between variables and ask the ethical questions that separate decent analysts from dinner-table anecdote-sellers.

Flashcards

Mind Map

Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Study with flashcards, timelines, and more

Earn certificates for completed courses

Bookmark content for later reference

Track your progress across all topics