Foundations of AI and Data Science
Core concepts, roles, workflows, and ethics that frame end‑to‑end AI projects.
Content
Project lifecycle CRISP-DM
Versions:
Watch & Learn
AI-discovered learning video
CRISP-DM: The Project GPS Your Future Self Will Thank You For
"Strategy without process is just vibes. Process without strategy is just chores. CRISP-DM is the peace treaty." — every successful data team, eventually
You already met the characters in our data drama (roles and workflows), and you’ve seen the map (AI vs. Data Science landscape). Now we’re doing the road trip playlist: the CRISP-DM lifecycle, a.k.a. how to get from “we have data??” to “we shipped value, on purpose.”
CRISP-DM stands for Cross-Industry Standard Process for Data Mining. It’s old (1990s), but like a classic leather jacket, it still fits — just layer with MLOps and some GenAI accessories. The magic: it’s iterative, human-centered, and actually practical.
The Six Phases (a.k.a. Your Project’s Plot Points)
1) Business Understanding
- Guiding question: What problem are we solving, and why does it matter now?
- Deliverables: problem statement, success metrics, scope, constraints, ROI hypothesis.
- Roles (from our earlier cast): Product Manager/Owner, Domain Expert, DS Lead, Stakeholders.
- Watch-outs: vague objectives, success metric that is just "accuracy" (no).
- Flavor text: If you can’t explain the win condition in one tweet-length sentence, you’re not done.
2) Data Understanding
- Guiding question: What data exists, how trustworthy is it, and what can it actually say?
- Deliverables: data inventory, data lineage, exploratory analysis, data quality report.
- Roles: Data Engineer, Analyst, Data Scientist.
- Watch-outs: sampling bias, ghost columns, leakage, temporal confusion.
- Tip: The data will show you its drama. Believe it.
3) Data Preparation (a.k.a. Feature Kitchen)
- Guiding question: How do we transform chaos into features our models can digest?
- Deliverables: cleaned datasets, feature definitions, splits, pipelines, documentation.
- Roles: Data Scientist, Data Engineer.
- Watch-outs: overfitting via peeking at test sets, undocumented feature hacks, reproducibility sins.
- Tip: If you can’t rebuild it from raw in one command, it’s art, not engineering.
4) Modeling
- Guiding question: Which modeling approach best balances performance, cost, and interpretability?
- Deliverables: candidate models, hyperparams, training logs, experiment registry.
- Roles: Data Scientist, ML Engineer.
- Watch-outs: leaderboard obsession, ignoring baseline, ignoring latency/throughput constraints.
- Tip: A dumb baseline that works beats a brilliant model that never ships.
5) Evaluation
- Guiding question: Does this work in the real world, for real users, under real constraints?
- Deliverables: evaluation report, fairness checks, robustness tests, cost projections, go/no-go.
- Roles: DS Lead, PM, Risk/Compliance, QA.
- Watch-outs: metric theater, cherry-picking, ignoring drift risk.
- Tip: Use decision thresholds that align to business costs, not your model’s feelings.
6) Deployment
- Guiding question: How do we ship safely, monitor intelligently, and learn continuously?
- Deliverables: API/service, pipelines, monitoring dashboards, rollback plan, runbooks.
- Roles: ML Engineer, MLOps/Platform, SRE, PM.
- Watch-outs: no monitoring, no rollback, no owner after launch (the desert of adoption).
- Tip: Deployment is a beginning, not an epilogue. Welcome to maintenance mode.
CRISP-DM isn’t a straight line — it’s a spiral staircase. You climb by looping with intention.
Real-World Walkthrough: Churn Prediction + LLM Assist
Imagine a subscription app with churn issues. Also, your CEO wants an LLM to “talk users out of canceling.” Cute.
Business Understanding
- Goal: Reduce churn by 10% in Q3 in the US segment. Success: save $2M ARR.
- Constraints: PII rules, on-call limits, inference budget $0.02/user/month.
Data Understanding
- Data sources: transactions, in-app events, support tickets, cancellation reasons.
- Findings: ticket text is gold; events have timezone chaos; 8% of labels ambiguous.
Data Preparation
- Create features: recency/frequency/monetary (RFM), session streaks, last support interaction sentiment (from text), plan type.
- Split by time to prevent leakage. Define feature store entries with owners.
Modeling
- Baselines: logistic regression + XGBoost. Try calibration for decisioning.
- For the LLM: retrieval of support macros; prompt template with guardrails; latency budget 500ms.
Evaluation
- Churn model: optimize for recall at fixed precision tied to outreach cost. Validate on recent months. Fairness: ensure no systematic under-servicing of certain plan types.
- LLM: A/B offline eval with rubric: helpfulness, policy compliance, tone; plus cost-per-session.
Deployment
- Batch scores daily -> CRM triggers. LLM in controlled rollout for retention chat.
- Monitoring: churn lift, cost per save, LLM safety incidents, drift in event distributions.
- Runbook: auto-rollback if precision drops below 0.6 for 24h.
The Handy Table You’ll Screenshot
| Phase | Key Questions | Artifacts | Primary Roles | Common Traps |
|---|---|---|---|---|
| Business Understanding | What’s the economic win? Who’s affected? | Problem brief, KPIs, assumptions | PM, DS Lead, Stakeholders | Vague goals, success = “accuracy” |
| Data Understanding | What exists? Can we trust it? | Data inventory, EDA notebook, data quality report | DE, DS, Analyst | Leakage, survivorship bias |
| Data Preparation | How do we build features repeatably? | Pipelines, feature store, splits | DE, DS | One-off scripts, non-repro transforms |
| Modeling | What works within constraints? | Experiments, baselines, model cards | DS, MLE | Overtuning, ignoring latency/cost |
| Evaluation | Does it generalize and behave? | Eval report, risk review, cost curve | DS Lead, PM, Risk | Metric theater, cherry-picking |
| Deployment | How do we run and learn? | Services, CI/CD, monitoring, runbooks | MLE, MLOps, SRE | No monitoring, no owner |
CRISP-DM Meets MLOps (a.k.a. The Glow-Up)
- Versioning everywhere: data, features, models, prompts.
- CI/CD for data and models: tests for schema, drift, and performance regressions.
- Observability: input distributions, latency SLOs, business KPIs wired to alerts.
- Governance: model cards, data lineage, audit logs, privacy controls.
For GenAI/LLM projects, sprinkle in:
- Prompt/version registries and evaluation harnesses (hallucination, safety, grounding).
- Retrieval pipelines with freshness SLAs.
- Cost-aware routing (cheap model unless quality dips).
- Red-teaming and abuse detection.
Mini Checklists (Put These On Your Wall)
crisp:
business_understanding:
success_metric: "reduce churn 10% in Q3 (US)"
constraints: ["PII rules", "cost <= $0.02/user/mo"]
owners: [pm, ds_lead]
data_understanding:
sources: [events, billing, support_tickets]
risks: [leakage, timezone, missing_labels]
data_preparation:
pipelines: [ingest, clean, feature_store]
tests: [schema, nulls, temporal_leakage]
modeling:
baselines: [log_reg, xgboost]
tracking: mlflow_experiments
evaluation:
metrics: [precision_at_target_recall, cost_curve]
fairness_checks: [segment_parity]
deployment:
rollout: canary_10_percent
monitoring: [drift, latency, kpi_lift]
rollback: threshold_precision < 0.60
Common Misunderstandings (and Why They Keep Wrecking Projects)
- “CRISP is waterfall.”
- No. It’s iterative. You loop when you learn. The key is to loop on purpose.
- “Data prep is just cleaning.”
- It’s architecture. Feature definitions are product interfaces for models.
- “Evaluation ends at AUC.”
- Business value rides on thresholds, costs, fairness, and robustness.
- “Deployment is throwing a pickle file at DevOps.”
- It’s services, contracts, monitoring, and ownership.
If you can’t name the owner of a model after launch, the owner is you. Forever.
Stage Gates That Save Time (and Careers)
- Gate 1: Business Readiness
- Exit criteria: problem brief approved, KPI defined, data access cleared.
- Gate 2: Data Readiness
- Exit criteria: quality report, leakage check, lineage documented.
- Gate 3: Modeling Readiness
- Exit criteria: baseline beats heuristic, constraints documented, model card drafted.
- Gate 4: Launch Readiness
- Exit criteria: monitoring dashboards live, runbook tested, rollback verified, owners named.
Each gate is a chance to stop gracefully instead of digging a prettier hole.
Quick Contrast: Classical ML vs. LLM inside CRISP
- Data Understanding: structured tables vs. unstructured corpora + retrieval sources.
- Prep: feature engineering vs. prompt engineering + chunking + embeddings.
- Modeling: algorithm selection vs. orchestration (RAG, tools, routing).
- Evaluation: metrics like F1 vs. rubric-based evals, human-in-the-loop, cost-quality tradeoffs.
- Deployment: model endpoint vs. pipeline of retriever + LLM + policy layer.
Same staircase, different shoes.
Your Project, As a One-Pager
- Problem: Who is helped? How do we know it worked?
- Data: Where from, how fresh, how biased?
- Features/Prompts: What’s computed, by whom, and how reproducible?
- Model(s): What wins against baseline under constraints?
- Evaluation: What’s the cost curve, fairness, and risk profile?
- Launch: What’s monitored, who’s on call, when do we roll back?
Stick this in your repo’s README. Future you will send you a fruit basket.
TL;DR (but keep the DR anyway)
- CRISP-DM is your process backbone: business → data → prep → model → eval → deploy → loop.
- Map roles explicitly to phases to kill handoff chaos.
- Measure what matters; accuracy is cute, impact pays rent.
- For GenAI, add prompt/retrieval evals, safety, and cost controls — same lifecycle, new artifacts.
- Treat deployment as continuous learning, not graduation.
Powerful insight: The fastest way to ship is to design your loops. CRISP-DM gives you the loops. Now, go spiral upward.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!