Foundations of AI and Data Science
Core concepts, roles, workflows, and ethics that frame end‑to‑end AI projects.
Content
Problem framing
Versions:
Watch & Learn
AI-discovered learning video
Problem Framing: Turning Vibes Into Value (Before You Touch a Line of Code)
If CRISP-DM is the map and roles are the party members, problem framing is the quest description. Without it, you are just speed-running chaos.
Remember how we walked through CRISP-DM and everyone got assigned a job like it was a heist movie? Great. Now we are doing the thing that prevents you from building a world-class model that solves the wrong problem. Problem framing is where business reality meets data reality and they hopefully agree to hang out.
What is problem framing?
Problem framing is the disciplined process of translating a messy objective (reduce churn, detect fraud, build a chatbot that is not feral) into a precise, testable, and actionable data or AI problem statement. It connects:
- the decision to be made
- the action that follows
- the prediction or generation the model provides
- the metrics that prove we did not waste three sprints and a GPU budget
Fancy models do not rescue fuzzy questions.
The decision-first mindset
Before talking models, ask: what decision is getting better because of this?
- Decision: what will a human or system do differently with model output?
- Unit of decision: who or what gets acted on? a customer, a transaction, a page view, a product?
- Timing: when is the decision made? real time, hourly, weekly?
- Levers: what actions are possible? offer discount, block transaction, reorder inventory, escalate ticket?
- Constraints: legal, ethical, latency, cost, fairness, privacy.
Now tie that to the model:
- Unit of prediction must match unit of decision.
- Features must be available at decision time (no time travel; leakage is illegal in several dimensions).
- Output must be actionable: class label, ranking, score with threshold, text generation with guardrails.
From business goal to ML or LLM task
Here is a cheat sheet your future self will thank you for.
| Business goal | Unit of decision | Likely task | Typical output | Primary metric | Common traps |
|---|---|---|---|---|---|
| Reduce churn | customer-month | classification or time-to-event | churn probability | uplift in retention rate; cost-adjusted profit | using post-churn features; ignoring incentives and fatigue |
| Cut fraud losses | transaction | anomaly detection or classification | risk score, block/allow | expected cost saved = TP benefit - FP cost | overblocking good users; adversarial drift |
| Improve search | query-session | ranking | ordered list | NDCG, CTR uplift, revenue per session | optimizing offline metric that does not move revenue |
| Forecast demand | product-store-day | time series | numeric forecast with intervals | MAPE, service level | ignoring promos and stockouts; feedback loops |
| Triage tickets | ticket | classification + routing | priority, team | SLA compliance, latency | fairness across issue types |
| Helpful support bot | conversation turn | LLM with retrieval | grounded answer | resolution rate, deflection, quality rubric | hallucinations, privacy leakage |
Choose the task that naturally produces the signal your decision needs. If you want to rank, do not pretend a classifier will magically learn ranking.
Make the objective legible: metrics that actually map to value
Business metrics and model metrics are different species. You need both.
- Business metrics: revenue, cost saved, conversion rate, time-to-resolution, customer satisfaction. These are your North Star and guardrails.
- Model metrics: AUC, F1, MAPE, BLEU, ROUGE, NDCG. These are your odometer, not the destination.
Connect them with expected value:
- For a binary decision with score s and threshold t:
- profit = TP * benefit - FP * cost - FN * missed benefit - operational cost
- Choose t that maximizes expected profit under constraints like fairness and latency.
Sneaky truth: the right metric often demands the right label. If your goal is uplift in retention, your label is response to intervention, not whether someone churned. Enter uplift modeling.
Scoping: assumptions, constraints, and the no-drama checklist
- Data feasibility: Do we have labels? If not, can we pseudo-label, run a pilot, or proxy? How expensive is labeling?
- Prediction horizon: how far ahead must we predict to enable action?
- Latency budget: batch vs near real time vs on device.
- Interpretability: does compliance demand explainability? choose models accordingly.
- Risk appetite: probability of catastrophic failure, harm modeling.
- Time and team: 6 weeks with 2 people is not the same as 6 months with a squad.
- Privacy and security: PII handling, data minimization, governance.
Shipping an 85 percent solution now that drives value beats a 99 percent solution that arrives after the fiscal year ends and the CFO has moved on.
The anatomy of a sharp problem statement
Use this template and bully your future self into filling it out.
Title: Reduce support resolution time via triage assist
Decision: Assign each incoming ticket to the best team and priority level at intake.
Unit of decision/prediction: ticket at creation time.
Action space: assign to team A/B/C and priority P1-P3; escalate if high-risk keywords.
Users and workflow: support agents in Zendesk at intake; prediction appears as suggestion with confidence and rationale.
Objective: decrease median time-to-resolution by 15 percent without increasing misroutes more than 2 percent.
Offline metrics: top-1 team accuracy, top-3 recall, calibration; fairness by issue type.
Online metrics: time-to-first-touch, median resolution time, misroute rate; guardrails: SLA breach rate, CSAT.
Data: past 12 months of tickets with team labels, resolution time, text, metadata; features include text embeddings, product, customer tier; text only from ticket body at creation (no replies to avoid leakage).
Constraints: PII redaction; under 200 ms latency; explanations required for audit.
Risks: bias by customer tier; privacy of free text; hallucinations if LLM generates rationale.
Baselines: rules based routing; simple keyword TF-IDF classifier.
Success criteria: 15 percent faster resolution in A/B test for 4 weeks with no SLA regression.
If you cannot complete this, you are not ready to model. You are in Business Understanding and Data Understanding land, and that is fine.
Labeling and leakage: the time machine problem
Two questions keep projects alive:
- Will the features be known at prediction time? If a feature arrives after the decision, it is leakage.
- Is the label aligned with the decision? If you label churn at month end but act at day 1, your signal will argue with your workflow.
Also sanity-check base rates. If only 0.2 percent of events are fraud, your 99.9 percent accuracy baseline is a clown metric.
Multi-objective reality: trade-offs are not bugs
- Accuracy vs latency vs cost
- Performance vs fairness vs privacy
- Immediate gains vs long-term behavior (customer fatigue, policy feedback)
You are optimizing on a pareto front. Make the trade-offs explicit, and add guardrail metrics (for example do not increase decline rate for good users by more than 0.5 percent).
LLM-specific framing notes (because 2025 happened)
- Source of truth: retrieval over generation when facts matter.
- Evaluation: mix task-specific rubrics, human ratings, and behavioral metrics (deflection, resolution). Hallucination rate is a metric, not a vibe.
- Grounding and safety: cite sources, enforce policies, redact PII, use constrained generation when stakes are high.
- Latency and cost: context length, retrieval fan-out, caching.
- Failure plan: graceful fallbacks to search, forms, or humans.
Anti-patterns and smell tests
- Metric salad: five offline metrics with no connection to revenue or risk.
- Actionless prediction: great model, no decision changes. Value equals zero.
- Stakeholder invisibility: no operator buy-in, no product owner, no legal sign-off.
- Scope creep masquerading as innovation: started with churn, now building a knowledge graph of the universe.
- Gold-plating baseline: spending three sprints on perfecting rules and then declaring ML unnecessary because the baseline is the only thing you built.
Smell tests:
- Can you simulate expected profit with plausible costs and base rates?
- Can you explain in one sentence what changes in the workflow?
- If the model got 0.0 better tomorrow, what would you do differently? If the answer is nothing, stop.
Mini walk-through: reduce return rate in e-commerce
- Decision: at checkout, decide whether to offer size guidance or a fit quiz.
- Unit: customer-item-session at checkout.
- Action: show fit tool A, fit tool B, or do nothing.
- Objective: reduce size-related returns by 10 percent without hurting conversion by more than 1 percent.
- Task: uplift modeling or policy learning; prediction is probability of return reduction under intervention.
- Metrics: online A/B test on return rate and conversion; offline proxy via historical try-on interactions.
- Data: product attributes, customer size history, prior returns; must exclude post-purchase data.
- Constraints: under 150 ms; privacy of size data; fairness across genders and sizes.
- Baselines: simple rules by product category and prior size; randomize between A and B to collect causal data.
- Risks: cannibalizing conversion; cold-start items; seasonality.
Notice how the task, label, and evaluation all changed once we admitted this is an intervention decision, not pure prediction. That pivot is problem framing doing its job.
Roles and the handoffs (aka who does what, when)
- Product or domain lead: defines decision, constraints, value hypothesis.
- Data scientist: formalizes task, labels, metrics, experiment design.
- Data engineer: secures data sources, builds reliable pipelines, ensures features exist at prediction time.
- ML engineer: builds serving infra, latency budgets, monitoring.
- Legal, risk, and ethics: reviews harms, privacy, fairness.
- Operators and UX: integrates into workflow, feedback loops, and adoption.
Hand off artifacts: the problem statement doc, a cost matrix, data schema with time stamps, baseline results, and an experiment plan. CRISP-DM fans, this is the bridge from Business Understanding to Modeling and Evaluation.
Quick-hit checklist
- Decision, unit, timing nailed down
- Actions available and worth taking
- Labels aligned to objective and timing
- Features available at decision time (no leakage)
- Metrics connected to value with cost assumptions
- Constraints and risks explicit; guardrails defined
- Baseline and experiment plan ready
- Stakeholders and workflow mapped
You do not have to be perfect, you have to be legible. Legibility lets you iterate fast and not reinvent your past mistakes every quarter.
Wrap-up
Problem framing is the boring-sounding superpower that turns expensive tinkering into repeatable impact. It ties business outcomes to model choices, protects you from data time travel, and makes success un-ambiguous. Start with the decision, pick a task that produces the needed signal, define metrics that map to value, and expose your assumptions so they can be tested. Do this, and the rest of the lifecycle becomes a victory lap instead of a mystery novel.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!