Courses/Full Stack AI and Data Science Professional/Foundations of AI and Data Science

Foundations of AI and Data Science

47 views

Core concepts, roles, workflows, and ethics that frame end‑to‑end AI projects.

Content

4 of 15

Problem framing

The No-Chill Problem-Framing Playbook

2 views

intermediate

humorous

science

sarcastic

gpt-5

2 views

Versions:

The No-Chill Problem-Framing Playbook

Watch & Learn

AI-discovered learning video

YouTube

Start learning for free

Bookmark content and pick up later
AI-generated study materials
Flashcards, timelines, and more
Progress tracking and certificates

Free to join · No credit card required

Problem Framing: Turning Vibes Into Value (Before You Touch a Line of Code)

If CRISP-DM is the map and roles are the party members, problem framing is the quest description. Without it, you are just speed-running chaos.

Remember how we walked through CRISP-DM and everyone got assigned a job like it was a heist movie? Great. Now we are doing the thing that prevents you from building a world-class model that solves the wrong problem. Problem framing is where business reality meets data reality and they hopefully agree to hang out.

What is problem framing?

Problem framing is the disciplined process of translating a messy objective (reduce churn, detect fraud, build a chatbot that is not feral) into a precise, testable, and actionable data or AI problem statement. It connects:

the decision to be made
the action that follows
the prediction or generation the model provides
the metrics that prove we did not waste three sprints and a GPU budget

Fancy models do not rescue fuzzy questions.

The decision-first mindset

Before talking models, ask: what decision is getting better because of this?

Decision: what will a human or system do differently with model output?
Unit of decision: who or what gets acted on? a customer, a transaction, a page view, a product?
Timing: when is the decision made? real time, hourly, weekly?
Levers: what actions are possible? offer discount, block transaction, reorder inventory, escalate ticket?
Constraints: legal, ethical, latency, cost, fairness, privacy.

Now tie that to the model:

Unit of prediction must match unit of decision.
Features must be available at decision time (no time travel; leakage is illegal in several dimensions).
Output must be actionable: class label, ranking, score with threshold, text generation with guardrails.

From business goal to ML or LLM task

Here is a cheat sheet your future self will thank you for.

Business goal	Unit of decision	Likely task	Typical output	Primary metric	Common traps
Reduce churn	customer-month	classification or time-to-event	churn probability	uplift in retention rate; cost-adjusted profit	using post-churn features; ignoring incentives and fatigue
Cut fraud losses	transaction	anomaly detection or classification	risk score, block/allow	expected cost saved = TP benefit - FP cost	overblocking good users; adversarial drift
Improve search	query-session	ranking	ordered list	NDCG, CTR uplift, revenue per session	optimizing offline metric that does not move revenue
Forecast demand	product-store-day	time series	numeric forecast with intervals	MAPE, service level	ignoring promos and stockouts; feedback loops
Triage tickets	ticket	classification + routing	priority, team	SLA compliance, latency	fairness across issue types
Helpful support bot	conversation turn	LLM with retrieval	grounded answer	resolution rate, deflection, quality rubric	hallucinations, privacy leakage

Choose the task that naturally produces the signal your decision needs. If you want to rank, do not pretend a classifier will magically learn ranking.

Make the objective legible: metrics that actually map to value

Business metrics and model metrics are different species. You need both.

Business metrics: revenue, cost saved, conversion rate, time-to-resolution, customer satisfaction. These are your North Star and guardrails.
Model metrics: AUC, F1, MAPE, BLEU, ROUGE, NDCG. These are your odometer, not the destination.

Connect them with expected value:

For a binary decision with score s and threshold t:
- profit = TP * benefit - FP * cost - FN * missed benefit - operational cost
Choose t that maximizes expected profit under constraints like fairness and latency.

Sneaky truth: the right metric often demands the right label. If your goal is uplift in retention, your label is response to intervention, not whether someone churned. Enter uplift modeling.

Scoping: assumptions, constraints, and the no-drama checklist

Data feasibility: Do we have labels? If not, can we pseudo-label, run a pilot, or proxy? How expensive is labeling?
Prediction horizon: how far ahead must we predict to enable action?
Latency budget: batch vs near real time vs on device.
Interpretability: does compliance demand explainability? choose models accordingly.
Risk appetite: probability of catastrophic failure, harm modeling.
Time and team: 6 weeks with 2 people is not the same as 6 months with a squad.
Privacy and security: PII handling, data minimization, governance.

Shipping an 85 percent solution now that drives value beats a 99 percent solution that arrives after the fiscal year ends and the CFO has moved on.

The anatomy of a sharp problem statement

Use this template and bully your future self into filling it out.

Title: Reduce support resolution time via triage assist

Decision: Assign each incoming ticket to the best team and priority level at intake.

Unit of decision/prediction: ticket at creation time.

Action space: assign to team A/B/C and priority P1-P3; escalate if high-risk keywords.

Users and workflow: support agents in Zendesk at intake; prediction appears as suggestion with confidence and rationale.

Objective: decrease median time-to-resolution by 15 percent without increasing misroutes more than 2 percent.

Offline metrics: top-1 team accuracy, top-3 recall, calibration; fairness by issue type.

Online metrics: time-to-first-touch, median resolution time, misroute rate; guardrails: SLA breach rate, CSAT.

Data: past 12 months of tickets with team labels, resolution time, text, metadata; features include text embeddings, product, customer tier; text only from ticket body at creation (no replies to avoid leakage).

Constraints: PII redaction; under 200 ms latency; explanations required for audit.

Risks: bias by customer tier; privacy of free text; hallucinations if LLM generates rationale.

Baselines: rules based routing; simple keyword TF-IDF classifier.

Success criteria: 15 percent faster resolution in A/B test for 4 weeks with no SLA regression.

If you cannot complete this, you are not ready to model. You are in Business Understanding and Data Understanding land, and that is fine.

Labeling and leakage: the time machine problem

Two questions keep projects alive:

Will the features be known at prediction time? If a feature arrives after the decision, it is leakage.
Is the label aligned with the decision? If you label churn at month end but act at day 1, your signal will argue with your workflow.

Also sanity-check base rates. If only 0.2 percent of events are fraud, your 99.9 percent accuracy baseline is a clown metric.

Multi-objective reality: trade-offs are not bugs

Accuracy vs latency vs cost
Performance vs fairness vs privacy
Immediate gains vs long-term behavior (customer fatigue, policy feedback)

You are optimizing on a pareto front. Make the trade-offs explicit, and add guardrail metrics (for example do not increase decline rate for good users by more than 0.5 percent).

LLM-specific framing notes (because 2025 happened)

Source of truth: retrieval over generation when facts matter.
Evaluation: mix task-specific rubrics, human ratings, and behavioral metrics (deflection, resolution). Hallucination rate is a metric, not a vibe.
Grounding and safety: cite sources, enforce policies, redact PII, use constrained generation when stakes are high.
Latency and cost: context length, retrieval fan-out, caching.
Failure plan: graceful fallbacks to search, forms, or humans.

Anti-patterns and smell tests

Metric salad: five offline metrics with no connection to revenue or risk.
Actionless prediction: great model, no decision changes. Value equals zero.
Stakeholder invisibility: no operator buy-in, no product owner, no legal sign-off.
Scope creep masquerading as innovation: started with churn, now building a knowledge graph of the universe.
Gold-plating baseline: spending three sprints on perfecting rules and then declaring ML unnecessary because the baseline is the only thing you built.

Smell tests:

Can you simulate expected profit with plausible costs and base rates?
Can you explain in one sentence what changes in the workflow?
If the model got 0.0 better tomorrow, what would you do differently? If the answer is nothing, stop.

Mini walk-through: reduce return rate in e-commerce

Decision: at checkout, decide whether to offer size guidance or a fit quiz.
Unit: customer-item-session at checkout.
Action: show fit tool A, fit tool B, or do nothing.
Objective: reduce size-related returns by 10 percent without hurting conversion by more than 1 percent.
Task: uplift modeling or policy learning; prediction is probability of return reduction under intervention.
Metrics: online A/B test on return rate and conversion; offline proxy via historical try-on interactions.
Data: product attributes, customer size history, prior returns; must exclude post-purchase data.
Constraints: under 150 ms; privacy of size data; fairness across genders and sizes.
Baselines: simple rules by product category and prior size; randomize between A and B to collect causal data.
Risks: cannibalizing conversion; cold-start items; seasonality.

Notice how the task, label, and evaluation all changed once we admitted this is an intervention decision, not pure prediction. That pivot is problem framing doing its job.

Roles and the handoffs (aka who does what, when)

Product or domain lead: defines decision, constraints, value hypothesis.
Data scientist: formalizes task, labels, metrics, experiment design.
Data engineer: secures data sources, builds reliable pipelines, ensures features exist at prediction time.
ML engineer: builds serving infra, latency budgets, monitoring.
Legal, risk, and ethics: reviews harms, privacy, fairness.
Operators and UX: integrates into workflow, feedback loops, and adoption.

Hand off artifacts: the problem statement doc, a cost matrix, data schema with time stamps, baseline results, and an experiment plan. CRISP-DM fans, this is the bridge from Business Understanding to Modeling and Evaluation.

Quick-hit checklist

Decision, unit, timing nailed down
Actions available and worth taking
Labels aligned to objective and timing
Features available at decision time (no leakage)
Metrics connected to value with cost assumptions
Constraints and risks explicit; guardrails defined
Baseline and experiment plan ready
Stakeholders and workflow mapped

You do not have to be perfect, you have to be legible. Legibility lets you iterate fast and not reinvent your past mistakes every quarter.

Wrap-up

Problem framing is the boring-sounding superpower that turns expensive tinkering into repeatable impact. It ties business outcomes to model choices, protects you from data time travel, and makes success un-ambiguous. Start with the decision, pick a task that produces the needed signal, define metrics that map to value, and expose your assumptions so they can be tested. Do this, and the rest of the lifecycle becomes a victory lap instead of a mystery novel.

Flashcards

Mind Map

Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Study with flashcards, timelines, and more

Earn certificates for completed courses

Bookmark content for later reference

Track your progress across all topics