Courses/Python for Data Science, AI & Development/Data Sources, Engineering, and Deployment

Data Sources, Engineering, and Deployment

37305 views

Acquire data from files, web, and databases; then test, package, version, and deploy reliable services.

Content

4 of 15

REST APIs and requests

REST APIs & requests in Python for Data Science (Practical)

7847 views

beginner

intermediate

humorous

computer science

data engineering

gpt-5-mini

7847 views

Versions:

REST APIs & requests in Python for Data Science (Practical)

Watch & Learn

AI-discovered learning video

Start learning for free

Bookmark content and pick up later
AI-generated study materials
Flashcards, timelines, and more
Progress tracking and certificates

Free to join · No credit card required

REST APIs and requests — fetch data like a pro (Python)

You already learned web scraping and parsing JSON/XML — now imagine skipping the scraping drama and getting structured data straight from the source.

Why REST APIs matter for data science and engineering

APIs are the clean, official way to get data. While web scraping is a toolbox for when the gatekeeper refuses to cooperate, REST APIs are the golden door: documented, versioned, and usually predictable. In modern data pipelines and ML deployments, you'll use REST APIs to:

Ingest datasets (e.g., financial ticks, weather, social media)
Talk to microservices (feature stores, preprocessing, inference endpoints)
Push results (logging, dashboards, third-party services)

If you enjoyed parsing JSON/XML before, think of REST as “JSON-on-demand” most of the time. Your parsing skills from earlier lessons are going to be used constantly.

Quick REST & HTTP refresher (in plain English)

Endpoint: a URL that represents a resource or action, e.g. https://api.example.com/v1/users
HTTP methods: GET (read, idempotent), POST (create/submit), PUT/PATCH (update), DELETE (remove)
Status codes: 200 OK, 201 Created, 400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found, 429 Too Many Requests, 500 Server Error
Content-Type: tells how data is encoded (application/json, multipart/form-data, image/png)
Headers, params, body: headers carry metadata (auth, content-type), params are query-string filters, body holds JSON/form data for POST/PUT

Remember: GET = ask nicely for data. POST = hand over data or ask the server to run something.

The requests library — your everyday fetch tool

Install: pip install requests

Basic GET

import requests

resp = requests.get('https://api.example.com/data', params={'q': 'nyc', 'limit': 50})
resp.raise_for_status()  # raises HTTPError on bad status
data = resp.json()      # parsed JSON (tie-in: you learned JSON parsing earlier)

POST with JSON

payload = {'text': 'analyze this', 'lang': 'en'}
resp = requests.post('https://api.example.com/analyze', json=payload)
print(resp.status_code, resp.json())

Use a session for connection pooling and default headers

s = requests.Session()
s.headers.update({'Authorization': 'Bearer YOUR_TOKEN', 'Accept': 'application/json'})
resp = s.get('https://api.example.com/me')

Authentication patterns

API keys: Authorization: ApiKey xxxxxxxxx or ?api_key=... (some do this — but prefer headers)
Bearer tokens / OAuth: Authorization: Bearer <token> (common for user-scoped APIs)
Basic auth: username/password in headers (rare for modern public APIs)

Tip: Never hardcode keys. Use environment variables or secret managers (especially in deployments).

Pagination, rate limits, and polite scraping

APIs often split results into pages.

Cursor-based pagination: API returns a next cursor URL — follow until next is null.
Page/limit: you pass page and limit params.

Example cursor loop:

items = []
url = 'https://api.example.com/v1/items'
params = {'limit': 100}
while url:
    resp = s.get(url, params=params)
    resp.raise_for_status()
    body = resp.json()
    items.extend(body['data'])
    url = body.get('next')  # next is a full URL or None

Rate limiting: servers might return 429 or headers like X-RateLimit-Remaining. Implement exponential backoff and respect Retry-After.

Example: retry with urllib3 Retry adapter

from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

retry_strategy = Retry(
    total=5,
    backoff_factor=1,
    status_forcelist=[429, 500, 502, 503, 504],
    allowed_methods=["HEAD", "GET", "OPTIONS", "POST"]
)
adapter = HTTPAdapter(max_retries=retry_strategy)
s.mount("https://", adapter)
s.mount("http://", adapter)

Error handling, timeouts, and robustness

Always set a timeout: requests.get(url, timeout=(3, 30)) (connect, read)
Use resp.raise_for_status() or handle status codes explicitly
For intermittent errors, prefer retries with exponential backoff
Log responses and response bodies for debugging (avoid logging secrets)

Streaming large responses & downloading files

For large binary responses (images, datasets), stream to avoid memory bloat.

with s.get('https://cdn.example.com/large.csv', stream=True) as r:
    r.raise_for_status()
    with open('large.csv', 'wb') as f:
        for chunk in r.iter_content(chunk_size=8192):
            if chunk:
                f.write(chunk)

Calling ML inference endpoints — tie-in with Deep Learning Foundations

You trained a PyTorch model and deployed it as a REST endpoint (e.g., FastAPI, Flask, TorchServe). Here's how to call it from Python:

Example: image classification endpoint that returns JSON probs

files = {'image': open('dog.jpg', 'rb')}
resp = s.post('https://inference.example.com/predict', files=files)
resp.raise_for_status()
result = resp.json()  # e.g. {'predictions': [{'label': 'dog', 'score': 0.98}, ...]}
print(result['predictions'][0])

Alternative: send base64-encoded image in JSON (useful for text-only APIs):

import base64
b64 = base64.b64encode(open('dog.jpg','rb').read()).decode('ascii')
resp = s.post('https://inference.example.com/predict', json={'image_b64': b64})

Design note: when deploying models, expose a small JSON contract (input schema, output schema). Your client code becomes a tiny, robust consumer of that contract.

Async requests for high concurrency

If you need massive parallelism in data ingestion, use aiohttp or httpx (async). Requests is synchronous, which is fine for many ETL jobs, but async scales better for many small API calls.

Best practices checklist (engineering-ready)

Use sessions for connection reuse
Set timeouts and retries (with exponential backoff)
Respect rate limits and Retry-After headers
Keep API keys out of source code (env vars or secrets manager)
Validate API responses and handle missing fields gracefully
Document the contract (input/output) used with ML endpoints
Prefer streaming for large downloads
Monitor latency/error rates and add observability when deployed

Quick summary — what to remember

REST + requests = predictable, structured data ingestion. Use it before you scrape HTML.
Handle pagination, timeouts, retries, and rate limits like a responsible engineer.
For ML workflows, calling inference endpoints is just another API call — be explicit about formats, and parse JSON safely.

"APIs are like the polite neighbors of the web — they’ll give you what you need if you follow their rules. Treat rate limits like shared parking spots: nobody wins by hogging them."

Key takeaways:

Master requests basics (GET/POST/headers/params)
Implement robust retries + backoff
Stream large responses and handle authentication securely
Tie calls into your ML pipeline: fetch features, call inference, log results

Go build a tiny client that hits a public API, stores JSON results, and feeds them into a model you trained in the Deep Learning Foundations lesson. That's where these lessons start to sing.

Tags: beginner, intermediate, humorous, computer-science, data-engineering

Flashcards

Mind Map

Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Study with flashcards, timelines, and more

Earn certificates for completed courses

Bookmark content for later reference

Track your progress across all topics