Courses/Full Stack AI and Data Science Professional/Foundations of AI and Data Science

Foundations of AI and Data Science

47 views

Core concepts, roles, workflows, and ethics that frame end‑to‑end AI projects.

Content

10 of 15

Command line essentials

The No-Chill CLI Crash Course

1 views

beginner

humorous

science

gpt-5

1 views

Versions:

The No-Chill CLI Crash Course

Watch & Learn

AI-discovered learning video

YouTube

Watch & Learn

AI-discovered learning video

YouTube

Start learning for free

Bookmark content and pick up later
AI-generated study materials
Flashcards, timelines, and more
Progress tracking and certificates

Free to join · No credit card required

Command Line Essentials: The Power Tools Your GUI Was Hiding From You

"The command line is like the gym for your brain — minimal decor, no distractions, wildly effective. Also a little scary until you learn where the weights go."

We just wrangled environments and dependencies, and had a civil-yet-spicy debate about notebooks vs scripts. Now it’s time to learn the thing that stitches those worlds together: the command line. The CLI is how you glue workflows, automate the boring parts, and yeet friction out of your data life. If you’ve ever thought, "There must be a faster way," the CLI politely says, "There is."

What Even Is a Shell (And Why Should AI People Care)?

A shell is your text-based interface to the computer. Common ones:
- bash/zsh (macOS/Linux)
- PowerShell (Windows)
You type commands; it does your bidding (usually). This is where you:
- Spin up/activate environments
- Run scripts and notebooks
- Inspect data files quickly
- Fetch datasets and wire up pipelines

Expert take: If your workflow can’t be expressed on the command line, it’ll be hard to automate, version, and scale. GUI clicks don’t commit to Git.

Navigating Like a Pro (aka: Stop Getting Lost)

You live in a filesystem. Know the neighborhood.

pwd — print working directory (where am I?)
ls -lah — list files (show me everything, including hidden dotfiles)
cd path/to/place — go somewhere
cd .. — go up one level; cd ~ — go home
mkdir -p data/raw — make directories, parents included
touch notes.txt — create an empty file
cp src.py backup/src.py — copy; mv a b — move/rename
rm file; rm -r folder — remove (careful)

Paths & globs you will meet:

. = current dir, .. = parent, ~ = home
*.csv matches all CSVs; data/{raw,processed} creates two dirs
Quote paths with spaces: cd 'My Data'

Quick peek at files:

head -n 5 big.csv — first 5 lines
tail -n 5 — last 5 lines
wc -l big.csv — how many rows
du -sh data/ — folder size

Pipes, Redirection, and The Art of Doing 5 Things At Once

> redirect output to a file; >> append
| pipe output of one command into the next

Examples you’ll use on day one:

# Count unique values in a column (CSV, comma-separated)
cut -d, -f3 data.csv | sort | uniq -c | sort -nr | head

# Save the first 1000 rows of a huge file
head -n 1000 big.csv > sample.csv

# Log output while still seeing it in the terminal
python train.py | tee logs/train.out

Working with compressed files:

zcat big.csv.gz | head
zgrep -i 'error' logs.gz

Your pipeline is a conveyor belt. Each command adds a transformation. Lego, but for text.

Find Stuff Fast: grep, find, jq (Your New Besties)

grep -R 'pattern' . — search recursively for text in files
grep -R --line-number --ignore-case 'todo' src/
find . -name '*.ipynb' -maxdepth 2 — find notebooks nearby

For JSON (APIs, logs), meet jq:

# Pretty-print JSON
echo '{"acc":0.91,"loss":0.23}' | jq

# Extract a field from a JSONL dataset
jq -r '.label' data.jsonl | sort | uniq -c

Lightweight text surgery:

# Replace tabs with commas in a TSV
sed 's/\t/,/g' data.tsv > data.csv

# Sum the 2nd column (numbers only)
awk -F, '{sum += $2} END {print sum}' data.csv

Why do people keep misunderstanding this? Because grep/awk/sed look like line noise. But they’re fast, composable, and perfect for quick checks without spinning up Python.

Environments & Dependencies — But Make It CLI

Remember our environment saga? Here’s the command-line muscle behind it.

Conda:

conda create -n ds-env python=3.11
conda activate ds-env
conda install numpy pandas scikit-learn
conda env export > environment.yml

venv + pip:

python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -r requirements.txt
pip freeze > requirements.txt

Path sanity checks:

which python         # macOS/Linux
where python         # Windows
python -c 'import sys; print(sys.executable)'

Environment variables (for API keys, secrets):

export OPENAI_API_KEY=sk-...
export WANDB_PROJECT=my-experiment
python train.py

Pro-tip: use a .env file with a loader (e.g., python-dotenv) or direnv so you don’t accidentally leak secrets in bash history.

Notebooks vs Scripts: Command-Line Edition

Start a notebook server:

jupyter lab  # or: jupyter notebook

Run a notebook headlessly (great for CI):

jupyter nbconvert --to notebook --execute notebook.ipynb --output executed.ipynb

Run a script with arguments:

python train.py --epochs 10 --lr 3e-4 --data data/processed

Make a script directly executable:

# In train.py, add the first line:
# !/usr/bin/env python

chmod +x train.py
./train.py --help

Notebooks are for exploration; scripts are for repeatability. The CLI is how you move from vibes to verified.

Git, Quickly (Because Future You Deserves Nice Things)

git init
git status
git add src/ notebook.ipynb requirements.txt
git commit -m 'Add baseline model'

Use .gitignore to avoid committing gigantic datasets and environment folders:

# .gitignore
*.pyc
.venv/
__pycache__/
.env
/data/

Bonus: git lfs for large artifacts, or use dataset registries and keep repos lean.

Fetch Data Like a Hacker (Legally)

curl -L -o data/raw/housing.csv https://example.com/housing.csv
wget -P data/raw https://example.com/housing.csv

# Test an API and parse JSON
curl -s 'https://api.example.com/items?limit=5' | jq '.items[] | {id, name}'

Remote machines:

ssh user@server
scp model.pkl user@server:/home/user/models/

Permissions, Sudo, and Other Spicy Buttons

Who am I? whoami
What’s executable? ls -l
Make it executable: chmod u+x script.sh
Ownership: chown user:group file

Use sudo sparingly. If you need it to install Python packages, consider fixing your environment instead.

A good rule: if a command makes you sweat, try a dry-run or read the --help first.

Customize Your Shell (Treat Yo’Self)

Add aliases and functions in ~/.bashrc or ~/.zshrc:

alias gs='git status'
alias ll='ls -lah'
function mkcd() { mkdir -p "$1" && cd "$1"; }

Persistent environment setup:

export PYTHONBREAKPOINT=ipdb.set_trace
export PIP_INDEX_URL=https://pypi.org/simple

Reload with source ~/.zshrc (or open a new terminal).

Cross-Platform Notes (So You Don’t Cry Later)

Windows: PowerShell is not bash. Install WSL for a Linux-like environment.
Paths: Windows uses backslashes; bash uses slashes. Many tools expect /.
Quoting rules differ; when scripts must run everywhere, prefer Python entrypoints.

Cheat Sheet: Commands You’ll Actually Use

Command	What it does	Why a data person cares
`ls -lah`	List files with sizes	Spot giant CSVs before RAM screams
`head/tail`	Peek at files	Sanity-check data quickly
`wc -l`	Count lines	Instant row count
`cut/sort/uniq`	Column ops + dedupe	Explore categories and frequency
`grep -R`	Search text recursively	Find code, configs, log patterns
`find`	Locate files by name/type	Hunt notebooks or models
`jq`	JSON query	APIs, logs, configs at speed
`conda/venv`	Manage environments	Reproducible science
`python script.py`	Run scripts	Batch jobs, automation
`jupyter nbconvert`	Execute notebooks	CI and reproducibility
`curl/wget`	Download data	Pipeline inputs
`git`	Version control	Collaborate without chaos

Small Frictions That Cause Big Headaches (and Fixes)

Spaces in filenames? Use quotes: cd 'My Data'
Accidentally nuked a folder with rm -r? Consider trash-cli to send to system trash.
Mysterious 'command not found'? Check echo $PATH. If a tool isn’t on PATH, either reinstall or export its path.
Python mismatch? which python, then python -V. Activate the right environment.
Slow notebook? Check running processes: top or htop (install), and watch that memory.

Try This Mini-Workflow

# 1) Create project skeleton
mkdir -p ds-project/{data/raw,data/processed,src,notebooks}
cd ds-project

# 2) Environment
python -m venv .venv && source .venv/bin/activate
pip install pandas scikit-learn jupyter

# 3) Get data
curl -L -o data/raw/titanic.csv https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv

# 4) Quick checks
wc -l data/raw/titanic.csv
head data/raw/titanic.csv | cut -d, -f3 | sort | uniq -c

# 5) Start notebook for exploration
jupyter lab

If it feels smooth, you’ve tasted CLI power. If it feels chaotic, that’s normal — you just leveled up from tourist to apprentice.

Wrap-Up: The CLI Is Your Exoskeleton

The command line gives you speed, automation, and reproducibility.
Environments, notebooks, and scripts all become more useful when you can glue them with pipes, redirection, and a few trusty utilities.
Your future self (and your teammates) will thank you for commands that can be documented, versioned, and rerun.

Final insight: Tools change; text interfaces endure. Learn the CLI once, and every new stack bows a little faster.

Now go open a terminal and make your computer do tricks.

Flashcards

Mind Map

Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Study with flashcards, timelines, and more

Earn certificates for completed courses

Bookmark content for later reference

Track your progress across all topics