From your first line of Python to shipping a real AI-driven automation —
a hands-on curriculum (self-paced or instructor-led) across Python fluency, business data science,
machine learning, AI engineering, and production.
92 runnable notebooks · 14 modules · 300+ end-of-lesson exercises · 269 in-lesson checkpoints · 100% offline
🚀 Quick start ·
📚 Curriculum ·
📓 How it works ·
- Why this course
- What's new
- Quick start
- Choose your path
- Curriculum
- Repository layout
- Datasets
- How each notebook works
- LLM providers
- Open any notebook in Colab — one-click links to all 92 notebooks
- About
- Contributing & licence
- End to end. From
print("hello")to a deployed, scheduled AI automation — no gaps assumed, no steps skipped. - Runs anywhere. One click into Google Colab, or
pip installlocally. Every notebook runs 100% offline — no API key, no paid service required. - Learn by doing. 300+ exercises — including a deliberate 🐞 debug-me in each lesson — every one shipping with a worked solution and the reasoning behind it.
- Built for live teaching. Every lesson is punctuated with short ✋ Quick exercise checkpoints (~2 min each) at natural section breaks, so a class can alternate ~20 minutes of instruction with a quick hands-on pause. 269 across the course, each with a scaffolded starter and a collapsible solution — and every solution has been executed in a fresh kernel to confirm it runs.
- Modern, minimal code. Charts in 1–3 lines (pandas
.plot(), seaborn, sklearn's built-in plot helpers), pipelines over boilerplate — you learn the way practitioners actually write Python today. - Visual where it counts. Key ideas — train/test splits, k-fold cross-validation, grid search, RAG pipelines, MCP topology — come with clean diagrams embedded right in the notebooks.
- Real business problems. Churn & CLV, fraud detection, demand forecasting, customer segmentation, RAG assistants, and AI governance — not toy datasets.
- ✋ Interactive in-lesson checkpoints. Every lesson now embeds short ~2-minute Quick exercise checkpoints at natural section breaks — 269 across the course — so you pause and do every ~20 minutes instead of reading straight through. Each ships a scaffolded starter and a collapsible solution, and every solution has been executed in a fresh kernel to confirm it runs. → How each notebook works
- 🧑🏫 Built for live teaching. The checkpoint rhythm — lecture ~20 min → ~2-min try → reveal — turns any lesson into an interactive class with zero prep.
- 🔌 100% offline, end to end. Every notebook — including the LLM, RAG, and agent lessons — runs with no API key via a built-in
MockLLMand offline stand-ins for the heavy libraries.
A taste of the style you'll be writing by Module 4 — a full, leakage-free model in a handful of lines (pipelines over boilerplate, just like real practitioners):
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
# X, y: a feature matrix and churn labels you've already loaded (see Module 4)
churn_model = make_pipeline(StandardScaler(), LogisticRegression())
auc = cross_val_score(churn_model, X, y, cv=5, scoring="roc_auc").mean()
print(f"5-fold ROC-AUC: {auc:.3f}")No setup yet — just open a notebook and press Run:
Click a badge to open a notebook in your browser; nothing to install.
🔑 You need a free Google account to run the notebooks. Colab gives each signed-in user a free cloud runtime, so the first time you press Run it will ask you to sign in (any Gmail account works). Without signing in you can read a notebook but not execute its cells. No Google account? Use the Local Jupyter option below instead.
- Start the full course —
00_master_onboarding.ipynb - Start the fast track —
00_fast_track_onboarding.ipynb - See it work first (5 min) —
00c_see_it_work.ipynb
Every notebook is listed with its own Colab link in Open any notebook in Colab.
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
jupyter labTested with Python 3.10+. Module 0 includes an environment-check cell. The 13 appendices demo heavier libraries (PyTorch, Prophet, FAISS, LangChain, …), kept commented-out at the bottom of requirements.txt — each still runs offline via a built-in stand-in, so install them only to see the real library at work.
| 🎓 Complete course | 🏎️ Fast track | |
|---|---|---|
| Scope | All 14 modules + 13 optional appendices | The essentials, condensed |
| Notebooks | 46 lessons (+ 13 appendices) | 15 (onboarding + 14 lessons) |
| Time | ~115 hours | ~15 hours |
| Best for | Depth — every exercise, stretch problem & capstone | A credible end-to-end pass in a few evenings |
| Start here | 00_master_onboarding.ipynb |
fast_track/ |
New here? 00c_see_it_work.ipynb is a 5-minute offline demo of what you'll build, and 00b_course_overview.ipynb has the full module map and an interactive time estimator.
| Module | Lessons | Focus |
|---|---|---|
| 0 · Onboarding | — | Setup, orientation, 5-minute demo |
| 1 · Foundations | 1–6 | Variables, control flow, lists, dicts, functions, classes |
| 2 · Data Science | 7–11 | pandas, NumPy, seaborn/matplotlib, statistics, time series |
| 3 · Real-world I/O | 12–13 | HTTP/APIs, SQL, data validation |
| 4 · Machine Learning | 14–16 | scikit-learn, cross-validation & hyperparameter tuning, model evaluation, feature engineering |
| 5 · Industry Applications | 17–20 | Churn/CLV, fraud, segmentation + recommenders, demand & maintenance |
| 6 · AI Engineering | 21–26 | LLM fundamentals, prompts, RAG, agents, document processing, eval & observability |
| 7 · Building AI POCs | 27–30 | Copilot setup, three POCs, RAG deep dive, vector DBs + agentic AI |
| 8 · Agents, Tools & MCP | 31–34 | Agent architectures, robust tools, the Model Context Protocol, multi-agent systems |
| 9 · NLP (Text Analytics) | 35–37 | Topic modeling (BERTopic, STREAM) & sentiment analysis (VADER → classical → transformers) |
| 10 · Optional: DeepTab | 38 | Deep learning for tabular data — Mamba/FT-Transformer/SAINT… behind a scikit-learn API |
| 11 · Production | 39–40 | Packaging notebooks into projects, scheduling |
| 12 · CI/CD & Deployment | 3 labs | Docker, Compose, GitHub Actions, registries, DNS, reverse proxies, HTTPS, Ubuntu deployment — a self-contained mini-book + 3 hands-on lab notebooks + runnable example app |
| 13 · Capstones | 41–42 | Two end-to-end projects (analytics + AI assistant) |
| 14 · Business AI | 43–46 | Digital transformation, architecture, AI-assisted dev, governance |
| 15 · Optional: Django | 2 labs | Wrap a model in a real web app — ORM, admin, forms, a JSON API — a self-contained mini-book + 2 hands-on lab notebooks + runnable example app |
13 optional appendices (classical → deep-learning → foundation-model forecasting, PyTorch, vector stores, RAG/agent frameworks) live beside their modules — all runnable offline. Modules 9–10 are optional, reference-style tracks (text analytics + deep tabular), and Module 15 (Django) is an optional mini-book — all fully offline.
| Path | Contents |
|---|---|
00_onboarding/ … 14_business_ai/ |
The complete course — 46 lessons + 13 appendices |
09_nlp/, 10_deeptab/ |
Optional reference tracks — text analytics (NB 35–37) & deep tabular learning (NB 38) |
12_cicd/ |
CI/CD, Docker & deployment mini-book — 14 chapters + 3 hands-on lab notebooks + a runnable example app |
15_django/ |
Optional: Django for AI web apps — mini-book (7 chapters) + 2 hands-on lab notebooks + a runnable example app, ChurnScope |
fast_track/ |
The fast track — 14 trimmed notebooks (~15 h) |
quizzes/ |
11 short multiple-choice quizzes (Modules 1–9, 11 & 14) |
data/ |
Sample CSVs (support_ops, api_log, customer_feedback) — disk copies of inline data for read_csv practice; see Datasets |
slides/ |
Course-overview deck + lecture decks (PDF + LaTeX source) |
scripts/ |
Helpers — validate/execute every checkpoint (test_checkpoints.py), run every notebook end-to-end, regenerate the hero banner, check NB-number references |
docs/ |
Course-design notes (pedagogical review, module-descriptor coverage) |
llm_providers.py |
Unified interface to OpenAI / Anthropic / Google / Ollama (+ offline MockLLM) |
previous_versions/ |
The legacy flat 19-notebook layout, archived |
The course is offline-first and reproducible: almost every dataset is synthetic and generated inline from a fixed random seed, so each run produces identical data with zero downloads. One fictional world ties them together — a SaaS company running an AI customer-support operation — and its tables (customers, support tickets, API-cost logs, feedback, payments) recur from module to module, so you re-meet familiar data as the techniques get harder. Three of those synthetic tables are also dumped to data/*.csv so you can practise pd.read_csv against real files; two small real datasets (Palmer Penguins, UCI Bike Sharing) are bundled there too, so the optional 📊 try it on real data sections run offline as well; and a few lessons deliberately reach for live public APIs where that is the point.
Output convention. The quantitative teaching notebooks ship with their figures rendered, so every chart is viewable directly on GitHub and in Colab before you run a single cell. Deterministic seeds mean re-running a notebook reproduces the same figures; if you re-run and commit, keep the rendered outputs in place.
Small enough to fit on one screen and travel with the repo (see data/README.md). The first four are disk copies of data the notebooks build inline; the last two are small real datasets bundled for the optional try it on real data sections.
| File | Rows | What it is | Used by |
|---|---|---|---|
data/api_log.csv |
50 | LLM API request log — model, segment, quarter, tokens_in/out, latency_ms |
NB 7 — Pandas fundamentals |
data/support_ops.csv |
60 | Support-ops metrics by channel & month — tickets, automation rate, latency, satisfaction, cost | disk copy of the data NB 41 (Capstone A) builds inline |
data/customer_feedback.csv |
15 | Labelled feedback — text, sentiment, topic |
sample mirroring the inline data in NB 14 & 22 |
forecast.csv |
28 | 7-day weather forecast for 4 cities, saved from the Open-Meteo API (lives at the repo root) | written by NB 12 — APIs & HTTP |
data/penguins.csv |
344 | Real — Palmer Penguins: 3 species' bill/flipper/mass measurements, with real missing values · CC0 | NB 9 — Visualization |
data/bike_sharing_daily.csv |
731 | Real — UCI Bike Sharing: daily rentals 2011–12 with weather & calendar features · CC BY 4.0 | NB 20 — Demand forecasting |
All generated inline (no downloads), grouped by the business problem they illustrate:
| Theme | What's in it | Notebooks |
|---|---|---|
| SaaS customer churn | The course backbone — tenure, charges, support tickets, usage, contract, region, churn label (+ a revenue target) | NB 14–17, 38 |
| LLM cost & latency logs | Support calls tagged by model & channel with tokens, cost, latency, satisfaction | NB 7–9 |
| Support operations | Tickets across five channels (Email/Chat/Phone/Web/Social), queried in an in-memory SQLite DB | NB 13, 24, 41 |
| Fraud / payments | One row per transaction, with planted fraud patterns (night spend, new-device takeover) | NB 18 |
| Customer segmentation | Customers drawn from hidden archetypes for clustering + recommendations | NB 19 |
| Demand & maintenance | Short demand series (lag & rolling features) plus predictive-maintenance signals | NB 20 |
| Time-series forecasting | Daily product-search series with trend & seasonality (classical → Prophet → DL → foundation models) | NB 11, DS A1–A4 |
| Customer feedback & reviews | Piles of short product reviews, support tickets and survey notes for topic models + sentiment | NB 35–37, 42 |
| RAG document corpora | Small knowledge bases / product catalogues chunked, embedded and retrieved | NB 23, 29, 30 |
| Invoices & documents | Synthetic messy invoices for an extraction pipeline | NB 25 |
| Golden eval sets | Tiny labelled sets for evaluating and triaging an AI feature | NB 26 |
| Agent / copilot data | A support copilot's lookup numbers + docs, exposed as tools / MCP resources | NB 31–34 |
| POC & app demo data | ~500 synthetic customer rows, a product catalogue, and random embedding vectors that seed the POC apps | NB 27, 28, 30 |
| Vision & sequences (PyTorch) | Toy 8×8 "digit" images, a synthetic sequence task, and a tiny text-intent set | ML A2, A3 |
| Business case studies | Meridian, a fictional 400-person B2B SaaS, for transformation & governance scenarios | NB 43–46 |
- scikit-learn toy sets — Iris & Wine (NB 14) and Breast Cancer (NB 15), used briefly to anchor the classic ML examples before switching to the synthetic SaaS data.
- Live public APIs (no key required) — Open-Meteo (weather — the running example) and JSONPlaceholder (a fake REST API) in NB 12; Firecrawl web scraping into a RAG-ready dataset in the I/O appendix.
- Pretrained models (models, not datasets — fetched on first use when online, each with an offline fallback) — sentence-transformers embeddings and small Hugging Face transformers in the embeddings/NLP lessons.
MockLLM— not a dataset, but the deterministic offline model that produces the text/JSON for 16 of the AI notebooks; swap one line inllm_providers.pyto call OpenAI / Anthropic / Google / Ollama instead.
If you want to take a lesson onto real data, these fit the course's themes and mostly load in one line (cached after the first fetch):
| Dataset | Load / source | Pairs with | Licence |
|---|---|---|---|
| California Housing | sklearn.datasets.fetch_california_housing() |
M4 regression (14–15) | public |
| 20 Newsgroups | sklearn.datasets.fetch_20newsgroups() |
M9 topic modeling (35–36) | public |
| statsmodels series (CO₂, sunspots, Nile) | statsmodels.datasets.co2.load_pandas() |
M2 stats & forecasting (10–11, A1–A4) | public |
| Telco Customer Churn | Kaggle blastchar/telco-customer-churn · 7,043 rows |
M4–M5 churn (14–17), M10 (38) | IBM sample |
| RAG Mini-Wikipedia | load_dataset("rag-datasets/rag-mini-wikipedia") · corpus + Q/A |
M6–M7 RAG (23, 29, 30) | CC BY 3.0 |
| Twitter Financial News Sentiment | load_dataset("zeroshot/twitter-financial-news-sentiment") |
M9 sentiment (37) | MIT |
| Adult / Census Income | sklearn.datasets.fetch_openml("adult", version=2) |
M14 governance & fairness (46), M10 (38) | public |
| Online Retail II | UCI #352 · ~1M rows | M2 pandas-at-scale, M3 SQL (13), M5 RFM/CLV (19) | CC BY 4.0 |
| Credit Card Fraud (ULB) | Kaggle mlg-ulb/creditcardfraud · 0.17% fraud |
M5 fraud (18) | DbCL v1.0 |
Key-free public APIs for Module 3 (NB 12), beyond Open-Meteo: REST Countries, Frankfurter (FX rates), USGS earthquakes (GeoJSON), and Hacker News — each returns a different JSON shape to practise on.
A consistent six-part template:
🎯 Objectives + ✅ prerequisites → numbered concept sections (prose + runnable code), interleaved with ✋ quick-exercise checkpoints → 🧪 practice exercises (incl. a 🐞 debug-me) → 🧠 stretch exercises A–D → 🎁 bonus mini-project → ✅ self-assessment + 🚀 next step
Every exercise — 300+ across the course — ships with a worked solution and the reasoning behind it.
Beyond the end-of-lesson exercise bank, each lesson embeds short ✋ Quick exercise checkpoints at natural section breaks — roughly one every ~20 minutes of material. They turn a lecture into a rhythm: teach ~20 min → pause for a ~2-minute hands-on exercise → teach again.
Each checkpoint is a self-contained three-cell block:
- Prompt — a focused, business/AI-flavoured task, solvable with only what the lesson has covered up to that point.
# ✍️ Your turn— a scaffolded starter cell to fill in.- ✅ Solution — a collapsible answer with a one-line explanation.
There are 269 of these across the course (3–4 per core lesson, 3 per fast-track lesson, 4 per CI/CD & Django lab), and every code solution has been executed in a fresh Jupyter kernel to verify it actually runs (the few file-content solutions — __init__.py, pytest, pyproject.toml — are validated by inspection). A CI workflow re-checks the structure and syntax of all 269 on every push; run the full kernel test yourself with python scripts/test_checkpoints.py --exec. They run 100% offline like everything else — no API key or network needed. Self-paced learners solve each one as they reach it; instructors use them as the built-in "pause and try" beats of a class. In the conceptual Business-AI lessons (43–46) the middle cell is a short written reflection/decision task instead of code.
🧑🏫 Teaching live? Lecture for ~20 minutes, then jump to the next ✋ checkpoint and give the room ~2 minutes to try it before you reveal the solution. With 3–4 per lesson, a 90-minute class gets several natural interactive breaks — no prep required.
Two more conventions you'll see throughout:
- Charts take a few lines, not a page. Plots are drawn with pandas
.plot(), seaborn one-liners (hue=instead of loops,sns.heatmap(cm, annot=True)for matrices), and scikit-learn's*Displayhelpers — raw matplotlib appears only for final tweaks and multi-panel layouts. - Diagrams travel with the notebook. Explanatory figures (the train/test split, 5-fold cross-validation, the GridSearchCV flow, RAG pipelines, MCP host/client/server topology, walk-forward backtesting, …) are embedded as attachments inside the
.ipynbfiles, so they render on GitHub, in Colab, and locally with no extra image files.
Notebooks 21–26 and 42 run entirely offline with the built-in MockLLM. For real intelligence, swap one line — the unified interface in llm_providers.py supports four providers:
| Provider | Class | When to use |
|---|---|---|
| 🟢 OpenAI | OpenAILLM(model="gpt-5.4-mini") |
Reliable default |
| 🟠 Anthropic | AnthropicLLM(model="claude-haiku-4-5") |
Long context, careful tone |
GoogleLLM(model="gemini-2.5-flash") |
Cheap at scale | |
| 🟣 Ollama | OllamaLLM(model="llama3.2:3b") |
Local — no internet, key, or cost |
Set the matching *_API_KEY env var for hosted providers (never inline). See 06_ai_engineering/A1_llm_providers_guide.ipynb for setup and cost notes. Never commit API keys.
Every notebook below runs in Google Colab with one click — no install, no download. Click a badge to open it. Sign in with a free Google account the first time you run a cell — Colab needs it to give you a cloud runtime.
| Notebook | Open |
|---|---|
00_master_onboarding.ipynb |
|
00b_course_overview.ipynb |
|
00c_see_it_work.ipynb |
| Notebook | Open |
|---|---|
01_python_basics.ipynb |
|
02_control_structures.ipynb |
|
03_lists_data_structures.ipynb |
|
04_dictionaries_advanced.ipynb |
|
05_functions_modules.ipynb |
|
06_classes_and_oop.ipynb |
| Notebook | Open |
|---|---|
12_apis_and_http.ipynb |
|
13_sql_fundamentals.ipynb |
|
A1_web_scraping_firecrawl.ipynb |
| Notebook | Open |
|---|---|
17_churn_clv_retention.ipynb |
|
18_fraud_anomaly_detection.ipynb |
|
19_segmentation_recommenders.ipynb |
|
20_demand_maintenance.ipynb |
| Notebook | Open |
|---|---|
27_from_setup_to_first_poc.ipynb |
|
28_three_pocs_growing_complexity.ipynb |
|
29_rag_pipeline_deep_dive.ipynb |
|
30_vector_db_and_agentic_ai.ipynb |
| Notebook | Open |
|---|---|
31_agent_architectures.ipynb |
|
32_designing_robust_tools.ipynb |
|
33_model_context_protocol.ipynb |
|
34_multi_agent_systems.ipynb |
| Notebook | Open |
|---|---|
35_topic_modeling_bertopic.ipynb |
|
36_topic_modeling_stream.ipynb |
|
37_sentiment_analysis.ipynb |
| Notebook | Open |
|---|---|
38_deeptab_tabular_deep_learning.ipynb |
| Notebook | Open |
|---|---|
39_from_notebook_to_project.ipynb |
|
40_scheduling_orchestration.ipynb |
| Notebook | Open |
|---|---|
lab01_docker_and_compose.ipynb |
|
lab02_ci_pipeline_github_actions.ipynb |
|
lab03_deploy_dns_https_monitoring.ipynb |
| Notebook | Open |
|---|---|
41_capstone_analytics.ipynb |
|
42_capstone_ai_assistant.ipynb |
| Notebook | Open |
|---|---|
43_digital_transformation.ipynb |
|
44_architecture_patterns.ipynb |
|
45_ai_assisted_software_development.ipynb |
|
46_bpm_governance_poc_mvp.ipynb |
| Notebook | Open |
|---|---|
lab01_django_in_a_notebook.ipynb |
|
lab02_serving_a_model_with_auth.ipynb |
I am Prof. Dr. Christoph Weisser, Professor of Mathematics, specializing in Business Data Science at Hochschule Bielefeld (HSBI), and former Technical Lead Analytics & Artificial Intelligence at BASF. My work focuses on Artificial Intelligence, Generative AI, Business Data Science, and agentic AI systems that bridge research with real-world industrial applications.
Before joining academia, I led international AI initiatives at BASF from strategy through production deployment. Today, I combine research, teaching, open-source software development, and selected industry collaborations to advance the practical application of AI.
I completed the PhD Program in Applied Statistics & Empirical Methods (summa cum laude) at Georg-August-Universität Göttingen and studied at the University of Oxford and the University of St Andrews as a scholar of the Studienstiftung des deutschen Volkes. I regularly publish research in leading journals and international conferences and contribute to open-source AI software.
Spotted a bug or an unclear explanation? Open an issue or PR — contributions are welcome.
Licensed under the MIT License (see LICENSE) — use freely for learning, teaching, or anything else.
Happy coding 🚀
