Skip to content

Commit f3138b5

Browse files
travisjneumanclaude
andcommitted
feat: add flashcard decks for levels 6-10 and key expansion modules
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 2fd05d4 commit f3138b5

10 files changed

Lines changed: 1606 additions & 0 deletions

practice/flashcards/README.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,8 @@ The runner uses the **Leitner box system**:
3333

3434
## Card Decks
3535

36+
### Core Levels
37+
3638
| File | Level | Cards | Topics |
3739
|------|-------|-------|--------|
3840
| `level-00-cards.json` | Absolute Beginner | 25 | Variables, print, input, basic types |
@@ -42,6 +44,20 @@ The runner uses the **Leitner box system**:
4244
| `level-3-cards.json` | File Automation | 25 | pathlib, os, shutil, glob, CSV |
4345
| `level-4-cards.json` | JSON & Data | 25 | json module, nested data, schemas, validation |
4446
| `level-5-cards.json` | Exceptions | 25 | try/except, custom exceptions, logging, context managers |
47+
| `level-6-cards.json` | SQL & ETL | 25 | SQL, staging areas, ETL patterns, idempotent operations, data integrity |
48+
| `level-7-cards.json` | API Integration | 25 | API adapters, caching, polling, observability, rate limiting, contracts |
49+
| `level-8-cards.json` | Dashboards & Resilience | 25 | Concurrency, thread safety, fault injection, graceful degradation, SLAs |
50+
| `level-9-cards.json` | Architecture & Governance | 25 | Architecture patterns, SLOs, capacity planning, security, design principles |
51+
| `level-10-cards.json` | Enterprise Excellence | 25 | Enterprise patterns, compliance, production readiness, operational excellence |
52+
53+
### Expansion Modules
54+
55+
| File | Module | Cards | Topics |
56+
|------|--------|-------|--------|
57+
| `module-web-scraping-cards.json` | Web Scraping | 15 | requests, BeautifulSoup, CSS selectors, pagination, robots.txt, CSV |
58+
| `module-fastapi-cards.json` | FastAPI Web Apps | 17 | FastAPI, Pydantic, path/query params, dependency injection, JWT, uvicorn |
59+
| `module-databases-cards.json` | Databases & ORM | 17 | sqlite3, SQLAlchemy Core/ORM, sessions, Alembic migrations, query optimization |
60+
| `module-django-cards.json` | Django Full-Stack | 18 | Django models, views, templates, URL routing, DRF serializers, admin |
4561

4662
## Card Format
4763

practice/flashcards/level-10-cards.json

Lines changed: 206 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 206 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,206 @@
1+
{
2+
"deck": "Level 6 — SQL & ETL Patterns",
3+
"description": "SQL connection patterns, staging areas, ETL design, idempotent operations, data integrity and lineage",
4+
"cards": [
5+
{
6+
"id": "6-01",
7+
"front": "What is a staging table and why do ETL pipelines use one?",
8+
"back": "A temporary holding area where raw data lands before being validated and merged into production tables.\n\nBenefits:\n- Isolates dirty data from production\n- Allows validation before insert\n- Makes reprocessing easy if something fails\n- Decouples extraction from loading",
9+
"concept_ref": "projects/level-6/02-staging-table-loader/README.md",
10+
"difficulty": 2,
11+
"tags": ["etl", "staging", "data-integrity"]
12+
},
13+
{
14+
"id": "6-02",
15+
"front": "What does 'idempotent' mean in the context of data operations?",
16+
"back": "An operation is idempotent if running it multiple times produces the same result as running it once.\n\nExample: An UPSERT (INSERT or UPDATE) is idempotent — re-running it with the same data does not create duplicates.\n\nA plain INSERT is NOT idempotent — running it twice creates duplicate rows.",
17+
"concept_ref": "projects/level-6/03-idempotency-key-builder/README.md",
18+
"difficulty": 2,
19+
"tags": ["idempotency", "data-integrity"]
20+
},
21+
{
22+
"id": "6-03",
23+
"front": "What is an UPSERT and how do you express it in SQL?",
24+
"back": "UPSERT = INSERT if the row does not exist, UPDATE if it does.\n\nSQLite syntax:\nINSERT INTO products (id, name, price)\nVALUES (1, 'Widget', 9.99)\nON CONFLICT(id) DO UPDATE SET\n name = excluded.name,\n price = excluded.price;\n\n'excluded' refers to the values that were attempted to be inserted.",
25+
"concept_ref": "projects/level-6/04-upsert-strategy-lab/README.md",
26+
"difficulty": 2,
27+
"tags": ["sql", "upsert", "idempotency"]
28+
},
29+
{
30+
"id": "6-04",
31+
"front": "What is a database transaction and why does it matter for ETL?",
32+
"back": "A transaction groups multiple SQL statements into a single atomic unit.\n\nEither ALL statements succeed (COMMIT) or ALL are undone (ROLLBACK).\n\nIn Python:\nconn = sqlite3.connect('db.sqlite')\ntry:\n conn.execute('INSERT ...')\n conn.execute('UPDATE ...')\n conn.commit()\nexcept Exception:\n conn.rollback()\n raise\n\nPrevents partial writes that leave data in an inconsistent state.",
33+
"concept_ref": "projects/level-6/05-transaction-rollback-drill/README.md",
34+
"difficulty": 2,
35+
"tags": ["sql", "transactions", "rollback"]
36+
},
37+
{
38+
"id": "6-05",
39+
"front": "What is ETL and what does each letter stand for?",
40+
"back": "E — Extract: pull data from a source (API, file, database)\nT — Transform: clean, validate, reshape the data\nL — Load: write the data into the target database\n\nETL pipelines run these three steps in sequence, often on a schedule (hourly, daily).",
41+
"concept_ref": "projects/level-6/README.md",
42+
"difficulty": 1,
43+
"tags": ["etl", "fundamentals"]
44+
},
45+
{
46+
"id": "6-06",
47+
"front": "What is data lineage and why should you track it?",
48+
"back": "Data lineage records WHERE data came from, WHAT transformations were applied, and WHEN it arrived.\n\nTracking lineage lets you:\n- Debug data quality issues back to their source\n- Prove compliance for audits\n- Understand impact when a source changes\n- Reproduce any dataset from its origin",
49+
"concept_ref": "projects/level-6/08-data-lineage-capture/README.md",
50+
"difficulty": 2,
51+
"tags": ["lineage", "data-integrity", "observability"]
52+
},
53+
{
54+
"id": "6-07",
55+
"front": "What is the difference between a full load and an incremental load?",
56+
"back": "Full load: drop and reload ALL data every time. Simple but slow for large datasets.\n\nIncremental load: only process NEW or CHANGED records since the last run. Uses a watermark (timestamp or ID) to track progress.\n\nIncremental is faster but more complex — you must handle deletes and track the high-water mark between runs.",
57+
"concept_ref": "projects/level-6/09-incremental-load-simulator/README.md",
58+
"difficulty": 2,
59+
"tags": ["etl", "incremental", "full-load"]
60+
},
61+
{
62+
"id": "6-08",
63+
"front": "What is a dead letter queue (or dead letter table) in data pipelines?",
64+
"back": "A place where rows that failed validation or processing are stored instead of being silently dropped.\n\nEach dead letter record includes:\n- The original data\n- The error message\n- A timestamp\n- The pipeline stage where it failed\n\nThis lets you investigate and replay failed records later.",
65+
"concept_ref": "projects/level-6/11-dead-letter-row-handler/README.md",
66+
"difficulty": 2,
67+
"tags": ["etl", "error-handling", "dead-letter"]
68+
},
69+
{
70+
"id": "6-09",
71+
"front": "How do you use Python's sqlite3 module to connect and query a database?",
72+
"back": "import sqlite3\n\nconn = sqlite3.connect('my.db')\ncursor = conn.cursor()\n\n# Always use parameterized queries (never f-strings!)\ncursor.execute('SELECT * FROM users WHERE age > ?', (18,))\nrows = cursor.fetchall()\n\nconn.close()\n\nUse conn as a context manager for auto-commit:\nwith sqlite3.connect('my.db') as conn:\n conn.execute('INSERT INTO ...')",
73+
"concept_ref": "projects/level-6/01-mssql-connection-simulator/README.md",
74+
"difficulty": 1,
75+
"tags": ["sqlite3", "sql", "python"]
76+
},
77+
{
78+
"id": "6-10",
79+
"front": "Why should you NEVER use f-strings or string concatenation in SQL queries?",
80+
"back": "SQL injection. If user input is inserted directly into SQL, an attacker can manipulate the query.\n\n# DANGEROUS\ncursor.execute(f\"SELECT * FROM users WHERE name = '{name}'\")\n# If name = \"'; DROP TABLE users; --\" your table is gone\n\n# SAFE — parameterized query\ncursor.execute('SELECT * FROM users WHERE name = ?', (name,))\n\nThe database driver escapes parameters automatically.",
81+
"concept_ref": "projects/level-6/01-mssql-connection-simulator/README.md",
82+
"difficulty": 1,
83+
"tags": ["sql", "security", "injection"]
84+
},
85+
{
86+
"id": "6-11",
87+
"front": "What is table drift and how do you detect it?",
88+
"back": "Table drift is when a table's actual schema diverges from its expected schema — columns added, removed, or type-changed without updating the pipeline.\n\nDetection: compare the live schema (PRAGMA table_info in SQLite) against a stored expected schema.\n\nDrift causes silent data corruption when pipelines assume a structure that no longer matches reality.",
89+
"concept_ref": "projects/level-6/10-table-drift-detector/README.md",
90+
"difficulty": 3,
91+
"tags": ["schema", "drift", "data-integrity"]
92+
},
93+
{
94+
"id": "6-12",
95+
"front": "What is a batch window and why do ETL jobs use them?",
96+
"back": "A batch window is a scheduled time period when ETL jobs run, typically during low-traffic hours.\n\nPurpose:\n- Avoid competing with user queries for database resources\n- Ensure data is consistent at known points in time\n- Allow dependent jobs to chain in sequence\n\nExample: nightly batch window from 2am-5am processes the previous day's data.",
97+
"concept_ref": "projects/level-6/13-batch-window-controller/README.md",
98+
"difficulty": 2,
99+
"tags": ["etl", "scheduling", "batch"]
100+
},
101+
{
102+
"id": "6-13",
103+
"front": "What is a runbook and what should it contain?",
104+
"back": "A runbook is a step-by-step guide for operating, troubleshooting, or recovering a system.\n\nA good runbook includes:\n- What the system does and its dependencies\n- How to start, stop, and restart it\n- Common failure modes and their fixes\n- Escalation contacts\n- Verification steps to confirm recovery\n\nRunbooks turn tribal knowledge into repeatable procedures.",
105+
"concept_ref": "projects/level-6/14-sql-runbook-generator/README.md",
106+
"difficulty": 2,
107+
"tags": ["operations", "runbook", "documentation"]
108+
},
109+
{
110+
"id": "6-14",
111+
"front": "What does EXPLAIN do in SQL and why is it useful?",
112+
"back": "EXPLAIN shows the query execution plan — how the database will process your query.\n\nSQLite: EXPLAIN QUERY PLAN SELECT * FROM orders WHERE customer_id = 5;\n\nIt reveals:\n- Whether indexes are being used\n- Table scan vs index scan\n- Join order and strategy\n\nUse it to find slow queries that need indexes or restructuring.",
113+
"concept_ref": "projects/level-6/06-query-performance-checker/README.md",
114+
"difficulty": 3,
115+
"tags": ["sql", "performance", "explain"]
116+
},
117+
{
118+
"id": "6-15",
119+
"front": "What is an index in a database and when should you create one?",
120+
"back": "An index is a data structure that speeds up lookups on a column, like a book's index.\n\nCREATE INDEX idx_orders_customer ON orders(customer_id);\n\nCreate indexes on columns you:\n- Filter with WHERE\n- Join on (foreign keys)\n- Sort with ORDER BY\n\nTrade-off: indexes speed reads but slow writes (the index must be updated on every INSERT/UPDATE).",
121+
"concept_ref": "projects/level-6/06-query-performance-checker/README.md",
122+
"difficulty": 2,
123+
"tags": ["sql", "index", "performance"]
124+
},
125+
{
126+
"id": "6-16",
127+
"front": "What is the difference between DELETE, TRUNCATE, and DROP?",
128+
"back": "DELETE FROM table WHERE ...; — removes matching rows, can be rolled back, fires triggers.\n\nTRUNCATE TABLE table; — removes ALL rows instantly, cannot be rolled back in most databases, resets auto-increment.\n\nDROP TABLE table; — removes the entire table structure and all data permanently.\n\nIn ETL: use DELETE for selective cleanup, TRUNCATE for full reloads, DROP only when removing a table entirely.",
129+
"concept_ref": "projects/level-6/05-transaction-rollback-drill/README.md",
130+
"difficulty": 2,
131+
"tags": ["sql", "delete", "truncate", "drop"]
132+
},
133+
{
134+
"id": "6-17",
135+
"front": "What is a foreign key constraint and why does it matter?",
136+
"back": "A foreign key links a column in one table to the primary key of another, enforcing referential integrity.\n\nCREATE TABLE orders (\n id INTEGER PRIMARY KEY,\n customer_id INTEGER REFERENCES customers(id)\n);\n\nThe database will reject an INSERT with a customer_id that does not exist in the customers table. This prevents orphaned records.",
137+
"concept_ref": "projects/level-6/02-staging-table-loader/README.md",
138+
"difficulty": 2,
139+
"tags": ["sql", "foreign-key", "integrity"]
140+
},
141+
{
142+
"id": "6-18",
143+
"front": "What is a high-water mark in incremental loading?",
144+
"back": "A stored value (usually a timestamp or auto-increment ID) that marks the last successfully processed record.\n\nOn the next run, the pipeline queries:\nSELECT * FROM source WHERE updated_at > :last_watermark\n\nAfter successful processing, update the watermark.\n\nStore it reliably (database, file) so it survives crashes and restarts.",
145+
"concept_ref": "projects/level-6/09-incremental-load-simulator/README.md",
146+
"difficulty": 3,
147+
"tags": ["etl", "incremental", "watermark"]
148+
},
149+
{
150+
"id": "6-19",
151+
"front": "What does ACID stand for in databases?",
152+
"back": "A — Atomicity: all or nothing (transactions)\nC — Consistency: data always valid (constraints enforced)\nI — Isolation: concurrent transactions don't interfere\nD — Durability: committed data survives crashes\n\nACID guarantees are what make relational databases reliable for business data.",
153+
"concept_ref": "projects/level-6/05-transaction-rollback-drill/README.md",
154+
"difficulty": 2,
155+
"tags": ["sql", "acid", "fundamentals"]
156+
},
157+
{
158+
"id": "6-20",
159+
"front": "What is an ETL health dashboard and what metrics should it show?",
160+
"back": "A dashboard that shows the operational status of your data pipelines.\n\nKey metrics:\n- Row counts (expected vs actual)\n- Run duration and trends\n- Error/dead letter counts\n- Last successful run time\n- Data freshness (how old is the latest record?)\n\nThese metrics let you detect problems before users notice stale or wrong data.",
161+
"concept_ref": "projects/level-6/12-etl-health-dashboard-feed/README.md",
162+
"difficulty": 2,
163+
"tags": ["etl", "monitoring", "dashboard"]
164+
},
165+
{
166+
"id": "6-21",
167+
"front": "What is the difference between cursor.fetchone(), fetchall(), and fetchmany()?",
168+
"back": "fetchone() — returns the next single row, or None if no more rows.\n\nfetchall() — returns ALL remaining rows as a list. Careful with large result sets (loads everything into memory).\n\nfetchmany(n) — returns up to n rows. Good for processing in batches.\n\nFor large datasets, iterate the cursor directly:\nfor row in cursor:\n process(row)",
169+
"concept_ref": "projects/level-6/01-mssql-connection-simulator/README.md",
170+
"difficulty": 1,
171+
"tags": ["sqlite3", "cursor", "python"]
172+
},
173+
{
174+
"id": "6-22",
175+
"front": "What is a connection pool and why would you use one?",
176+
"back": "A collection of pre-opened database connections that are shared and reused instead of opening a new connection for every query.\n\nBenefits:\n- Opening connections is slow; reusing is fast\n- Limits the max connections to avoid overwhelming the database\n- Handles connection lifecycle (health checks, timeouts)\n\nIn SQLAlchemy: engine = create_engine(url, pool_size=5, max_overflow=10)",
177+
"concept_ref": "projects/level-6/01-mssql-connection-simulator/README.md",
178+
"difficulty": 3,
179+
"tags": ["database", "connection-pool", "performance"]
180+
},
181+
{
182+
"id": "6-23",
183+
"front": "What is a SQL summary publisher and why automate it?",
184+
"back": "A process that runs aggregate queries and publishes the results (to a file, dashboard, or notification channel).\n\nExamples:\n- Daily sales totals by region\n- Row counts per table for data quality\n- Top N records by some metric\n\nAutomation ensures reports are consistent, timely, and do not depend on someone remembering to run them manually.",
185+
"concept_ref": "projects/level-6/07-sql-summary-publisher/README.md",
186+
"difficulty": 1,
187+
"tags": ["sql", "reporting", "automation"]
188+
},
189+
{
190+
"id": "6-24",
191+
"front": "What is ON CONFLICT in SQLite and when do you use it?",
192+
"back": "ON CONFLICT specifies what to do when an INSERT violates a uniqueness constraint.\n\nStrategies:\n- ABORT (default): cancel the statement\n- IGNORE: skip the conflicting row silently\n- REPLACE: delete the old row, insert the new one\n- DO UPDATE SET ...: update specific columns (UPSERT)\n\nINSERT OR IGNORE INTO logs (...) VALUES (...);\nINSERT INTO items (...) VALUES (...)\n ON CONFLICT(id) DO UPDATE SET name = excluded.name;",
193+
"concept_ref": "projects/level-6/04-upsert-strategy-lab/README.md",
194+
"difficulty": 3,
195+
"tags": ["sql", "conflict", "upsert"]
196+
},
197+
{
198+
"id": "6-25",
199+
"front": "What makes a good idempotency key?",
200+
"back": "An idempotency key uniquely identifies an operation so it can be safely retried.\n\nGood keys are:\n- Deterministic: same input always produces the same key\n- Unique: no two different operations share a key\n- Stable: does not change between retries\n\nCommon patterns:\n- Hash of the input data: hashlib.sha256(payload).hexdigest()\n- Natural keys: (source_system, record_id, date)\n- UUIDs generated by the caller (not the server)",
201+
"concept_ref": "projects/level-6/03-idempotency-key-builder/README.md",
202+
"difficulty": 3,
203+
"tags": ["idempotency", "keys", "design"]
204+
}
205+
]
206+
}

0 commit comments

Comments
 (0)