Skip to content

Commit 3644739

Browse files
travisjneumanclaude
andcommitted
feat: flesh out all 15 level-5 SOLUTION.md with complete annotated solutions
Each SOLUTION.md now contains the complete working code from project.py with WHY comments explaining design reasoning, a Design Decisions table, Alternative Approaches section with code snippets, and Common Pitfalls warnings. Projects covered: schedule-ready script, alert threshold monitor, multi-file ETL runner, config layer priority, plugin-style transformer, metrics summary engine, resilient JSON loader, cross-file joiner, template report renderer, API polling simulator, retry backoff runner, fail-safe exporter, operational run logger, change detection tool, and level-5 mini capstone. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 7b8d0ce commit 3644739

30 files changed

Lines changed: 6350 additions & 1227 deletions

File tree

Lines changed: 208 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,58 +1,227 @@
1-
# Solution: Level 4 / Project 01 - Schema Validator Engine
1+
# Schema Validator Engine — Annotated Solution
22

3-
> **STOP** — Have you attempted this project yourself first?
4-
>
5-
> Learning happens in the struggle, not in reading answers.
6-
> Spend at least 20 minutes trying before reading this solution.
7-
> If you are stuck, try the [Walkthrough](./WALKTHROUGH.md) first — it guides
8-
> your thinking without giving away the answer.
3+
> **STOP!** Try solving this yourself first. Use the [project README](./README.md) and [walkthrough](./WALKTHROUGH.md) before reading the solution.
94
105
---
116

12-
13-
## Complete solution
7+
## Complete Solution
148

159
```python
16-
# WHY configure_logging: [explain the design reason]
17-
# WHY load_schema: [explain the design reason]
18-
# WHY load_records: [explain the design reason]
19-
# WHY validate_record: [explain the design reason]
20-
# WHY validate_all: [explain the design reason]
21-
# WHY run: [explain the design reason]
22-
# WHY parse_args: [explain the design reason]
23-
# WHY main: [explain the design reason]
24-
25-
# [paste the complete working solution here]
26-
# Include WHY comments on every non-obvious line.
10+
"""Level 4 / Project 01 — Schema Validator Engine.
11+
12+
Validates data records against a JSON schema definition.
13+
Demonstrates: schema loading, type checking, required-field enforcement,
14+
and structured error collection.
15+
"""
16+
17+
from __future__ import annotations
18+
19+
import argparse
20+
import json
21+
import logging
22+
from pathlib import Path
23+
24+
# ---------- logging setup ----------
25+
26+
def configure_logging() -> None:
27+
"""Set up structured logging so every validation event is traceable."""
28+
# WHY: The pipe-delimited layout (timestamp | level | message) makes logs
29+
# easy to parse with CLI tools like awk and grep, which matters when
30+
# debugging validation failures across thousands of records.
31+
logging.basicConfig(
32+
level=logging.INFO,
33+
format="%(asctime)s | %(levelname)s | %(message)s",
34+
)
35+
36+
# ---------- schema helpers ----------
37+
38+
# WHY: We translate JSON schema type names ("string", "integer") into Python
39+
# builtins so isinstance() can check values directly. This avoids scattered
40+
# if/elif chains and makes adding new types a one-line change.
41+
TYPE_MAP: dict[str, type] = {
42+
"string": str,
43+
"integer": int,
44+
"float": float,
45+
"boolean": bool,
46+
"number": (int, float), # type: ignore[assignment]
47+
}
48+
49+
50+
def load_schema(path: Path) -> dict:
51+
"""Load a JSON schema file that describes expected fields."""
52+
# WHY: Fail early with a clear message if the schema file is missing,
53+
# rather than letting json.loads raise a confusing error later.
54+
if not path.exists():
55+
raise FileNotFoundError(f"Schema not found: {path}")
56+
return json.loads(path.read_text(encoding="utf-8"))
57+
58+
59+
def load_records(path: Path) -> list[dict]:
60+
"""Load a JSON array of data records to validate."""
61+
if not path.exists():
62+
raise FileNotFoundError(f"Records file not found: {path}")
63+
data = json.loads(path.read_text(encoding="utf-8"))
64+
# WHY: Validate the top-level structure up front. A JSON object instead
65+
# of an array would cause cryptic errors during iteration.
66+
if not isinstance(data, list):
67+
raise ValueError("Records file must contain a JSON array")
68+
return data
69+
70+
# ---------- validation logic ----------
71+
72+
73+
def validate_record(record: dict, schema: dict) -> list[str]:
74+
"""Validate one record against the schema, returning a list of errors.
75+
76+
Checks performed:
77+
1. Required fields must be present and non-null.
78+
2. Field values must match the declared type.
79+
3. Numeric fields must fall within min/max bounds (if specified).
80+
"""
81+
# WHY: Returning a list instead of raising exceptions lets the caller
82+
# decide how to handle invalid records (log, quarantine, fail, etc.).
83+
errors: list[str] = []
84+
fields_spec = schema.get("fields", {})
85+
86+
for field_name, rules in fields_spec.items():
87+
value = record.get(field_name)
88+
89+
# WHY: Check both "value is None" and "field_name not in record"
90+
# because a field could exist with a None value or be entirely
91+
# absent — both count as missing.
92+
if rules.get("required", False) and (value is None or field_name not in record):
93+
errors.append(f"missing required field '{field_name}'")
94+
continue # no point checking type/range on a missing field
95+
96+
if field_name not in record:
97+
continue # optional and absent — that is fine
98+
99+
# WHY: Look up the Python type from TYPE_MAP so we can use isinstance()
100+
# for a clean, extensible type check.
101+
expected = TYPE_MAP.get(rules.get("type", ""), None)
102+
if expected and not isinstance(value, expected):
103+
errors.append(
104+
f"field '{field_name}' expected {rules['type']}, "
105+
f"got {type(value).__name__}"
106+
)
107+
continue # skip range check if type is wrong
108+
109+
# WHY: Range checks only make sense for numeric values, so guard
110+
# with isinstance before comparing.
111+
if isinstance(value, (int, float)):
112+
if "min" in rules and value < rules["min"]:
113+
errors.append(
114+
f"field '{field_name}' value {value} < min {rules['min']}"
115+
)
116+
if "max" in rules and value > rules["max"]:
117+
errors.append(
118+
f"field '{field_name}' value {value} > max {rules['max']}"
119+
)
120+
121+
# WHY: Flag extra fields because in data pipelines, unexpected columns
122+
# often signal upstream schema drift. Surfacing them early prevents
123+
# silent data loss or misinterpretation downstream.
124+
for key in record:
125+
if key not in fields_spec:
126+
errors.append(f"unexpected field '{key}'")
127+
128+
return errors
129+
130+
131+
def validate_all(records: list[dict], schema: dict) -> dict:
132+
"""Validate every record and return a structured report."""
133+
report: dict = {"total": len(records), "valid": 0, "invalid": 0, "errors": []}
134+
135+
for idx, record in enumerate(records):
136+
issues = validate_record(record, schema)
137+
if issues:
138+
report["invalid"] += 1
139+
report["errors"].append({"record_index": idx, "issues": issues})
140+
logging.warning("record %d invalid: %s", idx, issues)
141+
else:
142+
report["valid"] += 1
143+
144+
return report
145+
146+
# ---------- CLI ----------
147+
148+
149+
def run(schema_path: Path, records_path: Path, output_path: Path) -> dict:
150+
"""Full validation run: load schema + records, validate, write report."""
151+
schema = load_schema(schema_path)
152+
records = load_records(records_path)
153+
report = validate_all(records, schema)
154+
155+
# WHY: Create parent directories automatically so the user does not need
156+
# to manually mkdir before running the tool.
157+
output_path.parent.mkdir(parents=True, exist_ok=True)
158+
output_path.write_text(json.dumps(report, indent=2), encoding="utf-8")
159+
logging.info("Validation complete — %d valid, %d invalid", report["valid"], report["invalid"])
160+
return report
161+
162+
163+
def parse_args() -> argparse.Namespace:
164+
parser = argparse.ArgumentParser(description="Validate records against a JSON schema")
165+
parser.add_argument("--schema", default="data/schema.json", help="Path to schema file")
166+
parser.add_argument("--input", default="data/records.json", help="Path to records file")
167+
parser.add_argument("--output", default="data/validation_report.json", help="Output report path")
168+
return parser.parse_args()
169+
170+
171+
def main() -> None:
172+
configure_logging()
173+
args = parse_args()
174+
report = run(Path(args.schema), Path(args.input), Path(args.output))
175+
print(json.dumps(report, indent=2))
176+
177+
178+
if __name__ == "__main__":
179+
main()
27180
```
28181

29-
## Design decisions
182+
## Design Decisions
30183

31-
| Decision | Why | Alternative considered |
32-
|----------|-----|----------------------|
33-
| configure_logging function | [reason] | [alternative] |
34-
| load_schema function | [reason] | [alternative] |
35-
| load_records function | [reason] | [alternative] |
184+
| Decision | Why |
185+
|----------|-----|
186+
| `TYPE_MAP` as a module-level constant | Keeps the mapping in one place. Adding a new type (e.g., `"date"`) is a single-line change instead of editing validation logic. |
187+
| Collect all errors per record instead of stopping at the first | Batch reporting is more useful for data pipelines — fixing one error at a time and re-running is slow when you have thousands of records. |
188+
| Flag unexpected fields in the record | Catches upstream schema drift early. In production, a new column appearing silently can cause downstream bugs that are hard to trace. |
189+
| Separate `load_schema` / `load_records` / `validate_record` functions | Each function has one job. You can test validation without touching the filesystem, or swap the loader for a database reader. |
36190

37-
## Alternative approaches
191+
## Alternative Approaches
38192

39-
### Approach B: [Name]
193+
### Using a validation library (e.g., `jsonschema` or `pydantic`)
40194

41195
```python
42-
# [Different valid approach with trade-offs explained]
196+
from pydantic import BaseModel, validator
197+
198+
class PersonRecord(BaseModel):
199+
name: str
200+
age: int
201+
email: str | None = None
202+
203+
@validator("age")
204+
def age_in_range(cls, v):
205+
if not 0 <= v <= 150:
206+
raise ValueError("age out of range")
207+
return v
43208
```
44209

45-
**Trade-off:** [When you would prefer this approach vs the primary one]
210+
**Trade-off:** Libraries like `pydantic` handle nested objects, custom validators, and type coercion out of the box, but they add a dependency and hide the validation mechanics. Writing your own validator teaches you exactly how schema checking works, which matters when you need to customize behavior or debug failures.
211+
212+
### Using `try/except` per record instead of error lists
46213

47-
## What could go wrong
214+
```python
215+
def validate_or_raise(record, schema):
216+
for field, rules in schema["fields"].items():
217+
if rules["required"] and field not in record:
218+
raise ValueError(f"Missing {field}")
219+
```
48220

49-
| Scenario | What happens | Prevention |
50-
|----------|-------------|------------|
51-
| [bad input] | [error/behavior] | [how to handle] |
52-
| [edge case] | [behavior] | [how to handle] |
221+
**Trade-off:** Raising exceptions is simpler to write but only reports the first error per record. The list-based approach in the main solution is better for batch data work where you want to see all problems at once.
53222

54-
## Key takeaways
223+
## Common Pitfalls
55224

56-
1. [Most important lesson from this project]
57-
2. [Second lesson]
58-
3. [Connection to future concepts]
225+
1. **Forgetting that `bool` is a subclass of `int` in Python**`isinstance(True, int)` returns `True`. If your schema has both `"boolean"` and `"integer"` types, check for `bool` first or a boolean value will pass an integer check.
226+
2. **Checking `value is None` but not `field_name not in record`** — A field can be present with value `None` (explicit null in JSON), or entirely absent from the dict. Both are "missing" but require different checks.
227+
3. **Mutating the input records during validation** — If you add or modify fields on the original dicts, subsequent validation passes or downstream code will see corrupted data. Always work on copies if you need to transform.

0 commit comments

Comments
 (0)