Add CV screening example with curated resume test set#4607
Conversation
A walkthrough demo for classifying CVs against a job spec with Agenta: - Curated test set of 30 real Markdown CVs (from the public opensporks/resumes dataset on Hugging Face, a mirror of the Kaggle Resume Dataset), hand-labeled against an IT Manager job spec - prepare_testset.py rebuilds the CSV reproducibly and can upload it to Agenta via the SDK - create_app.py creates the completion app with the screening prompt and structured-output JSON schema, and deploys it to production - Streamlit demo UI: PDF upload -> Markdown (markitdown) -> prompt fetched from the Agenta registry -> structured score dashboard - Sample CV PDFs (one per classification) generated from the test set https://claude.ai/code/session_01YMbf4sUb2VBFQHGNKv6yh3
The Streamlit app now shows a thumbs up/down form with an optional comment after each screening. Submitting it attaches the feedback to the screening's trace in Agenta as an annotation (evaluator slug 'user-feedback'), following the capture-user-feedback cookbook: the invocation link is captured inside the instrumented classify_cv call and the annotation is POSTed to /api/simple/traces/. Screening results now persist in session state so the result and feedback form survive Streamlit reruns. Entry scripts load .env via python-dotenv, matching the documented setup flow. https://claude.ai/code/session_01YMbf4sUb2VBFQHGNKv6yh3
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR introduces a complete, production-ready CV screening example for the Python SDK. It includes shared configuration with a structured JSON schema for classification results, scripts to prepare a curated test set from external resume data, an Agenta deployment script, and an interactive Streamlit demo that fetches prompts from Agenta, runs LLM screening, and collects user feedback as trace annotations. ChangesCV Screening Example
Sequence Diagram(s)sequenceDiagram
participant User
participant StreamlitApp as Streamlit App
participant Agenta
participant OpenAI
User->>StreamlitApp: Upload CV PDF
StreamlitApp->>StreamlitApp: Convert PDF to Markdown
StreamlitApp->>Agenta: Fetch production prompt config
Agenta-->>StreamlitApp: Return prompt + LLM config
User->>StreamlitApp: Click "Screen CV" button
StreamlitApp->>OpenAI: Call chat completion<br/>with prompt + schema
OpenAI-->>StreamlitApp: Return structured JSON<br/>(scores, requirements, classification)
StreamlitApp->>StreamlitApp: Render classification banner<br/>+ score metrics + requirements
StreamlitApp->>Agenta: Capture trace invocation ID
User->>StreamlitApp: Submit feedback<br/>(thumbs up/down + comment)
StreamlitApp->>Agenta: POST feedback as trace annotation
Agenta-->>StreamlitApp: Success response (200/202)
StreamlitApp-->>User: Show feedback confirmation
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsStopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
examples/python/cv-screening/requirements.txt (1)
1-19:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftPin dependency versions to avoid pulling vulnerable packages.
The requirements file specifies no version constraints, which means
pip installwill fetch the latest versions of all packages and their transitive dependencies. OSV Scanner has flagged numerous critical and high-severity vulnerabilities in transitive dependencies that could be pulled in, including:
- aiohttp: 23 CRITICAL issues (SSRF, header injection, DoS, credential leaks)
- gitpython: 9 CRITICAL issues (RCE, path traversal, arbitrary code execution)
- litellm: 13 CRITICAL issues (SSTI, SQL injection, SSRF, eval-based RCE)
- pillow: 6 CRITICAL issues (arbitrary code execution, buffer overflow, DoS)
- pyarrow: 3 CRITICAL issues (arbitrary code execution)
While this is example code, users may run it in environments connected to real data or networks. Unpinned dependencies create a supply-chain risk.
🔒 Recommendation
Generate a pinned
requirements.txtby running:pip install -r requirements.txt pip freeze > requirements.txtThen review the frozen versions and update any packages flagged by
pip-auditor OSV Scanner. Alternatively, specify minimum safe versions inline:# Agenta SDK + LLM client -agenta -openai -python-dotenv +agenta>=0.28.0 +openai>=1.0.0 +python-dotenv>=1.0.0For the remaining packages, apply the same pattern after verifying secure minimum versions.
🧹 Nitpick comments (1)
examples/python/cv-screening/Readme.md (1)
20-22: 💤 Low valueAdd language identifier to code fence.
The code fence starting at line 20 lacks a language identifier, triggering a markdownlint warning (MD040). While this is ASCII art rather than code, specifying
textor leaving it as triple-backticks with no syntax highlighting improves consistency.📝 Proposed fix
-``` +```text PDF upload ──> Markdown (markitdown) ──> prompt fetched from Agenta ──> LLM ──> structured scores</details> <!-- cr-comment:v1:63b9a63971a5e8574a05aef6 --> </blockquote></details> </blockquote></details> --- <details> <summary>ℹ️ Review info</summary> <details> <summary>⚙️ Run configuration</summary> **Configuration used**: Organization UI **Review profile**: CHILL **Plan**: Pro Plus **Run ID**: `7c1b7401-bc27-47de-b94d-d0d734c5558f` </details> <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between aed2d47357cc8d88347011835c7cc1f3f7f08ea7 and c28d1a2dca9c1982a3b8885929de57447d80e256. </details> <details> <summary>⛔ Files ignored due to path filters (4)</summary> * `examples/python/cv-screening/data/sample_cvs/candidate_chef.pdf` is excluded by `!**/*.pdf` * `examples/python/cv-screening/data/sample_cvs/candidate_it_manager.pdf` is excluded by `!**/*.pdf` * `examples/python/cv-screening/data/sample_cvs/candidate_it_supervisor.pdf` is excluded by `!**/*.pdf` * `examples/python/cv-screening/data/testset.csv` is excluded by `!**/*.csv` </details> <details> <summary>📒 Files selected for processing (10)</summary> * `examples/python/Readme.md` * `examples/python/cv-screening/.env.example` * `examples/python/cv-screening/Readme.md` * `examples/python/cv-screening/app.py` * `examples/python/cv-screening/config.py` * `examples/python/cv-screening/create_app.py` * `examples/python/cv-screening/data/.gitignore` * `examples/python/cv-screening/make_sample_pdfs.py` * `examples/python/cv-screening/prepare_testset.py` * `examples/python/cv-screening/requirements.txt` </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
| response = requests.post( | ||
| f"{host}/api/simple/traces/", | ||
| headers={ | ||
| "Content-Type": "application/json", | ||
| "Authorization": f"ApiKey {os.environ['AGENTA_API_KEY']}", | ||
| }, | ||
| json={ | ||
| "trace": { | ||
| "data": {"outputs": outputs}, | ||
| "references": {"evaluator": {"slug": FEEDBACK_EVALUATOR_SLUG}}, | ||
| "links": {"invocation": invocation}, | ||
| } | ||
| }, | ||
| timeout=30, | ||
| ) | ||
| return response.status_code in (200, 202) |
There was a problem hiding this comment.
Handle feedback POST failures explicitly.
requests.post(...) can raise on timeout/connection issues, which can crash feedback submission instead of returning a clean UI error path.
Proposed fix
def send_feedback(invocation: dict, thumbs_up: bool, comment: str) -> bool:
@@
- response = requests.post(
- f"{host}/api/simple/traces/",
- headers={
- "Content-Type": "application/json",
- "Authorization": f"ApiKey {os.environ['AGENTA_API_KEY']}",
- },
- json={
- "trace": {
- "data": {"outputs": outputs},
- "references": {"evaluator": {"slug": FEEDBACK_EVALUATOR_SLUG}},
- "links": {"invocation": invocation},
- }
- },
- timeout=30,
- )
- return response.status_code in (200, 202)
+ try:
+ response = requests.post(
+ f"{host}/api/simple/traces/",
+ headers={
+ "Content-Type": "application/json",
+ "Authorization": f"ApiKey {os.environ['AGENTA_API_KEY']}",
+ },
+ json={
+ "trace": {
+ "data": {"outputs": outputs},
+ "references": {"evaluator": {"slug": FEEDBACK_EVALUATOR_SLUG}},
+ "links": {"invocation": invocation},
+ }
+ },
+ timeout=30,
+ )
+ except requests.RequestException:
+ return False
+ return response.ok| cv_markdown = pdf_to_markdown(uploaded.getvalue()) | ||
| with st.expander("Extracted Markdown", expanded=False): | ||
| st.markdown(cv_markdown) | ||
|
|
||
| if st.button("Screen candidate", type="primary"): | ||
| with st.spinner("Evaluating CV against the job spec ..."): | ||
| result = classify_cv(cv_markdown, config) | ||
| st.session_state["screening"] = {"cv": cv_markdown, "result": result} |
There was a problem hiding this comment.
Guard conversion/classification with user-facing error handling.
The main screening path can raise on PDF parsing, LLM call, or JSON decoding; currently those failures bubble up and break the interaction.
Proposed fix
- cv_markdown = pdf_to_markdown(uploaded.getvalue())
+ try:
+ cv_markdown = pdf_to_markdown(uploaded.getvalue())
+ except Exception as exc:
+ st.error(f"Could not read this PDF: {exc}")
+ return
@@
if st.button("Screen candidate", type="primary"):
with st.spinner("Evaluating CV against the job spec ..."):
- result = classify_cv(cv_markdown, config)
+ try:
+ result = classify_cv(cv_markdown, config)
+ except Exception as exc:
+ st.error(f"Screening failed: {exc}")
+ return
st.session_state["screening"] = {"cv": cv_markdown, "result": result}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cv_markdown = pdf_to_markdown(uploaded.getvalue()) | |
| with st.expander("Extracted Markdown", expanded=False): | |
| st.markdown(cv_markdown) | |
| if st.button("Screen candidate", type="primary"): | |
| with st.spinner("Evaluating CV against the job spec ..."): | |
| result = classify_cv(cv_markdown, config) | |
| st.session_state["screening"] = {"cv": cv_markdown, "result": result} | |
| try: | |
| cv_markdown = pdf_to_markdown(uploaded.getvalue()) | |
| except Exception as exc: | |
| st.error(f"Could not read this PDF: {exc}") | |
| return | |
| with st.expander("Extracted Markdown", expanded=False): | |
| st.markdown(cv_markdown) | |
| if st.button("Screen candidate", type="primary"): | |
| with st.spinner("Evaluating CV against the job spec ..."): | |
| try: | |
| result = classify_cv(cv_markdown, config) | |
| except Exception as exc: | |
| st.error(f"Screening failed: {exc}") | |
| return | |
| st.session_state["screening"] = {"cv": cv_markdown, "result": result} |
| try: | ||
| ag.AppManager.create(app_slug=APP_SLUG, app_type="SERVICE:completion") | ||
| except Exception as exc: # noqa: BLE001 - app may already exist | ||
| print(f" Application not created ({exc}); assuming it already exists.") | ||
|
|
||
| print(f"Committing prompt to variant '{VARIANT_SLUG}' ...") | ||
| try: | ||
| variant = ag.VariantManager.create( | ||
| parameters=PROMPT_CONFIG, | ||
| app_slug=APP_SLUG, | ||
| variant_slug=VARIANT_SLUG, | ||
| ) | ||
| except Exception: | ||
| # The variant already exists: commit a new version instead. | ||
| variant = ag.VariantManager.commit( | ||
| parameters=PROMPT_CONFIG, | ||
| app_slug=APP_SLUG, | ||
| variant_slug=VARIANT_SLUG, | ||
| ) |
There was a problem hiding this comment.
Handle duplicate-resource paths explicitly and re-raise real failures.
Both create steps treat any exception as an “already exists” case. That can silently hide real failures (auth, connectivity, API/server errors) and still attempt production deploy.
Proposed fix
print(f"Creating application '{APP_SLUG}' ...")
try:
ag.AppManager.create(app_slug=APP_SLUG, app_type="SERVICE:completion")
except Exception as exc: # noqa: BLE001 - app may already exist
- print(f" Application not created ({exc}); assuming it already exists.")
+ if "already exists" in str(exc).lower():
+ print(f" Application already exists ({exc}).")
+ else:
+ raise
print(f"Committing prompt to variant '{VARIANT_SLUG}' ...")
try:
variant = ag.VariantManager.create(
parameters=PROMPT_CONFIG,
app_slug=APP_SLUG,
variant_slug=VARIANT_SLUG,
)
- except Exception:
+ except Exception as exc:
# The variant already exists: commit a new version instead.
- variant = ag.VariantManager.commit(
- parameters=PROMPT_CONFIG,
- app_slug=APP_SLUG,
- variant_slug=VARIANT_SLUG,
- )
+ if "already exists" in str(exc).lower():
+ variant = ag.VariantManager.commit(
+ parameters=PROMPT_CONFIG,
+ app_slug=APP_SLUG,
+ variant_slug=VARIANT_SLUG,
+ )
+ else:
+ raise📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| ag.AppManager.create(app_slug=APP_SLUG, app_type="SERVICE:completion") | |
| except Exception as exc: # noqa: BLE001 - app may already exist | |
| print(f" Application not created ({exc}); assuming it already exists.") | |
| print(f"Committing prompt to variant '{VARIANT_SLUG}' ...") | |
| try: | |
| variant = ag.VariantManager.create( | |
| parameters=PROMPT_CONFIG, | |
| app_slug=APP_SLUG, | |
| variant_slug=VARIANT_SLUG, | |
| ) | |
| except Exception: | |
| # The variant already exists: commit a new version instead. | |
| variant = ag.VariantManager.commit( | |
| parameters=PROMPT_CONFIG, | |
| app_slug=APP_SLUG, | |
| variant_slug=VARIANT_SLUG, | |
| ) | |
| try: | |
| ag.AppManager.create(app_slug=APP_SLUG, app_type="SERVICE:completion") | |
| except Exception as exc: # noqa: BLE001 - app may already exist | |
| if "already exists" in str(exc).lower(): | |
| print(f" Application already exists ({exc}).") | |
| else: | |
| raise | |
| print(f"Committing prompt to variant '{VARIANT_SLUG}' ...") | |
| try: | |
| variant = ag.VariantManager.create( | |
| parameters=PROMPT_CONFIG, | |
| app_slug=APP_SLUG, | |
| variant_slug=VARIANT_SLUG, | |
| ) | |
| except Exception as exc: | |
| # The variant already exists: commit a new version instead. | |
| if "already exists" in str(exc).lower(): | |
| variant = ag.VariantManager.commit( | |
| parameters=PROMPT_CONFIG, | |
| app_slug=APP_SLUG, | |
| variant_slug=VARIANT_SLUG, | |
| ) | |
| else: | |
| raise |
| PARQUET_URL = ( | ||
| "https://huggingface.co/api/datasets/opensporks/resumes" | ||
| "/parquet/default/train/0.parquet" | ||
| ) |
There was a problem hiding this comment.
Pin/verify the source dataset artifact for true reproducibility.
The script currently downloads from a mutable source path. If upstream data changes, data/testset.csv can drift over time, which conflicts with the reproducible-build objective.
Proposed fix
import argparse
import asyncio
import csv
+import hashlib
import re
import sys
from pathlib import Path
@@
PARQUET_URL = (
@@
)
+EXPECTED_PARQUET_SHA256 = "<fill-with-known-good-sha256>"
@@
def download_dataset() -> pd.DataFrame:
@@
response = requests.get(PARQUET_URL, timeout=120)
response.raise_for_status()
- CACHE_PATH.write_bytes(response.content)
+ content = response.content
+ digest = hashlib.sha256(content).hexdigest()
+ if digest != EXPECTED_PARQUET_SHA256:
+ raise RuntimeError(
+ f"Dataset artifact hash mismatch: got {digest}, expected {EXPECTED_PARQUET_SHA256}"
+ )
+ CACHE_PATH.write_bytes(content)| for resume_id, expected in CURATED_RESUMES.items(): | ||
| matches = df[df["ID"] == resume_id] | ||
| if matches.empty: | ||
| print(f"warning: resume {resume_id} not found in dataset, skipping") | ||
| continue |
There was a problem hiding this comment.
Don’t silently skip curated IDs; fail fast on missing records.
Skipping missing curated resumes can silently shrink the testset and invalidate evaluation comparisons.
Proposed fix
def build_testset(df: pd.DataFrame) -> list[dict]:
rows = []
+ missing_ids = []
for resume_id, expected in CURATED_RESUMES.items():
matches = df[df["ID"] == resume_id]
if matches.empty:
- print(f"warning: resume {resume_id} not found in dataset, skipping")
+ missing_ids.append(resume_id)
continue
@@
- return rows
+ if missing_ids:
+ raise RuntimeError(
+ f"Curated resumes missing from source dataset: {missing_ids}"
+ )
+ return rows…pt revision
Move all the AI logic out of the Streamlit app into a new screening.py
module (prompt fetch, the LLM call, tracing, feedback), leaving app.py as
a UI-only shell. Any other frontend can import screening.py unchanged.
Tracing improvements so screenings are easy to act on from the UI:
- Auto-instrument the OpenAI client with OpenInference, so every trace has
a child LLM span with the exact messages, token counts, and cost.
- classify_cv takes its inputs as a dict whose keys match the prompt input
variables ({"cv": ...}), and the prompt config is kept out of the trace
(ignore_inputs). The span data then mirrors the completion app's inputs.
- Link each span to the deployed prompt revision via ag.tracing.store_refs,
so traces filter by app/environment and open in the playground on the
right revision with inputs pre-filled.
Also fix create_app.py to read variant.variant_version as an attribute
(VariantManager now returns a ConfigurationResponse, not a dict).
The walkthrough needed a leaner story: the output schema is now tech_match / experience_match / overall_match, each with a short reason, plus the missing-requirements list. overall_match is a holistic hire-or-not judgment, so a requirement like a language can flip it while the other two stay true. The test set drops the bookkeeping columns and carries one expected_* column per dimension; empty cells are skipped by the code evaluator documented in the Readme.
A walkthrough demo for classifying CVs against a job spec with Agenta:
opensporks/resumes dataset on Hugging Face, a mirror of the Kaggle
Resume Dataset), hand-labeled against an IT Manager job spec
to Agenta via the SDK
and structured-output JSON schema, and deploys it to production
fetched from the Agenta registry -> structured score dashboard
https://claude.ai/code/session_01YMbf4sUb2VBFQHGNKv6yh3