Added new demo functionality to evaluate Supervisor Agent#15
Added new demo functionality to evaluate Supervisor Agent#15dmatrix wants to merge 1 commit intodatabricks-solutions:mainfrom
Conversation
dmatrix
commented
May 6, 2026
- Added new demo functionality to evaluate Supervisor Agent
- update DABs to deploy the new databricks notebook as part of the bundle
- updated the README to describe how to show this demo
Signed-off-by: Jules Damji <dmatrix@comcast.net>
|
A few minor issues: 1. Endpoint default is hardcoded to your deploy. def resolve_endpoint(w, name: str) -> str:
if name.startswith("mas-") and name.endswith("-endpoint"):
return name
for a in w.supervisor_agents.list_supervisor_agents():
if a.display_name == name:
return a.endpoint_name or w.supervisor_agents.get_supervisor_agent(name=a.name).endpoint_name
raise RuntimeError(f"No Supervisor Agent named '{name}'")Default the widget to "Bee Colony Health Advisor" (the canonical name setup_agents.py creates). Also rename the var — it currently holds an endpoint but is called supervisor_name.
|
djliden
left a comment
There was a problem hiding this comment.
A few minor suggestions—thanks for adding!
|
I would also suggest getting rid of the "make a judge with genie code" part from the readme and replacing it entirely with this—but we can ask genie code to look at the traces. |
supervisor_name --> supervisor_name_endpoint. And it should hold a string in the widget text as "your_supervisor_endpoint_name".
A misnomer comment since anything under scripts gets deployed anyway. So no need to add this explicitly.
Removed
renamed the image
What is type for |
|
Ahh, I misunderstood, thought you wanted the eval notebook to run as part of the DAB (not just be deployed)
I don't see any additional commits on this PR I added these notes as suggestions for clarity—I would also suggest making the eval part runnable via the DAB, maybe with something like: |
djliden
left a comment
There was a problem hiding this comment.
(added inline suggestions to clarify a few of the earlier suggestions)
| dbutils.widgets.text( | ||
| "supervisor_name", | ||
| "mas-f6c439c0-endpoint", | ||
| "Supervisor Agent Endpoint Name", | ||
| ) | ||
| dbutils.widgets.text( | ||
| "judge_model", | ||
| "databricks:/databricks-gpt-5-4", | ||
| "Judge Model URI", | ||
| ) | ||
|
|
||
| supervisor_name = dbutils.widgets.get("supervisor_name") | ||
| judge_model = dbutils.widgets.get("judge_model") | ||
|
|
||
| print(f"Supervisor: {supervisor_name}") | ||
| print(f"Judge model: {judge_model}") |
There was a problem hiding this comment.
| dbutils.widgets.text( | |
| "supervisor_name", | |
| "mas-f6c439c0-endpoint", | |
| "Supervisor Agent Endpoint Name", | |
| ) | |
| dbutils.widgets.text( | |
| "judge_model", | |
| "databricks:/databricks-gpt-5-4", | |
| "Judge Model URI", | |
| ) | |
| supervisor_name = dbutils.widgets.get("supervisor_name") | |
| judge_model = dbutils.widgets.get("judge_model") | |
| print(f"Supervisor: {supervisor_name}") | |
| print(f"Judge model: {judge_model}") | |
| dbutils.widgets.text( | |
| "supervisor", | |
| "Bee Colony Health Advisor", | |
| "Supervisor Agent (display name or mas-XXXXXXXX-endpoint)", | |
| ) | |
| dbutils.widgets.text( | |
| "judge_model", | |
| "databricks:/databricks-gpt-5-4", | |
| "Judge Model URI", | |
| ) | |
| supervisor = dbutils.widgets.get("supervisor") | |
| judge_model = dbutils.widgets.get("judge_model") | |
| print(f"Supervisor: {supervisor}") | |
| print(f"Judge model: {judge_model}") |
uses the "canonical" name by default (Bee Colony Health Advisor) which we will resolve to the endpoint name, so users do not need to manually find the endpoint most of the time.
| import time | ||
| from typing import Literal | ||
|
|
||
| import mlflow | ||
| from databricks_openai import DatabricksOpenAI | ||
| from mlflow.entities import Feedback | ||
| from mlflow.genai.judges import make_judge | ||
| from mlflow.genai.scorers import Correctness, Guidelines, scorer | ||
|
|
||
| client = DatabricksOpenAI() | ||
|
|
||
| current_user = ( | ||
| spark.sql("SELECT current_user()").first()[0] | ||
| ) | ||
| experiment_name = ( | ||
| f"/Users/{current_user}/bee_pollinator_eval" | ||
| ) | ||
| mlflow.openai.autolog() | ||
| mlflow.set_experiment(experiment_name) | ||
| print(f"MLflow experiment: {experiment_name}") |
There was a problem hiding this comment.
| import time | |
| from typing import Literal | |
| import mlflow | |
| from databricks_openai import DatabricksOpenAI | |
| from mlflow.entities import Feedback | |
| from mlflow.genai.judges import make_judge | |
| from mlflow.genai.scorers import Correctness, Guidelines, scorer | |
| client = DatabricksOpenAI() | |
| current_user = ( | |
| spark.sql("SELECT current_user()").first()[0] | |
| ) | |
| experiment_name = ( | |
| f"/Users/{current_user}/bee_pollinator_eval" | |
| ) | |
| mlflow.openai.autolog() | |
| mlflow.set_experiment(experiment_name) | |
| print(f"MLflow experiment: {experiment_name}") | |
| import time | |
| from typing import Literal | |
| import mlflow | |
| from databricks.sdk import WorkspaceClient | |
| from databricks_openai import DatabricksOpenAI | |
| from mlflow.entities import Feedback | |
| from mlflow.genai.judges import make_judge | |
| from mlflow.genai.scorers import Correctness, Guidelines, scorer | |
| def resolve_endpoint(w: WorkspaceClient, name: str) -> str: | |
| """Accept a Supervisor Agent display name or its serving endpoint; | |
| return the endpoint name (mas-XXXXXXXX-endpoint).""" | |
| if name.startswith("mas-") and name.endswith("-endpoint"): | |
| return name | |
| for a in w.supervisor_agents.list_supervisor_agents(): | |
| if a.display_name == name: | |
| endpoint = a.endpoint_name | |
| if not endpoint and a.name: | |
| endpoint = w.supervisor_agents.get_supervisor_agent(name=a.name).endpoint_name | |
| if not endpoint: | |
| raise RuntimeError( | |
| f"Supervisor Agent '{name}' has no endpoint_name yet — still provisioning?" | |
| ) | |
| return endpoint | |
| available = [a.display_name for a in w.supervisor_agents.list_supervisor_agents()] | |
| raise RuntimeError( | |
| f"No Supervisor Agent named '{name}'. Available: {available}" | |
| ) | |
| w = WorkspaceClient() | |
| supervisor_endpoint = resolve_endpoint(w, supervisor) | |
| print(f"Endpoint: {supervisor_endpoint}") | |
| client = DatabricksOpenAI() | |
| current_user = spark.sql("SELECT current_user()").first()[0] | |
| experiment_name = f"/Users/{current_user}/bee_pollinator_eval" | |
| mlflow.openai.autolog() | |
| mlflow.set_experiment(experiment_name) | |
| print(f"MLflow experiment: {experiment_name}") |
Resolves supervisor name into endpoint
There was a problem hiding this comment.
WorskapceClient has not attrribute called supervisor_agents, so this won't work. Am debugging it....
There was a problem hiding this comment.
https://databricks-sdk-py.readthedocs.io/en/latest/workspace/supervisoragents/supervisor_agents.html. <- make sure you're using the most recent version as this was only recently added
(+ see https://github.com/databricks-solutions/devrel-examples/blob/main/demos/bee-pollinator/scripts/setup_agents.py for some tested usage patterns)
There was a problem hiding this comment.
https://databricks-sdk-py.readthedocs.io/en/latest/clients/workspace.html looks like a BETA API??
There was a problem hiding this comment.
Yes as is the knowledge assistant API (see https://docs.databricks.com/api/workspace/supervisoragents and https://docs.databricks.com/api/workspace/knowledgeassistants). Both work well with the current version of the SDK. Just have to keep an eye out for any breaking changes in the future.
There was a problem hiding this comment.
What databricks_sdk version are you using that has the lastest API.
There was a problem hiding this comment.
Let try to install databricks-sdk>=0.106.0
| def predict_supervisor(request: str) -> str: | ||
| """Query the Supervisor Agent and return the response text.""" | ||
| response = client.responses.create( | ||
| model=supervisor_name, | ||
| input=[{"role": "user", "content": request}], | ||
| ) | ||
| answer = "".join([ | ||
| block.text | ||
| for item in response.output | ||
| if hasattr(item, "content") | ||
| for block in item.content | ||
| if hasattr(block, "text") | ||
| ]) | ||
| return answer |
There was a problem hiding this comment.
| def predict_supervisor(request: str) -> str: | |
| """Query the Supervisor Agent and return the response text.""" | |
| response = client.responses.create( | |
| model=supervisor_name, | |
| input=[{"role": "user", "content": request}], | |
| ) | |
| answer = "".join([ | |
| block.text | |
| for item in response.output | |
| if hasattr(item, "content") | |
| for block in item.content | |
| if hasattr(block, "text") | |
| ]) | |
| return answer | |
| def predict_supervisor(request: str) -> str: | |
| """Query the Supervisor Agent and return the response text.""" | |
| response = client.responses.create( | |
| model=supervisor_endpoint, | |
| input=[{"role": "user", "content": request}], | |
| ) | |
| answer = "".join([ | |
| block.text | |
| for item in response.output | |
| if hasattr(item, "content") | |
| for block in item.content | |
| if hasattr(block, "text") | |
| ]) | |
| return answer |
use the resolved endpoint
|
|
||
| ### How to run it | ||
|
|
||
| 1. Change to your deployed bundle directory `scripts/eval_supervisor.py` into your Databricks workspace |
There was a problem hiding this comment.
I find this instruction hard to follow—did you mean something like
Open scripts/eval_supervisor.py in your Databricks workspace (the bundle uploads it under /Workspace/Users//.bundle/bee-pollinator-demo/dev/files/scripts/)"
| @@ -0,0 +1,502 @@ | |||
| # Databricks notebook source | |||
| # DBTITLE 1,Install dependencies | |||
| # MAGIC %pip install mlflow==3.11.0 databricks_openai | |||
There was a problem hiding this comment.
| # MAGIC %pip install mlflow==3.11.0 databricks_openai | |
| # MAGIC %pip install mlflow>=3.11.0 databricks_openai |
There was a problem hiding this comment.
Not too eager about making the eval notebook as part of DAB run. It should be something the demoer walks and runs through it runtime, speaking to it, and also possibly, adding realtime monitoring after the face.
Much better experience for both the demoer and audience.
There was a problem hiding this comment.
Ah, I have not committed the changes yet. :-) Only in my branch. Want to test it before I push.
There was a problem hiding this comment.
Not too eager about making the eval notebook as part of DAB run. It should be something the demoer walks and runs through it runtime, speaking to it, and also possibly, adding realtime monitoring after the face.
Much better experience for both the demoer and audience.
Sounds good—ignore that comment, then. Maybe later we can add a lightweight judge/eval so the experiment is pre-populated with some traces the presenter can talk through.