Added new demo functionality to evaluate Supervisor Agent by dmatrix · Pull Request #15 · databricks-solutions/devrel-examples

dmatrix · 2026-05-06T17:59:23Z

Added new demo functionality to evaluate Supervisor Agent
update DABs to deploy the new databricks notebook as part of the bundle
updated the README to describe how to show this demo

Signed-off-by: Jules Damji <dmatrix@comcast.net>

djliden · 2026-05-07T16:10:15Z

A few minor issues:

1. Endpoint default is hardcoded to your deploy. dbutils.widgets.text("supervisor_name", "mas-f6c439c0-endpoint", ...) won't match anyone else's workspace. Since display_name is unique per workspace (per the SDK), accept the display name and resolve at startup--something like this:

def resolve_endpoint(w, name: str) -> str:
    if name.startswith("mas-") and name.endswith("-endpoint"):
        return name
    for a in w.supervisor_agents.list_supervisor_agents():
        if a.display_name == name:
            return a.endpoint_name or w.supervisor_agents.get_supervisor_agent(name=a.name).endpoint_name
    raise RuntimeError(f"No Supervisor Agent named '{name}'")

Default the widget to "Bee Colony Health Advisor" (the canonical name setup_agents.py creates). Also rename the var — it currently holds an endpoint but is called supervisor_name.

PR description claims a databricks.yml change to wire the notebook into the bundle, but the diff doesn't include one—add notebook to DAB?
Nits: eval_rational.png → eval_rationale.png; "Change to your deployed bundle directory scripts/eval_supervisor.py into your Databricks workspace" reads like words got dropped; >=3.11.0 rather than mlflow==3.11.0 maybe?

djliden

A few minor suggestions—thanks for adding!

djliden · 2026-05-07T17:11:21Z

I would also suggest getting rid of the "make a judge with genie code" part from the readme and replacing it entirely with this—but we can ask genie code to look at the traces.

dmatrix · 2026-05-07T19:23:21Z

Default the widget to "Bee Colony Health Advisor" (the canonical name setup_agents.py creates). Also rename the var — it currently holds an endpoint but is called supervisor_name.

supervisor_name --> supervisor_name_endpoint. And it should hold a string in the widget text as "your_supervisor_endpoint_name".

PR description claims a databricks.yml change to wire the notebook into the bundle, but the diff doesn't include one—add notebook to DAB?

A misnomer comment since anything under scripts gets deployed anyway. So no need to add this explicitly.

I would also suggest getting rid of the "make a judge with genie code" part from the readme and replacing it entirely with this—but we can ask genie code to look at the traces.

Removed

Nits: eval_rational.png → eval_rationale.png; "Change to your deployed bundle directory scripts/eval_supervisor.py into your Databricks workspace"

renamed the image

reads like words got dropped; >=3.11.0 rather than mlflow==3.11.0 maybe?
Not sure where this is dropped. I have in the prereq as now mlflow==3.11.0

def resolve_endpoint(w, name: str) -> str:
    if name.startswith("mas-") and name.endswith("-endpoint"):
        return name
    for a in w.supervisor_agents.list_supervisor_agents():
        if a.display_name == name:
            return a.endpoint_name or w.supervisor_agents.get_supervisor_agent(name=a.name).endpoint_name
    raise RuntimeError(f"No Supervisor Agent named '{name}'")

What is type for w argument here? And where should this be placed in the notebook? Makes sense after fetching the varibles from the widgets.

djliden · 2026-05-07T20:34:03Z

Ahh, I misunderstood, thought you wanted the eval notebook to run as part of the DAB (not just be deployed)

removed
renamed the image

I don't see any additional commits on this PR

I added these notes as suggestions for clarity—I would also suggest making the eval part runnable via the DAB, maybe with something like:

...
  jobs:
    setup_demo:
      # ... existing job unchanged ...

    eval_demo:
      name: "[${bundle.target}] bee-pollinator-eval"
      tasks:
        - task_key: evaluate_supervisor
          notebook_task:
            notebook_path: ./scripts/eval_supervisor.py
            source: WORKSPACE
            base_parameters:
              supervisor: "Bee Colony Health Advisor"
              judge_model: "databricks:/databricks-claude-opus-4-7"   # or whatever default
          environment_key: default
      environments:
        - environment_key: default
          spec:
            client: "1"
            dependencies:
              - databricks-sdk>=0.106.0
              - databricks-openai>=0.5.0
              - mlflow>=3.11.0
...

djliden

(added inline suggestions to clarify a few of the earlier suggestions)

djliden · 2026-05-07T20:18:07Z

+dbutils.widgets.text(
+    "supervisor_name",
+    "mas-f6c439c0-endpoint",
+    "Supervisor Agent Endpoint Name",
+)
+dbutils.widgets.text(
+    "judge_model",
+    "databricks:/databricks-gpt-5-4",
+    "Judge Model URI",
+)
+
+supervisor_name = dbutils.widgets.get("supervisor_name")
+judge_model = dbutils.widgets.get("judge_model")
+
+print(f"Supervisor: {supervisor_name}")
+print(f"Judge model: {judge_model}")


Suggested change

dbutils.widgets.text(

"supervisor_name",

"mas-f6c439c0-endpoint",

"Supervisor Agent Endpoint Name",

)

dbutils.widgets.text(

"judge_model",

"databricks:/databricks-gpt-5-4",

"Judge Model URI",

)

supervisor_name = dbutils.widgets.get("supervisor_name")

judge_model = dbutils.widgets.get("judge_model")

print(f"Supervisor: {supervisor_name}")

print(f"Judge model: {judge_model}")

dbutils.widgets.text(

"supervisor",

"Bee Colony Health Advisor",

"Supervisor Agent (display name or mas-XXXXXXXX-endpoint)",

)

dbutils.widgets.text(

"judge_model",

"databricks:/databricks-gpt-5-4",

"Judge Model URI",

)

supervisor = dbutils.widgets.get("supervisor")

judge_model = dbutils.widgets.get("judge_model")

print(f"Supervisor: {supervisor}")

print(f"Judge model: {judge_model}")

uses the "canonical" name by default (Bee Colony Health Advisor) which we will resolve to the endpoint name, so users do not need to manually find the endpoint most of the time.

djliden · 2026-05-07T20:21:18Z

+import time
+from typing import Literal
+
+import mlflow
+from databricks_openai import DatabricksOpenAI
+from mlflow.entities import Feedback
+from mlflow.genai.judges import make_judge
+from mlflow.genai.scorers import Correctness, Guidelines, scorer
+
+client = DatabricksOpenAI()
+
+current_user = (
+    spark.sql("SELECT current_user()").first()[0]
+)
+experiment_name = (
+    f"/Users/{current_user}/bee_pollinator_eval"
+)
+mlflow.openai.autolog()
+mlflow.set_experiment(experiment_name)
+print(f"MLflow experiment: {experiment_name}")


Suggested change

import time

from typing import Literal

import mlflow

from databricks_openai import DatabricksOpenAI

from mlflow.entities import Feedback

from mlflow.genai.judges import make_judge

from mlflow.genai.scorers import Correctness, Guidelines, scorer

client = DatabricksOpenAI()

current_user = (

spark.sql("SELECT current_user()").first()[0]

)

experiment_name = (

f"/Users/{current_user}/bee_pollinator_eval"

)

mlflow.openai.autolog()

mlflow.set_experiment(experiment_name)

print(f"MLflow experiment: {experiment_name}")

import time

from typing import Literal

import mlflow

from databricks.sdk import WorkspaceClient

from databricks_openai import DatabricksOpenAI

from mlflow.entities import Feedback

from mlflow.genai.judges import make_judge

from mlflow.genai.scorers import Correctness, Guidelines, scorer

def resolve_endpoint(w: WorkspaceClient, name: str) -> str:

"""Accept a Supervisor Agent display name or its serving endpoint;

return the endpoint name (mas-XXXXXXXX-endpoint)."""

if name.startswith("mas-") and name.endswith("-endpoint"):

return name

for a in w.supervisor_agents.list_supervisor_agents():

if a.display_name == name:

endpoint = a.endpoint_name

if not endpoint and a.name:

endpoint = w.supervisor_agents.get_supervisor_agent(name=a.name).endpoint_name

if not endpoint:

raise RuntimeError(

f"Supervisor Agent '{name}' has no endpoint_name yet — still provisioning?"

)

return endpoint

available = [a.display_name for a in w.supervisor_agents.list_supervisor_agents()]

raise RuntimeError(

f"No Supervisor Agent named '{name}'. Available: {available}"

)

w = WorkspaceClient()

supervisor_endpoint = resolve_endpoint(w, supervisor)

print(f"Endpoint: {supervisor_endpoint}")

client = DatabricksOpenAI()

current_user = spark.sql("SELECT current_user()").first()[0]

experiment_name = f"/Users/{current_user}/bee_pollinator_eval"

mlflow.openai.autolog()

mlflow.set_experiment(experiment_name)

print(f"MLflow experiment: {experiment_name}")

Resolves supervisor name into endpoint

WorskapceClient has not attrribute called supervisor_agents, so this won't work. Am debugging it....

https://databricks-sdk-py.readthedocs.io/en/latest/workspace/supervisoragents/supervisor_agents.html. <- make sure you're using the most recent version as this was only recently added

(+ see https://github.com/databricks-solutions/devrel-examples/blob/main/demos/bee-pollinator/scripts/setup_agents.py for some tested usage patterns)

https://databricks-sdk-py.readthedocs.io/en/latest/clients/workspace.html looks like a BETA API??

Yes as is the knowledge assistant API (see https://docs.databricks.com/api/workspace/supervisoragents and https://docs.databricks.com/api/workspace/knowledgeassistants). Both work well with the current version of the SDK. Just have to keep an eye out for any breaking changes in the future.

What databricks_sdk version are you using that has the lastest API.

Let try to install databricks-sdk>=0.106.0

djliden · 2026-05-07T20:22:29Z

+def predict_supervisor(request: str) -> str:
+    """Query the Supervisor Agent and return the response text."""
+    response = client.responses.create(
+        model=supervisor_name,
+        input=[{"role": "user", "content": request}],
+    )
+    answer = "".join([
+        block.text
+        for item in response.output
+        if hasattr(item, "content")
+        for block in item.content
+        if hasattr(block, "text")
+    ])
+    return answer


Suggested change

def predict_supervisor(request: str) -> str:

"""Query the Supervisor Agent and return the response text."""

response = client.responses.create(

model=supervisor_name,

input=[{"role": "user", "content": request}],

)

answer = "".join([

block.text

for item in response.output

if hasattr(item, "content")

for block in item.content

if hasattr(block, "text")

])

return answer

def predict_supervisor(request: str) -> str:

"""Query the Supervisor Agent and return the response text."""

response = client.responses.create(

model=supervisor_endpoint,

input=[{"role": "user", "content": request}],

)

answer = "".join([

block.text

for item in response.output

if hasattr(item, "content")

for block in item.content

if hasattr(block, "text")

])

return answer

use the resolved endpoint

djliden · 2026-05-07T20:24:13Z

+
+### How to run it
+
+1. Change to your deployed bundle directory `scripts/eval_supervisor.py` into your Databricks workspace


I find this instruction hard to follow—did you mean something like

Open scripts/eval_supervisor.py in your Databricks workspace (the bundle uploads it under /Workspace/Users//.bundle/bee-pollinator-demo/dev/files/scripts/)"

djliden · 2026-05-07T20:27:51Z

@@ -0,0 +1,502 @@
+# Databricks notebook source
+# DBTITLE 1,Install dependencies
+# MAGIC %pip install mlflow==3.11.0 databricks_openai


Suggested change

# MAGIC %pip install mlflow==3.11.0 databricks_openai

# MAGIC %pip install mlflow>=3.11.0 databricks_openai

Not too eager about making the eval notebook as part of DAB run. It should be something the demoer walks and runs through it runtime, speaking to it, and also possibly, adding realtime monitoring after the face.

Much better experience for both the demoer and audience.

Ah, I have not committed the changes yet. :-) Only in my branch. Want to test it before I push.

Not too eager about making the eval notebook as part of DAB run. It should be something the demoer walks and runs through it runtime, speaking to it, and also possibly, adding realtime monitoring after the face.

Much better experience for both the demoer and audience.

Sounds good—ignore that comment, then. Maybe later we can add a lightweight judge/eval so the experiment is pre-populated with some traces the presenter can talk through.

Added new demo functionality to evaluate Supervisor Agent

c4bc4ec

Signed-off-by: Jules Damji <dmatrix@comcast.net>

djliden requested changes May 7, 2026

View reviewed changes


		### How to run it

		1. Change to your deployed bundle directory `scripts/eval_supervisor.py` into your Databricks workspace

	# MAGIC %pip install mlflow==3.11.0 databricks_openai
	# MAGIC %pip install mlflow>=3.11.0 databricks_openai

Conversation

dmatrix commented May 6, 2026

Uh oh!

djliden commented May 7, 2026

Uh oh!

djliden left a comment

Choose a reason for hiding this comment

Uh oh!

djliden commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dmatrix commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

djliden commented May 7, 2026

Uh oh!

djliden left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

djliden May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmatrix May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

djliden May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

djliden commented May 7, 2026 •

edited

Loading

dmatrix commented May 7, 2026 •

edited

Loading

djliden May 7, 2026 •

edited

Loading

dmatrix May 7, 2026 •

edited

Loading

djliden May 7, 2026 •

edited

Loading