Skip to content

support llm gateway callout for script#105

Open
atulikumwenayo wants to merge 4 commits into
mainfrom
callout
Open

support llm gateway callout for script#105
atulikumwenayo wants to merge 4 commits into
mainfrom
callout

Conversation

@atulikumwenayo
Copy link
Copy Markdown
Collaborator

@atulikumwenayo atulikumwenayo commented Jun 2, 2026

Added two methods to allow customers to use their AI models to generate text in Script. In the public sdk, the methods use the public LLM Gateway API (the same used for function), so that the customer can test outside of DC. While running in DC, the methods will be overridden to call the registered spark UDF function.

Sample test and result

df = client.read_dlo("sample_data__dll")

print(client.llm_gateway_generate_text("Define Apache Spark in 20 words or less", "sfdc_ai__DefaultGPT52"))

df_upper1 = df.withColumn(
    "description__c",
    llm_gateway_generate_text_col(
        "Transform '{desc}' into upper case, and add '{id}' to the result as suffix separated by ' - ', then return the final result.",
        {"id": col("id__c"), "desc": col("description__c")},
        model_id="sfdc_ai__DefaultGPT52",
        max_tokens=1000,
    ),
)

dlo_name = "sample_data_copy__dll"
client.write_to_dlo(dlo_name, df_upper1, write_mode=WriteMode.APPEND)
datacustomcode run ./payload/entrypoint.py --sf-cli-org myorg
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
26/06/03 15:58:23 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2026-06-03 15:58:26.763 | INFO     | datacustomcode.einstein_platform_client:_get_einstein_platform_url:63 - Using Einstein Platform API endpoint: https://test.api.salesforce.com (env=test)
Apache Spark is a distributed computing framework for fast, in-memory big data processing, supporting batch, streaming, SQL, and machine learning.
+------------------------+--------------+-------------+-------------------+-----+-----------------------+
|cdp_sys_sourceversion__c|description__c|datasource__c|datasourceobject__c|id__c|internalorganization__c|
+------------------------+--------------+-------------+-------------------+-----+-----------------------+
|                    NULL|      TEST - 1|UploadedFiles|          data2.csv|    1|                       |
|                    NULL|   ANOTHER - 2|UploadedFiles|          data2.csv|    2|                       |
|                    NULL|     AGAIN - 3|UploadedFiles|          data2.csv|    3|                       |
|                    NULL|      FOUR - 4|UploadedFiles|          data2.csv|    4|                       |
|                    NULL|      FIVE - 5|UploadedFiles|          data2.csv|    5|                       |
+------------------------+--------------+-------------+-------------------+-----+-----------------------+

@atulikumwenayo atulikumwenayo changed the title expose llm gateway callout for script support llm gateway callout for script Jun 5, 2026
description=(
"Maximum number of tokens to generate. If None, server default applies."
),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a shared interface across Function & Script? If so, are we ok with this option being there for Function, but presumably ignored?

Copy link
Copy Markdown
Collaborator Author

@atulikumwenayo atulikumwenayo Jun 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The customer who's familiar with SFAP public API would expect this param in byoc, right? Currently byoc-proxy does not define this param (proto, but I think it should. Essentially we have three paths that are semi consistent:

  1. SFAP API: with this PR, both script and function set this new param and it gets sent to SFAP while testing outside
  2. Script UDF: already defines and honors this param
  3. grpc to byoc-proxy (used by function): does not define this param.

We have two options:

  1. Keep this param and have a followup item to support it in grpc, or
  2. Remove this param from the interface until grpc supports it, meaning the user won't be to use it in UDF either (why? because SFAP route and UDF route use the same interface)

from datacustomcode.llm_gateway.base import LLMGateway
from datacustomcode.llm_gateway.default import DefaultLLMGateway
from datacustomcode.llm_gateway.spark_base import SparkLLMGateway
from datacustomcode.llm_gateway.spark_default import DefaultSparkLLMGateway
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this pull in pyspark imports into Function runtime?

from datacustomcode.llm_gateway.spark_base import SparkLLMGateway

if TYPE_CHECKING:
from pyspark.sql import Column
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See note here about whether this will affect Function?

Returns:
A Spark ``Column`` that, when evaluated, produces the generated text.
"""
gateway = Client()._get_spark_llm_gateway()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why create a new Client() here? Couldn't we use our own?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client is a singleton, the same client is reused

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants