Skip to content

feat(dotAI): Dot AI LangChain4J - Amazon Bedrock#35242

Open
ihoffmann-dot wants to merge 165 commits into
mainfrom
dot-ai-langchain-amazon-bedrock
Open

feat(dotAI): Dot AI LangChain4J - Amazon Bedrock#35242
ihoffmann-dot wants to merge 165 commits into
mainfrom
dot-ai-langchain-amazon-bedrock

Conversation

@ihoffmann-dot

@ihoffmann-dot ihoffmann-dot commented Apr 7, 2026

Copy link
Copy Markdown
Member

Summary

Adds AWS Bedrock as a supported provider. Bedrock is a managed platform that
proxies multiple model families (Anthropic, Amazon Titan, Cohere, Meta, etc.)
via a unified Converse API — a single integration covers all of them.

  • Add langchain4j-bedrock dependency
  • Add bedrock case to LangChain4jModelFactory switch
  • Implement buildBedrockChatModel using BedrockRuntimeClient with explicit or IAM role credentials
  • Implement buildBedrockEmbeddingModel with automatic Titan/Cohere dispatch by model ID prefix
  • Add embeddingInputType field to ProviderConfig (Cohere-specific; default: search_document)
  • buildBedrockImageModel throws UnsupportedOperationException (no LangChain4J support)
  • Add 4 unit tests in LangChain4jModelFactoryTest

Configuration

{
  "chat": {
    "provider": "bedrock",
    "region": "us-east-1",
    "accessKeyId": "...",
    "secretAccessKey": "...",
    "model": "anthropic.claude-3-5-sonnet-20241022-v2:0",
    "maxTokens": 16384,
    "temperature": 1.0
  },
  "embeddings": {
    "provider": "bedrock",
    "region": "us-east-1",
    "accessKeyId": "...",
    "secretAccessKey": "...",
    "model": "amazon.titan-embed-text-v2:0"
  }
}

Notes

  • If accessKeyId / secretAccessKey are omitted, credentials resolve via DefaultCredentialsProvider (IAM role, environment, ~/.aws/credentials).
  • Embedding dispatch: model IDs starting with cohere. → BedrockCohereEmbeddingModel; all others → BedrockTitanEmbeddingModel.
  • embeddingInputType is Cohere-only. Use search_document when indexing content, search_query when embedding a search query. Titan silently ignores this field.
  • Image generation via Bedrock is not available through LangChain4J. Attempting it throws UnsupportedOperationException.

Related Issue

This PR fixes #35334
EPIC: dotAI Multi-Provider Support #33970

@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

❌ Codex Review failed — openai.gpt-5.5

The review job failed before producing output. See the run for details.

Run: #27446303016

@ihoffmann-dot ihoffmann-dot marked this pull request as ready for review June 12, 2026 22:18
@ihoffmann-dot ihoffmann-dot enabled auto-merge June 12, 2026 22:37
riccardoruocco pushed a commit to riccardoruocco/core that referenced this pull request Jun 16, 2026
…sed refs (dotCMS#35761)

Fixes dotCMS#35794

## Summary

Hardens the PR-to-issue linking gate
(`.github/workflows/issue_open-pr.yml` →
`issue_comp_link-issue-to-pr.yml`) and adds a second gate on the linked
issue's metadata.

### Linking-gate fixes (the original three)

- **Shell injection / template interpolation.** `inputs.pr_body`,
`pr_title`, `pr_author`, `pr_url`, `pr_branch`, `pr_merged` were
template-substituted by GitHub Actions directly into the bash `run:`
scripts. PRs whose body contained backticks, `$var(...)`, or unbalanced
parens (e.g. dotCMS#35728's `$markdownTool.blockToMarkdown(json)`) caused a
bash syntax error on the first `Debug workflow inputs` step → entire job
exited code 2 → `Add failure comment to PR` was skipped (its condition
needs `failure_detected=true`, never set) → PR appeared unchecked.
Hoisted all inputs into a job-level `env:` block and reference them as
`$ENV_VAR` in shell, so user-supplied values are treated as data, not
code.
- **Missed markdown-link refs.** The regex
`(close[ds]?|fix(e[ds])?|resolve[ds]?)(:)?\s+#([0-9]+)` requires a
literal `#` immediately after the keyword and misses GitHub's other
valid form `fixes [dotCMS#123](url)` (as in dotCMS#35242). Added a GraphQL
`closingIssuesReferences` lookup as the 4th and final fallback in
`Determine final issue number`.
- **Stale-on-open.** The workflow only triggered on `pull_request:
[opened]`, so editing the body or pushing new commits never re-evaluated
the gate. A once-broken PR stayed broken even after the author fixed the
link. Broadened triggers to `[opened, edited, synchronize, reopened]`.
Existing idempotency (`grep -q "#$issue_number"` before body patch,
`sort -u` on PR-list comments, `Remove failure comment` step) handles
the extra runs; the failure-comment step is now dedup-guarded.

### New second gate — linked issue must have a team label

After the link is resolved, the workflow validates that the linked issue
carries a `Team : *` label and fails with a distinct `❌ Linked Issue
Needs Team Label` PR comment if it doesn't. The validation step runs
**before** the side-effect steps (PR-list comment on the issue, PR body
PATCH), so a PR whose linked issue is missing a team label does not
mutate any remote state before failing.

### Review-driven follow-ups (commit `312ced3e`)

- Step order moved so gates run before side effects (`Link PR to issue`
and the issue's PR-list comment no longer run when validation fails).
- `gh issue view` errors in the team-label step are surfaced distinctly
instead of being collapsed into "no team label."
- `$GITHUB_OUTPUT` heredocs use a random `ghadelimiter_<random>` instead
of literal `EOF` to prevent delimiter collision with user-controlled
content.
- Markdown link characters (`\`, `[`, `]`) in `PR_TITLE` are escaped
before being embedded in the bot's linked-issue comment.
- URL-presence check on the existing PR list switched from `grep -q` to
`grep -qF`.

### Operational notes

- Fork PRs are skipped via `if:
github.event.pull_request.head.repo.full_name == github.repository`
because the `pull_request` `GITHUB_TOKEN` is read-only on forks. If this
gate ever becomes a required check, fork contributors will be unchecked
by design — flag for follow-up at that point.
- Explicit `permissions: { pull-requests: write, issues: write }`
declared on the reusable workflow.
- The check remains advisory (not in branch-protection required checks
on `main`).

## Test plan

- [x] Confirm no shell error in `Debug workflow inputs` when the PR body
contains backticks / `$var(...)` (this PR's own description is the test
case).
- [x] GraphQL fallback runs only when neither body regex nor branch-name
extraction succeeds.
- [x] `synchronize` re-fires the check on a new push (empty commit
`2e6b0ed0` validated this).
- [x] Linked-issue gate fails with `❌ Linked Issue Needs Team Label`
when the linked issue has no `Team : *` label.
- [x] Linked-issue gate passes when `Team : *` is restored on the issue.
- [ ] Side-effect steps (`Create new comment`, `Link PR to issue`) are
skipped on a run where team-label validation fails.
🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
if (config.temperature() != null || config.maxTokens() != null) {
final BedrockChatRequestParameters.Builder params = BedrockChatRequestParameters.builder();
if (config.temperature() != null) params.temperature(config.temperature());
if (config.maxTokens() != null) params.maxOutputTokens(config.maxTokens());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

🤖 Bedrock Review — deepseek.v3.2

[🔴 Critical] dotCMS/pom.xml:525 — Adding a new dependency (langchain4j-bedrock) without specifying a version violates dotCMS conventions. Dependency versions must be defined in bom/application/pom.xml only, never in dotCMS/pom.xml. This will cause a build failure.

[🟡 Medium] dotCMS/src/main/java/com/dotcms/ai/client/langchain4j/BedrockModelProviderStrategy.java:139-142 — Logging timeout and maxRetries as debug when unsupported for embedding models is appropriate, but there's no warning to the user that these configurations are silently ignored, which could lead to confusion. Consider adding a Logger.warn for at least one of these cases.

[🟡 Medium] dotCMS/src/main/java/com/dotcms/ai/client/langchain4j/BedrockModelProviderStrategy.java:154 — The config.embeddingInputType() is passed directly to BedrockCohereEmbeddingModel.builder().inputType(...). If the provided value is invalid (not search_document or search_query), the error will come from the LangChain4J library at runtime, not during validation. This is acceptable but could be documented.

[🟡 Medium] dotCMS/src/main/java/com/dotcms/ai/client/langchain4j/BedrockModelProviderStrategy.java:156 — The condition modelLower.startsWith("amazon.titan-") may not match all Titan embedding model IDs (e.g., amazon.titan-embed-g1-text-02). The check should be modelLower.contains("titan") or a more flexible pattern to avoid missing future Titan variants.

[🟡 Medium] dotCMS/src/main/java/com/dotcms/ai/client/langchain4j/BedrockModelProviderStrategy.java:159 — Setting dimensions on BedrockTitanEmbeddingModel may not be supported for all Titan models; passing null could cause an error. The LangChain4J library should handle this, but it's a potential runtime issue.

[🟡 Medium] dotCMS/src/main/java/com/dotcms/ai/client/langchain4j/BedrockModelProviderStrategy.java:239 — The overrideConfiguration method uses AwsRetryStrategy.standardRetryStrategy().toBuilder(). If maxRetries is set to a large number, it could lead to excessive retries and prolonged failures. Ensure the value is reasonable (maybe cap it), though this is currently left to the user.

[🟡 Medium] dotCMS/src/test/java/com/dotcms/ai/client/langchain4j/BedrockModelProviderStrategyTest.java:465 — The test helper bedrockConfig uses hardcoded AWS credentials ("test-access-key", "test-secret-key"). While this is safe for unit tests, ensure no real secrets are ever committed. This is acceptable for test code.

[🟡 Medium] dotCMS/src/test/java/com/dotcms/ai/client/langchain4j/BedrockModelProviderStrategyTest.java:156 — Test test_buildEmbeddingModel_titanMixedCase_routesToTitan uses "AMAZON.TITAN-embed-text-v2:0". The model ID casing may not be preserved by AWS; the test should verify that the routing logic is case-insensitive as intended.

[🟡 Medium] dotCMS/src/test/java/com/dotcms/ai/client/langchain4j/LangChain4jModelFactoryTest.java:350 — Test test_buildEmbeddingModel_bedrock_cohereUppercase_routesToCohere uses "Cohere.embed-english-v3" (mixed case). This is good for verifying case-insensitive routing, but ensure the actual AWS model ID is case-sensitive; the test should reflect that the strategy lowercases the model for routing only.


Run: #27647750313 · tokens: in: 12404 · out: 863 · total: 13267

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI: Safe To Rollback Area : Backend PR changes Java/Maven backend code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[FEATURE] dotAI: LangChain4J integration — Phase 2 (AWS Bedrock)

3 participants