Redis-Backed Semantic Search (Phase 4)#3663
Conversation
…-api-builder into redis-dab-phase4 # Conflicts: # src/Cli/Commands/AddOptions.cs # src/Cli/Commands/ConfigureOptions.cs # src/Cli/Commands/EntityOptions.cs # src/Cli/Commands/UpdateOptions.cs # src/Cli/ConfigGenerator.cs # src/Core/Services/RequestValidator.cs # src/Service.GraphQLBuilder/Queries/InputTypeBuilder.cs # src/Service.Tests/UnitTests/ConfigValidationUnitTests.cs
There was a problem hiding this comment.
Pull request overview
This PR introduces entity-level semantic search backed by Redis vector search, wiring it through DAB’s config model, REST/GraphQL surface area, OpenAPI/schema generation, and the SQL query engine so requests can be narrowed by semantic candidates and (optionally) return a semantic distance/similarity field.
Changes:
- Add semantic-search configuration (
semantic-searchblock) to the runtime config model, validator, CLI, and JSON schema. - Extend REST + GraphQL query surfaces with semantic search inputs (and block unsupported combinations like ordering by
semantic_distance). - Implement Redis-based semantic candidate resolution and integrate semantic narrowing + distance enrichment into the SQL query pipeline.
Reviewed changes
Copilot reviewed 33 out of 33 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Service/Startup.cs | Registers semantic search service into DI and enables HttpClient factory usage. |
| src/Service/Services/SemanticSearch/RedisSemanticSearchService.cs | New Redis FT.SEARCH KNN-based semantic candidate resolver. |
| src/Service.Tests/UnitTests/SemanticSearchTextFlowTests.cs | Adds tests intended to validate semantic search flow (currently not exercising product code). |
| src/Service.Tests/UnitTests/RequestValidatorUnitTests.cs | Adds coverage for rejecting semantic_distance in insert bodies. |
| src/Service.Tests/UnitTests/RequestParserUnitTests.cs | Adds coverage for parsing/validating semantic query params and rejecting semantic_distance ordering. |
| src/Service.Tests/UnitTests/ConfigValidationUnitTests.cs | Adds validator coverage for semantic-search configuration requirements. |
| src/Service.Tests/GraphQLBuilder/QueryBuilderTests.cs | Verifies semantic arguments are added to GraphQL collection queries when enabled. |
| src/Service.Tests/GraphQLBuilder/MultipleMutationBuilderTests.cs | Updates test wiring for new query engine factory dependencies. |
| src/Service.GraphQLBuilder/Sql/SchemaConverter.cs | Adds semanticDistance field to GraphQL schema for semantic-enabled entities. |
| src/Service.GraphQLBuilder/Queries/QueryBuilder.cs | Adds semantic query args (semanticSearch, semanticThreshold) when enabled. |
| src/Service.GraphQLBuilder/Queries/InputTypeBuilder.cs | Excludes semanticDistance from filter/order input generation. |
| src/Core/Services/SemanticSearch/NoOpSemanticSearchService.cs | Adds default no-op semantic search service implementation. |
| src/Core/Services/SemanticSearch/ISemanticSearchService.cs | Introduces semantic search service abstraction. |
| src/Core/Services/RestService.cs | Adds REST semantic request validation + semantic_distance selection behavior. |
| src/Core/Services/RequestValidator.cs | Blocks setting semantic_distance in insert/upsert bodies. |
| src/Core/Services/OpenAPI/OpenApiDocumentor.cs | Adds semantic query params and response field to OpenAPI when enabled. |
| src/Core/Resolvers/SqlQueryEngine.cs | Integrates semantic narrowing + semantic distance enrichment and default ordering behavior. |
| src/Core/Resolvers/Sql Query Structures/SqlQueryStructure.cs | Adds semantic narrowing predicate builder and semantic distance map tracking. |
| src/Core/Resolvers/Factories/QueryEngineFactory.cs | Plumbs semantic search service into SQL query engine construction. |
| src/Core/Parsers/RequestParser.cs | Parses semantic query params and enforces some semantic-specific request constraints. |
| src/Core/Models/SemanticSearchConstants.cs | Adds shared constants for REST/GraphQL semantic names. |
| src/Core/Models/SemanticSearchCandidate.cs | Defines candidate record used for narrowing and distance mapping. |
| src/Core/Models/RestRequestContexts/RestRequestContext.cs | Adds semantic inputs/flags to request context. |
| src/Core/Configurations/RuntimeConfigValidator.cs | Validates semantic-search prerequisites and reserves semantic names. |
| src/Config/ObjectModel/EntitySemanticSearchOptions.cs | Adds semantic-search entity configuration object model. |
| src/Config/ObjectModel/Entity.cs | Adds semantic-search to entity config model. |
| src/Cli/Utils.cs | Adds CLI construction/validation of semantic-search options for entities. |
| src/Cli/ConfigGenerator.cs | Wires semantic-search options into add/update flows; adds runtime cache L2 configure options. |
| src/Cli/Commands/UpdateOptions.cs | Adds CLI parameters for semantic-search entity options. |
| src/Cli/Commands/EntityOptions.cs | Adds CLI options for semantic-search entity configuration. |
| src/Cli/Commands/ConfigureOptions.cs | Adds CLI options for runtime cache level-2 provider/connection-string. |
| src/Cli/Commands/AddOptions.cs | Adds CLI parameters for semantic-search entity options. |
| schemas/dab.draft.schema.json | Adds semantic-search entity schema definition and constraints. |
ajtiwari07
left a comment
There was a problem hiding this comment.
Fixed failing validation compile issue in tests: removed invalid Dictionary.AsReadOnly() calls in SemanticSearchTextFlowTests. Commit 3cbca26 has been pushed to this PR branch.
ajtiwari07
left a comment
There was a problem hiding this comment.
Update: commit 3cbca26 pushed with CI compile fix in SemanticSearchTextFlowTests (remove invalid Dictionary.AsReadOnly() usage). Re-run should pick up this change.
JerryNixon
left a comment
There was a problem hiding this comment.
-
keyless views are not rejected at startup. Spec requires startup failure when semantic search is enabled on a view without resolvable primary keys. Current validation checks table/view type but not resolvable PK metadata. Add metadata validation before runtime.
-
reserved-name validation misses real database columns. Reserved name checks only inspect configured fields/mappings. If the backing table has a real exposed column named semantic_distance and no explicit field config, startup may pass. Validate against resolved metadata/exposed field names.
-
Embedding/Redis failures should produce useful sanitized errors with correlation/provider details where available. Current embedding failure becomes empty results, Redis failures collapse into a generic BadRequest, and there is no provider status/error propagation.
| "additionalProperties": false, | ||
| "properties": { | ||
| "enabled": { | ||
| "$ref": "#/$defs/boolean-or-string", |
There was a problem hiding this comment.
| return result.Embedding; | ||
| } | ||
|
|
||
| private static bool TryParseVectorText(string text, out float[]? vector) |
There was a problem hiding this comment.
client-provided vectors bypass the embedding subsystem. TryParseVectorText() accepts raw JSON float arrays from the caller. The specs say callers provide natural language and DAB embeds it; #3331 explicitly excludes client-provided vectors. Remove raw vector input support unless I am missing something. There is no need for this added complexity as users will not use it like this, almost 99.9999% sure of this.
| return bytes; | ||
| } | ||
|
|
||
| private async Task<float[]> GetEmbeddingAsync(string semanticSearchValue) |
There was a problem hiding this comment.
embedding failure silently returns empty results.
GetEmbeddingAsync() returns [] when embedding is disabled/fails, and the caller returns a successful empty response. Spec says embedding failure must fail the request and must not continue as if Redis returned no matches. Throw a sanitized DAB error instead.
| /// Applies a semantic narrowing predicate of the form: | ||
| /// (col1 = ... AND col2 = ...) OR (...) | ||
| /// </summary> | ||
| public void ApplySemanticCandidates(IReadOnlyList<SemanticSearchCandidate> candidates) |
There was a problem hiding this comment.
SQL narrowing uses all Redis document columns, not primary keys. ApplySemanticCandidates() iterates candidate.ColumnValues, which can include arbitrary non-PK Redis fields. The spec says Redis results provide database key values and DAB builds a key predicate. Narrow only on PrimaryKeyValues. We're wanting simple, intuitive behavior unless there is a technicalreason I am missing here.
| /// <summary> | ||
| /// True when semantic distance should be included in REST output. | ||
| /// </summary> | ||
| public bool IncludeSemanticDistanceInResponse { get; set; } |
There was a problem hiding this comment.
REST $select behavior violates the spec.
context.IncludeSemanticDistanceInResponse is set but never used by SqlQueryEngine; REST responses add semantic_distance whenever semantic search runs. Spec says $select=id,description must not return semantic_distance unless explicitly selected. It's okay to omit this, if for no other reason, so the user can expect the returning structure. Right?
| return ApplySemanticDistanceAndOrderingIfNeeded(response, structure, dataSourceName, includeRestField: true, includeGraphQlField: false); | ||
| } | ||
|
|
||
| private async Task<(bool shouldReturnEmpty, JsonDocument? emptyResponse)> TryApplySemanticNarrowingAsync( |
There was a problem hiding this comment.
GraphQL empty semantic result shape is wrong for connections. TryApplySemanticNarrowingAsync() returns JsonDocument.Parse("[]") before connection shaping. For paginated GraphQL fields the expected shape is a connection with items: [], not a raw array. Return through the normal pagination formatter.
| return similarity; | ||
| } | ||
|
|
||
| private static string NormalizeFieldName(string field) |
There was a problem hiding this comment.
Spec requires Redis records to include real database primary key column names exactly, including case. NormalizeFieldName() strips JSON paths and accepts case-insensitive matches, which can map the wrong field. Enforce exact PK names and fail/skip invalid records deterministically.
| /// <summary> | ||
| /// Resolves semantic candidates by: | ||
| /// 1) generating/retrieving an embedding vector for semantic_search input, | ||
| /// 2) performing a Redis FT.SEARCH KNN query, |
There was a problem hiding this comment.
The spec says Redis applies the threshold. This PR runs KNN then filters in DAB. Push threshold into the Redis query or explicitly justify/spec-change the behavior.
| return result.Embedding; | ||
| } | ||
|
|
||
| private static bool TryParseVectorText(string text, out float[]? vector) |
There was a problem hiding this comment.
TryParseVectorText() parses caller input as a float array before embedding. Besides violating spec, this creates avoidable CPU/memory pressure. Remove vector parsing.
| continue; | ||
| } | ||
|
|
||
| results.Add(new SemanticSearchCandidate(primaryKeys, sqlColumns, similarity)); |
There was a problem hiding this comment.
Using Redis document columns beyond PK lets external index contents influence SQL predicates on arbitrary columns. That is brittle and can interact badly with policy/RLS. Treat Redis as candidate-key source only; DAB/database authorization remains authoritative.
Summary
This PR adds end-to-end semantic (vector) search support to DAB, enabling entities to be queried using natural language. Results are retrieved from a Redis vector index via FT.SEARCH KNN and then narrowed in the SQL database to return fully enriched records, with results ordered by semantic distance.
Git Issue: #3332
What's New
Semantic Search Pipeline
RedisSemanticSearchService — new service that takes the user's search text, generates an embedding (or accepts a raw float array), issues a Redis FT.SEARCH KNN query against a configured index, and returns primary-key candidates with cosine distances
ApplySemanticCandidates() in SqlQueryStructure — translates Redis candidates into a SQL WHERE (pk1=… AND pk2=…) OR (…) predicate, so only the semantically relevant rows are fetched from the database
TryApplySemanticNarrowingAsync() in SqlQueryEngine — orchestrates the full flow: call Redis → deduplicate candidates by PK signature → apply SQL predicate → attach distance scores → re-order results by distance (unless user specified explicit ordering)
SemanticDistanceByPrimaryKeySignature map on SqlQueryStructure — carries Redis distances through to the final JSON response, injected as _distance / distance fields for REST and GraphQL respectively
Configuration
New entity.semantic_search config block per entity: enabled, redis-index-name, redis-index-type (hash/json), redis-index-multiplier, similarity-threshold
runtime.cache.level-2.connection-string is reused as the Redis connection for both caching and vector search
runtime.embeddings block: provider, endpoint, api-key, model, api-version, dimensions, timeout-ms
CLI support: --semantic-search.enabled, --semantic-search.redis-index-name, --semantic-search.redis-index-type, --semantic-search.redis-index-multiplier, --semantic-search.similarity-threshold
Schema updated in dab.draft.schema.json