SOLR-18130: new param "solrConnection" in solrj-streaming, support HTTP by vyatkinv · Pull Request #4320 · apache/solr

vyatkinv · 2026-04-22T11:14:55Z

https://issues.apache.org/jira/browse/SOLR-18130

Description

This PR updates the solrj-streaming module by replacing all usages of zkHost with solrCloud to enable support for HTTP-based quorum configurations.

Solution

Parameters, fields and variables in solrj-streaming, named as zkHost renamed to solrCloud
For backward compatibility, specifying zkHost is still supported.
A shared method has been introduced in an abstract class to resolve the effective solrCloud value using a priority-based approach (e.g., explicit parameter → legacy zkHost → default Zookeeper host).

Tests

Introduced method, that randomly provides either an HTTP or Zookeeper record in different scenarios. Additionally, all zkHost parameters have been replaced with connectionString, using getSolrConnection().toString() for substitution.

Checklist

Please review the following and check all that apply:

I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
I have created a Jira issue and added the issue ID to my pull request title.
I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
I have developed this patch against the main branch.
I have run ./gradlew check.
I have added tests for my changes.
I have added documentation for the Reference Guide
I have added a changelog entry for my change

…ing" module

epugh · 2026-04-22T16:35:36Z

Shouldn't solrCloud be connectionString? I was hoping down the road we could put more into the connection string like username password or other details that are needed....

…eams

vyatkinv · 2026-04-23T03:50:46Z

Shouldn't solrCloud be connectionString? I was hoping down the road we could put more into the connection string like username password or other details that are needed....

@dsmiley previously suggested naming this parameter solrCloud in the jira ticket comments. However, it looks like there might be some disagreement here, so it would be good to reach a consensus before finalizing the name.

epugh · 2026-04-23T12:00:36Z

Shouldn't solrCloud be connectionString? I was hoping down the road we could put more into the connection string like username password or other details that are needed....

@dsmiley previously suggested naming this parameter solrCloud in the jira ticket comments. However, it looks like there might be some disagreement here, so it would be good to reach a consensus before finalizing the name.

Agreed... I believe the intent is to specify how you connect to your running Solr? In the first PR I believe it was connectionString which makes sense as "hey, I'm going to have to parse this thing, and it tells me how to connect". @dsmiley ?

dsmiley

Great work here! Clearly this was a bit more involved than a rename ;-)

"connectionString" is somewhat long and it's ambiguous as to what we're even connecting to. But it's not bad. How about "solrCloudString"? If you don't like that, I'll capitulate and be satisfied with "connectionString".

Incorporating auth is out-of-scope but I'll say now that I'm suspicious that it makes sense to embed secret credentials into this. A conversation for another time.

I imagine there should be some ref-guide updates on this in addition to major-changes-in-solr-10.adoc.

I can see the potential of a follow-on issue to update CLIUtils.getZkHost (and thus CLI commands that call it) to instead resolve a SolrCloud connection string that is not necessarily ZK.

Answers to the PR questions:

It's a test issue -- good catch. Glad you made resolving this mandatory. I wouldn't call this "mandatory SolrCloud parameter" since the param can be blank when it's used inside Solr, defaulting to the server's connection.
Normalize; do not round-trip with "zkHost".
IMO SolrParams should be used for "parameters" like this, not Map. Perhaps too much scope creep here but I leave that to you.
Let's not add support for that to ScoreNodes now, albeit a single comment where we initialize it would be nice. Like "for now limited to the current cluster but we could expand support if needed".

I suggest that you change StreamFactory to not mention "ZK" whatsoever, instead using the name we choose (e.g. getDefaultCloudString). For backwards-compatibility, however, the older method names should exist as deprecated.

epugh · 2026-04-30T14:42:47Z

Hi all... So I'm following up after @dsmiley pinged me (thanks!)... What I am excited about this is the opportunity to make it simpler for non Solr experts to access solr. I don't think that you should need to know if it's solrCloud or standalone or zk that you are using, it's just a "connection string". Yes, every connection string has rules of formatting... I would like a name like solrConnection or solrConnString but not something that is as specific as "solrCloudConnection" that ties us to a specific mode of Solr. I know that "Streaming Expressions" is a solr cloud specific feature. However, from a user perspective, they may not know that, or care.

Someday I hpoe to see Streaming Expressions pushed more towards our Data Scientist / ML user base, and they don't care about Cloud versus non Cloud mode, and also don't care about zkHost... So, having a property that speaks to their concern "how do I connect" is perfect. Could we have a property like solr or connection?

dsmiley · 2026-04-30T17:00:29Z

+1 to "solrConnection"

Could we have a property like solr or connection

huh; you just agreed to "solrConnection"?

epugh · 2026-05-01T11:48:07Z

I was just wondering if in the context of a stream expression if saying "connection" was enough. I like "solrConnection" too. We might look at other connections and see what they are named...

dsmiley · 2026-05-01T13:43:59Z

Then lets move forward @vyatkinv with "solrConnection" throughout. If at the last minute Eric gets us to change our minds again, we shouldn't need to subsequently change the vast majority of this PR since "solrConnection" is a perfectly fine parameter & field & local-var name in all places we have such.

…g> by SolrParams in streams, which used it

vyatkinv · 2026-05-02T16:11:18Z

I’ve addressed most of the review comments. Summary of changes in the latest commits:

Renamed solrCloud parameter to solrConnection.
Switched Streams to use CloudSolrClientConnection instead of a raw string.
To do this, I broke backward compatibility for constructors that accepted parameters directly. I believe this is acceptable since end users are not expected to instantiate stream implementations directly—they typically pass string expressions into StreamFactory. Most of these constructors are used internally, in tests, or not used at all.
If needed, I can restore the old constructors as @Deprecated and delegate them to the new ones based on CloudSolrClientConnection.
Fixed minor issues related to formatting and parameter ordering.
Renamed getSolrConnection to buildSolrConnection.
Added a new method to CloudSolrClient.Builder so internal Solr code can use the typed builder, while keeping the string-based API for end users.
For the same reason, I replaced the string-based getCloudSolrClient with a typed variant.
One concern: currently zkHost can accept HTTP URLs.
It seems more consistent if:
- zkHost only allows ZooKeeper connection strings,
- Solr URLs only allow HTTP(S),
- and the new solrConnection supports both.
  Should we add validation in StreamTool, the JDBC driver, and buildSolrConnection() to enforce that zkHost is strictly ZooKeeper? And allow both formats only via the new parameter?
Renamed parameter-building method to buildSolrParamsExcept and refactored it to remove the Map<String, String> variant, reusing the unified implementation.

As follow-up work (either separate issue or PR), I suggest:
a) adding support for HTTP quorum in the StreamTool CLI
b) adding support for HTTP quorum in the JDBC driver

PS: Documentation and new tests are not added yet.

dsmiley · 2026-05-18T04:50:49Z

-      // The same ZK Host is used, so the ZK ACLs should still be applied
-      cache.setDefaultZKHost(zkClient().getZkServerAddress() + "/random/chroot");
-      cache.getCloudSolrClient(
-          CloudSolrClient.CloudSolrClientConnection.parse(zkClient().getZkServerAddress()));


This is a distinction we're loosing but I don't think it matters.
CC @HoustonPutman

dsmiley · 2026-05-18T13:43:58Z

      registerV2Api(packageLoader.getPackageAPI().readAPI);
      registerV2Api(ZookeeperRead.class);
+    } else {
+      solrClientCache = new InternalSolrClientCache(solrClientProvider.getSolrClient());


Let's have no solrClientCache when no SolrCloud. SCC is linked to solrj-streaming, and it's only used for SolrCloud functions within solrj-streaming.

I'm confused by the fact that StreamHandler, SqlHandler, ExportHandler, and others use SolrClientCache regardless of mode. At least they assign it to a field and pass it to the streaming context. It seems that eliminating SolrClientCache in non-cloud mode is not a trivial task.

I'll try testing what happens when these handlers are called in standalone mode without CloudClientCache. If everything works, I think it's safe to remove it.

If this is actually going to be supported, it'd need a test. If there is no test... don't worry about it. I remember when streaming expressions came about... it kind of lightly requires SolrCloud... not sure anyone put in any real concerted effort to truly not depend on SolrCloud. That said, if we notice easy/simple ways to help it potentially work then that's a good thing.

My understanding is that none of those features were designed to work in non cloud mode, the assumption was that they wouldn't. There also was a relunctance to I think to start supporting it in non cloud mode if that isnt' a common use case, as then it increases our testing and maintenance burden. As a community, we don't have a shared understanding of what is in cloud mode versus not, and what our overall goal is. You are seeing this in the code ;-).

dsmiley · 2026-05-18T13:59:59Z

FYI consider #3740 (comment) (or skip for now as you wish)

epugh · 2026-05-18T14:02:17Z

I didn't notice at first that there are also integration tests for running CLI utilities (*.bats). Do I need to add tests for the --solr-connection parameter there?

Since we are moving to --solr-connection, please do update the bats tests.

Maybe one test just to prove that the bin/solr scripts properly pass through --solr-connection?

epugh · 2026-05-18T14:48:11Z

Looks like we refer in the error message to solr-connection, so we probably do need to think about it:

We could introduce it as --solr-connection and then deprecate --solr-url and then in the future make -s be the shortcut for --solr-connection?

epugh · 2026-05-18T15:05:19Z

I did some testing on this branch on what commands connect direct to ZK, and the results where what I expected.

docker run --name zookeeper --restart always -d -p 2181:2181 zookeeper 

# fails
bin/solr zk mkroot mychroot

# succeeds
bin/solr zk mkroot --zk-host localhost:2181 mychroot

# succeeds
bin/solr zk ls --zk-host localhost:2181 /

# fails due to unknown option name
bin/solr zk ls --solr-connection localhost:2181 /

# succeeds
bin/solr zk upconfig -z localhost:2181 -n mynewconfig -d ./server/solr/configsets/sample_techproducts_configs

# succeeds
bin/solr zk downconfig -z localhost:2181 -n mynewconfig -d /tmp

# succeeds
bin/solr cluster -z localhost:2181 --property urlScheme --value https

# Conflicts: # solr/core/src/java/org/apache/solr/core/CoreContainer.java # solr/solrj-streaming/src/java/org/apache/solr/client/solrj/io/SolrClientCache.java

…remove redundant property cleanup

vyatkinv · 2026-05-24T11:18:31Z

I've taken all the comments into account, and this evening I'll try to add all the necessary integration tests and check whether it's safe to remove the solrClientCache initialization for standalone mode.

vyatkinv · 2026-05-24T13:09:57Z

Integration tests via ./gradlew integrationTests does not works after removing metrics-core in SOLR-17855:

2026-05-24 13:06:33.753 INFO  (main) [] o.a.s.s.CoreContainerProvider Using logger factory org.apache.logging.slf4j.Log4jLoggerFactory
2026-05-24 13:06:33.760 INFO  (main) [] o.a.s.s.CoreContainerProvider  ___      _       Welcome to Apache Solr™ version 11.0.0-SNAPSHOT e804e358509b5278ff21252689e3632770a6fb3b [snapshot build, details omitted]
2026-05-24 13:06:33.760 INFO  (main) [] o.a.s.s.CoreContainerProvider / __| ___| |_ _   Starting in cloud mode on port 38229
2026-05-24 13:06:33.760 INFO  (main) [] o.a.s.s.CoreContainerProvider \__ \/ _ \ | '_|  Install dir: /home/vova/solr/solr/packaging/build/solr-11.0.0-SNAPSHOT
2026-05-24 13:06:33.760 INFO  (main) [] o.a.s.s.CoreContainerProvider |___/\___/_|_|    Start time: 2026-05-24T13:06:33.760931721Z
2026-05-24 13:06:33.763 INFO  (main) [] o.a.s.s.CoreContainerProvider Solr started with "-XX:+CrashOnOutOfMemoryError" that will crash on any OutOfMemoryError exception. The cause of the OOME will be logged in the crash file at the following path: /home/vova/solr/solr/packaging/build/test-output/solr-home/logs/jvm_crash_3608.log
[2,450s][warning][jfr,system] Could not initialize JDK events. access denied ("java.lang.RuntimePermission" "accessClassInPackage.jdk.internal.event")
Exception in thread "embeddedZkServer" java.lang.NoClassDefFoundError: com/codahale/metrics/Reservoir
	at org.apache.zookeeper.metrics.impl.DefaultMetricsProvider$DefaultMetricsContext.lambda$getSummary$2(DefaultMetricsProvider.java:151)
	at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1708)
	at org.apache.zookeeper.metrics.impl.DefaultMetricsProvider$DefaultMetricsContext.getSummary(DefaultMetricsProvider.java:147)
	at org.apache.zookeeper.server.ServerMetrics.<init>(ServerMetrics.java:81)
	at org.apache.zookeeper.server.ServerMetrics.<clinit>(ServerMetrics.java:46)
	at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:133)
	at org.apache.solr.cloud.SolrZkServer.lambda$start$0(SolrZkServer.java:160)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.lang.ClassNotFoundException: com.codahale.metrics.Reservoir
	at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:593)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:526)
	at org.eclipse.jetty.ee10.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:443)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:526)
	... 8 more
	Suppressed: java.lang.ClassNotFoundException: com.codahale.metrics.Reservoir
		at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445)
		at org.eclipse.jetty.ee10.webapp.WebAppClassLoader.findClass(WebAppClassLoader.java:563)
		at org.eclipse.jetty.ee10.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:467)
		... 9 more

Launching via bin/solr start fails for the same reason

Is there a workaround to run integration tests now?

dsmiley

I'll do a couple of the suggestions I included in this review, finishing today.

I also want to ensure no stream creates a SolrClientCache() (no args) -- should always rely on StreamContext.

dsmiley · 2026-05-24T17:42:29Z

+      streamFactory.withCollectionSolrConnection(defaultCollection, solrConnection);
+      streamFactory.withDefaultSolrConnection(solrConnection);


this line seems redundant with the previous

Why? The first line sets the default collection and connection to it, and the second line sets the default connection for all other collections if solrConnection is not specified in the expression itself.
If we remove any of them, the behavior will change.

This will be relevant when explicitly enabling external clusters. By default, indeed, one connection is always established.

dsmiley · 2026-05-24T19:07:05Z

-      streamFactory.withDefaultZkHost(defaultZkhost);
+      var solrConnection = CloudSolrClient.CloudSolrClientConnection.parse(defaultZkhost);
+      streamFactory.withCollectionSolrConnection(defaultCollection, solrConnection);
+      streamFactory.withDefaultSolrConnection(solrConnection);


this line appears redundant with the previous

dsmiley · 2026-05-24T19:09:08Z

+    if (!solrConnection.isZookeeper()) {
+      throw new SQLException(
+          String.format(
+              Locale.ROOT, "Expected ZooKeeper connection string, but got: '%s'.", schemaName));
+    }


not sure there's any point in ensuring this.

epugh · 2026-05-25T14:16:19Z

Launching via bin/solr start fails for the same reason

Is there a workaround to run integration tests now?

This is a great thing to drop a note to dev@solr.apache.org about, and let the person who merged the code know about the issue!

dsmiley · 2026-05-25T14:52:42Z

I already did yesterday: https://issues.apache.org/jira/browse/SOLR-17855
Ignore that in this issue for now.

vyatkinv · 2026-05-25T18:32:28Z

Added integration tests for CLI utilities with --solr-connection (locally i revert metrics-core remove). Still need coverage to ensure no connections are made to external clusters. I’ll look into the best approach tomorrow — the closest existing examples seem to be test_auth.bats and test_adminconsole_urls.bats

vyatkinv · 2026-05-25T19:10:27Z

gradlew_integration_tests.txt

dsmiley · 2026-05-26T03:40:58Z

I'll pursue the SolrClientCache security matter in another JIRA issue TBD. Sorry to see this JIRA issue get so much scope creep. We can merge something here but just won't release it until the security matter is handled.

vyatkinv · 2026-05-26T04:21:49Z

Good
A little later I implement InternalSolrClientCache#getFrom(CoreContainer cc) and security tests in separate PRs

dsmiley · 2026-05-26T13:22:34Z

InternalSolrClientCache#getFrom(CoreContainer cc) I'll get it; it was my idea anyway.
Can we also just back out ISCC altogether from this PR to de-scope it? The work will not be lost.

I leave it to @epugh to review the CLI aspects of this PR. Ideally that'd be a separate PR; it's a large scope divergence. I don't have time/experience or frankly interest to review the CLI.

vyatkinv · 2026-05-26T16:48:00Z

Can we also just back out ISCC altogether from this PR to de-scope it? The work will not be

I'm rolling back InternalSolrClientCache from this PR

github-actions Bot added client:solrj tests labels Apr 22, 2026

vyatkinv marked this pull request as draft April 22, 2026 11:15

vyatkinv changed the title ~~[WIP] SOLR-18130: rename parameter "zkHost" to "solrCloud" in "solrj-streaming"~~ WIP: SOLR-18130: rename parameter "zkHost" to "solrCloud" in "solrj-streaming" Apr 22, 2026

vyatkinv changed the title ~~WIP: SOLR-18130: rename parameter "zkHost" to "solrCloud" in "solrj-streaming"~~ SOLR-18130: rename parameter "zkHost" to "solrCloud" in "solrj-streaming" Apr 22, 2026

SOLR-18130: rename parameter "zkHost" to "solrCloud" in "solrj-stream…

c877f2a

…ing" module

vyatkinv force-pushed the SOLR-18130-solrj-streaming-solr-cloud-parameter branch from 27f4ae5 to c877f2a Compare April 22, 2026 13:36

vyatkinv added 2 commits April 23, 2026 10:48

SOLR-18130: add getMapWithExclusions, fix errors, refactored lost str…

7fa9646

…eams

SOLR-18130: add changelog

b374cb6

dsmiley reviewed Apr 24, 2026

View reviewed changes

vyatkinv added 5 commits May 2, 2026 18:57

Merge branch 'main' into SOLR-18130-solrj-streaming-solr-cloud-parameter

46a7eea

SOLR-18130: rename solrCloud to solrConnection

8e2e63a

SOLR-18130: reorder params in init()

01c350e

SOLR-18130: using a typed solrConnection instead of a string one

2bd00e1

SOLR-18130: rework of buildSolrParamsExcept, replace Map<String,Strin…

b4cd769

…g> by SolrParams in streams, which used it

github-actions Bot added module:sql cat:search cat:cli labels May 2, 2026

vyatkinv added 4 commits May 2, 2026 23:18

SOLR-18130: Fix changelog entry

43ad271

SOLR-18130: fix reflow

238e487

SOLR-18130: fix param order

a4ff28c

SOLR-18130: replace varargs by set in buildSolrParamsExcept

c4f5adc

SOLR-18130: more pretty changelogs

93d9577

dsmiley reviewed May 18, 2026

View reviewed changes

vyatkinv added 5 commits May 24, 2026 15:27

Merge branch 'main' into SOLR-18130-solrj-streaming-solr-cloud-parameter

32434d8

# Conflicts: # solr/core/src/java/org/apache/solr/core/CoreContainer.java # solr/solrj-streaming/src/java/org/apache/solr/client/solrj/io/SolrClientCache.java

SOLR-18130: Correction of code review comments

ddf6a1e

SOLR-18130: rename system property to 'solr.cloud.external.enabled', …

89b62e5

…remove redundant property cleanup

SOLR-18130: remove InternalSolrClientCacheTest

488ba47

SOLR-18130: unused imports

e804e35

dsmiley reviewed May 24, 2026

View reviewed changes

dsmiley added 4 commits May 24, 2026 18:18

rename to withCollectionUseThisConnection

8ec8e6c

InternalSolrClientCache: move to cloud

2e33078

getConnectionForCollection + misc

5086c9d

SQL ConnectionImpl: forbidden inside Solr

2eb2e42

vyatkinv added 2 commits May 25, 2026 23:22

SOLR-18130: code review fixes

5613d36

SOLR-18130: add bats tests for CLI tools

101db40

SOLR-18130: revert InternalSolrClientCache

1dcc233

		streamFactory.withCollectionSolrConnection(defaultCollection, solrConnection);
		streamFactory.withDefaultSolrConnection(solrConnection);

Conversation

vyatkinv commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Solution

Tests

Checklist

Uh oh!

epugh commented Apr 22, 2026

Uh oh!

vyatkinv commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

epugh commented Apr 23, 2026

Uh oh!

dsmiley left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

epugh commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dsmiley commented Apr 30, 2026

Uh oh!

epugh commented May 1, 2026

Uh oh!

dsmiley commented May 1, 2026

Uh oh!

vyatkinv commented May 2, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vyatkinv May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

epugh May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dsmiley commented May 18, 2026

Uh oh!

epugh commented May 18, 2026

Uh oh!

epugh commented May 18, 2026

Uh oh!

epugh commented May 18, 2026

Uh oh!

vyatkinv commented May 24, 2026

Uh oh!

vyatkinv commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dsmiley left a comment

Choose a reason for hiding this comment

Uh oh!

vyatkinv commented Apr 22, 2026 •

edited

Loading

vyatkinv commented Apr 23, 2026 •

edited

Loading

epugh commented Apr 30, 2026 •

edited

Loading

vyatkinv May 24, 2026 •

edited

Loading

epugh May 25, 2026 •

edited

Loading

vyatkinv commented May 24, 2026 •

edited

Loading

vyatkinv commented May 26, 2026 •

edited

Loading