SOLR-18130: new param "solrConnection" in solrj-streaming, support HTTP#4320
SOLR-18130: new param "solrConnection" in solrj-streaming, support HTTP#4320vyatkinv wants to merge 45 commits into
Conversation
27f4ae5 to
c877f2a
Compare
|
Shouldn't |
@dsmiley previously suggested naming this parameter |
Agreed... I believe the intent is to specify how you connect to your running Solr? In the first PR I believe it was |
dsmiley
left a comment
There was a problem hiding this comment.
Great work here! Clearly this was a bit more involved than a rename ;-)
"connectionString" is somewhat long and it's ambiguous as to what we're even connecting to. But it's not bad. How about "solrCloudString"? If you don't like that, I'll capitulate and be satisfied with "connectionString".
Incorporating auth is out-of-scope but I'll say now that I'm suspicious that it makes sense to embed secret credentials into this. A conversation for another time.
I imagine there should be some ref-guide updates on this in addition to major-changes-in-solr-10.adoc.
I can see the potential of a follow-on issue to update CLIUtils.getZkHost (and thus CLI commands that call it) to instead resolve a SolrCloud connection string that is not necessarily ZK.
Answers to the PR questions:
- It's a test issue -- good catch. Glad you made resolving this mandatory. I wouldn't call this "mandatory SolrCloud parameter" since the param can be blank when it's used inside Solr, defaulting to the server's connection.
- Normalize; do not round-trip with "zkHost".
- IMO SolrParams should be used for "parameters" like this, not Map. Perhaps too much scope creep here but I leave that to you.
- Let's not add support for that to ScoreNodes now, albeit a single comment where we initialize it would be nice. Like "for now limited to the current cluster but we could expand support if needed".
I suggest that you change StreamFactory to not mention "ZK" whatsoever, instead using the name we choose (e.g. getDefaultCloudString). For backwards-compatibility, however, the older method names should exist as deprecated.
|
Hi all... So I'm following up after @dsmiley pinged me (thanks!)... What I am excited about this is the opportunity to make it simpler for non Solr experts to access solr. I don't think that you should need to know if it's solrCloud or standalone or zk that you are using, it's just a "connection string". Yes, every connection string has rules of formatting... I would like a name like solrConnection or solrConnString but not something that is as specific as "solrCloudConnection" that ties us to a specific mode of Solr. I know that "Streaming Expressions" is a solr cloud specific feature. However, from a user perspective, they may not know that, or care. Someday I hpoe to see Streaming Expressions pushed more towards our Data Scientist / ML user base, and they don't care about Cloud versus non Cloud mode, and also don't care about zkHost... So, having a property that speaks to their concern "how do I connect" is perfect. Could we have a property like |
|
+1 to "solrConnection"
huh; you just agreed to "solrConnection"? |
|
I was just wondering if in the context of a stream expression if saying "connection" was enough. I like "solrConnection" too. We might look at other connections and see what they are named... |
|
Then lets move forward @vyatkinv with "solrConnection" throughout. If at the last minute Eric gets us to change our minds again, we shouldn't need to subsequently change the vast majority of this PR since "solrConnection" is a perfectly fine parameter & field & local-var name in all places we have such. |
…g> by SolrParams in streams, which used it
|
I’ve addressed most of the review comments. Summary of changes in the latest commits:
As follow-up work (either separate issue or PR), I suggest: PS: Documentation and new tests are not added yet. |
| // The same ZK Host is used, so the ZK ACLs should still be applied | ||
| cache.setDefaultZKHost(zkClient().getZkServerAddress() + "/random/chroot"); | ||
| cache.getCloudSolrClient( | ||
| CloudSolrClient.CloudSolrClientConnection.parse(zkClient().getZkServerAddress())); |
There was a problem hiding this comment.
This is a distinction we're loosing but I don't think it matters.
CC @HoustonPutman
| registerV2Api(packageLoader.getPackageAPI().readAPI); | ||
| registerV2Api(ZookeeperRead.class); | ||
| } else { | ||
| solrClientCache = new InternalSolrClientCache(solrClientProvider.getSolrClient()); |
There was a problem hiding this comment.
Let's have no solrClientCache when no SolrCloud. SCC is linked to solrj-streaming, and it's only used for SolrCloud functions within solrj-streaming.
There was a problem hiding this comment.
I'm confused by the fact that StreamHandler, SqlHandler, ExportHandler, and others use SolrClientCache regardless of mode. At least they assign it to a field and pass it to the streaming context. It seems that eliminating SolrClientCache in non-cloud mode is not a trivial task.
There was a problem hiding this comment.
I'll try testing what happens when these handlers are called in standalone mode without CloudClientCache. If everything works, I think it's safe to remove it.
There was a problem hiding this comment.
If this is actually going to be supported, it'd need a test. If there is no test... don't worry about it. I remember when streaming expressions came about... it kind of lightly requires SolrCloud... not sure anyone put in any real concerted effort to truly not depend on SolrCloud. That said, if we notice easy/simple ways to help it potentially work then that's a good thing.
There was a problem hiding this comment.
My understanding is that none of those features were designed to work in non cloud mode, the assumption was that they wouldn't. There also was a relunctance to I think to start supporting it in non cloud mode if that isnt' a common use case, as then it increases our testing and maintenance burden. As a community, we don't have a shared understanding of what is in cloud mode versus not, and what our overall goal is. You are seeing this in the code ;-).
|
FYI consider #3740 (comment) (or skip for now as you wish) |
Since we are moving to Maybe one test just to prove that the |
|
I did some testing on this branch on what commands connect direct to ZK, and the results where what I expected. |
# Conflicts: # solr/core/src/java/org/apache/solr/core/CoreContainer.java # solr/solrj-streaming/src/java/org/apache/solr/client/solrj/io/SolrClientCache.java
…remove redundant property cleanup
|
I've taken all the comments into account, and this evening I'll try to add all the necessary integration tests and check whether it's safe to remove the solrClientCache initialization for standalone mode. |
|
Integration tests via Launching via Is there a workaround to run integration tests now? |
dsmiley
left a comment
There was a problem hiding this comment.
I'll do a couple of the suggestions I included in this review, finishing today.
I also want to ensure no stream creates a SolrClientCache() (no args) -- should always rely on StreamContext.
| streamFactory.withCollectionSolrConnection(defaultCollection, solrConnection); | ||
| streamFactory.withDefaultSolrConnection(solrConnection); |
There was a problem hiding this comment.
this line seems redundant with the previous
There was a problem hiding this comment.
Why? The first line sets the default collection and connection to it, and the second line sets the default connection for all other collections if solrConnection is not specified in the expression itself.
If we remove any of them, the behavior will change.
There was a problem hiding this comment.
This will be relevant when explicitly enabling external clusters. By default, indeed, one connection is always established.
| streamFactory.withDefaultZkHost(defaultZkhost); | ||
| var solrConnection = CloudSolrClient.CloudSolrClientConnection.parse(defaultZkhost); | ||
| streamFactory.withCollectionSolrConnection(defaultCollection, solrConnection); | ||
| streamFactory.withDefaultSolrConnection(solrConnection); |
There was a problem hiding this comment.
this line appears redundant with the previous
| if (!solrConnection.isZookeeper()) { | ||
| throw new SQLException( | ||
| String.format( | ||
| Locale.ROOT, "Expected ZooKeeper connection string, but got: '%s'.", schemaName)); | ||
| } |
There was a problem hiding this comment.
not sure there's any point in ensuring this.
This is a great thing to drop a note to dev@solr.apache.org about, and let the person who merged the code know about the issue! |
|
I already did yesterday: https://issues.apache.org/jira/browse/SOLR-17855 |
|
Added integration tests for CLI utilities with |
|
I'll pursue the SolrClientCache security matter in another JIRA issue TBD. Sorry to see this JIRA issue get so much scope creep. We can merge something here but just won't release it until the security matter is handled. |
|
Good |
|
I leave it to @epugh to review the CLI aspects of this PR. Ideally that'd be a separate PR; it's a large scope divergence. I don't have time/experience or frankly interest to review the CLI. |
I'm rolling back |

https://issues.apache.org/jira/browse/SOLR-18130
Description
This PR updates the solrj-streaming module by replacing all usages of zkHost with solrCloud to enable support for HTTP-based quorum configurations.
Solution
Parameters, fields and variables in solrj-streaming, named as
zkHostrenamed tosolrCloudFor backward compatibility, specifying zkHost is still supported.
A shared method has been introduced in an abstract class to resolve the effective solrCloud value using a priority-based approach (e.g., explicit parameter → legacy zkHost → default Zookeeper host).
Tests
Introduced method, that randomly provides either an HTTP or Zookeeper record in different scenarios. Additionally, all zkHost parameters have been replaced with connectionString, using
getSolrConnection().toString()for substitution.Checklist
Please review the following and check all that apply:
mainbranch../gradlew check.