SOLR-17855: Remove Dropwizard dependency from core#4404
Conversation
mlbiscoc
left a comment
There was a problem hiding this comment.
There are a few things to note:
- The
OVERSEERSTATUSapi no longer including rates, percentiles and timing information. Only requests and error info. If we still care about this overseer info, I can properly collect to OTEL and have it exposed via our metrics api and OTLP instead. - Dropwizard
metrics-core:3.2.5is still being pulled in places where we embed ZooKeeper which is Solrs testing framework.
dsmiley
left a comment
There was a problem hiding this comment.
Note that Solr has RTimer, which maybe might be an adequate substitute in some places where you removed stuff completely?
I'm okay with losing some metrics relating to the Overseer. Someone can invest in bringing it back if they so please. I'm more okay with this, I admit, because I want folks to disable the Overseer instead ;-)
It'll be very necessary to add to major-changes-in-solr-10.adoc here on metrics that have disappeared.
| scheduledExecutor.shutdown(); | ||
| } else { | ||
| log(meter.getCount() + " docs at " + meter.getMeanRate() + " doc/s"); | ||
| double rate = count / ((System.nanoTime() - meterStartNanos) / 1e9); |
There was a problem hiding this comment.
I'd prefer to see something that requires less trust in the time math as a code reviewer... like some JDK API calls to produce a readable equivalent.
There was a problem hiding this comment.
if we wanted to bring back such timings, I'd rather it be done with annotations similar to tracing @WithSpan. Alas, OTEL has no metric equivalent, which would be for something more light-weight that isn't span-worthy. I think talking to ZK is span-worthy as it's another process/system, albeit it's so fast that someone would typically prefer to disable such. Ideally Spans would have levels like logging but they don't; people have proposed it. I did see this interesting development, which is somewhat related as it allows disabling spans for an entire "scope".
|
@mlbiscoc I think we may be able to elminate JettySolrRunnerWithMetrics as well... I poked around and it appears that it is no longer used in testing, it exists for two classes, but isn't actively used in the asserts, it just sits there... I played around a bit and got this patch file: |
I took a look and applied it to this PR. I think it's worth adding. The jetty metrics were essentially a no-op in 10 at this point anyways. |
| @@ -0,0 +1,9 @@ | |||
| title: Remove Dropwizard Metrics as a direct Solr dependency. The `OVERSEERSTATUS` response no longer includes per-operation timing metrics or queue stats; success/error counts are preserved. | |||
There was a problem hiding this comment.
We can also advise that tracing is a substitute for per-operation metrics.
| } | ||
|
|
||
| private static void logIndexingRate(long docCount, long startNanos) { | ||
| long elapsedMs = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startNanos); |
There was a problem hiding this comment.
as we're logging at docs per SECOND, why convert to milliseconds at an intermediate step? It's okay if we round to a whole number... docs/sec should be rather high anyway. It's also okay if you want to add a fraction.
|
|
||
| ==== Overseer Status Metrics Removed | ||
|
|
||
| The `OVERSEERSTATUS` Collection API response no longer includes per-operation timing metrics but per-operation `requests` and `errors` counts still exist. |
There was a problem hiding this comment.
mention that OTEL tracing can be used to get similar information indirectly
https://issues.apache.org/jira/browse/SOLR-17855
After migration to OTEL, dropwizard metrics are not used anymore for metrics collection but there were still a few references to it in solr-core. This removes those dropwizard instruments so we can get rid of the dependency from core but it is still needed in test-framework where we embed zookeeper.