nvme-cli: add nvme top command for real-time monitoring#3333
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a new interactive nvme top command to provide a continuously updating, top-like view of NVMe subsystem / namespace / path health and performance, and adds reusable utility building blocks (dashboard framework + table enhancements) to support it.
Changes:
- Extend the table utility to support float/double values and printing to an arbitrary
FILE *stream. - Add a generic terminal dashboard framework (scrolling, resize handling, uevent-driven refresh).
- Add
nvme topcommand wiring via the print-ops layer, plus helper functions for formatting and stat aggregation.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| util/table.h | Adds float/double value setters and table_print_stream() API |
| util/table.c | Implements float/double formatting and stream-based table printing |
| util/sighdl.h | Exposes new SIGWINCH flag |
| util/sighdl.c | Installs SIGWINCH handler alongside SIGINT |
| util/meson.build | Builds new util/dashboard.c |
| util/dashboard.h | New public dashboard API for live rendering + events |
| util/dashboard.c | Implements terminal dashboard rendering, input, uevent handling, resize support |
| nvme.c | Adds top builtin command and --delay option |
| nvme-top.h | Declares helper APIs for formatting and stat aggregation |
| nvme-top.c | Implements formatting and stat aggregation helpers (IOPS/BW/lat/util) |
| nvme-print.h | Adds top print-op and nvme_show_top() entrypoint |
| nvme-print.c | Wires nvme_show_top() into the print dispatch |
| nvme-print-stdout.c | Implements stdout backend for the interactive nvme top dashboard |
| nvme-builtin.h | Registers the top command in the CLI |
| meson.build | Adds nvme-top.c to the build |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@igaw I have just sent v2 upstream addressing most comments from Copilot. |
The table API automatically adjusts column width based on the width of the value being printed. While table_print_XXX() already supports unsigned, unsigned long, and long data types, the corresponding helper table_get_value_width() does not account for these types. Add support for unsigned, unsigned long, and long in table_get_value_ width() so that column width calculation is consistent with the supported print helpers. This will be used by the nvme top dashboard, where several statistics are represented using these data types. Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Link: https://patch.msgid.link/20260511115555.2638335-2-nilay@linux.ibm.com Signed-off-by: Daniel Wagner <wagi@kernel.org>
table_print_centered() open-codes the logic to determine the width of a table value, even though a helper already exists for this purpose. Replace the open-coded width calculation with a call to table_get_value_width() to avoid duplication and keep the behavior consistent across the table helpers. Also, ensure that table_get_value_width() returns -1 on failure so the caller could then handle error as needed. Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Link: https://patch.msgid.link/20260511115555.2638335-3-nilay@linux.ibm.com Signed-off-by: Daniel Wagner <wagi@kernel.org>
The table_print_XXX() APIs do not currently support printing values of type float or double. Add support for float and double so that these data types can be used with the table printing helpers. This will be later used for printing nvme-top stat. While at it, switch error reporting to nvme_show_error() for consistency with the rest of the code. Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Link: https://patch.msgid.link/20260511115555.2638335-4-nilay@linux.ibm.com Signed-off-by: Daniel Wagner <wagi@kernel.org>
The table APIs currently print output only to stdout. However, callers may need to direct output to other destinations such as stderr, a file, or an in-memory buffer. Add support for passing a FILE * stream so callers can choose where the table output is written. This will be used by the nvme top dashboard to control how statistics are displayed. Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Link: https://patch.msgid.link/20260511115555.2638335-5-nilay@linux.ibm.com Signed-off-by: Daniel Wagner <wagi@kernel.org>
Add a sigaction handler for SIGWINCH so that nvme-top can detect terminal window size changes. This allows the dashboard layout to be adjusted and redrawn when the terminal is resized. To avoid undefined behavior and async-signal-safety the data type for nvme_sigwinch_received is defined as volatile sig_atomic_t. While we are at it, also update the data type of nvme_sigint_received to volatile sig_atomic_t. Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Link: https://patch.msgid.link/20260511115555.2638335-6-nilay@linux.ibm.com Signed-off-by: Daniel Wagner <wagi@kernel.org>
Add a generic dashboard framework to support interactive, top-like views. The framework renders data within a fixed-size window backed by a larger buffer and provides event-driven APIs for updating the display. Supported events: - key input (ESC, Enter, 'q') - escape sequences (up/down arrows) - SIGWINCH for terminal resize handling - periodic refresh via timeout - kobject uevents (via netlink) to detect NVMe topology changes (subsystem, controller, namespace) - reverse video highlighting for rows Exported APIs: - FILE *dashboard_init(struct dashboard_ctx **db_ctx, int interval) Initialize the dashboard context. Returns a FILE * stream used by the caller to write data for rendering. The returned context is used by all other dashboard APIs. The @interval is specified in seconds which represents the dashboard refresh interval. - void dashboard_exit(struct dashboard_ctx *db_ctx) Tear down the dashboard and free all resources. - int dashboard_draw_frame(struct dashboard_ctx *db_ctx, int scroll) Render the current frame. If @scroll is non-zero, adjust the view based on scrolling (up/down); otherwise render a new buffer. - enum event_type dashboard_wait_for_event(struct dashboard_ctx *db_ctx) Wait for events such as key input, timeout, uevent, or SIGWINCH. Callers are expected to implement an event loop, and then based on returned event type take the action. For instance, if the returned event is timeout then caller shall update/refill the data store buffer and invoke dashboard_draw_frame() so that new/updated data could be rendered on dashboard . If the returned event type is up/down arrow keys then caller shall adjust the data start index in data store buffer and invoke dashboard_draw_frame() for rendering data. If the returned event type is key press 'ESC' or 'q' or kobject uevent then user shall take action as appropriate. Reverse video helpers: dashboard_{set,reset}_header_row_reverse() dashboard_{set,reset}_footer_row_reverse() dashboard_{set,reset}_data_row_reverse() Additional getters/setters: dashboard_get_interval() dashboard_get_header_rows() dashboard_set_header_rows() dashboard_get_footer_rows() dashboard_set_footer_rows() dashboard_get_data_rows() dashboard_get_data_start() dashboard_set_data_start() dashboard_get_frame_data_rows() This framework is intended for use by nvme-top and similar tools that require dynamic, event-driven terminal dashboards. Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Link: https://patch.msgid.link/20260511115555.2638335-7-nilay@linux.ibm.com Signed-off-by: Daniel Wagner <wagi@kernel.org>
Add a new "nvme top" CLI command that provides an interactive, top-like dashboard for real-time monitoring of NVMe devices and paths. The dashboard presents continuously updated information about NVMe fabrics/PCIe paths and devices, similar in spirit to tools such as top or iotop. It helps administrators observe device health, detect path degradation, identify multipath imbalances, and catch transient failures. The interface consists of two views: * Subsystem list view Displays all NVMe subsystems present on the system. Users can navigate the list using arrow keys and select a subsystem for detailed inspection. * Subsystem detail view Shows detailed statistics for the selected subsystem. When native multipath is enabled, this includes namespace head statistics, path statistics, path health, and controller summary. Without multipath, it displays namespace statistics and controller summary. Users can switch between views using the ESC/return key, exit with 'q', and navigate using arrow keys. This command builds on the generic dashboard framework to provide a flexible and extensible real-time monitoring interface. Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Link: https://patch.msgid.link/20260511115555.2638335-8-nilay@linux.ibm.com Signed-off-by: Daniel Wagner <wagi@kernel.org>
| struct timespec now; | ||
| long interval_ns, rem_interval_ns, start_time_ns, cur_time_ns; | ||
|
|
||
| clock_gettime(CLOCK_MONOTONIC, &now); | ||
| cur_time_ns = now.tv_sec * NSEC_PER_SEC + now.tv_nsec; |
There was a problem hiding this comment.
in other places we use the __u64 type for this.
There was a problem hiding this comment.
okay, fixed this using __u64
| /* compute remaining time */ | ||
| calc_rem_time(db_ctx, &t0); | ||
| goto again; |
There was a problem hiding this comment.
Yep, there is something off with the remaining timeout when pselect fails. I suspect this
again:
ts.tv_sec = interval_sec;
ts.tv_nsec = interval_nsec;should be
again:
ts.tv_sec = db_ctx->rem_interval.tv_sec;
ts.tv_nsec = db_ctx->rem_interval.tv_nsec;There was a problem hiding this comment.
Yes correct, I'll fix this.
| nvme_sigwinch_received = false; | ||
| if (sigaction(SIGWINCH, &act, NULL) == -1) | ||
| return -errno; |
There was a problem hiding this comment.
Good point. This could have an impact when using nvme-cli in non-top mode. I think we should handle this differently.
There was a problem hiding this comment.
Yeah I am fixing this in next PR.
| void dashboard_set_data_raw_reverse(struct dashboard_ctx *db_ctx, int row); | ||
| void dashboard_reset_data_raw_reverse(struct dashboard_ctx *db_ctx); | ||
|
|
||
| void dashboard_set_header_raw_reverse(struct dashboard_ctx *db_ctx, int row); | ||
| void dashboard_reset_header_raw_reverse(struct dashboard_ctx *db_ctx); | ||
|
|
||
| void dashboard_set_footer_raw_reverse(struct dashboard_ctx *db_ctx, int row); | ||
| void dashboard_reset_footer_raw_reverse(struct dashboard_ctx *db_ctx); |
There was a problem hiding this comment.
Yes this is type, will fix this in next PR.
| while (1) { | ||
| stdout_top_print_subsys_topology_header(db_ctx, stream); | ||
| ret = stdout_top_print_subsys_topology(db_ctx, stream, s); | ||
| if (ret) | ||
| break; |
There was a problem hiding this comment.
just use:
__cleanup_nvme_global_ctx struct libnvme_global_ctx *ctx = NULL;and I think it would be great to move the new dashboard/top feature into a new file, e.g. nvme-print-stdout-top.c. This files gets a bit too big and we might need to disable this for the windows port.
There was a problem hiding this comment.
Yeah makes sense. I will be moving all top related code under new file nvme-print-stdout-top.c
| libnvme_subsystem_for_each_ctrl(s, c) | ||
| num_ctrl++; | ||
|
|
||
| if (!num_ctrl) | ||
| continue; |
There was a problem hiding this comment.
Not totally sure if this is valid... starring at the code.
There was a problem hiding this comment.
This is a edge case and I am not sure if we could really have any NVMe subsystem with zero controller. The reason we ignore it because if we have got any NVMe subsystem which has no controller then does it even make sense to display it? So my initial thought was to ignore displaying any such subsystem.
But now again staring at the code it seems that if we do that then we may not be able to have 1:1 mapping between displayed row index and the selected subsys pointer, as Copilot has correctly pointed out.
So I'd rather not ignore such subsystem with zero controller but still display it. I see there's no issue displaying such subsystem as anyways all properties of such subsystems would be also zero (#paths, #namespaces, IOPS, BW etc...)
|
@shroffni thanks for the update. Copilot has found a few more things, which are non trivial... The signal handling one is the one we should definitely address before merging, as it can introduce regressions. |
Yes I will address new review comments. Most of them are valid, makes sense. But I wonder why it didn't raise some of those comments during initial review (as some of those comments still apply to the v1 code). |
I think the review is somewhat limited on the context size of the LLM. Thus it just might have found the low hanging fruits first, run out of space and on the second pass it found more. Remember it is mimicking humans :) Anyway, if you have a Copilot subscription you could push a it to your github repo and then ask Copilot for an review. In case you don't have one, you could create a PR (and update it with |
Okay I don't have a Copilot subscription. So I may just raise PR (after addressing latest review comments) and let you know; so you may then ask Copilot to review once more! |
|
Let's close this one as is been replaced. |
Monitoring NVMe devices and paths in production is currently limited to
static snapshots via nvme-cli. While this is sufficient for basic
inspection, it is not ideal for NVMe-oF (fabrics) deployments where path
conditions can change dynamically due to varying network latency,
congestion, or link failures.
In multipath environments, administrators often need continuous
visibility into path state, ANA status, queue depth, link speed, and
error counters. Today, this typically requires repeatedly invoking
commands or relying on ad-hoc tooling, making it harder to quickly
identify issues.
This patch series introduces "nvme top", a tool for real-time monitoring
of NVMe devices and fabrics paths, similar in spirit to tools such as
top or iotop. The goal is to provide a continuously updating view of
device and path health, enabling faster detection of link degradation,
multipath imbalances, and transient failures.
The series first adds the necessary building blocks for supporting a
top-like dashboard. The initial patches extend the table APIs (including
support for additional data types such as unsigned, long, float, and
double) and introduce a generic dashboard framework. The final patch
adds the nvme top command built on top of this framework.
Future work:
current implementation relies on kobject uevents for topology change,
but namespace path add/delete events are not exported by the kernel
since they are associated with hidden gendisk kobjects. This may
require explicit uevent generation from the NVMe driver for namespace
path changes.
As usual feedback, comments, and suggestions are welcome!