Skip to content

nvme-cli: add nvme top command for real-time monitoring#3333

Closed
igaw wants to merge 7 commits into
linux-nvme:masterfrom
igaw:top
Closed

nvme-cli: add nvme top command for real-time monitoring#3333
igaw wants to merge 7 commits into
linux-nvme:masterfrom
igaw:top

Conversation

@igaw
Copy link
Copy Markdown
Collaborator

@igaw igaw commented Apr 30, 2026

Monitoring NVMe devices and paths in production is currently limited to
static snapshots via nvme-cli. While this is sufficient for basic
inspection, it is not ideal for NVMe-oF (fabrics) deployments where path
conditions can change dynamically due to varying network latency,
congestion, or link failures.

In multipath environments, administrators often need continuous
visibility into path state, ANA status, queue depth, link speed, and
error counters. Today, this typically requires repeatedly invoking
commands or relying on ad-hoc tooling, making it harder to quickly
identify issues.

This patch series introduces "nvme top", a tool for real-time monitoring
of NVMe devices and fabrics paths, similar in spirit to tools such as
top or iotop. The goal is to provide a continuously updating view of
device and path health, enabling faster detection of link degradation,
multipath imbalances, and transient failures.

The series first adds the necessary building blocks for supporting a
top-like dashboard. The initial patches extend the table APIs (including
support for additional data types such as unsigned, long, float, and
double) and introduce a generic dashboard framework. The final patch
adds the nvme top command built on top of this framework.

Future work:

  • Export NVMe statistics to external monitoring systems (e.g. Grafana).
  • Improve topology change detection in multipath configurations. The
    current implementation relies on kobject uevents for topology change,
    but namespace path add/delete events are not exported by the kernel
    since they are associated with hidden gendisk kobjects. This may
    require explicit uevent generation from the NVMe driver for namespace
    path changes.
  • Wire nvme top into an MCP pipeline and feed it to an LLM

As usual feedback, comments, and suggestions are welcome!

Comment thread nvme-print-stdout.c Fixed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new interactive nvme top command to provide a continuously updating, top-like view of NVMe subsystem / namespace / path health and performance, and adds reusable utility building blocks (dashboard framework + table enhancements) to support it.

Changes:

  • Extend the table utility to support float/double values and printing to an arbitrary FILE * stream.
  • Add a generic terminal dashboard framework (scrolling, resize handling, uevent-driven refresh).
  • Add nvme top command wiring via the print-ops layer, plus helper functions for formatting and stat aggregation.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
util/table.h Adds float/double value setters and table_print_stream() API
util/table.c Implements float/double formatting and stream-based table printing
util/sighdl.h Exposes new SIGWINCH flag
util/sighdl.c Installs SIGWINCH handler alongside SIGINT
util/meson.build Builds new util/dashboard.c
util/dashboard.h New public dashboard API for live rendering + events
util/dashboard.c Implements terminal dashboard rendering, input, uevent handling, resize support
nvme.c Adds top builtin command and --delay option
nvme-top.h Declares helper APIs for formatting and stat aggregation
nvme-top.c Implements formatting and stat aggregation helpers (IOPS/BW/lat/util)
nvme-print.h Adds top print-op and nvme_show_top() entrypoint
nvme-print.c Wires nvme_show_top() into the print dispatch
nvme-print-stdout.c Implements stdout backend for the interactive nvme top dashboard
nvme-builtin.h Registers the top command in the CLI
meson.build Adds nvme-top.c to the build

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread util/dashboard.h
Comment thread util/dashboard.c Outdated
Comment thread util/dashboard.h Outdated
Comment thread util/sighdl.h Outdated
Comment thread util/dashboard.c
Comment thread util/dashboard.c
Comment thread nvme-print-stdout.c Outdated
Comment thread util/table.h
Comment thread util/table.c
Comment thread util/dashboard.c Outdated
@shroffni
Copy link
Copy Markdown
Contributor

@igaw I have just sent v2 upstream addressing most comments from Copilot.

shroffni added 7 commits May 11, 2026 14:17
The table API automatically adjusts column width based on the width of
the value being printed. While table_print_XXX() already supports
unsigned, unsigned long, and long data types, the corresponding helper
table_get_value_width() does not account for these types.

Add support for unsigned, unsigned long, and long in table_get_value_
width() so that column width calculation is consistent with the
supported print helpers.

This will be used by the nvme top dashboard, where several statistics
are represented using these data types.

Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Link: https://patch.msgid.link/20260511115555.2638335-2-nilay@linux.ibm.com
Signed-off-by: Daniel Wagner <wagi@kernel.org>
table_print_centered() open-codes the logic to determine the width of a
table value, even though a helper already exists for this purpose.

Replace the open-coded width calculation with a call to
table_get_value_width() to avoid duplication and keep the behavior
consistent across the table helpers. Also, ensure that
table_get_value_width() returns -1 on failure so the caller could
then handle error as needed.

Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Link: https://patch.msgid.link/20260511115555.2638335-3-nilay@linux.ibm.com
Signed-off-by: Daniel Wagner <wagi@kernel.org>
The table_print_XXX() APIs do not currently support printing values of
type float or double.

Add support for float and double so that these data types can be used
with the table printing helpers. This will be later used for printing
nvme-top stat.

While at it, switch error reporting to nvme_show_error() for
consistency with the rest of the code.

Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Link: https://patch.msgid.link/20260511115555.2638335-4-nilay@linux.ibm.com
Signed-off-by: Daniel Wagner <wagi@kernel.org>
The table APIs currently print output only to stdout. However, callers
may need to direct output to other destinations such as stderr, a file,
or an in-memory buffer.

Add support for passing a FILE * stream so callers can choose where the
table output is written.

This will be used by the nvme top dashboard to control how statistics
are displayed.

Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Link: https://patch.msgid.link/20260511115555.2638335-5-nilay@linux.ibm.com
Signed-off-by: Daniel Wagner <wagi@kernel.org>
Add a sigaction handler for SIGWINCH so that nvme-top can detect
terminal window size changes.

This allows the dashboard layout to be adjusted and redrawn when the
terminal is resized. To avoid undefined behavior and async-signal-safety
the data type for nvme_sigwinch_received is defined as volatile
sig_atomic_t.

While we are at it, also update the data type of nvme_sigint_received
to volatile sig_atomic_t.

Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Link: https://patch.msgid.link/20260511115555.2638335-6-nilay@linux.ibm.com
Signed-off-by: Daniel Wagner <wagi@kernel.org>
Add a generic dashboard framework to support interactive, top-like
views. The framework renders data within a fixed-size window backed by
a larger buffer and provides event-driven APIs for updating the display.

Supported events:

- key input (ESC, Enter, 'q')
- escape sequences (up/down arrows)
- SIGWINCH for terminal resize handling
- periodic refresh via timeout
- kobject uevents (via netlink) to detect NVMe topology changes
  (subsystem, controller, namespace)
- reverse video highlighting for rows

Exported APIs:

- FILE *dashboard_init(struct dashboard_ctx **db_ctx, int interval)
  Initialize the dashboard context. Returns a FILE * stream used by the
  caller to write data for rendering. The returned context is used by
  all other dashboard APIs. The @interval is specified in seconds which
  represents the dashboard refresh interval.

- void dashboard_exit(struct dashboard_ctx *db_ctx)
  Tear down the dashboard and free all resources.

- int dashboard_draw_frame(struct dashboard_ctx *db_ctx, int scroll)
  Render the current frame. If @scroll is non-zero, adjust the view
  based on scrolling (up/down); otherwise render a new buffer.

- enum event_type dashboard_wait_for_event(struct dashboard_ctx *db_ctx)
  Wait for events such as key input, timeout, uevent, or SIGWINCH.
  Callers are expected to implement an event loop, and then based on
  returned event type take the action. For instance, if the returned
  event is timeout then caller shall update/refill the data store buffer
  and invoke dashboard_draw_frame() so that new/updated data could be
  rendered on dashboard . If the returned event type is up/down arrow
  keys then caller shall adjust the data start index in data store
  buffer and invoke dashboard_draw_frame() for rendering data. If the
  returned event type is key press 'ESC' or 'q' or kobject uevent then
  user shall take action as appropriate.

Reverse video helpers:

dashboard_{set,reset}_header_row_reverse()
dashboard_{set,reset}_footer_row_reverse()
dashboard_{set,reset}_data_row_reverse()

Additional getters/setters:

dashboard_get_interval()
dashboard_get_header_rows()
dashboard_set_header_rows()
dashboard_get_footer_rows()
dashboard_set_footer_rows()
dashboard_get_data_rows()
dashboard_get_data_start()
dashboard_set_data_start()
dashboard_get_frame_data_rows()

This framework is intended for use by nvme-top and similar tools that
require dynamic, event-driven terminal dashboards.

Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Link: https://patch.msgid.link/20260511115555.2638335-7-nilay@linux.ibm.com
Signed-off-by: Daniel Wagner <wagi@kernel.org>
Add a new "nvme top" CLI command that provides an interactive,
top-like dashboard for real-time monitoring of NVMe devices and
paths.

The dashboard presents continuously updated information about
NVMe fabrics/PCIe paths and devices, similar in spirit to tools
such as top or iotop. It helps administrators observe device
health, detect path degradation, identify multipath imbalances,
and catch transient failures.

The interface consists of two views:

* Subsystem list view
  Displays all NVMe subsystems present on the system. Users can
  navigate the list using arrow keys and select a subsystem for
  detailed inspection.

* Subsystem detail view
  Shows detailed statistics for the selected subsystem. When
  native multipath is enabled, this includes namespace head
  statistics, path statistics, path health, and controller
  summary. Without multipath, it displays namespace statistics
  and controller summary.

Users can switch between views using the ESC/return key, exit with
'q', and navigate using arrow keys.

This command builds on the generic dashboard framework to provide
a flexible and extensible real-time monitoring interface.

Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Link: https://patch.msgid.link/20260511115555.2638335-8-nilay@linux.ibm.com
Signed-off-by: Daniel Wagner <wagi@kernel.org>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.

Comment thread util/dashboard.c
Comment on lines +330 to +334
struct timespec now;
long interval_ns, rem_interval_ns, start_time_ns, cur_time_ns;

clock_gettime(CLOCK_MONOTONIC, &now);
cur_time_ns = now.tv_sec * NSEC_PER_SEC + now.tv_nsec;
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in other places we use the __u64 type for this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, fixed this using __u64

Comment thread util/dashboard.c
Comment thread util/dashboard.c
Comment on lines +412 to +414
/* compute remaining time */
calc_rem_time(db_ctx, &t0);
goto again;
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, there is something off with the remaining timeout when pselect fails. I suspect this

again:
		ts.tv_sec = interval_sec;
		ts.tv_nsec = interval_nsec;

should be

again:
		ts.tv_sec = db_ctx->rem_interval.tv_sec;
		ts.tv_nsec = db_ctx->rem_interval.tv_nsec;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes correct, I'll fix this.

Comment thread util/sighdl.c
Comment on lines +31 to +33
nvme_sigwinch_received = false;
if (sigaction(SIGWINCH, &act, NULL) == -1)
return -errno;
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. This could have an impact when using nvme-cli in non-top mode. I think we should handle this differently.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I am fixing this in next PR.

Comment thread util/dashboard.h
Comment on lines +39 to +46
void dashboard_set_data_raw_reverse(struct dashboard_ctx *db_ctx, int row);
void dashboard_reset_data_raw_reverse(struct dashboard_ctx *db_ctx);

void dashboard_set_header_raw_reverse(struct dashboard_ctx *db_ctx, int row);
void dashboard_reset_header_raw_reverse(struct dashboard_ctx *db_ctx);

void dashboard_set_footer_raw_reverse(struct dashboard_ctx *db_ctx, int row);
void dashboard_reset_footer_raw_reverse(struct dashboard_ctx *db_ctx);
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a typo

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is type, will fix this in next PR.

Comment thread nvme-print-stdout.c
Comment on lines +7234 to +7238
while (1) {
stdout_top_print_subsys_topology_header(db_ctx, stream);
ret = stdout_top_print_subsys_topology(db_ctx, stream, s);
if (ret)
break;
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use:

	__cleanup_nvme_global_ctx struct libnvme_global_ctx *ctx = NULL;

and I think it would be great to move the new dashboard/top feature into a new file, e.g. nvme-print-stdout-top.c. This files gets a bit too big and we might need to disable this for the windows port.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah makes sense. I will be moving all top related code under new file nvme-print-stdout-top.c

Comment thread nvme-print-stdout.c
Comment on lines +7381 to +7385
libnvme_subsystem_for_each_ctrl(s, c)
num_ctrl++;

if (!num_ctrl)
continue;
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not totally sure if this is valid... starring at the code.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a edge case and I am not sure if we could really have any NVMe subsystem with zero controller. The reason we ignore it because if we have got any NVMe subsystem which has no controller then does it even make sense to display it? So my initial thought was to ignore displaying any such subsystem.

But now again staring at the code it seems that if we do that then we may not be able to have 1:1 mapping between displayed row index and the selected subsys pointer, as Copilot has correctly pointed out.

So I'd rather not ignore such subsystem with zero controller but still display it. I see there's no issue displaying such subsystem as anyways all properties of such subsystems would be also zero (#paths, #namespaces, IOPS, BW etc...)

@igaw
Copy link
Copy Markdown
Collaborator Author

igaw commented May 11, 2026

@shroffni thanks for the update. Copilot has found a few more things, which are non trivial... The signal handling one is the one we should definitely address before merging, as it can introduce regressions.

@shroffni
Copy link
Copy Markdown
Contributor

@shroffni thanks for the update. Copilot has found a few more things, which are non trivial... The signal handling one is the one we should definitely address before merging, as it can introduce regressions.

Yes I will address new review comments. Most of them are valid, makes sense. But I wonder why it didn't raise some of those comments during initial review (as some of those comments still apply to the v1 code).
Anyways, is there any way for me to ask Copilot for review before I submit the code so that I don't have to again re-spin a new version?

@igaw
Copy link
Copy Markdown
Collaborator Author

igaw commented May 11, 2026

@shroffni thanks for the update. Copilot has found a few more things, which are non trivial... The signal handling one is the one we should definitely address before merging, as it can introduce regressions.

Yes I will address new review comments. Most of them are valid, makes sense. But I wonder why it didn't raise some of those comments during initial review (as some of those comments still apply to the v1 code). Anyways, is there any way for me to ask Copilot for review before I submit the code so that I don't have to again re-spin a new version?

I think the review is somewhat limited on the context size of the LLM. Thus it just might have found the low hanging fruits first, run out of space and on the second pass it found more. Remember it is mimicking humans :)

Anyway, if you have a Copilot subscription you could push a it to your github repo and then ask Copilot for an review. In case you don't have one, you could create a PR (and update it with git push -f) and I just ask for a Copilot for a review. As my subscription is sponsored by github anyway, I don't mind.

@shroffni
Copy link
Copy Markdown
Contributor

@shroffni thanks for the update. Copilot has found a few more things, which are non trivial... The signal handling one is the one we should definitely address before merging, as it can introduce regressions.

Yes I will address new review comments. Most of them are valid, makes sense. But I wonder why it didn't raise some of those comments during initial review (as some of those comments still apply to the v1 code). Anyways, is there any way for me to ask Copilot for review before I submit the code so that I don't have to again re-spin a new version?

I think the review is somewhat limited on the context size of the LLM. Thus it just might have found the low hanging fruits first, run out of space and on the second pass it found more. Remember it is mimicking humans :)

Anyway, if you have a Copilot subscription you could push a it to your github repo and then ask Copilot for an review. In case you don't have one, you could create a PR (and update it with git push -f) and I just ask for a Copilot for a review. As my subscription is sponsored by github anyway, I don't mind.

Okay I don't have a Copilot subscription. So I may just raise PR (after addressing latest review comments) and let you know; so you may then ask Copilot to review once more!

@shroffni
Copy link
Copy Markdown
Contributor

@igaw I have just raised this PR: #3350

Addressed comments from the last review. You may ask Copilot to review once more. If all looks good to you then
I'd spin a new patchset and send it upstream.

@igaw
Copy link
Copy Markdown
Collaborator Author

igaw commented May 12, 2026

Let's close this one as is been replaced.

@igaw igaw closed this May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants