MULTIARCH-6023: PowerVS IPI install signal handling to prevent deprovision failures#80917
MULTIARCH-6023: PowerVS IPI install signal handling to prevent deprovision failures#80917Neha-dot-Yadav wants to merge 3 commits into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughThe script now uses separate TERM and EXIT traps, with TERM terminating child jobs and exiting 143. Both installer phases run in the background with PID-based waiting, and the step config updates timeout and grace period values. ChangesPowervs install TERM handling
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 14 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (14 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
ci-operator/step-registry/ipi/install/powervs/install/ipi-install-powervs-install-commands.sh (1)
972-976: 🔒 Security & Privacy | 🔴 Critical | ⚡ Quick winSame redirection-order bug as
create cluster— stderr bypasses the filter here too.🔒 Proposed fix
- openshift-install wait-for install-complete --dir="${dir}" 2>&1 > >(grep --line-buffered -v 'password\|X-Auth-Token\|UserData:') & + openshift-install wait-for install-complete --dir="${dir}" > >(grep --line-buffered -v 'password\|X-Auth-Token\|UserData:') 2>&1 & INSTALL_PID=$! wait $INSTALL_PID ret=$?🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ci-operator/step-registry/ipi/install/powervs/install/ipi-install-powervs-install-commands.sh` around lines 972 - 976, The redirection order in the openshift-install wait-for install-complete command is incorrect, causing stderr to bypass the grep filter. The current order `2>&1 > >(grep ...)` redirects stderr before stdout is piped to grep, so stderr goes to the original stdout instead of through the filter. Move the `2>&1` redirection to come after the process substitution `> >(grep ...)` so that both stdout and stderr are properly filtered through the grep command that removes sensitive information like passwords and authentication tokens.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@ci-operator/step-registry/ipi/install/powervs/install/ipi-install-powervs-install-commands.sh`:
- Around line 940-944: The redirection order in the openshift-install command is
incorrect, causing stderr to bypass the secret-scrubbing grep filter. Currently,
stderr is redirected to stdout before stdout is redirected to the grep process
substitution, allowing sensitive information like passwords and tokens to leak
into CI logs unredacted. Fix this by swapping the redirection order in the
openshift-install invocation so that stdout is first redirected to the grep
process substitution using > >(...), and then stderr is merged into it using
2>&1, ensuring both output streams pass through the password, X-Auth-Token, and
UserData filtering.
- Around line 707-710: Split the combined trap statement handling both EXIT and
TERM into two separate trap handlers to prevent prepare_next_steps from
executing twice and to preserve the correct exit code. Create a TERM trap that
kills background children and exits the script immediately, and a separate EXIT
trap that captures the exit code before running prepare_next_steps, ensuring the
function reads the actual installation result rather than the status from the
wait command.
---
Duplicate comments:
In
`@ci-operator/step-registry/ipi/install/powervs/install/ipi-install-powervs-install-commands.sh`:
- Around line 972-976: The redirection order in the openshift-install wait-for
install-complete command is incorrect, causing stderr to bypass the grep filter.
The current order `2>&1 > >(grep ...)` redirects stderr before stdout is piped
to grep, so stderr goes to the original stdout instead of through the filter.
Move the `2>&1` redirection to come after the process substitution `> >(grep
...)` so that both stdout and stderr are properly filtered through the grep
command that removes sensitive information like passwords and authentication
tokens.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: b04622d5-638c-4b9d-86ce-8f97b938dad7
📒 Files selected for processing (1)
ci-operator/step-registry/ipi/install/powervs/install/ipi-install-powervs-install-commands.sh
|
/pj-rehearse periodic-ci-openshift-multiarch-main-nightly-5.0-ocp-e2e-ovn-powervc-multi-p-p |
|
@hamzy: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@hamzy: job(s): periodic-ci-openshift-multiarch-main-nightly-5.0-ocp-e2e-ovn-powervc-multi-p-p either don't exist or were not found to be affected, and cannot be rehearsed |
|
/pj-rehearse periodic-ci-openshift-multiarch-main-nightly-4.22-ocp-e2e-ovn-powervs-capi-multi-p-p |
|
@hamzy: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Neha-dot-Yadav, prb112 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/retitle MULTIARCH-6023: PowerVS IPI install signal handling to prevent deprovision failures |
|
@Neha-dot-Yadav: This pull request references MULTIARCH-6023 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/pj-rehearse periodic-ci-openshift-multiarch-main-nightly-4.22-ocp-e2e-ovn-powervs-capi-multi-p-p |
|
@hamzy: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
New changes are detected. LGTM label has been removed. |
|
[REHEARSALNOTIFIER]
A total of 43 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs. A full list of affected jobs can be found here Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
/pj-rehearse periodic-ci-openshift-multiarch-main-nightly-4.22-ocp-e2e-ovn-powervs-capi-multi-p-p |
|
@Neha-dot-Yadav: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@Neha-dot-Yadav: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/hold |
| as: ipi-install-powervs-install | ||
| from: upi-installer | ||
| grace_period: 2h | ||
| timeout: 30m0s # Temporary: reduced from default for TERM trap testing |
Problem
When PowerVS IPI installations timeout, the deprovision step fails with:
Skipping: /tmp/secret/metadata.json not found.This prevents cleanup of cloud resources (VMs, networks, DNS records), leading to stale resource and subsequent job failures.
Root Cause
The script has two issues with signal handling:
openshift-install ... | grep ... # Bash is blocked hereWhen SIGTERM arrives, it's queued but cannot be processed until the pipeline completes. If the pipeline never completes (stuck installation), SIGKILL arrives after the grace period and terminates everything without running cleanup.
Solution
Summary by CodeRabbit
This PR improves the reliability of PowerVS IPI installation cleanup in OpenShift CI by fixing signal-handling behavior during install timeouts, preventing deprovision failures that previously left stale cloud resources behind (e.g., VMs, networks, and DNS records) and could cause later job failures.
What changed (practical impact)
In
ci-operator/step-registry/ipi/install/powervs/install/ipi-install-powervs-install-commands.sh, the script’s termination/cleanup logic was reworked so that when the install step is interrupted (such as by timeout), the script can still execute its “next steps”/cleanup path instead of failing early with errors likeSkipping: /tmp/secret/metadata.json not found.How it fixes timeout/deprovision failures
TERMtrap (rather than having handlers that could be overwritten), ensuring termination signals trigger the intended cleanup behavior.openshift-install ... | grep ...(a blocking pipeline) was replaced with runningopenshift-installin the background while filtering output via process substitution, so the script remains responsive to signals.waitto preserve interruptibility: The script captures the installer PID, waits on it, and propagates the status so the process can be cleanly interrupted while still running termination cleanup logic.These adjustments apply to both:
openshift-install --dir=... create clusteropenshift-install wait-for install-completeTemporary CI timing tweak to validate TERM handling
In
ci-operator/step-registry/ipi/install/powervs/install/ipi-install-powervs-install-ref.yaml, the steptimeoutwas set to 30 minutes and thegrace_periodreduced to 15 minutes (temporarily) to exercise/test the TERM-triggered cleanup behavior during timeouts.