[MINOR] Reject traversal segments in note and folder paths#5227
[MINOR] Reject traversal segments in note and folder paths#5227jongyoul wants to merge 3 commits intoapache:masterfrom
Conversation
NotebookRepo.buildNoteFileName composes a filesystem path or object-store
key from a user-supplied note path. The previous implementation only
required a leading "/" but otherwise concatenated the value verbatim, so
paths such as "/../etc/foo" would compose to a location outside the
configured notebook root for backends that perform a raw filesystem
operation (FileSystemNotebookRepo) or build an object key directly from
the string (S3 / Azure / GCS).
Changes:
- NotebookRepo: add a default rejectTraversalSegments helper that
URL-decodes the path repeatedly (capped at five layers) and refuses
any "." or ".." segment. buildNoteFileName now calls it, so every
NotebookRepo implementation is hardened in one place.
- FileSystemNotebookRepo: the folder-level move and remove APIs build
the path directly from the input rather than via buildNoteFileName,
so apply rejectTraversalSegments to those inputs as well.
- NotebookService.renameNote: normalize the rebuilt note path through
normalizeNotePath, matching what other rename / move entry points
already do (createNote, cloneNote, moveFolder).
Tests:
- NotebookRepoPathValidationTest (27): rejects "..", ".", URL-encoded
variants such as "%2e%2e" and double-encoded variants, and the
excessive-encoding-layer case. Accepts realistic note names including
Korean characters and trailing dots inside a segment ("..." stays
valid, only exact "..", "." segments are rejected).
- Existing FileSystemNotebookRepoTest (2) and NotebookServiceTest (5)
continue to pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address reviewer feedback (simplify pass) on the previous commit: - Move rejectTraversalSegments and decodeRepeatedly out of NotebookRepo default methods into a new final NotebookPathValidator class. Default methods on a public interface can be overridden, which would let an implementation accidentally — or intentionally — bypass the check; a static helper on a final class makes that impossible. - Drop the duplicate decodeRepeatedly that lived as a private static in NotebookService and route normalizeNotePath through the same helper. - Cache the segment-split Pattern as a static final field so the regex is compiled once instead of on every save / get / move call. - Drop the InMemoryStub boilerplate from the test and exercise the utility class directly; the stub existed only to instantiate an interface that no longer carries the helpers. - Keep the existing "Exceeded maximum decode attempts" error message so existing assertion tests in NotebookServiceTest continue to pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR hardens Zeppelin’s notebook path handling against path traversal by rejecting .. / . path segments (including URL-encoded forms) when composing note storage paths/keys, and aligns service-layer rename behavior with existing normalization.
Changes:
- Introduces
NotebookPathValidatorwith shared helpers to repeatedly URL-decode note paths (capped) and reject traversal segments. - Applies traversal-segment rejection in
NotebookRepo.buildNoteFileName(...)so all backends using it get consistent protection. - Fixes missed normalization in
NotebookService.renameNote(...)and adds traversal checks for folder move/remove inFileSystemNotebookRepo.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
zeppelin-server/src/main/java/org/apache/zeppelin/notebook/repo/NotebookPathValidator.java |
Adds centralized URL-decoding and traversal-segment rejection utilities. |
zeppelin-server/src/main/java/org/apache/zeppelin/notebook/repo/NotebookRepo.java |
Enforces traversal-segment rejection in the default note filename/key builder. |
zeppelin-plugins/notebookrepo/filesystem/src/main/java/org/apache/zeppelin/notebook/repo/FileSystemNotebookRepo.java |
Validates folder paths for traversal segments before folder move/remove operations. |
zeppelin-server/src/main/java/org/apache/zeppelin/service/NotebookService.java |
Uses the shared decoder and normalizes rename paths before moving notes. |
zeppelin-server/src/test/java/org/apache/zeppelin/notebook/repo/NotebookRepoPathValidationTest.java |
Adds test coverage for traversal rejection and repeated-decoding behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| NotebookPathValidator.rejectTraversalSegments(folderPath); | ||
| NotebookPathValidator.rejectTraversalSegments(newFolderPath); | ||
| // [ZEPPELIN-4195] newFolderPath parent path maybe not exist | ||
| this.fs.tryMkDir(new Path(notebookDir, folderPath.substring(1)).getParent()); |
There was a problem hiding this comment.
@Copilot You are right — the mkDir line targets the wrong parent. However, that line was not introduced by this PR (the PR only added the two rejectTraversalSegments calls just above it; the buggy tryMkDir(folderPath...) predates this change in master). Since this PR is a focused security fix for path traversal, I would prefer to keep its scope tight and address the move(folderPath, newFolderPath, ...) mkdir bug in a separate follow-up.
Filed as ZEPPELIN-6415.
| /** | ||
| * Repeatedly URL-decodes {@code encoded} until it stabilises, capped at | ||
| * {@value #MAX_DECODE_LAYERS} layers. | ||
| */ | ||
| public static String decodeRepeatedly(String encoded) throws IOException { | ||
| String previous = encoded; | ||
| for (int attempt = 0; attempt < MAX_DECODE_LAYERS; attempt++) { | ||
| String decoded = URLDecoder.decode(previous, StandardCharsets.UTF_8); | ||
| if (decoded.equals(previous)) { | ||
| return decoded; | ||
| } | ||
| previous = decoded; | ||
| } | ||
| throw new IOException("Exceeded maximum decode attempts. Possible malicious input."); |
There was a problem hiding this comment.
@Copilot Good catch on both. Fixed in dd53762:
URLDecoder.decodeis now wrapped in a try/catch that re-throwsIllegalArgumentExceptionasIOException— keeps the declared failure mode consistent.- The loop now runs
MAX_DECODE_LAYERS + 1iterations (pass <= MAX_DECODE_LAYERS), so the constant truly caps the number of accepted encoding layers: detecting stability after N layers needs N decodes plus one confirmation pass. Updated the Javadoc and added two tests — one for the malformed-encoding wrap, one for the exact 5-layer boundary.
…ract Wrap URLDecoder.decode IllegalArgumentException as IOException so the declared IOException contract is preserved for malformed percent-encoding, and run the loop MAX_DECODE_LAYERS + 1 times so the constant truly caps the number of accepted encoding layers (an extra pass is needed to confirm the result has stabilised).
tbonelee
left a comment
There was a problem hiding this comment.
LGTM, approving. A couple of questions that might be worth a follow-up:
- The folder-level
move(folderPath, ...)andremove(folderPath, ...)inS3NotebookRepo,OSSNotebookRepo,AzureNotebookRepo, etc. don't go throughbuildNoteFileNameeither, so they look like they'd benefit from the samerejectTraversalSegmentscall you added toFileSystemNotebookRepo. Was leaving them out intentional, or worth a follow-up? - The segment splitter is
/+, so backslash variants like/foo/..\barpass through. I'm not sure how realistic this is, but on Windows +LocalFileSystemI wonder ifjava.io.Filecould end up interpreting\as a separator and resolving outside the notebook dir? Might be worth considering[\\/]+, or it may not be a case we need to worry about.
What is this PR for?
NotebookRepo.buildNoteFileNamecomposes a filesystem path or object-store key from a user-supplied note path. The previous implementation required only a leading/but otherwise concatenated the value verbatim, so a value such as/../etc/foowould compose to a location outside the configured notebook root for backends that perform a raw filesystem operation (FileSystemNotebookRepo) or build an object key directly from the string (S3 / Azure / GCS).This PR centralises a small validation helper at the
NotebookRepointerface level so every backend gets the same defence, fixes one folder-level path that bypassedbuildNoteFileName, and adds the missingnormalizeNotePathcall inNotebookService.renameNote.What type of PR is it?
Improvement
Todos
NotebookRepo.rejectTraversalSegments(URL-decoded, recursive, capped at 5 layers).rejectTraversalSegmentsfrom the defaultbuildNoteFileName, so every implementation is covered.FileSystemNotebookRepo's folder-levelmove/remove, which build the path without going throughbuildNoteFileName.normalizeNotePathinNotebookService.renameNoteto matchcreateNote,cloneNote, andmoveFolder.NotebookRepoPathValidationTest(27):..,., URL-encoded variants (%2e%2e,%2E%2E), double-encoded variants, and an excessive-encoding-layer payload. Realistic note names including Korean characters and...inside a segment are accepted.What is the Jira issue?
N/A —
[MINOR]change.How should this be tested?
All test sets pass locally.
Questions
IOExceptioninstead of silently composing to an out-of-root location.