Feature 2446 no proxy env#2448
Open
yhirose wants to merge 13 commits into
Open
Conversation
f20c562 to
f0d1675
Compare
In preparation for NO_PROXY support (#2446), centralize the proxy-enabled decision in a single helper so the upcoming bypass logic can be added in one place rather than to six divergent call sites. The helper's body for now is identical to the existing condition; the host parameter is unused until set_no_proxy() lands. Refactored sites: ClientImpl::create_client_socket ClientImpl::handle_request (HTTP request rewrite) ClientImpl::setup_redirect_client ClientImpl::process_request (SSL is_proxy_enabled flag) SSLClient::setup_proxy_connection SSLClient::ensure_socket_connection The two prepare_default_headers Proxy-Authorization injection blocks (currently gated only on proxy auth credentials being set) are intentionally not wrapped here. Doing so would change behavior in the rare misconfiguration case where credentials are set without set_proxy, so the gating is deferred to the NO_PROXY commit where it becomes meaningful. No behavior change. All 608 unit tests and the 22 squid-backed proxy tests pass.
Building block for the upcoming set_proxy_from_env (#2446). Parses "http(s)://[user[:pass]@]host[:port][/...]" into a detail::ProxyUrl struct. Rejects: - empty input - any control character (< 0x20 or 0x7F), including CR/LF/NUL — these would otherwise let a malicious env value inject extra header lines into a CONNECT request or Proxy-Authorization header - schemes other than http and https - ports outside [1, 65535] - malformed IPv6 host literals (validated via inet_pton(AF_INET6)) - non-numeric or trailing-garbage port strings Notes: - userinfo is split on the LAST '@' so passwords containing '@' are preserved in the password field - if no port is present, defaults to 80 (http) / 443 (https) - integer parse goes through detail::from_chars to stay compatible with -fno-exceptions builds The helper has no callers yet; it lands consumer-side when set_proxy_from_env arrives. All 608 unit tests pass.
Building blocks for the upcoming Client::set_no_proxy (#2446): - NoProxyEntry / NoProxyKind: parsed list entry (wildcard, hostname suffix, IPv4 CIDR, IPv6 CIDR) - NormalizedTarget: pre-normalized form of the connection's target host (lowercase, brackets stripped, trailing dot stripped, with inet_pton already attempted) - parse_no_proxy_entry / parse_no_proxy_list: token / list parsing. Port-specific entries are rejected by design — cpp-httplib's other host-keyed APIs (e.g. set_hostname_addr_map) are hostname-only, so supporting host:port for NO_PROXY alone would be inconsistent. - ipv4_in_cidr / ipv6_in_cidr: CIDR membership. IPv4 special-cases prefix=0 to avoid the (1u << 32) shift UB. IPv6 uses byte-wise memcmp plus a masked partial-byte compare. - normalize_target: prepares the target host for matching. Routes every IP literal through inet_pton so "127.0.0.1" vs "127.000.000.001" vs decimal-form integers cannot be used to bypass a NO_PROXY entry via alternate string forms. - host_matches_no_proxy: matches a normalized target against an entry list. Hostname suffix matching uses a dot-boundary rule so "evilexample.com" does NOT match the entry "example.com". IPv4 and IPv6 entries match only their own address family — IPv4-mapped IPv6 ("::ffff:1.2.3.4") is not cross-matched against IPv4 entries. These helpers have no callers yet; they land consumer-side in the upcoming set_no_proxy / set_proxy_from_env commits. All 608 unit tests pass.
Implements the user-facing half of #2446 (set_proxy_from_env follows in the next commit). When a NO_PROXY pattern matches the target host, the client now bypasses the configured proxy and the corresponding Proxy-Authorization header is suppressed. Public API: - Client::set_no_proxy(const std::vector<std::string> &patterns) Patterns: "*", hostname suffix (e.g. "example.com" or ".example.com"), IPv4/IPv6 CIDR (e.g. "10.0.0.0/8", "fe80::/10"), or single IP literals. Replaces any previous list. Malformed entries are silently dropped. Internals: - is_proxy_enabled_for_host now consults no_proxy_entries_, normalizing the target through inet_pton so leading-zero or alternate-form IPs cannot be used to bypass an entry. - prepare_default_headers gates both Proxy-Authorization injection blocks (basic and bearer) on is_proxy_enabled_for_host(host_). Previously, Proxy-Authorization was sent whenever proxy auth credentials were configured, even when the request was going direct to the target. With NO_PROXY now in play, that path would leak proxy credentials to the destination server — analog of the redirect-leak class of bugs (cf. CVE-2023-32681 in Python requests, GHSA-6hrp-7fq9-3qv2 in cpp-httplib). - setup_redirect_client now takes the redirect target host as a parameter and re-evaluates is_proxy_enabled_for_host against it. no_proxy_entries_ is always copied to the redirect client so the bypass policy follows across redirects. This is the cross-origin leak surface that GHSA-c3h8-fqq4-xm4g lives in; centralizing the decision through is_proxy_enabled_for_host removes the chance of branch divergence. - copy_settings copies no_proxy_entries_. The slight behavior change for the rare misconfiguration "set proxy_basic_auth without set_proxy" — Proxy-Authorization is no longer sent in that case — is deliberate. The header has no addressee when the proxy is unset. All 608 unit tests and 22 squid-backed proxy integration tests pass.
Final user-facing piece for #2446. Reads proxy-related environment variables and configures the client. - HTTPS clients (SSLClient) read https_proxy / HTTPS_PROXY - HTTP clients read http_proxy (lowercase only — see below) - Both also read no_proxy / NO_PROXY - Returns true if at least one variable was found and applied The lowercase-only http_proxy rule mitigates httpoxy / CVE-2016-5385. In CGI / FastCGI environments the uppercase HTTP_PROXY collides with the HTTP_* namespace used to expose request headers, so a remote attacker controlling the "Proxy:" header can inject a proxy URL. cpp-httplib follows curl, Go, and Python requests in honoring only the lowercase form. https_proxy/HTTPS_PROXY and no_proxy/NO_PROXY do not have this problem because their names don't begin with HTTP_. Scheme dispatch uses virtual is_ssl(): an SSLClient picks https_proxy and a plain ClientImpl picks http_proxy. There is intentionally no cross-scheme fallback — the two variables describe different traffic. set_proxy_from_env() reads getenv() synchronously and is documented as "call once at startup" — concurrent setenv from other threads is undefined. All 608 unit tests pass.
27 black-box tests exercising the public Client API only (no detail::
calls, BORDER-friendly; no EXPECT_NO_THROW, -fno-exceptions-friendly).
In-process proxy mock + target server. Each test asserts which side
of the routing decision each request landed on, and what headers (in
particular Proxy-Authorization) the receiving side saw.
Coverage:
Suffix matching (dot-boundary rule)
- exact-host match
- subdomain match
- "evilexample.com" does NOT match "example.com" ← regression
guard for the classic NO_PROXY suffix-match pitfall
- "example.com.evil.com" does NOT match
- leading-dot pattern still matches the bare domain (Go/curl
convention)
- case-insensitive
- trailing-dot host normalization
Wildcard
- "*" bypasses everything
IP normalization
- exact IPv4 match
- "::1" matches "0:0:0:0:0:0:0:1" via inet_pton
- IPv4-mapped IPv6 ("::ffff:127.0.0.1") is NOT cross-matched
against an IPv4 entry
CIDR
- basic v4 in-cidr / not-in-cidr
- "0.0.0.0/0" (prefix=0; verifies no shift UB)
- bare IP treated as /32
- malformed prefix (/33) silently dropped → no NO_PROXY effect
Proxy-Authorization handling
- suppressed when NO_PROXY matches the target
- sent when NO_PROXY does not match
Backward compat
- default behavior unchanged when set_no_proxy is never called
Parsing edge cases
- port-specific entries ("host:port") rejected
- empty / whitespace tokens dropped
Cross-origin redirect (analog of GHSA-6hrp-7fq9-3qv2)
- redirect target in NO_PROXY → redirect leg goes direct, no
Proxy-Authorization carried over
set_proxy_from_env (Unix only — uses setenv/unsetenv)
- lowercase http_proxy applied
- uppercase HTTP_PROXY ignored (httpoxy / CVE-2016-5385)
- NO_PROXY-only env returns true and applies the bypass list
- CRLF in env value rejected (cf. CVE-2026-21428)
- empty env value treated as unset
635 tests (608 prior + 27 new) pass under both the regular and the
split builds.
Adds two subsections under "Proxy server support":
- "Bypass the proxy for specific hosts (NO_PROXY)" — set_no_proxy,
pattern syntax, dot-boundary rule, IP normalization, limitations
(no port-specific entries, no v4-mapped v6 cross-match, replace
semantics).
- "Read proxy settings from the environment" — set_proxy_from_env,
which variables are read, the lowercase-only http_proxy rule with
an inline httpoxy / CVE-2016-5385 explanation, threading
expectations.
Documentation only. Closes the doc gap from #2446.
Replaces the now-incorrect Note at the bottom of c16-proxy ("cpp-httplib
does not read HTTP_PROXY...") with the actual API.
JA is the master per the project's translation workflow; the EN
translation lands in the same PR. Both pages remain `status: "draft"`
for normal review.
Adds two sections:
- Bypass the proxy for specific hosts (set_no_proxy):
pattern syntax, dot-boundary rule, case-insensitivity, IP
normalization via inet_pton, port-specific-entries unsupported,
malformed entries dropped.
- Read proxy settings from the environment (set_proxy_from_env):
which variables are read, lowercase-only http_proxy with an
inline httpoxy / CVE-2016-5385 explanation, threading caveat.
Apply seven post-implementation cleanups:
- Move ProxyUrl, ProxyEnvSettings and most helper forward declarations
below the BORDER. Only NoProxyKind/NoProxyEntry/NormalizedTarget stay
above (they are used as ClientImpl members or by inline cache state).
This shrinks the public header surface area considerably.
- Drop ProxyUrl::scheme: the field was write-only after parsing. Track
is_https as a local during parse_proxy_url and use it for the
default-port branch directly.
- Hoist the duplicate is_proxy_enabled_for_host(host_) gate in
write_request: the previous form had two adjacent gates bracketing
an unrelated end-server bearer-token block. Reordering puts the two
proxy-auth blocks together under a single gate.
- Drop the redundant trim_copy + empty-check inside parse_no_proxy_list:
detail::split already trims each token and skips empties, so the inner
work was dead code.
- Cache normalize_target(host_) on the client. host_ is const, so the
normalized form is invariant for the client's lifetime. The gate is
called up to 7 times per request when NO_PROXY is configured;
caching avoids repeating two heap allocations + two inet_pton calls
per request. Cross-host calls (only setup_redirect_client passing
next_host) still compute fresh.
- Trim narrative comments in setup_redirect_client and
set_proxy_from_env: replace WHAT-narration with single-line WHY
statements.
- Drop test comments that paraphrased their own test name.
All 635 unit tests pass under both the regular and split builds.
The previous design had two intermediate structs that existed only to
ferry parsed values between helper functions and the consuming method:
- detail::ProxyUrl: filled by parse_proxy_url, drained back into
proxy_host_ / proxy_port_ / proxy_basic_auth_* by set_proxy_from_env.
- detail::ProxyEnvSettings: bundle of two ProxyUrl + a NoProxyEntry
vector returned by read_proxy_env, drained by set_proxy_from_env.
Both bundles had exactly one producer and exactly one consumer. Drop
them and let the parsing flow directly into ClientImpl state:
- New private member ClientImpl::apply_proxy_url(url) parses a proxy
URL and, on success, assigns the result to proxy_host_, proxy_port_,
and proxy_basic_auth_*. Same validation as before (CRLF rejection,
scheme allowlist, port range, IPv6 bracket validation), same commit-
on-success ordering — the local variables are kept until every check
has passed so a malformed URL leaves no partial state.
- set_proxy_from_env now reads getenv() directly, dispatches between
https_proxy / http_proxy via virtual is_ssl(), and applies via
apply_proxy_url. NO_PROXY is parsed in place via parse_no_proxy_list.
Net effect:
- Two structs and two free helper functions removed (~150 lines of
declaration + body deleted).
- set_proxy_from_env body grows ~20 lines (still well under 50).
- Per-request hot path is unchanged (NoProxyEntry / NormalizedTarget
cache stays). Setup path is marginally faster (no intermediate
string copies through ProxyUrl / ProxyEnvSettings).
635 unit tests pass under both the regular and split builds.
The new code carried inline doc comments (15-line set_no_proxy block,
18-line set_proxy_from_env block, plus narrating comments inside parser
bodies, plus section dividers in the test file) that were heavy
compared to the rest of the codebase — neighboring setters like
set_proxy / set_proxy_basic_auth carry no doc at all, the test file
does not use sub-section dividers, and the README / cookbook already
document the behavior in detail.
Removed:
- Public-API doc blocks on set_no_proxy and set_proxy_from_env.
- Narrating comments inside parse_no_proxy_entry, normalize_target,
apply_proxy_url, host_matches_no_proxy that were just describing
the obvious code structure.
- Multi-line BORDER-rationale meta comments.
- In-test sub-section dividers ("// ---- Hostname suffix matching",
etc.) and per-class doc comments on the test fixtures.
- Test-side comments that paraphrased their own test name.
- Redundant ordering comments inside setup_redirect_client.
Kept:
- Security WHY comments (CRLF rejection, dot-boundary suffix matching,
httpoxy / CVE-2016-5385, GHSA-6hrp-7fq9-3qv2 analog, CVE-2026-21428).
- Regression-target WHY comments (UB shift on prefix=0).
- Non-obvious external knowledge (detail::split already trims).
635 unit tests still pass under both the regular and split builds.
f0d1675 to
c20b661
Compare
Three regression guards added during review of an alternate NO_PROXY implementation (PR #2449). All three pass on the current implementation and surface bugs in the alternate one: - BareIPv6LiteralMatchesIPv6Cidr: a host given as a bare IPv6 literal (no surrounding brackets) must still be recognized as IPv6 for CIDR matching. An implementation that only detects IPv6 when the host string starts with '[' would split the host at the first ':' and misclassify it as a hostname. - TrailingDotOnEntryIsNormalized: trailing dots must be canonicalized on BOTH sides — host and entry. An implementation that strips the host-side trailing dot only would fail to match host "example.com" against entry "example.com." because the substring lengths differ. - ValidEntryWithSurroundingWhitespaceStillMatches: an entry with leading/trailing whitespace must still match. An implementation that feeds raw tokens directly to inet_pton would reject valid CIDRs (" 10.0.0.0/8 ") because of the spaces. 635 unit tests pass.
Adopts the unified 16-byte address representation suggested by the alternate NO_PROXY implementation in PR #2449. Both v4 and v6 entries now share one storage type and one matcher; the v4/v6 distinction is only the address-family flag and the max prefix length. - detail::NoProxyEntry: replaces in_addr v4_net + in6_addr v6_net with a single IPBytes net (std::array<uint8_t, 16>). v4 occupies the first 4 bytes, v6 fills all 16. - detail::NormalizedTarget: replaces in_addr v4 + in6_addr v6 with a single IPBytes ip. - Replaces detail::ipv4_in_cidr and detail::ipv6_in_cidr with one detail::ip_in_cidr that takes the address, the network, the prefix length and the family's max bits (32 for v4, 128 for v6). The mask is constructed by the byte-fill approach from the previous v6 helper, which is straightforward to read and avoids the shift UB that the v4 helper had to special-case. - The NoProxyKind enum keeps IPv4Cidr / IPv6Cidr as separate values so the match dispatch stays explicit and IPv4 entries cannot accidentally cross-match an IPv6 target (the same address-family isolation the previous code had). Net change: -28 lines + -1 helper function. All 30 NoProxyTest cases plus 643 unit tests pass under both the regular and split builds.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The primary goal of this pull request is to provide a reference implementation with Opus 4.7 (#2446), focusing not only on functionality but also on security by ensuring there are no vulnerabilities.