Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 22 additions & 10 deletions API-INTERNAL.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,12 +79,18 @@ If the requested key is a collection, it will return an object with all the coll
<dd><p>Remove a key from Onyx and update the subscribers</p>
</dd>
<dt><a href="#retryOperation">retryOperation()</a></dt>
<dd><p>Handles storage operation failures based on the error type:</p>
<dd><p>Handles storage operation failures based on the error class (see lib/storage/errors.ts).
The connection layer (createStore) owns connection/transport recovery; this operation layer owns
capacity recovery (eviction) so that a given failure is retried by exactly one layer:</p>
<ul>
<li>Storage capacity errors: evicts data and retries the operation</li>
<li>Invalid data errors: logs an alert and throws an error</li>
<li>Non-retriable errors: logs an alert and resolves without retrying</li>
<li>Other errors: retries the operation</li>
<li>INVALID_DATA: logs an alert and throws (the same data will always fail).</li>
<li>TRANSIENT / FATAL: the connection layer already retried (transient) or exhausted its heal budget
and alerted (fatal). Retrying here would only re-amplify, so we skip the write quietly.</li>
<li>CAPACITY: evicts the least recently accessed evictable key and retries, under a session-level
circuit breaker (see lib/StorageCircuitBreaker.ts) that halts the loop once eviction stops making
progress or failures storm — the per-operation budget alone cannot stop a session-wide storm.</li>
<li>UNKNOWN: the provider couldn&#39;t classify it — log the full error shape (name + message +
provider) once so it&#39;s visible, then bounded retry without eviction.</li>
</ul>
</dd>
<dt><a href="#broadcastUpdate">broadcastUpdate()</a></dt>
Expand Down Expand Up @@ -318,11 +324,17 @@ Remove a key from Onyx and update the subscribers
<a name="retryOperation"></a>

## retryOperation()
Handles storage operation failures based on the error type:
- Storage capacity errors: evicts data and retries the operation
- Invalid data errors: logs an alert and throws an error
- Non-retriable errors: logs an alert and resolves without retrying
- Other errors: retries the operation
Handles storage operation failures based on the error class (see lib/storage/errors.ts).
The connection layer (createStore) owns connection/transport recovery; this operation layer owns
capacity recovery (eviction) so that a given failure is retried by exactly one layer:
- INVALID_DATA: logs an alert and throws (the same data will always fail).
- TRANSIENT / FATAL: the connection layer already retried (transient) or exhausted its heal budget
and alerted (fatal). Retrying here would only re-amplify, so we skip the write quietly.
- CAPACITY: evicts the least recently accessed evictable key and retries, under a session-level
circuit breaker (see lib/StorageCircuitBreaker.ts) that halts the loop once eviction stops making
progress or failures storm — the per-operation budget alone cannot stop a session-wide storm.
- UNKNOWN: the provider couldn't classify it — log the full error shape (name + message +
provider) once so it's visible, then bounded retry without eviction.

**Kind**: global function
<a name="broadcastUpdate"></a>
Expand Down
151 changes: 102 additions & 49 deletions lib/OnyxUtils.ts

Large diffs are not rendered by default.

120 changes: 120 additions & 0 deletions lib/StorageCircuitBreaker.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
import * as Logger from './Logger';

/**
* Process-scoped circuit breaker for storage CAPACITY failures.
*
* The per-operation retry budget in `OnyxUtils.retryOperation` cannot stop a session-level storm:
* each evict -> OnyxDerived recompute -> new write starts its own fresh budget, so a full disk or
* exhausted quota can drive tens of thousands of evict+retry cycles that never make progress and
* freeze the app. This breaker is the session-level brake — `retryOperation` consults it before
* every eviction.
*
* It trips when EITHER:
* - capacity failures within {@link ROLLING_WINDOW_MS} exceed {@link FAILURE_THRESHOLD}, or
* - {@link NO_PROGRESS_CAP} consecutive evictions are each immediately followed by another capacity
* failure (the eviction freed nothing the next write could use — a no-progress cycle). This is a
* cheap proxy for `getDatabaseSize()`, which is costly and only reports origin-wide usage.
*
* On trip it emits exactly ONE alert and self-resets once the rolling window clears, so a persistent
* condition produces at most one alert per window instead of one log line per failed write.
*/

/** Rolling window over which capacity failures are counted, and how long a trip stays open. */
const ROLLING_WINDOW_MS = 60 * 1000;

/** Capacity failures within the window above which the breaker trips (storm backstop). */
const FAILURE_THRESHOLD = 50;

/** Consecutive no-progress evictions (evict -> still capacity failure) above which the breaker trips. */
const NO_PROGRESS_CAP = 5;

let failureTimestamps: number[] = [];
let consecutiveNoProgressEvictions = 0;
let evictionAwaitingResult = false;
let trippedUntil = 0;

function reset(): void {
failureTimestamps = [];
consecutiveNoProgressEvictions = 0;
evictionAwaitingResult = false;
trippedUntil = 0;
}

/** Whether the breaker is currently open. Self-resets once the window since the trip has cleared. */
function isTripped(): boolean {
if (trippedUntil === 0) {
return false;
}
if (Date.now() >= trippedUntil) {
reset();
return false;
}
return true;
}

function trip(reason: string): void {
trippedUntil = Date.now() + ROLLING_WINDOW_MS;
Logger.logAlert(`Storage circuit breaker tripped: ${reason}. Halting eviction/retry for ${ROLLING_WINDOW_MS / 1000}s to stop a storage failure storm.`);
}

/**
* Record a CAPACITY failure. Call once per capacity failure in `retryOperation`, BEFORE deciding
* whether to evict; then check {@link isTripped} to decide whether to proceed.
*/
function recordCapacityFailure(): void {
// While open, recording is a no-op: no extra timestamps, no second alert, and nothing to keep the
// window from clearing. `isTripped()` self-resets here once the window has elapsed.
if (isTripped()) {
return;
}

const now = Date.now();
failureTimestamps = failureTimestamps.filter((timestamp) => now - timestamp < ROLLING_WINDOW_MS);

// A fresh storm (nothing left in the window) resets the no-progress tracking so a stale eviction
// from an earlier, unrelated incident can't be miscounted as no-progress for this one.
if (failureTimestamps.length === 0) {
consecutiveNoProgressEvictions = 0;
evictionAwaitingResult = false;
}

// We evicted on the previous cycle and we're back here with another capacity failure, so that
// eviction freed no usable space.
if (evictionAwaitingResult) {
consecutiveNoProgressEvictions += 1;
evictionAwaitingResult = false;
}

failureTimestamps.push(now);

if (failureTimestamps.length > FAILURE_THRESHOLD) {
trip(`${failureTimestamps.length} capacity failures within ${ROLLING_WINDOW_MS / 1000}s`);
return;
}
if (consecutiveNoProgressEvictions >= NO_PROGRESS_CAP) {
trip(`${consecutiveNoProgressEvictions} consecutive evictions freed no usable space`);
}
}

/** Record that `retryOperation` just evicted a key, so the next capacity failure counts as no-progress. */
function recordEviction(): void {
evictionAwaitingResult = true;
}

/**
* Record that a storage write SUCCEEDED. If an eviction was awaiting its verdict, the eviction freed
* usable space — so it must NOT later be miscounted as a no-progress cycle by the next capacity
* failure. Clear the pending flag and reset the consecutive no-progress streak (a success breaks the
* streak). No-op when no eviction is pending (the common case), so it's cheap to call on every write.
*/
function recordWriteSuccess(): void {
if (!evictionAwaitingResult) {
return;
}
evictionAwaitingResult = false;
consecutiveNoProgressEvictions = 0;
}

const StorageCircuitBreaker = {recordCapacityFailure, recordEviction, recordWriteSuccess, isTripped, reset, ROLLING_WINDOW_MS, FAILURE_THRESHOLD, NO_PROGRESS_CAP};

export default StorageCircuitBreaker;
13 changes: 13 additions & 0 deletions lib/storage/__mocks__/index.ts
Original file line number Diff line number Diff line change
@@ -1,11 +1,24 @@
import MemoryOnlyProvider, {mockStore, setMockStore} from '../providers/MemoryOnlyProvider';
import classifyIDBError from '../providers/IDBKeyValProvider/classifyError';
import classifySQLiteError from '../providers/classifySQLiteError';
import {StorageErrorClass} from '../errors';

const init = jest.fn(MemoryOnlyProvider.init);

init();

// Tests exercise retryOperation against both IndexedDB- and SQLite-shaped errors, so the mock facade
// classifies with each engine's real (native-dep-free) classifier in turn. Mirrors how the real facade
// delegates to the active provider; here we cover both engines since one mock stands in for all.
const classifyError = (error: unknown) => {
const idbClass = classifyIDBError(error);
return idbClass === StorageErrorClass.UNKNOWN ? classifySQLiteError(error) : idbClass;
};

const StorageMock = {
init,
classifyError: jest.fn(classifyError),
getStorageProvider: jest.fn(() => MemoryOnlyProvider),
getItem: jest.fn(MemoryOnlyProvider.getItem),
multiGet: jest.fn(MemoryOnlyProvider.multiGet),
setItem: jest.fn(MemoryOnlyProvider.setItem),
Expand Down
43 changes: 43 additions & 0 deletions lib/storage/errors.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
import type {ValueOf} from 'type-fest';

/**
* Shared vocabulary for storage write failures. The *classes* are engine-agnostic; the *matching*
* is not — each storage provider knows its own error dialect and owns its classifier (see each
* provider's `classifyError`). This module deliberately holds NO string matchers: it is the common
* taxonomy the two reacting layers agree on, while the per-engine knowledge lives with the engine.
*
* - the connection layer (`createStore`) recovers TRANSIENT and FATAL errors by reopening the DB, and
* - the operation layer (`OnyxUtils.retryOperation`) recovers CAPACITY by eviction and retries UNKNOWN.
*
* This module has no Onyx dependencies (and no engine dependencies) so it can live in the storage
* layer, and be imported by every provider, without creating an import cycle.
*/
const StorageErrorClass = {
/** Connection/transport failure (stale connection). Owner: connection layer — reopen + retry once. */
TRANSIENT: 'transient',
/** Quota exceeded / disk full. Owner: operation layer — evict and retry. */
CAPACITY: 'capacity',
/** Non-serializable payload. Never retriable — the same data will always fail. */
INVALID_DATA: 'invalidData',
/** Backing-store corruption. Owner: connection layer — budgeted heal, then give up. */
FATAL: 'fatal',
/** Unmatched by the active provider. Owner: operation layer — bounded retry, and log the shape so
* recurring cases can be promoted into one of the classes above. */
UNKNOWN: 'unknown',
} as const;

type StorageErrorClassValue = ValueOf<typeof StorageErrorClass>;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NAB: I think ValueOf<typeof StorageErrorClass> is actually a touch clearer at the callsites than this named alias.


/**
* Normalizes any thrown value into a lowercased `{name, message}` pair for matching. Shared by every
* provider's classifier so they all extract the error the same way.
*/
function getErrorParts(error: unknown): {name: string; message: string} {
if (error instanceof Error || error instanceof DOMException) {
return {name: (error.name ?? '').toLowerCase(), message: (error.message ?? '').toLowerCase()};
}
return {name: '', message: String(error ?? '').toLowerCase()};
}

export {StorageErrorClass, getErrorParts};
export type {StorageErrorClassValue};
6 changes: 6 additions & 0 deletions lib/storage/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,12 @@ const storage: Storage = {
return provider;
},

/**
* Classifies a write error using the active provider's own classifier. Synchronous and pure —
* never wrapped in tryOrDegradePerformance.
*/
classifyError: (error) => provider.classifyError(error),

/**
* Initializes all providers in the list of storage providers
* and enables fallback providers if necessary
Expand Down
45 changes: 45 additions & 0 deletions lib/storage/providers/IDBKeyValProvider/classifyError.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
import {StorageErrorClass, getErrorParts} from '../../errors';
import type {StorageErrorClassValue} from '../../errors';

/**
* Classifies an IndexedDB write failure into the shared storage taxonomy (lib/storage/errors.ts).
* Matching is done on the lowercased error name and message. This is the IndexedDB engine's own
* dialect — it is NOT shared with other engines.
*/
function classifyIDBError(error: unknown): StorageErrorClassValue {
const {name, message} = getErrorParts(error);

// Non-serializable data passed to IDBObjectStore.put — retrying is futile.
if (message.includes("failed to execute 'put' on 'idbobjectstore'")) {
return StorageErrorClass.INVALID_DATA;
}

// Browser quota exceeded.
if (name.includes('quotaexceedederror') || message.includes('quotaexceedederror')) {
return StorageErrorClass.CAPACITY;
}

// Backing-store corruption (Chromium LevelDB). Recoverable only via a budgeted reopen.
if (message.includes('internal error opening backing store')) {
return StorageErrorClass.FATAL;
}

// Transient connection/transport failures — the cached connection is stale and a reopen fixes it:
// - InvalidStateError: connection closed between getDB() resolving and db.transaction().
// - AbortError: write transaction aborted (connection close / versionchange / sibling abort).
// - Safari/WebKit IDB server termination for backgrounded tabs.
if (
name.includes('invalidstateerror') ||
name.includes('aborterror') ||
message.includes('connection to indexed database server lost') ||
message.includes('connection is closing') ||
// This is related to https://github.com/Expensify/react-native-onyx/pull/796 — remove this comment when #796 is merged.
message.includes('idb write transaction aborted without an error')
) {
return StorageErrorClass.TRANSIENT;
}

return StorageErrorClass.UNKNOWN;
}

export default classifyIDBError;
Loading
Loading