Skip to content

Reindexer: Mock reindex function to reduce parallel contention#1210

Merged
brandur merged 1 commit intomasterfrom
brandur-mock-reindex
Apr 13, 2026
Merged

Reindexer: Mock reindex function to reduce parallel contention#1210
brandur merged 1 commit intomasterfrom
brandur-mock-reindex

Conversation

@brandur
Copy link
Copy Markdown
Contributor

@brandur brandur commented Apr 13, 2026

This one's aimed at fixing an intermittently failing test case:

https://github.com/riverqueue/river/actions/runs/24322049420/job/71009998068?pr=1208

--- FAIL: TestReindexer (0.00s)
    --- FAIL: TestReindexer/ReindexesConfiguredIndexes (10.07s)
        reindexer_test.go:219: Reusing idle postgres schema "maintenance_2026_04_13t01_54_14_schema_04" [user facing: "maintenance_2026_04_13t01_54_14_schema_04"] after cleaning in 26.436456ms [4 generated] [7 reused]
        test_signal.go:95: timed out waiting on test signal after 10s
        logger.go:256: time=2026-04-13T01:54:27.581Z level=INFO msg="maintenance.Reindexer: Signaled to stop during index build; attempting to clean up concurrent artifacts"
        riverdbtest.go:293: Checked in postgres schema "maintenance_2026_04_13t01_54_14_schema_04"; 1 idle schema(s) [4 generated] [10 reused]
    --- FAIL: TestReindexer/ReindexesMinimalSubsetofIndexes (10.14s)
        reindexer_test.go:183: Reusing idle postgres schema "maintenance_2026_04_13t01_54_14_schema_01" [user facing: "maintenance_2026_04_13t01_54_14_schema_01"] after cleaning in 28.042877ms [4 generated] [10 reused]
        test_signal.go:95: timed out waiting on test signal after 10s
        reindexer_test.go:211:
                Error Trace:	/home/runner/work/river/river/internal/maintenance/reindexer_test.go:211
                Error:      	Should be false
                Test:       	TestReindexer/ReindexesMinimalSubsetofIndexes
        logger.go:256: time=2026-04-13T01:54:28.444Z level=INFO msg="maintenance.Reindexer: Signaled to stop during index build; attempting to clean up concurrent artifacts"
        riverdbtest.go:293: Checked in postgres schema "maintenance_2026_04_13t01_54_14_schema_01"; 1 idle schema(s) [5 generated] [24 reused]
FAIL
FAIL	github.com/riverqueue/river/internal/maintenance	18.764s

I'm diagnosing with Claude's help here, but what appears to be happening
is that although a reindex operation in Postgres is often fast, it is
still a heavy operation, and can slow down even further when there's a
lot of concurrent activity hammering a database.

Many reindexer test cases run in parallel, and it appears that was
happening here is that we got a reindex that exceeded our maximum
timeout of 10x. We have some evidence this during the test run from the
runtime of 10.07s and the line:

Signaled to stop during index build; attempting to clean up concurrent artifacts

Here, I'm putting forward a solution proposed by Claude, which is to
mock out the reindex operation, especially where we have a number of
reindexes going in parallel. The tests ReindexOneSuccess and
ReindexSkippedWithReindexArtifact still put load on the real reindex
operation, so we're not going full mock here, and should see our
intermittency considerably reduced while still being confident that
everything still works.

@brandur brandur force-pushed the brandur-mock-reindex branch from 74790ee to 9bb0e45 Compare April 13, 2026 05:25
@brandur brandur requested a review from bgentry April 13, 2026 05:31
Copy link
Copy Markdown
Contributor

@bgentry bgentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this seems appropriate; we can maintain coverage that the reindexing actually works while mocking out where needed to avoid timing issues in tests.

Comment thread internal/maintenance/reindexer_test.go Outdated
Comment on lines +195 to +197
mockExec := newReindexerExecutorMock(bundle.exec)
var reindexedNames []string
var reindexedMu sync.Mutex
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'd normally stack these in a var block, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally. Thanks, fixed!

This one's aimed at fixing an intermittently failing test case:

https://github.com/riverqueue/river/actions/runs/24322049420/job/71009998068?pr=1208

    --- FAIL: TestReindexer (0.00s)
        --- FAIL: TestReindexer/ReindexesConfiguredIndexes (10.07s)
            reindexer_test.go:219: Reusing idle postgres schema "maintenance_2026_04_13t01_54_14_schema_04" [user facing: "maintenance_2026_04_13t01_54_14_schema_04"] after cleaning in 26.436456ms [4 generated] [7 reused]
            test_signal.go:95: timed out waiting on test signal after 10s
            logger.go:256: time=2026-04-13T01:54:27.581Z level=INFO msg="maintenance.Reindexer: Signaled to stop during index build; attempting to clean up concurrent artifacts"
            riverdbtest.go:293: Checked in postgres schema "maintenance_2026_04_13t01_54_14_schema_04"; 1 idle schema(s) [4 generated] [10 reused]
        --- FAIL: TestReindexer/ReindexesMinimalSubsetofIndexes (10.14s)
            reindexer_test.go:183: Reusing idle postgres schema "maintenance_2026_04_13t01_54_14_schema_01" [user facing: "maintenance_2026_04_13t01_54_14_schema_01"] after cleaning in 28.042877ms [4 generated] [10 reused]
            test_signal.go:95: timed out waiting on test signal after 10s
            reindexer_test.go:211:
                    Error Trace:	/home/runner/work/river/river/internal/maintenance/reindexer_test.go:211
                    Error:      	Should be false
                    Test:       	TestReindexer/ReindexesMinimalSubsetofIndexes
            logger.go:256: time=2026-04-13T01:54:28.444Z level=INFO msg="maintenance.Reindexer: Signaled to stop during index build; attempting to clean up concurrent artifacts"
            riverdbtest.go:293: Checked in postgres schema "maintenance_2026_04_13t01_54_14_schema_01"; 1 idle schema(s) [5 generated] [24 reused]
    FAIL
    FAIL	github.com/riverqueue/river/internal/maintenance	18.764s

I'm diagnosing with Claude's help here, but what appears to be happening
is that although a reindex operation in Postgres is often fast, it is
still a heavy operation, and can slow down even further when there's a
lot of concurrent activity hammering a database.

Many reindexer test cases run in parallel, and it appears that was
happening here is that we got a reindex that exceeded our maximum
timeout of 10x. We have some evidence this during the test run from the
runtime of 10.07s and the line:

    Signaled to stop during index build; attempting to clean up concurrent artifacts

Here, I'm putting forward a solution proposed by Claude, which is to
mock out the reindex operation, especially where we have a number of
reindexes going in parallel. The tests `ReindexOneSuccess` and
`ReindexSkippedWithReindexArtifact` still put load on the real reindex
operation, so we're not going full mock here, and should see our
intermittency considerably reduced while still being confident that
everything still works.
@brandur brandur force-pushed the brandur-mock-reindex branch from 9bb0e45 to 3f3fe4a Compare April 13, 2026 15:32
@brandur brandur merged commit 3c62677 into master Apr 13, 2026
37 of 39 checks passed
@brandur brandur deleted the brandur-mock-reindex branch April 13, 2026 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants