Confidence: medium · Effort: small (<1 d)
Problem
The strict and ILP-2 kernels switched their bit-frequency counter to the packed 12-uint32 form (bcnt_p[12] + BCNT_ADD/BCNT_GET macros at bruteforce.cu:705-706) — the comment explicitly notes a 12-register saving. The legacy bruteforce_kernel (mode_policy 0/1 branch) was never migrated and still uses int bcnt[24], occupying 24 registers.
Files: src/bruteforce.cu:1031-1034 (declaration), src/bruteforce.cu:1067-1074 (updates), src/bruteforce.cu:1102-1110 (read)
Suggested fix
Apply the same packed-uint32 macros (BCNT_ADD/BCNT_GET) in bruteforce_kernel so it fits the __launch_bounds__(256, 4) register budget more comfortably.
Why it matters
Higher occupancy on the legacy path, which is still used for captures without MI metadata.
Confidence: medium · Effort: small (<1 d)
Problem
The strict and ILP-2 kernels switched their bit-frequency counter to the packed 12-uint32 form (
bcnt_p[12]+BCNT_ADD/BCNT_GETmacros atbruteforce.cu:705-706) — the comment explicitly notes a 12-register saving. The legacybruteforce_kernel(mode_policy 0/1 branch) was never migrated and still usesint bcnt[24], occupying 24 registers.Files:
src/bruteforce.cu:1031-1034(declaration),src/bruteforce.cu:1067-1074(updates),src/bruteforce.cu:1102-1110(read)Suggested fix
Apply the same packed-uint32 macros (
BCNT_ADD/BCNT_GET) inbruteforce_kernelso it fits the__launch_bounds__(256, 4)register budget more comfortably.Why it matters
Higher occupancy on the legacy path, which is still used for captures without MI metadata.