Priority: P2 · Confidence: medium · Effort: large
Context
The project already ships three back-ends (CUDA in bruteforce.cu, OpenCL in bruteforce_cl.cpp + dmrcrack.cl, CPU in bruteforce.c) plus a CUDA/HIP shim (include/gpu_compat.h). They implement the same bruteforce_* API but are selected at compile/link time (one .cu/.cpp/.c per binary), with logic duplicated between them (see the single-source-kernel issue).
Proposal
Introduce a thin runtime backend interface (include/backend.h) — a vtable of init / get_device_count / alloc / free / copy / launch / synchronize — so the core attack loop talks only to that API and the back-ends become interchangeable, optionally at runtime (auto-detect CUDA → OpenCL → CPU).
Honest scope notes
- This is architecture, not raw speed — it mainly buys maintainability and clean multi-backend/multi-GPU support.
- RTC / NVRTC (compiling kernels at runtime) is part of the Hashcat model but its big speed win comes from baking the salt to enable dead-code elimination. For RC4-40 the cost is the KSA (256 swaps/key) which RTC can't shortcut, so expect ~no throughput gain — pursue it (if at all) for single-source maintainability, not 2× claims.
- Do the single-source-kernel consolidation first; it's the prerequisite and delivers most of the maintainability win on its own.
Why it matters
Foundation for vendor-neutral support (NVIDIA/AMD/Intel) without per-backend code drift. Lower priority than consolidating the kernel and multi-GPU.
Priority: P2 · Confidence: medium · Effort: large
Context
The project already ships three back-ends (CUDA in
bruteforce.cu, OpenCL inbruteforce_cl.cpp+dmrcrack.cl, CPU inbruteforce.c) plus a CUDA/HIP shim (include/gpu_compat.h). They implement the samebruteforce_*API but are selected at compile/link time (one.cu/.cpp/.cper binary), with logic duplicated between them (see the single-source-kernel issue).Proposal
Introduce a thin runtime backend interface (
include/backend.h) — a vtable ofinit / get_device_count / alloc / free / copy / launch / synchronize— so the core attack loop talks only to that API and the back-ends become interchangeable, optionally at runtime (auto-detect CUDA → OpenCL → CPU).Honest scope notes
Why it matters
Foundation for vendor-neutral support (NVIDIA/AMD/Intel) without per-backend code drift. Lower priority than consolidating the kernel and multi-GPU.