Skip to content

Commit 0bb39fc

Browse files
committed
Added CLAUDE.md and README.md
1 parent 5993472 commit 0bb39fc

2 files changed

Lines changed: 18 additions & 24 deletions

File tree

CLAUDE.md

Lines changed: 15 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -26,15 +26,12 @@ cached-data/
2626
3. If tokens are shared (same underlying user), the budget is split evenly across categories
2727
4. For each category × platform (12 combos), checks if cached JSON is fresh (<23h)
2828
5. If stale, queries GitHub Search API with platform-specific topics/languages/keywords
29-
6. Filters repos that have **real releases with platform installers** via two methods:
30-
- **Extension matching**: Direct installer files (`.apk`, `.exe`, `.dmg`, `.deb`, etc.)
31-
- **Keyword matching**: Generic archives (`.zip`, `.tar.gz`) with platform keywords in the filename (e.g. `myapp-macos-arm64.zip`, `myapp-win-x64.tar.gz`)
29+
6. Filters repos that have **real releases with platform installers** — only dedicated installer formats count (no generic archives like `.zip` or `.tar.gz`)
3230
7. Repos with NSFW/inappropriate topics or descriptions are excluded via `BLOCKED_TOPICS`
3331
8. Verifies ALL candidates — no artificial caps. Stops gracefully when per-platform budget is exhausted or rate limit drops below `RATE_LIMIT_FLOOR` (50)
3432
9. Never saves 0-repo results; never overwrites good cached data with poor results
35-
10. Waits 65s between platforms for search API rate limit (30 req/min) to reset
36-
11. Saves results to `cached-data/{category}/{platform}.json`
37-
12. GitHub Actions commits and pushes changes
33+
10. Saves results to `cached-data/{category}/{platform}.json`
34+
11. GitHub Actions commits and pushes changes
3835

3936
### Token Strategy
4037
- 3 GitHub Classic PATs (scope: `public_repo`), each from a **separate GitHub account**
@@ -59,8 +56,9 @@ cached-data/
5956
- `verify_installers()` stops when per-platform budget is exhausted (not just global floor)
6057

6158
**Search API throttling:**
62-
- 65-second pause between platforms within a category to let the 30 req/min limit reset
63-
- Only pauses if the previous platform actually ran searches (cached platforms skip it)
59+
- Sliding window rate limiter (`_acquire_search_slot()`) tracks timestamps of all search API calls
60+
- Automatically pauses when approaching 28 calls per 60-second window (GitHub allows 30, 2 left as buffer)
61+
- Pacing is per-call — no blunt inter-platform pauses needed; search calls are spaced automatically
6462
- `_update_rate_info()` ignores search API headers to prevent core rate tracking pollution
6563

6664
**Safety caps:**
@@ -77,17 +75,11 @@ cached-data/
7775

7876
Each platform has: topics, installer file extensions, scoring keywords (high/medium/low), primary/secondary languages, and frameworks. See `PLATFORMS` dict.
7977

80-
**Installer detection** uses two layers:
81-
1. **Extension matching** — dedicated installer files:
82-
- Android: `.apk`, `.aab`
83-
- Windows: `.msi`, `.exe`, `.msix`
84-
- macOS: `.dmg`, `.pkg`, `.app.zip`
85-
- Linux: `.appimage`, `.deb`, `.rpm`
86-
2. **Keyword matching** — generic archives (`.zip`, `.tar.gz`, `.tar.xz`, `.tar.bz2`, `.7z`) with platform keywords in the filename:
87-
- Android: `android`
88-
- Windows: `win64`, `win32`, `windows`, `-win-`, etc.
89-
- macOS: `macos`, `darwin`, `osx`, `-mac-`, etc.
90-
- Linux: `linux`, `-linux-`, etc.
78+
**Installer detection** — only dedicated installer file extensions count (generic archives like `.zip`/`.tar.gz` are ignored):
79+
- Android: `.apk`, `.aab`
80+
- Windows: `.msi`, `.exe`, `.msix`
81+
- macOS: `.dmg`, `.pkg`
82+
- Linux: `.appimage`, `.deb`, `.rpm`
9183

9284
### Content Filtering
9385

@@ -113,6 +105,8 @@ All search queries include `fork:true` to discover forked repositories with plat
113105
| `CACHE_VALIDITY_HOURS` | 23 | Cache TTL |
114106
| `MAX_CONCURRENT_REQUESTS` | 25 | HTTP concurrency (core API) |
115107
| `MAX_SEARCH_CONCURRENT` | 5 | Search API concurrency |
108+
| `SEARCH_RATE_LIMIT` | 28 | Max search calls per 60s window (GitHub allows 30) |
109+
| `SEARCH_RATE_WINDOW` | 60 | Sliding window in seconds |
116110
| `RELEASE_CHECK_BATCH` | 40 | Repos verified per batch |
117111
| `REQUEST_TIMEOUT` | 20 | Per-request timeout (seconds) |
118112
| `MAX_RETRIES` | 3 | Per-request retry limit |
@@ -168,6 +162,6 @@ Each `{platform}.json`:
168162
- Python 3.11, no type-checking or linting configured
169163
- No tests beyond `validate_releases.py`
170164
- Each category creates its own `GitHubClient` — release cache is per-client, shared across platforms within the same category
171-
- Platforms are processed sequentially within each category (with 65s search rate limit pause between)
172-
- Typical runtime: 15-25 minutes with 3 dedicated tokens
165+
- Platforms are processed sequentially within each category (search API pacing handled by sliding window rate limiter)
166+
- Typical runtime: 10-20 minutes with 3 dedicated tokens
173167
- The `_check_assets()` helper inside `get_latest_stable_release()` detects installers for ALL platforms in one pass, so cross-platform repos benefit all platforms from a single release check

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Each category is fetched across 4 platforms:
1818
|---|---|---|
1919
| **Android** | `.apk`, `.aab` | Kotlin, Java |
2020
| **Windows** | `.exe`, `.msi`, `.msix` | C#, C++, Rust |
21-
| **macOS** | `.dmg`, `.pkg`, `.app.zip` | Swift, Objective-C |
21+
| **macOS** | `.dmg`, `.pkg` | Swift, Objective-C |
2222
| **Linux** | `.AppImage`, `.deb`, `.rpm` | C++, Rust, C |
2323

2424
## Requirements
@@ -160,9 +160,9 @@ If the workflow fails, it automatically creates a GitHub issue labeled `automati
160160
### Rate limit strategy
161161

162162
- Each category uses its own token with an independent 5,000 req/hr budget
163-
- Search API rate limits (30/min) are tracked separately from core API limits
163+
- Search API rate limits (30/min) are paced by a sliding window rate limiter (28 calls per 60s window)
164164
- Release verification results are cached across platforms within the same category
165-
- Platforms are processed sequentially to avoid rate-limit thrashing
165+
- Platforms are processed sequentially; search API pacing is automatic
166166

167167
## Project Structure
168168

0 commit comments

Comments
 (0)