You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CLAUDE.md
+99-20Lines changed: 99 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,48 +22,115 @@ cached-data/
22
22
## How It Works
23
23
24
24
1.`fetch_all_categories.py` reads 3 per-category tokens (`GH_TOKEN_TRENDING`, `GH_TOKEN_NEW_RELEASES`, `GH_TOKEN_MOST_POPULAR`), falling back to `GITHUB_TOKEN`
25
-
2. Each category gets its own `GitHubClient` with a dedicated token (5,000 req/hr each = 15,000 total)
26
-
3. For each category × platform (12 combos), checks if cached JSON is fresh (<23h)
27
-
4. If stale, queries GitHub Search API with platform-specific topics/languages/keywords
28
-
5. Filters repos that have **real releases with platform installers** (e.g. `.apk` for android, `.exe`/`.msi` for windows)
29
-
6. Verifies ALL candidates — no artificial caps. Stops gracefully when rate limit drops below `RATE_LIMIT_FLOOR` (50)
30
-
7. Saves results to `cached-data/{category}/{platform}.json`
31
-
8. GitHub Actions commits and pushes changes
25
+
2. Each category gets its own `GitHubClient` with a dedicated token
26
+
3. If tokens are shared (same underlying user), the budget is split evenly across categories
27
+
4. For each category × platform (12 combos), checks if cached JSON is fresh (<23h)
28
+
5. If stale, queries GitHub Search API with platform-specific topics/languages/keywords
29
+
6. Filters repos that have **real releases with platform installers** via two methods:
30
+
-**Extension matching**: Direct installer files (`.apk`, `.exe`, `.dmg`, `.deb`, etc.)
31
+
-**Keyword matching**: Generic archives (`.zip`, `.tar.gz`) with platform keywords in the filename (e.g. `myapp-macos-arm64.zip`, `myapp-win-x64.tar.gz`)
32
+
7. Repos with NSFW/inappropriate topics or descriptions are excluded via `BLOCKED_TOPICS`
33
+
8. Verifies ALL candidates — no artificial caps. Stops gracefully when per-platform budget is exhausted or rate limit drops below `RATE_LIMIT_FLOOR` (50)
34
+
9. Never saves 0-repo results; never overwrites good cached data with poor results
35
+
10. Waits 65s between platforms for search API rate limit (30 req/min) to reset
36
+
11. Saves results to `cached-data/{category}/{platform}.json`
37
+
12. GitHub Actions commits and pushes changes
32
38
33
39
### Token Strategy
34
-
- 3 GitHub Classic PATs (scope: `public_repo`), one per category
35
-
- Stored as GitHub Actions secrets: `GH_TOKEN_TRENDING`, `GH_TOKEN_NEW_RELEASES`, `GH_TOKEN_MOST_POPULAR`
40
+
- 3 GitHub Classic PATs (scope: `public_repo`), each from a **separate GitHub account**
41
+
- GitHub rate limits are per-user (not per-token), so 3 accounts = 3 independent 5,000 req/hr pools = 15,000 total
42
+
- Stored as GitHub Actions repository secrets: `GH_TOKEN_TRENDING`, `GH_TOKEN_NEW_RELEASES`, `GH_TOKEN_MOST_POPULAR`
36
43
- Backward compatible: falls back to single `GITHUB_TOKEN` if per-category tokens aren't set
44
+
- If shared tokens detected, budget is automatically split evenly across categories
45
+
46
+
### Rate Limit Management
47
+
48
+
**Two independent rate limits at play:**
49
+
50
+
| Limit | Pool | Scope |
51
+
|---|---|---|
52
+
| Core API | 5,000/hr per user | Release checks, rate_limit endpoint |
53
+
| Search API | 30/min per user |`search/repositories` queries |
54
+
55
+
**Core API budget system:**
56
+
-`main()` detects shared tokens and caps each category to its fair share
57
+
-`process_category()` divides the category's budget evenly across 4 platforms
58
+
- Budget recalculates after each platform — unused budget carries forward
59
+
-`verify_installers()` stops when per-platform budget is exhausted (not just global floor)
60
+
61
+
**Search API throttling:**
62
+
- 65-second pause between platforms within a category to let the 30 req/min limit reset
63
+
- Only pauses if the previous platform actually ran searches (cached platforms skip it)
64
+
-`_update_rate_info()` ignores search API headers to prevent core rate tracking pollution
65
+
66
+
**Safety caps:**
67
+
-`_wait_for_rate_limit()` never sleeps more than 60s (prevents workflow timeout)
68
+
- Minimum budget of 100 requests per platform regardless of remaining
69
+
- Workflow timeout: 45 minutes
37
70
38
71
### Categories
39
-
-**trending**: High star velocity + recent activity. Sorted by trending score (platform score + velocity x 10)
72
+
-**trending**: High star velocity + recent activity. Sorted by trending score (platform score + velocity × 10)
40
73
-**new-releases**: Repos with stable releases in last 14 days. Sorted by release date
41
-
-**most-popular**: Repos with 5000+ stars. Sorted by star count
74
+
-**most-popular**: Repos with 5,000+ stars. Sorted by star count
42
75
43
76
### Platform Detection
44
-
Each platform has defined: topics, installer file extensions, scoring keywords (high/medium/low), primary/secondary languages, and frameworks. See `PLATFORMS` dict in fetch script.
77
+
78
+
Each platform has: topics, installer file extensions, scoring keywords (high/medium/low), primary/secondary languages, and frameworks. See `PLATFORMS` dict.
2.**Keyword matching** — generic archives (`.zip`, `.tar.gz`, `.tar.xz`, `.tar.bz2`, `.7z`) with platform keywords in the filename:
87
+
- Android: `android`
88
+
- Windows: `win64`, `win32`, `windows`, `-win-`, etc.
89
+
- macOS: `macos`, `darwin`, `osx`, `-mac-`, etc.
90
+
- Linux: `linux`, `-linux-`, etc.
91
+
92
+
### Content Filtering
93
+
94
+
`BLOCKED_TOPICS` set (~40 terms) excludes repos with NSFW/inappropriate content. Checked against both repo topics (set intersection) and description (substring match) during candidate collection, before any API calls are wasted on verification.
95
+
96
+
### Cache Protection
97
+
98
+
- Cache files are valid for 23 hours (`CACHE_VALIDITY_HOURS`)
99
+
- Stale caches with fewer than the minimum threshold repos are refetched (30 for trending/most-popular, 10 for new-releases)
100
+
-**Never saves 0 repos** — if a fetch returns 0, existing cache is preserved
101
+
-**Never overwrites good data with poor results** — if fetch returns fewer than threshold but cache has more, cache is kept
102
+
-`FORCE_REFRESH` env var bypasses cache loading entirely
103
+
104
+
### Fork Inclusion
105
+
106
+
All search queries include `fork:true` to discover forked repositories with platform installers.
45
107
46
108
## Key Constants
47
109
48
110
| Constant | Value | Notes |
49
111
|---|---|---|
50
-
|`RATE_LIMIT_FLOOR`| 50 |Stop verifying when rate limit drops below this |
112
+
|`RATE_LIMIT_FLOOR`| 50 |Global minimum — stop verifying below this |
1.`check-rate-limit` — Checks all 3 tokens, reports dedicated vs fallback, gates on >1000 remaining
158
+
2.`fetch-and-update` — Runs the script, validates JSON, commits and pushes (retries push up to 3 times with rebase)
159
+
3.`notify-on-failure` — Auto-creates a GitHub issue labeled `automation, category-fetch, bug`
160
+
161
+
**Workflow inputs:**
162
+
-`force_refresh` (boolean) — Skip all caches when triggered manually
163
+
164
+
**Timeout:** 45 minutes
165
+
87
166
## Development Notes
88
167
89
168
- Python 3.11, no type-checking or linting configured
90
169
- No tests beyond `validate_releases.py`
91
170
- Each category creates its own `GitHubClient` — release cache is per-client, shared across platforms within the same category
92
-
- Platforms are processed sequentially within each category to avoid rate-limit thrashing
93
-
-The workflow retries `git push` up to 3 times with rebase on conflict
94
-
-On failure, the workflow auto-creates a GitHub issue labeled `automation, category-fetch, bug`
171
+
- Platforms are processed sequentially within each category (with 65s search ratelimit pause between)
172
+
-Typical runtime: 15-25 minutes with 3 dedicated tokens
173
+
-The `_check_assets()` helper inside `get_latest_stable_release()` detects installers for ALL platforms in one pass, so cross-platform repos benefit all platforms from a single release check
0 commit comments