Skip to content

Commit 72847a9

Browse files
committed
Add new llmsTxtAdoption metric to HTTPArchive reports configuration
1 parent 631e0ec commit 72847a9

2 files changed

Lines changed: 129 additions & 0 deletions

File tree

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
---
2+
name: add-httparchive-metric-report
3+
description: Add new metrics to HTTPArchive reports config. USE FOR adding performance metrics, adoption/percentage metrics, or custom metric analysis from crawl data. Chooses timeseries vs histogram based on data type.
4+
---
5+
6+
# Adding Metrics to HTTPArchive Reports
7+
8+
## Documentation Reference
9+
10+
See [reports.md](../../../reports.md) for complete architecture, troubleshooting, and configuration details.
11+
12+
## Quick Implementation
13+
14+
Add metrics to `includes/reports.js` in the `config._metrics` object. The system automatically generates reports across all lenses (all, top1k, wordpress, etc.).
15+
16+
## Metric Type Selection
17+
18+
| Type | Use For | Don't Use For |
19+
|------|---------|---------------|
20+
| **Timeseries** | Percentiles, adoption rates, trends, **boolean/presence metrics** | N/A (most versatile) |
21+
| **Histogram** | Continuous value distributions (page weight, load times) | Boolean/binary (only 2 states) |
22+
23+
**Key Rule:** Always use timeseries for boolean/adoption metrics; histogram only for continuous distributions.
24+
25+
## Required SQL Patterns
26+
27+
Every metric MUST include:
28+
- `date = '${params.date}'`
29+
- `AND is_root_page`
30+
- `${params.lens.sql}`
31+
- `${params.devRankFilter}`
32+
- `${ctx.ref('crawl', 'pages')}`
33+
- `GROUP BY client ORDER BY client`
34+
35+
## Quick Patterns
36+
37+
### Timeseries - Adoption/Percentage
38+
```sql
39+
ROUND(SAFE_DIVIDE(COUNTIF(condition), COUNT(0)) * 100, 2) AS pct_pages
40+
```
41+
42+
### Timeseries - Percentiles
43+
```sql
44+
ROUND(APPROX_QUANTILES(FLOAT64(metric), 1001)[OFFSET(101)] / 1024, 2) AS p10,
45+
ROUND(APPROX_QUANTILES(FLOAT64(metric), 1001)[OFFSET(251)] / 1024, 2) AS p25,
46+
ROUND(APPROX_QUANTILES(FLOAT64(metric), 1001)[OFFSET(501)] / 1024, 2) AS p50,
47+
ROUND(APPROX_QUANTILES(FLOAT64(metric), 1001)[OFFSET(751)] / 1024, 2) AS p75,
48+
ROUND(APPROX_QUANTILES(FLOAT64(metric), 1001)[OFFSET(901)] / 1024, 2) AS p90
49+
-- Add: AND FLOAT64(metric) > 0 in WHERE for continuous metrics
50+
```
51+
52+
### Histogram - Distribution Bins
53+
```sql
54+
-- Core binning pattern in innermost subquery:
55+
CAST(FLOOR(FLOAT64(metric) / bin_size) * bin_size AS INT64) AS bin,
56+
COUNT(0) AS volume
57+
-- Wrap with pdf: volume / SUM(volume) OVER (PARTITION BY client)
58+
-- Wrap with cdf: SUM(pdf) OVER (PARTITION BY client ORDER BY bin)
59+
```
60+
61+
## Examples
62+
63+
```javascript
64+
llmsTxtAdoption: {
65+
SQL: [
66+
{
67+
type: 'timeseries',
68+
query: DataformTemplateBuilder.create((ctx, params) => `
69+
SELECT
70+
client,
71+
ROUND(SAFE_DIVIDE(
72+
COUNTIF(SAFE.BOOL(custom_metrics.other.llms_txt_validation.valid)),
73+
COUNT(0)
74+
) * 100, 2) AS pct_pages
75+
FROM ${ctx.ref('crawl', 'pages')}
76+
WHERE
77+
date = '${params.date}'
78+
AND is_root_page
79+
${params.lens.sql}
80+
${params.devRankFilter}
81+
GROUP BY client
82+
ORDER BY client
83+
`)
84+
}
85+
]
86+
}
87+
```
88+
89+
See [reports.md](../../../reports.md) for complete histogram + timeseries examples.
90+
91+
## Implementation
92+
93+
1. Open `includes/reports.js`, locate `config._metrics` (line ~42)
94+
2. Add metric before closing `}` of `_metrics`
95+
3. Use patterns above for timeseries/histogram structure
96+
4. Include all required SQL patterns
97+
5. Run `get_errors` to verify
98+
99+
## Key Notes
100+
101+
- **Continuous metrics:** Add `AND metric > 0` before percentile calculations
102+
- **Custom metrics:** Use `SAFE.BOOL()` and `SAFE_DIVIDE()` for safety
103+
- **Auto-processing:** Metrics run across all lenses automatically
104+

includes/reports.js

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,31 @@ const config = {
9999
`)
100100
}
101101
]
102+
},
103+
llmsTxtAdoption: {
104+
SQL: [
105+
{
106+
type: 'timeseries',
107+
query: DataformTemplateBuilder.create((ctx, params) => `
108+
SELECT
109+
client,
110+
ROUND(SAFE_DIVIDE(
111+
COUNTIF(SAFE.BOOL(custom_metrics.other.llms_txt_validation.valid)),
112+
COUNT(0)
113+
) * 100, 2) AS pct_pages
114+
FROM ${ctx.ref('crawl', 'pages')}
115+
WHERE
116+
date = '${params.date}'
117+
AND is_root_page
118+
${params.lens.sql}
119+
${params.devRankFilter}
120+
GROUP BY
121+
client
122+
ORDER BY
123+
client
124+
`)
125+
}
126+
]
102127
}
103128
}
104129
};

0 commit comments

Comments
 (0)