|
| 1 | +--- |
| 2 | +name: add-httparchive-metric-report |
| 3 | +description: Add new metrics to HTTPArchive reports config. USE FOR adding performance metrics, adoption/percentage metrics, or custom metric analysis from crawl data. Chooses timeseries vs histogram based on data type. |
| 4 | +--- |
| 5 | + |
| 6 | +# Adding Metrics to HTTPArchive Reports |
| 7 | + |
| 8 | +## Documentation Reference |
| 9 | + |
| 10 | +See [reports.md](../../../reports.md) for complete architecture, troubleshooting, and configuration details. |
| 11 | + |
| 12 | +## Quick Implementation |
| 13 | + |
| 14 | +Add metrics to `includes/reports.js` in the `config._metrics` object. The system automatically generates reports across all lenses (all, top1k, wordpress, etc.). |
| 15 | + |
| 16 | +## Metric Type Selection |
| 17 | + |
| 18 | +| Type | Use For | Don't Use For | |
| 19 | +|------|---------|---------------| |
| 20 | +| **Timeseries** | Percentiles, adoption rates, trends, **boolean/presence metrics** | N/A (most versatile) | |
| 21 | +| **Histogram** | Continuous value distributions (page weight, load times) | Boolean/binary (only 2 states) | |
| 22 | + |
| 23 | +**Key Rule:** Always use timeseries for boolean/adoption metrics; histogram only for continuous distributions. |
| 24 | + |
| 25 | +## Required SQL Patterns |
| 26 | + |
| 27 | +Every metric MUST include: |
| 28 | +- `date = '${params.date}'` |
| 29 | +- `AND is_root_page` |
| 30 | +- `${params.lens.sql}` |
| 31 | +- `${params.devRankFilter}` |
| 32 | +- `${ctx.ref('crawl', 'pages')}` |
| 33 | +- `GROUP BY client ORDER BY client` |
| 34 | + |
| 35 | +## Quick Patterns |
| 36 | + |
| 37 | +### Timeseries - Adoption/Percentage |
| 38 | +```sql |
| 39 | +ROUND(SAFE_DIVIDE(COUNTIF(condition), COUNT(0)) * 100, 2) AS pct_pages |
| 40 | +``` |
| 41 | + |
| 42 | +### Timeseries - Percentiles |
| 43 | +```sql |
| 44 | +ROUND(APPROX_QUANTILES(FLOAT64(metric), 1001)[OFFSET(101)] / 1024, 2) AS p10, |
| 45 | +ROUND(APPROX_QUANTILES(FLOAT64(metric), 1001)[OFFSET(251)] / 1024, 2) AS p25, |
| 46 | +ROUND(APPROX_QUANTILES(FLOAT64(metric), 1001)[OFFSET(501)] / 1024, 2) AS p50, |
| 47 | +ROUND(APPROX_QUANTILES(FLOAT64(metric), 1001)[OFFSET(751)] / 1024, 2) AS p75, |
| 48 | +ROUND(APPROX_QUANTILES(FLOAT64(metric), 1001)[OFFSET(901)] / 1024, 2) AS p90 |
| 49 | +-- Add: AND FLOAT64(metric) > 0 in WHERE for continuous metrics |
| 50 | +``` |
| 51 | + |
| 52 | +### Histogram - Distribution Bins |
| 53 | +```sql |
| 54 | +-- Core binning pattern in innermost subquery: |
| 55 | +CAST(FLOOR(FLOAT64(metric) / bin_size) * bin_size AS INT64) AS bin, |
| 56 | +COUNT(0) AS volume |
| 57 | +-- Wrap with pdf: volume / SUM(volume) OVER (PARTITION BY client) |
| 58 | +-- Wrap with cdf: SUM(pdf) OVER (PARTITION BY client ORDER BY bin) |
| 59 | +``` |
| 60 | + |
| 61 | +## Examples |
| 62 | + |
| 63 | +```javascript |
| 64 | +llmsTxtAdoption: { |
| 65 | + SQL: [ |
| 66 | + { |
| 67 | + type: 'timeseries', |
| 68 | + query: DataformTemplateBuilder.create((ctx, params) => ` |
| 69 | + SELECT |
| 70 | + client, |
| 71 | + ROUND(SAFE_DIVIDE( |
| 72 | + COUNTIF(SAFE.BOOL(custom_metrics.other.llms_txt_validation.valid)), |
| 73 | + COUNT(0) |
| 74 | + ) * 100, 2) AS pct_pages |
| 75 | + FROM ${ctx.ref('crawl', 'pages')} |
| 76 | + WHERE |
| 77 | + date = '${params.date}' |
| 78 | + AND is_root_page |
| 79 | + ${params.lens.sql} |
| 80 | + ${params.devRankFilter} |
| 81 | + GROUP BY client |
| 82 | + ORDER BY client |
| 83 | + `) |
| 84 | + } |
| 85 | + ] |
| 86 | +} |
| 87 | +``` |
| 88 | + |
| 89 | +See [reports.md](../../../reports.md) for complete histogram + timeseries examples. |
| 90 | + |
| 91 | +## Implementation |
| 92 | + |
| 93 | +1. Open `includes/reports.js`, locate `config._metrics` (line ~42) |
| 94 | +2. Add metric before closing `}` of `_metrics` |
| 95 | +3. Use patterns above for timeseries/histogram structure |
| 96 | +4. Include all required SQL patterns |
| 97 | +5. Run `get_errors` to verify |
| 98 | + |
| 99 | +## Key Notes |
| 100 | + |
| 101 | +- **Continuous metrics:** Add `AND metric > 0` before percentile calculations |
| 102 | +- **Custom metrics:** Use `SAFE.BOOL()` and `SAFE_DIVIDE()` for safety |
| 103 | +- **Auto-processing:** Metrics run across all lenses automatically |
| 104 | + |
0 commit comments