Skip to content

Commit 91ec26d

Browse files
remote functions (#67)
* remote function * connection * masthead update * formatting * extend description * reservation off * tf update * dataform export routine * bq connections * spark procedure role * docker update * lint * lint * lint * lint * more spark roles * mh submodule * submodule * lint * lint * use connections from dataform * sync with latest version * fix package versions * remove submodule * update packages * test * test * rewrite triggers * adjust bq export * mh roles update * cleanup * packages update * test * packages update * arguments renamed * current nodejs * Update definitions/output/reports/tech_report_technologies.js Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * remove reservation * switch order * deactivate spark procedure * deactivate standard exports * updated function calling * standard reports export draft * update description * dependabot update * fix month * fix query generation * update * fix env var * fix export config * formatting * fix --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent f34e8d6 commit 91ec26d

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+1422
-799
lines changed

.github/dependabot.yml

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,18 @@ updates:
1010
schedule:
1111
interval: "weekly"
1212
- package-ecosystem: "npm"
13-
directory: "/src"
13+
directory: "/infra/bigquery-export"
14+
schedule:
15+
interval: "weekly"
16+
- package-ecosystem: "npm"
17+
directory: "infra/dataform-export"
18+
schedule:
19+
interval: "weekly"
20+
- package-ecosystem: "npm"
21+
directory: "infra/dataform-trigger"
22+
schedule:
23+
interval: "weekly"
24+
- package-ecosystem: "terraform"
25+
directory: "infra/tf/"
1426
schedule:
1527
interval: "weekly"

Makefile

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,9 @@ tf_plan:
1111

1212
tf_apply:
1313
terraform -chdir=infra/tf init && terraform -chdir=infra/tf apply -auto-approve
14-
cd infra/bigquery-export/ && npm install && npm run buildpack
14+
15+
bigquery_export_deploy:
16+
cd infra/bigquery-export && npm install && npm run buildpack
17+
18+
#bigquery_export_spark_deploy:
19+
# cd infra/bigquery_export_spark && gcloud builds submit --region=global --tag us-docker.pkg.dev/httparchive/bigquery-spark-procedures/firestore_export:latest

README.md

Lines changed: 36 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ Consumers:
4747

4848
### Triggering workflows
4949

50-
In order to unify the workflow triggering mechanism, we use [a Cloud Run function](./src/README.md) that can be invoked in a number of ways (e.g. listen to PubSub messages), do intermediate checks and trigger the particular Dataform workflow execution configuration.
50+
In order to unify the workflow triggering mechanism, we use [a Cloud Run function](./infra/README.md) that can be invoked in a number of ways (e.g. listen to PubSub messages), do intermediate checks and trigger the particular Dataform workflow execution configuration.
5151

5252
## Contributing
5353

@@ -59,5 +59,38 @@ In order to unify the workflow triggering mechanism, we use [a Cloud Run functio
5959

6060
#### Workspace hints
6161

62-
1. In `workflow_settings.yaml` set `env_name: dev` to process sampled data.
63-
2. In `includes/constants.js` set `today` or other variables to a custome value.
62+
1. In `workflow_settings.yaml` set `environment: dev` to process sampled data.
63+
2. For development and testing, you can modify variables in `includes/constants.js`, but note that these are programmatically generated.
64+
65+
## Repository Structure
66+
67+
- `definitions/` - Contains the core Dataform SQL definitions and declarations
68+
- `output/` - Contains the main pipeline transformation logic
69+
- `declarations/` - Contains referenced tables/views declarations and other resources definitions
70+
- `includes/` - Contains shared JavaScript utilities and constants
71+
- `infra/` - Infrastructure code and deployment configurations
72+
- `dataform-trigger/` - Cloud Run function for workflow automation
73+
- `tf/` - Terraform configurations
74+
- `bigquery-export/` - BigQuery export configurations
75+
- `docs/` - Additional documentation
76+
77+
## Development Setup
78+
79+
1. Install dependencies:
80+
81+
```bash
82+
npm install
83+
```
84+
85+
2. Available Scripts:
86+
87+
- `npm run format` - Format code using Standard.js, fix Markdown issues, and format Terraform files
88+
- `npm run lint` - Run linting checks on JavaScript, Markdown files, and compile Dataform configs
89+
90+
## Code Quality
91+
92+
This repository uses:
93+
94+
- Standard.js for JavaScript code style
95+
- Markdownlint for Markdown file formatting
96+
- Dataform's built-in compiler for SQL validation

definitions/output/reports/cwv_tech_adoption.js

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@ publish('cwv_tech_adoption', {
1313
DELETE FROM ${ctx.self()}
1414
WHERE date = '${pastMonth}';
1515
`).query(ctx => `
16-
/* {"dataform_trigger": "report_cwv_tech_complete", "date": "${pastMonth}", "name": "adoption", "type": "report"} */
1716
SELECT
1817
date,
1918
app AS technology,
@@ -30,4 +29,18 @@ GROUP BY
3029
app,
3130
rank,
3231
geo
33-
`)
32+
`).postOps(ctx => `
33+
SELECT
34+
reports.run_export_job(
35+
JSON '''{
36+
"destination": "firestore",
37+
"config": {
38+
"database": "tech-report-apis-${constants.environment}",
39+
"collection": "adoption",
40+
"type": "report",
41+
"date": "${pastMonth}"
42+
},
43+
"query": "SELECT STRING(date) AS date, * EXCEPT(date) FROM ${ctx.self()} WHERE date = '${pastMonth}'"
44+
}'''
45+
);
46+
`)

definitions/output/reports/cwv_tech_categories.js

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@ publish('cwv_tech_categories', {
55
type: 'table',
66
tags: ['crux_ready']
77
}).query(ctx => `
8-
/* {"dataform_trigger": "report_cwv_tech_complete", "name": "categories", "type": "dict"} */
98
WITH pages AS (
109
SELECT DISTINCT
1110
client,
@@ -50,7 +49,7 @@ technology_stats AS (
5049
SELECT
5150
technology,
5251
category_obj AS categories,
53-
SUM(origins.dektop + origins.mobile) AS total_origins
52+
SUM(origins.desktop + origins.mobile) AS total_origins
5453
FROM ${ctx.ref('reports', 'cwv_tech_technologies')}
5554
GROUP BY
5655
technology,
@@ -91,4 +90,17 @@ SELECT
9190
) AS origins,
9291
NULL AS technologies
9392
FROM total_pages
94-
`)
93+
`).postOps(ctx => `
94+
SELECT
95+
reports.run_export_job(
96+
JSON '''{
97+
"destination": "firestore",
98+
"config": {
99+
"database": "tech-report-apis-${constants.environment}",
100+
"collection": "categories",
101+
"type": "dict"
102+
},
103+
"query": "SELECT * FROM ${ctx.self()}"
104+
}'''
105+
);
106+
`)

definitions/output/reports/cwv_tech_core_web_vitals.js

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,6 @@ return Object.values(vitals)
6868
DELETE FROM ${ctx.self()}
6969
WHERE date = '${pastMonth}';
7070
`).query(ctx => `
71-
/* {"dataform_trigger": "report_cwv_tech_complete", "date": "${pastMonth}", "name": "core_web_vitals", "type": "report"} */
7271
SELECT
7372
date,
7473
app AS technology,
@@ -98,4 +97,18 @@ GROUP BY
9897
app,
9998
rank,
10099
geo
101-
`)
100+
`).postOps(ctx => `
101+
SELECT
102+
reports.run_export_job(
103+
JSON '''{
104+
"destination": "firestore",
105+
"config": {
106+
"database": "tech-report-apis-${constants.environment}",
107+
"collection": "core_web_vitals",
108+
"type": "report",
109+
"date": "${pastMonth}"
110+
},
111+
"query": "SELECT STRING(date) AS date, * EXCEPT(date) FROM ${ctx.self()} WHERE date = '${pastMonth}'"
112+
}'''
113+
);
114+
`)

definitions/output/reports/cwv_tech_lighthouse.js

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,6 @@ return Object.values(lighthouse)
5454
DELETE FROM ${ctx.self()}
5555
WHERE date = '${pastMonth}';
5656
`).query(ctx => `
57-
/* {"dataform_trigger": "report_cwv_tech_complete", "date": "${pastMonth}", "name": "lighthouse", "type": "report"} */
5857
SELECT
5958
date,
6059
app AS technology,
@@ -75,4 +74,18 @@ GROUP BY
7574
app,
7675
rank,
7776
geo
78-
`)
77+
`).postOps(ctx => `
78+
SELECT
79+
reports.run_export_job(
80+
JSON '''{
81+
"destination": "firestore",
82+
"config": {
83+
"database": "tech-report-apis-${constants.environment}",
84+
"collection": "lighthouse",
85+
"type": "report",
86+
"date": "${pastMonth}"
87+
},
88+
"query": "SELECT STRING(date) AS date, * EXCEPT(date) FROM ${ctx.self()} WHERE date = '${pastMonth}'"
89+
}'''
90+
);
91+
`)

definitions/output/reports/cwv_tech_page_weight.js

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,6 @@ return Object.values(pageWeight)
4646
DELETE FROM ${ctx.self()}
4747
WHERE date = '${pastMonth}';
4848
`).query(ctx => `
49-
/* {"dataform_trigger": "report_cwv_tech_complete", "date": "${pastMonth}", "name": "page_weight", "type": "report"} */
5049
SELECT
5150
date,
5251
app AS technology,
@@ -65,4 +64,18 @@ GROUP BY
6564
app,
6665
rank,
6766
geo
68-
`)
67+
`).postOps(ctx => `
68+
SELECT
69+
reports.run_export_job(
70+
JSON '''{
71+
"destination": "firestore",
72+
"config": {
73+
"database": "tech-report-apis-${constants.environment}",
74+
"collection": "page_weight",
75+
"type": "report",
76+
"date": "${pastMonth}"
77+
},
78+
"query": "SELECT STRING(date) AS date, * EXCEPT(date) FROM ${ctx.self()} WHERE date = '${pastMonth}'"
79+
}'''
80+
);
81+
`)

definitions/output/reports/cwv_tech_technologies.js

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@ publish('cwv_tech_technologies', {
55
type: 'table',
66
tags: ['crux_ready']
77
}).query(ctx => `
8-
/* {"dataform_trigger": "report_cwv_tech_complete", "name": "technologies", "type": "dict"} */
98
WITH pages AS (
109
SELECT DISTINCT
1110
client,
@@ -86,4 +85,17 @@ SELECT
8685
MAX(IF(client = 'mobile', origins, 0)) AS mobile
8786
) AS origins
8887
FROM total_pages
89-
`)
88+
`).postOps(ctx => `
89+
SELECT
90+
reports.run_export_job(
91+
JSON '''{
92+
"destination": "firestore",
93+
"config": {
94+
"database": "tech-report-apis-${constants.environment}",
95+
"collection": "technologies",
96+
"type": "dict"
97+
},
98+
"query": "SELECT * FROM ${ctx.self()}"
99+
}'''
100+
);
101+
`)

definitions/output/reports/reports_dynamic.js

Lines changed: 71 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,48 @@
11
const configs = new reports.HTTPArchiveReports()
22
const metrics = configs.listMetrics()
33

4+
const bucket = 'httparchive'
5+
const storagePath = '/reports/'
6+
7+
function generateExportQuery (metric, sql, params, ctx) {
8+
let query = ''
9+
if (sql.type === 'histogram') {
10+
query = `
11+
SELECT
12+
* EXCEPT(date)
13+
FROM ${ctx.self()}
14+
WHERE date = '${params.date}'
15+
`
16+
} else if (sql.type === 'timeseries') {
17+
query = `
18+
SELECT
19+
FORMAT_DATE('%Y_%m_%d', date) AS date,
20+
* EXCEPT(date)
21+
FROM ${ctx.self()}
22+
`
23+
} else {
24+
throw new Error('Unknown SQL type')
25+
}
26+
27+
const queryOutput = query.replace(/[\r\n]+/g, ' ')
28+
return queryOutput
29+
}
30+
31+
function generateExportPath (metric, sql, params) {
32+
if (sql.type === 'histogram') {
33+
return `${storagePath}${params.date.replaceAll('-', '_')}/${metric.id}.json`
34+
} else if (sql.type === 'timeseries') {
35+
return `${storagePath}${metric.id}.json`
36+
} else {
37+
throw new Error('Unknown SQL type')
38+
}
39+
}
40+
441
const iterations = []
542
for (
6-
let month = constants.currentMonth; month >= constants.currentMonth; month = constants.fnPastMonth(month)) {
43+
let date = constants.currentMonth; date >= constants.currentMonth; date = constants.fnPastMonth(date)) {
744
iterations.push({
8-
date: month,
45+
date,
946
devRankFilter: constants.devRankFilter
1047
})
1148
}
@@ -18,29 +55,52 @@ if (iterations.length === 1) {
1855
type: 'incremental',
1956
protected: true,
2057
bigquery: sql.type === 'histogram' ? { partitionBy: 'date', clusterBy: ['client'] } : {},
21-
schema: 'reports',
22-
tags: ['crawl_complete', 'http_reports']
58+
schema: 'reports'
59+
// tags: ['crawl_complete', 'http_reports']
2360
}).preOps(ctx => `
2461
--DELETE FROM ${ctx.self()}
2562
--WHERE date = '${params.date}';
26-
`).query(ctx => `
27-
/* {"dataform_trigger": "report_complete", "date": "${params.date}", "name": "${metric.id}", "type": "${sql.type}"} */` +
28-
sql.query(ctx, params))
63+
`).query(
64+
ctx => sql.query(ctx, params)
65+
).postOps(ctx => `
66+
SELECT
67+
reports.run_export_job(
68+
JSON '''{
69+
"destination": "cloud_storage",
70+
"config": {
71+
"bucket": "${bucket}",
72+
"name": "${generateExportPath(metric, sql, params)}"
73+
},
74+
"query": "${generateExportQuery(metric, sql, params, ctx)}"
75+
}'''
76+
);
77+
`)
2978
})
3079
})
3180
} else {
3281
iterations.forEach((params, i) => {
3382
metrics.forEach(metric => {
3483
metric.SQL.forEach(sql => {
3584
operate(metric.id + '_' + sql.type + '_' + params.date, {
36-
tags: ['crawl_complete']
85+
// tags: ['crawl_complete']
3786
}).queries(ctx => `
3887
DELETE FROM reports.${metric.id}_${sql.type}
3988
WHERE date = '${params.date}';
4089
41-
/* {"dataform_trigger": "report_complete", "date": "${params.date}", "name": "${metric.id}", "type": "${sql.type}"} */
42-
INSERT INTO reports.${metric.id}_${sql.type}` +
43-
sql.query(ctx, params))
90+
INSERT INTO reports.${metric.id}_${sql.type}` + sql.query(ctx, params)
91+
).postOps(ctx => `
92+
SELECT
93+
reports.run_export_job(
94+
JSON '''{
95+
"destination": "cloud_storage",
96+
"config": {
97+
"bucket": "${bucket}",
98+
"name": "${generateExportPath(metric, sql, params)}"
99+
},
100+
"query": "${generateExportQuery(metric, sql, params, ctx)}"
101+
}'''
102+
);
103+
`)
44104
})
45105
})
46106
})

0 commit comments

Comments
 (0)