You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .agents/skills/optimize-storage-costs/SKILL.md
+24-13Lines changed: 24 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,10 +11,15 @@ Identify and remove BigQuery tables that contribute to storage costs but have no
11
11
12
12
## Table Categories
13
13
14
-
| Type | Definition | Indicators |
15
-
|------|------------|------------|
16
-
|**Dead-end**| Regularly updated, no downstream consumption | Updated but never read in 30+ days |
17
-
|**Unused**| No upstream or downstream activity | No reads/writes in 30+ days |
14
+
Masthead Data uses lineage analysis to identify tables, but relies on visible pipeline references. Modification timestamps are critical:
15
+
16
+
| Type | Definition | Indicators | Watch for |
17
+
|------|------------|------------|---|
18
+
|**Dead-end**| Regularly updated, no downstream consumption | Updated but never read in 30+ days | External writers outside lineage graph (manual jobs, independent pipelines) |
19
+
|**Unused**| No upstream or downstream activity | No reads/writes in 30+ days | Recent `lastModifiedTime` despite "Unused" flag suggests external writer—**do not drop without verification**|
20
+
21
+
### Key Signal
22
+
If a table is flagged `Unused`**and** has a recent modification timestamp, something outside Masthead's visibility is writing to it. This always warrants investigation before dropping.
18
23
19
24
## When to Use
20
25
@@ -26,7 +31,7 @@ Identify and remove BigQuery tables that contribute to storage costs but have no
26
31
## Prerequisites
27
32
28
33
- Masthead Data agent v0.2.7+ installed (for accurate lineage)
29
-
- Access to Masthead insights dataset: `masthead-prod.{DATASET_NAME}.insights`
34
+
- Access to Masthead insights dataset: `masthead-prod.httparchive.insights`
30
35
- BigQuery permissions to query insights and drop tables
SAFE.INT64(overview.num_bytes) / POW(1024, 4) AS total_tib,
44
49
SAFE.FLOAT64(overview.cost_30d) AS cost_usd_30d,
45
50
SAFE.FLOAT64(overview.savings_30d) AS savings_usd_30d
46
-
FROM \`masthead-prod.{DATASET_NAME}.insights\`
51
+
FROM \`masthead-prod.httparchive.insights\`
47
52
WHERE category = 'Cost'
48
53
AND subtype IN ('Dead end table', 'Unused table')
49
54
AND overview.num_bytes IS NOT NULL
50
55
AND SAFE.FLOAT64(overview.savings_30d) > 10
51
-
ORDER BY total_tib DESC"> storage_waste.csv
56
+
AND target_resource NOT LIKE '%analytics_%' -- Filter out low-impact GA intraday tables
57
+
ORDER BY savings_usd_30d DESC"> storage_waste.csv
52
58
```
53
59
60
+
**Note:** Sorting by `savings_usd_30d` instead of `total_tib` prioritizes high-impact targets for review.
61
+
54
62
**Alternative: Use Masthead UI**
55
63
- Navigate to [Dictionary page](https://app.mastheadata.com/dictionary?tab=Tables&deadEnd=true)
56
64
- Filter by `Dead-end` or `Unused` labels
@@ -67,6 +75,8 @@ Review `storage_waste.csv` and add a `status` column with values:
67
75
- Is this a backup or archive table? (consider alternative storage)
68
76
- Is there a downstream dependency not captured in lineage?
69
77
- Is this table part of an active experiment or migration?
78
+
-**For repo-managed projects:** Search the codebase (e.g., `grep` for table name in model definitions, scripts) to confirm ownership. Table naming can be misleading (e.g., `cwv_tech_*` may seem like current outputs but could be legacy).
79
+
-**Check for disabled producers:** If a Dataform `publish()` has `disabled: true` but the underlying BigQuery table still exists and has recent modifications, either the table is abandoned or an external process took over—both warrant investigation.
70
80
71
81
### Step 3: Drop Approved Tables
72
82
@@ -106,16 +116,17 @@ For interactive review with Google Sheets integration:
106
116
107
117
## Decision Framework
108
118
109
-
| Monthly Savings | Action |
110
-
|-----------------|--------|
111
-
| < $10 | Consider keeping (low ROI) |
112
-
| $10-$100 | Review and drop if unused |
113
-
| $100-$1000 | Priority review, likely drop |
114
-
| > $1000 | Immediate investigation required |
119
+
| Monthly Savings | Action | Recency Check |
120
+
|-----------------|--------|---------------|
121
+
| < $10 | Consider keeping (low ROI) | Skip if `lastModifiedTime` > 12 months old (hygiene only) |
122
+
| $10-$100 | Review and drop if unused | Check modification date; recent writes require owner verification |
123
+
| $100-$1000 | Priority review, likely drop | Mandatory verification if modified in last 30 days |
124
+
| > $1000 | Immediate investigation required | Always verify external writer before any action |
115
125
116
126
## Key Notes
117
127
118
128
-**Dead-end tables** may indicate pipeline issues - investigate before dropping
129
+
-**Unused tables with recent modifications** are the highest-priority investigate cases. The gap between Masthead's "no lineage" and actual writes means an external dependency exists.
119
130
- Tables can be restored from time travel (7 days) or fail-safe (7 days after time travel)
120
131
- Consider archiving to Cloud Storage if compliance requires retention
121
132
- Coordinate with data teams before dropping shared datasets
0 commit comments