Skip to content

Commit e75446d

Browse files
CCM-12860: Create Alarms for Metrics Exceeding Thresholds (#370)
* Add alarms * Add alarms dir and fix anomaly metric * add alarms readme * fix return data * Fix some values and increase cert expiry from 14 to 30 * Make sqs msg age alarms same period; some tf lint * Fix expiry unit tests * Split into files * Split alarm modules into files * Fix tf * revert me * Fix dirs * fix path * fix new resources * Fix for each * Fix patch tests * Add wait to patch tests * Add wait status to post tests * Add new; fix error logs alarm * Fix letter queue alarm * Peer review comp tests * Peer review platform; inline api gw alarms; name convention * Add optional alarm trigger for PR env * Move alarm toggle to account level
1 parent 2778623 commit e75446d

37 files changed

Lines changed: 752 additions & 63 deletions

infrastructure/terraform/components/api/README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ No requirements.
1717
| <a name="input_core_environment"></a> [core\_environment](#input\_core\_environment) | Environment of Core | `string` | `"prod"` | no |
1818
| <a name="input_default_tags"></a> [default\_tags](#input\_default\_tags) | A map of default tags to apply to all taggable resources within the component | `map(string)` | `{}` | no |
1919
| <a name="input_disable_gateway_execute_endpoint"></a> [disable\_gateway\_execute\_endpoint](#input\_disable\_gateway\_execute\_endpoint) | Disable the execution endpoint for the API Gateway | `bool` | `true` | no |
20+
| <a name="input_enable_alarms"></a> [enable\_alarms](#input\_enable\_alarms) | Enable CloudWatch alarms for this deployed environment | `bool` | `true` | no |
2021
| <a name="input_enable_api_data_trace"></a> [enable\_api\_data\_trace](#input\_enable\_api\_data\_trace) | Enable API Gateway data trace logging | `bool` | `false` | no |
2122
| <a name="input_enable_backups"></a> [enable\_backups](#input\_enable\_backups) | Enable backups | `bool` | `false` | no |
2223
| <a name="input_enable_event_cache"></a> [enable\_event\_cache](#input\_enable\_event\_cache) | Enable caching of events to an S3 bucket | `bool` | `true` | no |
@@ -46,6 +47,10 @@ No requirements.
4647
| <a name="module_amendment_event_transformer"></a> [amendment\_event\_transformer](#module\_amendment\_event\_transformer) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.29/terraform-lambda.zip | n/a |
4748
| <a name="module_amendments_queue"></a> [amendments\_queue](#module\_amendments\_queue) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.24/terraform-sqs.zip | n/a |
4849
| <a name="module_authorizer_lambda"></a> [authorizer\_lambda](#module\_authorizer\_lambda) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.29/terraform-lambda.zip | n/a |
50+
| <a name="module_ddb_alarms_letter_queue"></a> [ddb\_alarms\_letter\_queue](#module\_ddb\_alarms\_letter\_queue) | ../../modules/alarms-ddb | n/a |
51+
| <a name="module_ddb_alarms_letters"></a> [ddb\_alarms\_letters](#module\_ddb\_alarms\_letters) | ../../modules/alarms-ddb | n/a |
52+
| <a name="module_ddb_alarms_mi"></a> [ddb\_alarms\_mi](#module\_ddb\_alarms\_mi) | ../../modules/alarms-ddb | n/a |
53+
| <a name="module_ddb_alarms_suppliers"></a> [ddb\_alarms\_suppliers](#module\_ddb\_alarms\_suppliers) | ../../modules/alarms-ddb | n/a |
4954
| <a name="module_domain_truststore"></a> [domain\_truststore](#module\_domain\_truststore) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.26/terraform-s3bucket.zip | n/a |
5055
| <a name="module_eventpub"></a> [eventpub](#module\_eventpub) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.31/terraform-eventpub.zip | n/a |
5156
| <a name="module_eventsub"></a> [eventsub](#module\_eventsub) | ../../modules/eventsub | n/a |
@@ -54,6 +59,7 @@ No requirements.
5459
| <a name="module_get_letters"></a> [get\_letters](#module\_get\_letters) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.29/terraform-lambda.zip | n/a |
5560
| <a name="module_get_status"></a> [get\_status](#module\_get\_status) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.29/terraform-lambda.zip | n/a |
5661
| <a name="module_kms"></a> [kms](#module\_kms) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.26/terraform-kms.zip | n/a |
62+
| <a name="module_lambda_alarms"></a> [lambda\_alarms](#module\_lambda\_alarms) | ../../modules/alarms-lambda | n/a |
5763
| <a name="module_letter_status_updates_queue"></a> [letter\_status\_updates\_queue](#module\_letter\_status\_updates\_queue) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.24/terraform-sqs.zip | n/a |
5864
| <a name="module_letter_updates_transformer"></a> [letter\_updates\_transformer](#module\_letter\_updates\_transformer) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.29/terraform-lambda.zip | n/a |
5965
| <a name="module_logging_bucket"></a> [logging\_bucket](#module\_logging\_bucket) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.26/terraform-s3bucket.zip | n/a |
@@ -62,6 +68,7 @@ No requirements.
6268
| <a name="module_post_letters"></a> [post\_letters](#module\_post\_letters) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.29/terraform-lambda.zip | n/a |
6369
| <a name="module_post_mi"></a> [post\_mi](#module\_post\_mi) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.29/terraform-lambda.zip | n/a |
6470
| <a name="module_s3bucket_test_letters"></a> [s3bucket\_test\_letters](#module\_s3bucket\_test\_letters) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.26/terraform-s3bucket.zip | n/a |
71+
| <a name="module_sqs_alarms"></a> [sqs\_alarms](#module\_sqs\_alarms) | ../../modules/alarms-sqs | n/a |
6572
| <a name="module_sqs_letter_updates"></a> [sqs\_letter\_updates](#module\_sqs\_letter\_updates) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.26/terraform-sqs.zip | n/a |
6673
| <a name="module_sqs_supplier_allocator"></a> [sqs\_supplier\_allocator](#module\_sqs\_supplier\_allocator) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.26/terraform-sqs.zip | n/a |
6774
| <a name="module_supplier_allocator"></a> [supplier\_allocator](#module\_supplier\_allocator) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.29/terraform-lambda.zip | n/a |
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
resource "aws_cloudwatch_metric_alarm" "apigw_five_xx" {
2+
count = local.alarms_enabled ? 1 : 0
3+
4+
alarm_name = "${local.csi}-apigw-5xx"
5+
alarm_description = "RELIABILITY: API Gateway 5xx responses"
6+
7+
namespace = "AWS/ApiGateway"
8+
metric_name = "5XXError"
9+
statistic = "Sum"
10+
period = 60
11+
12+
evaluation_periods = 1
13+
threshold = 0
14+
comparison_operator = "GreaterThanThreshold"
15+
treat_missing_data = "notBreaching"
16+
17+
dimensions = local.apigw_alarm_dimensions
18+
19+
actions_enabled = false
20+
alarm_actions = []
21+
ok_actions = []
22+
insufficient_data_actions = []
23+
tags = local.default_tags
24+
}
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
resource "aws_cloudwatch_metric_alarm" "apigw_latency_anomaly" {
2+
count = local.alarms_enabled ? 1 : 0
3+
4+
alarm_name = "${local.csi}-apigw-latency-anomaly"
5+
alarm_description = "RELIABILITY: API Gateway latency anomaly"
6+
comparison_operator = "GreaterThanUpperThreshold"
7+
evaluation_periods = 5
8+
datapoints_to_alarm = 3
9+
threshold_metric_id = "ad1"
10+
treat_missing_data = "notBreaching"
11+
12+
actions_enabled = false
13+
alarm_actions = []
14+
ok_actions = []
15+
insufficient_data_actions = []
16+
tags = local.default_tags
17+
18+
metric_query {
19+
id = "m1"
20+
metric {
21+
metric_name = "Latency"
22+
namespace = "AWS/ApiGateway"
23+
stat = "Average"
24+
period = 60
25+
dimensions = local.apigw_alarm_dimensions
26+
}
27+
return_data = true
28+
}
29+
30+
metric_query {
31+
id = "ad1"
32+
expression = "ANOMALY_DETECTION_BAND(m1, 2)"
33+
label = "Latency (expected)"
34+
return_data = true
35+
}
36+
}
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
resource "aws_cloudwatch_metric_alarm" "apigw_latency_threshold" {
2+
count = local.alarms_enabled ? 1 : 0
3+
4+
alarm_name = "${local.csi}-apigw-latency-threshold"
5+
alarm_description = "RELIABILITY: API Gateway latency above threshold"
6+
7+
namespace = "AWS/ApiGateway"
8+
metric_name = "Latency"
9+
statistic = "Average"
10+
period = 60
11+
12+
evaluation_periods = 5
13+
threshold = 29000
14+
comparison_operator = "GreaterThanThreshold"
15+
treat_missing_data = "notBreaching"
16+
17+
dimensions = local.apigw_alarm_dimensions
18+
19+
actions_enabled = false
20+
alarm_actions = []
21+
ok_actions = []
22+
insufficient_data_actions = []
23+
tags = local.default_tags
24+
}

infrastructure/terraform/components/api/cloudwatch_metric_alarm_apim_auth_cert_expirty.tf renamed to infrastructure/terraform/components/api/cloudwatch_metric_alarm_apim_auth_cert_expiry.tf

File renamed without changes.
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
locals {
2+
alarms_enabled = var.enable_alarms
3+
4+
apigw_alarm_dimensions = {
5+
ApiName = aws_api_gateway_rest_api.main.name
6+
Stage = aws_api_gateway_stage.main.stage_name
7+
}
8+
9+
lambda_alarm_targets = {
10+
authorizer_lambda = module.authorizer_lambda.function_name
11+
get_letter = module.get_letter.function_name
12+
get_letters = module.get_letters.function_name
13+
get_letter_data = module.get_letter_data.function_name
14+
get_status = module.get_status.function_name
15+
patch_letter = module.patch_letter.function_name
16+
post_letters = module.post_letters.function_name
17+
post_mi = module.post_mi.function_name
18+
update_letter_queue = module.update_letter_queue.function_name
19+
upsert_letter = module.upsert_letter.function_name
20+
amendment_event_transformer = module.amendment_event_transformer.function_name
21+
letter_updates_transformer = module.letter_updates_transformer.function_name
22+
mi_updates_transformer = module.mi_updates_transformer.function_name
23+
supplier_allocator = module.supplier_allocator.function_name
24+
}
25+
26+
sqs_alarm_targets = {
27+
sqs_letter_updates = module.sqs_letter_updates.sqs_queue_name
28+
amendments_queue = module.amendments_queue.sqs_queue_name
29+
letter_status_updates_queue = module.letter_status_updates_queue.sqs_queue_name
30+
sqs_supplier_allocator = module.sqs_supplier_allocator.sqs_queue_name
31+
}
32+
}

infrastructure/terraform/components/api/module_authorizer_lambda.tf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ module "authorizer_lambda" {
3636

3737
lambda_env_vars = {
3838
CLOUDWATCH_NAMESPACE = "/aws/api-gateway/supplier/alarms",
39-
CLIENT_CERTIFICATE_EXPIRATION_ALERT_DAYS = 14,
39+
CLIENT_CERTIFICATE_EXPIRATION_ALERT_DAYS = 30,
4040
APIM_SUPPLIER_ID_HEADER = "NHSD-Supplier-ID",
4141
SUPPLIERS_TABLE_NAME = aws_dynamodb_table.suppliers.name
4242
}
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
module "ddb_alarms_letter_queue" {
2+
count = local.alarms_enabled ? 1 : 0
3+
source = "../../modules/alarms-ddb"
4+
alarm_prefix = local.csi
5+
table_name = aws_dynamodb_table.letter_queue.name
6+
tags = local.default_tags
7+
}
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
module "ddb_alarms_letters" {
2+
count = local.alarms_enabled ? 1 : 0
3+
source = "../../modules/alarms-ddb"
4+
alarm_prefix = local.csi
5+
table_name = aws_dynamodb_table.letters.name
6+
tags = local.default_tags
7+
}
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
module "ddb_alarms_mi" {
2+
count = local.alarms_enabled ? 1 : 0
3+
source = "../../modules/alarms-ddb"
4+
alarm_prefix = local.csi
5+
table_name = aws_dynamodb_table.mi.name
6+
tags = local.default_tags
7+
}

0 commit comments

Comments
 (0)