restructure TF-managed services #354

LesnyRumcajs · 2023-12-07T16:50:31Z

Summary of changes
Changes introduced in this pull request:

Restructured IAC code as outlined in here and here
fixed several issues with the existing code, including inconsistencies, unused variables (or entire files), stray artefacts and logic errors (some may still exist - I tackled only those that caught my attention),
created a plenty of follow-up issues given that this PR was getting too big and there was a need to unblock other initiatives,

Migrated services:

Sync check
Daily snapshot

Follow-ups:

For daily snapshot service, allow having only calibnet OR mainnet snapshots Allow specifying calibnet and/or mainnet snapshot for the daily snapshot #376
Allow multiple instances of a single service to be created (e.g., 3 instances of the snapshot service) with a single count = N. Allow multiple instances of a single module #372
add assertions Add tag assertions to Terraform modules #375
use lesser privilege API keys for ingestion, higher for provisioning Use lower privilege users for data ingestion, higher for provisioning #377
create distinct terraform user for automation Create distinct user for DO automation #378
figure out log-based alarm once the frustration levels go down Add log-based alerts for the snapshot service #379
Filecoin node (mainnet) Migrate Forest node services to Terragrunt #373
Filecoin node (calibnet) Migrate Forest node services to Terragrunt #373
(and some more that are not linked)

Reference issue to close (if applicable)

Closes #363
Closes #318

Other information and links

for more information, see https://pre-commit.ci

joshdougall · 2024-01-10T01:47:59Z

tf-managed/README.md

+# Structure
+
+```
+├── common <- common code, shared between all modules (TODO maybe move it to modules?)


I like this folder structure. The only change I would make is to rename common to something like scripts, to show more intention on the folder itself. Since the root directory is tf-managed, we can assume that modules refer to terraform modules and either grow the scripts folder, which is then split by script language (i.e. ruby). Just a thought!

@joshdougall Will do, thanks for the review. The end goal is to get rid of this directory altogether and hide all those scripts behind a docker (and/or packer!) images.

for more information, see https://pre-commit.ci

github-actions · 2024-01-18T09:50:40Z

Forest: Snapshot Service Infrastructure Plan: success

Show Plan

data.local_file.init: Reading...
data.external.sources_tar: Reading...
data.local_file.init: Read complete after 0s [id=6c33e49974387292c0336588b86d24ceec3c19e1]
data.digitalocean_ssh_keys.keys: Reading...
data.digitalocean_project.forest_project: Reading...
data.external.sources_tar: Read complete after 0s [id=-]
data.local_file.sources: Reading...
data.local_file.sources: Read complete after 0s [id=1856d7c8b967c9a68d2c699aee6767229dd641ce]
data.digitalocean_ssh_keys.keys: Read complete after 0s [id=ssh_keys/14512061520513425405]
data.digitalocean_project.forest_project: Read complete after 1s [id=da5e6601-7fd9-4d02-951e-390f7feb3411]

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # digitalocean_droplet.forest will be created
  + resource "digitalocean_droplet" "forest" {
      + backups              = false
      + created_at           = (known after apply)
      + disk                 = (known after apply)
      + graceful_shutdown    = false
      + id                   = (known after apply)
      + image                = "docker-20-04"
      + ipv4_address         = (known after apply)
      + ipv4_address_private = (known after apply)
      + ipv6                 = false
      + ipv6_address         = (known after apply)
      + locked               = (known after apply)
      + memory               = (known after apply)
      + monitoring           = true
      + name                 = "prod-forest-snapshot"
      + price_hourly         = (known after apply)
      + price_monthly        = (known after apply)
      + private_networking   = (known after apply)
      + region               = "fra1"
      + resize_disk          = true
      + size                 = "s-4vcpu-16gb-amd"
      + ssh_keys             = [
          + "00:a0:c0:54:5f:40:22:10:52:8a:04:48:f9:c8:db:00",
          + "04:77:74:e8:81:92:9d:1e:cb:d3:5d:0d:fa:83:56:f6",
          + "31:fd:e9:da:70:df:ef:33:af:a2:ea:a1:fd:69:a7:9d",
          + "37:1e:1a:fc:25:2d:5a:a7:1f:49:b2:6d:53:5c:0e:45",
          + "41:91:6d:f9:f7:27:44:30:7f:a4:6f:36:e8:97:ad:cb",
          + "5a:a8:6d:02:66:21:e9:f7:27:b2:1c:6e:89:0f:65:77",
          + "5f:d6:ad:06:b8:2d:4a:ef:0a:ac:97:bf:37:b0:7a:4c",
          + "77:09:d9:32:61:65:81:08:d1:e2:50:9b:ec:28:02:62",
          + "88:95:97:77:a1:1f:bf:e8:3a:84:20:7d:a9:4c:74:6d",
          + "99:ea:ec:bf:9f:d1:b2:52:02:b2:78:a2:57:25:a0:e7",
          + "9c:18:88:44:c4:d6:74:84:07:9a:3c:9a:f6:17:f3:e4",
          + "b6:03:52:e0:49:14:03:90:19:37:69:c3:c7:d0:e7:69",
          + "bb:7a:cc:18:56:7a:cb:2b:07:d7:8b:30:86:b8:b5:41",
          + "c7:f9:b0:49:24:aa:30:36:4e:5f:d4:a3:ab:43:49:e8",
          + "d3:6d:af:8e:a4:b9:8f:b8:38:2b:56:06:5f:38:48:a7",
          + "e4:0e:85:24:75:5e:f3:b1:77:c4:7d:a2:3a:1e:00:b1",
          + "f7:de:2d:83:ce:e7:c3:13:2c:ca:3a:f0:4b:4e:46:da",
          + "fa:48:60:7b:b0:c4:86:70:e9:fa:e9:f8:fb:c7:2e:72",
          + "fa:62:10:64:1b:77:eb:78:a5:ba:e0:86:ff:76:7e:97",
          + "fe:42:94:20:d0:a9:24:67:5f:de:78:c1:bb:8b:6c:92",
        ]
      + status               = (known after apply)
      + tags                 = [
          + "iac",
          + "prod",
        ]
      + urn                  = (known after apply)
      + user_data            = (sensitive value)
      + vcpus                = (known after apply)
      + volume_ids           = (known after apply)
      + vpc_uuid             = (known after apply)
    }

  # digitalocean_firewall.forest-firewall will be created
  + resource "digitalocean_firewall" "forest-firewall" {
      + created_at      = (known after apply)
      + droplet_ids     = (known after apply)
      + id              = (known after apply)
      + name            = "prod-forest-snapshot"
      + pending_changes = (known after apply)
      + status          = (known after apply)

      + inbound_rule {
          + port_range                = "22"
          + protocol                  = "tcp"
          + source_addresses          = [
              + "0.0.0.0/0",
              + "::/0",
            ]
          + source_droplet_ids        = []
          + source_kubernetes_ids     = []
          + source_load_balancer_uids = []
          + source_tags               = []
        }
      + inbound_rule {
          + port_range                = "2345"
          + protocol                  = "tcp"
          + source_addresses          = [
              + "0.0.0.0/0",
              + "::/0",
            ]
          + source_droplet_ids        = []
          + source_kubernetes_ids     = []
          + source_load_balancer_uids = []
          + source_tags               = []
        }
      + inbound_rule {
          + port_range                = "53"
          + protocol                  = "udp"
          + source_addresses          = [
              + "0.0.0.0/0",
              + "::/0",
            ]
          + source_droplet_ids        = []
          + source_kubernetes_ids     = []
          + source_load_balancer_uids = []
          + source_tags               = []
        }
      + inbound_rule {
          + port_range                = "80"
          + protocol                  = "tcp"
          + source_addresses          = [
              + "0.0.0.0/0",
              + "::/0",
            ]
          + source_droplet_ids        = []
          + source_kubernetes_ids     = []
          + source_load_balancer_uids = []
          + source_tags               = []
        }

      + outbound_rule {
          + destination_addresses          = [
              + "0.0.0.0/0",
              + "::/0",
            ]
          + destination_droplet_ids        = []
          + destination_kubernetes_ids     = []
          + destination_load_balancer_uids = []
          + destination_tags               = []
          + port_range                     = "53"
          + protocol                       = "udp"
        }
      + outbound_rule {
          + destination_addresses          = [
              + "0.0.0.0/0",
              + "::/0",
            ]
          + destination_droplet_ids        = []
          + destination_kubernetes_ids     = []
          + destination_load_balancer_uids = []
          + destination_tags               = []
          + port_range                     = "all"
          + protocol                       = "tcp"
        }
    }

  # digitalocean_project_resources.connect_forest_project will be created
  + resource "digitalocean_project_resources" "connect_forest_project" {
      + id        = (known after apply)
      + project   = "da5e6601-7fd9-4d02-951e-390f7feb3411"
      + resources = (known after apply)
    }

  # module.monitoring[0].newrelic_alert_policy.alert will be created
  + resource "newrelic_alert_policy" "alert" {
      + account_id          = (known after apply)
      + id                  = (known after apply)
      + incident_preference = "PER_POLICY"
      + name                = "prod-forest-snapshot alert policy"
    }

  # module.monitoring[0].newrelic_notification_channel.slack-channel[0] will be created
  + resource "newrelic_notification_channel" "slack-channel" {
      + account_id     = (known after apply)
      + active         = true
      + destination_id = "f902e020-5993-4425-9ae3-133084fc870d"
      + id             = (known after apply)
      + name           = "prod-forest-snapshot slack"
      + product        = "IINT"
      + status         = (known after apply)
      + type           = "SLACK"

      + property {
          + key   = "channelId"
          + value = "C05BHMZ7GTT"
        }
      + property {
          + key   = "customDetailsSlack"
          + value = "issue id - {{issueId}}"
        }
    }

  # module.monitoring[0].newrelic_nrql_alert_condition.disk_space will be created
  + resource "newrelic_nrql_alert_condition" "disk_space" {
      + account_id                   = (known after apply)
      + aggregation_window           = (known after apply)
      + description                  = "Alert when disk space usage is high on an the service host"
      + enabled                      = true
      + entity_guid                  = (known after apply)
      + id                           = (known after apply)
      + name                         = "High Disk Utilization"
      + policy_id                    = (known after apply)
      + type                         = "static"
      + violation_time_limit         = (known after apply)
      + violation_time_limit_seconds = 259200

      + critical {
          + operator              = "above"
          + threshold             = 95
          + threshold_duration    = 300
          + threshold_occurrences = "all"
        }

      + nrql {
          + query = "SELECT latest(diskUsedPercent) FROM StorageSample where entityName = 'prod-forest-snapshot'"
        }

      + warning {
          + operator              = "above"
          + threshold             = 85
          + threshold_duration    = 300
          + threshold_occurrences = "all"
        }
    }

  # module.monitoring[0].newrelic_workflow.alerting-workflow-slack[0] will be created
  + resource "newrelic_workflow" "alerting-workflow-slack" {
      + account_id            = (known after apply)
      + destinations_enabled  = true
      + enabled               = true
      + enrichments_enabled   = true
      + guid                  = (known after apply)
      + id                    = (known after apply)
      + last_run              = (known after apply)
      + muting_rules_handling = "NOTIFY_ALL_ISSUES"
      + name                  = "prod-forest-snapshot slack alerting workflow"
      + workflow_id           = (known after apply)

      + destination {
          + channel_id            = (known after apply)
          + name                  = (known after apply)
          + notification_triggers = (known after apply)
          + type                  = (known after apply)
        }

      + issues_filter {
          + filter_id = (known after apply)
          + name      = "prod-forest-snapshot alerting workflow filter"
          + type      = "FILTER"

          + predicate {
              + attribute = "labels.policyIds"
              + operator  = "EXACTLY_MATCHES"
              + values    = (known after apply)
            }
        }
    }

Plan: 7 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + ip = [
      + (known after apply),
    ]

─────────────────────────────────────────────────────────────────────────────

Saved the plan to: /home/runner/work/forest-iac/forest-iac/tfplan

To perform exactly these actions, run the following command to apply:
    terraform apply "/home/runner/work/forest-iac/forest-iac/tfplan"

github-actions · 2024-01-18T09:51:00Z

Forest: Sync Check Service Infrastructure Plan: success

Show Plan

data.external.sources_tar: Reading...
data.local_file.init: Reading...
data.local_file.init: Read complete after 0s [id=8f06c32ae54175fe92d10d4176a7080693818664]
data.digitalocean_ssh_keys.keys: Reading...
data.digitalocean_project.forest_project: Reading...
data.external.sources_tar: Read complete after 0s [id=-]
data.local_file.sources: Reading...
data.local_file.sources: Read complete after 0s [id=820c3ca54b1bc8f5ac0eeb455a0493917908745f]
data.digitalocean_ssh_keys.keys: Read complete after 0s [id=ssh_keys/14512061520513425405]
data.digitalocean_project.forest_project: Read complete after 2s [id=da5e6601-7fd9-4d02-951e-390f7feb3411]

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # digitalocean_droplet.forest will be created
  + resource "digitalocean_droplet" "forest" {
      + backups              = false
      + created_at           = (known after apply)
      + disk                 = (known after apply)
      + graceful_shutdown    = false
      + id                   = (known after apply)
      + image                = "docker-20-04"
      + ipv4_address         = (known after apply)
      + ipv4_address_private = (known after apply)
      + ipv6                 = false
      + ipv6_address         = (known after apply)
      + locked               = (known after apply)
      + memory               = (known after apply)
      + monitoring           = true
      + name                 = "prod-sync-check"
      + price_hourly         = (known after apply)
      + price_monthly        = (known after apply)
      + private_networking   = (known after apply)
      + region               = "fra1"
      + resize_disk          = true
      + size                 = "s-4vcpu-16gb-amd"
      + ssh_keys             = [
          + "00:a0:c0:54:5f:40:22:10:52:8a:04:48:f9:c8:db:00",
          + "04:77:74:e8:81:92:9d:1e:cb:d3:5d:0d:fa:83:56:f6",
          + "31:fd:e9:da:70:df:ef:33:af:a2:ea:a1:fd:69:a7:9d",
          + "37:1e:1a:fc:25:2d:5a:a7:1f:49:b2:6d:53:5c:0e:45",
          + "41:91:6d:f9:f7:27:44:30:7f:a4:6f:36:e8:97:ad:cb",
          + "5a:a8:6d:02:66:21:e9:f7:27:b2:1c:6e:89:0f:65:77",
          + "5f:d6:ad:06:b8:2d:4a:ef:0a:ac:97:bf:37:b0:7a:4c",
          + "77:09:d9:32:61:65:81:08:d1:e2:50:9b:ec:28:02:62",
          + "88:95:97:77:a1:1f:bf:e8:3a:84:20:7d:a9:4c:74:6d",
          + "99:ea:ec:bf:9f:d1:b2:52:02:b2:78:a2:57:25:a0:e7",
          + "9c:18:88:44:c4:d6:74:84:07:9a:3c:9a:f6:17:f3:e4",
          + "b6:03:52:e0:49:14:03:90:19:37:69:c3:c7:d0:e7:69",
          + "bb:7a:cc:18:56:7a:cb:2b:07:d7:8b:30:86:b8:b5:41",
          + "c7:f9:b0:49:24:aa:30:36:4e:5f:d4:a3:ab:43:49:e8",
          + "d3:6d:af:8e:a4:b9:8f:b8:38:2b:56:06:5f:38:48:a7",
          + "e4:0e:85:24:75:5e:f3:b1:77:c4:7d:a2:3a:1e:00:b1",
          + "f7:de:2d:83:ce:e7:c3:13:2c:ca:3a:f0:4b:4e:46:da",
          + "fa:48:60:7b:b0:c4:86:70:e9:fa:e9:f8:fb:c7:2e:72",
          + "fa:62:10:64:1b:77:eb:78:a5:ba:e0:86:ff:76:7e:97",
          + "fe:42:94:20:d0:a9:24:67:5f:de:78:c1:bb:8b:6c:92",
        ]
      + status               = (known after apply)
      + tags                 = [
          + "iac",
          + "prod",
        ]
      + urn                  = (known after apply)
      + user_data            = (sensitive value)
      + vcpus                = (known after apply)
      + volume_ids           = (known after apply)
      + vpc_uuid             = (known after apply)
    }

  # digitalocean_project_resources.connect_forest_project will be created
  + resource "digitalocean_project_resources" "connect_forest_project" {
      + id        = (known after apply)
      + project   = "da5e6601-7fd9-4d02-951e-390f7feb3411"
      + resources = (known after apply)
    }

Plan: 2 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + ip = [
      + (known after apply),
    ]

─────────────────────────────────────────────────────────────────────────────

Saved the plan to: /home/runner/work/forest-iac/forest-iac/tfplan

To perform exactly these actions, run the following command to apply:
    terraform apply "/home/runner/work/forest-iac/forest-iac/tfplan"

ruseinov

Looks good to me.
The structure seems nice and straightforward.
nit: I'm not convinced on live directory naming though, if those are terragrunt configs - just name it terragrunt instead?

GH if conditions are surely a pain to test, but I can see no way around it, as long as those have been tested.
Same goes for wget of deps, not a problem to re-run the job if it ever fails.

LesnyRumcajs · 2024-01-19T13:38:46Z

nit: I'm not convinced on live directory naming though, if those are terragrunt configs - just name it terragrunt instead?

This follows the structure outlined in Terraform Up and Running and the Terragrunt guide

ruseinov · 2024-01-19T13:40:41Z

nit: I'm not convinced on live directory naming though, if those are terragrunt configs - just name it terragrunt instead?

This follows the structure outlined in Terraform Up and Running and the Terragrunt guide

Let's keep it as is then. Still not a fan of the naming though :)

samuelarogbonlo

LGTM

LesnyRumcajs and others added 2 commits December 7, 2023 17:50

first blood

b9d56e8

[pre-commit.ci] auto fixes from pre-commit.com hooks

5ed4d32

for more information, see https://pre-commit.ci

LesnyRumcajs changed the title ~~first blood~~ restructure TF-managed services Dec 7, 2023

LesnyRumcajs added 3 commits December 8, 2023 12:05

dry env

95d3fb4

hclfmt

c96a716

prod fix

18e4640

This was referenced Dec 14, 2023

Dev snapshot service uses prod bucket #363

Closed

Every Terraform-managed service should be Dockerized #364

Closed

LesnyRumcajs and others added 6 commits December 15, 2023 14:30

get env from root

a5b321f

adapt daily snapshot

1b5d857

defragment snapshot main

596cb78

monitoring wip

6833137

[pre-commit.ci] auto fixes from pre-commit.com hooks

35026eb

for more information, see https://pre-commit.ci

remove unused

b08eefb

LesnyRumcajs force-pushed the terragrunt-iac branch from b280290 to b08eefb Compare January 2, 2024 18:56

LesnyRumcajs and others added 8 commits January 3, 2024 10:51

defragment sync check

7be2c6e

fix state keying

94d548b

mail alerts

c9f90a9

slack notifications

14a5d31

add synthetic monitoring snapshots age

f3ee148

[pre-commit.ci] auto fixes from pre-commit.com hooks

d68ee83

for more information, see https://pre-commit.ci

bolt new node version

fd8f1cd

log as metrics

3b3a395

joshdougall reviewed Jan 10, 2024

View reviewed changes

LesnyRumcajs and others added 2 commits January 11, 2024 17:41

cleanups, tinkering

8a9bb7d

[pre-commit.ci] auto fixes from pre-commit.com hooks

fd1273d

for more information, see https://pre-commit.ci

LesnyRumcajs force-pushed the terragrunt-iac branch from 4988256 to b389d93 Compare January 11, 2024 16:49

js lint

4c78c91

LesnyRumcajs force-pushed the terragrunt-iac branch from b389d93 to 4c78c91 Compare January 11, 2024 17:17

add live/ docs

6ab3476

LesnyRumcajs force-pushed the terragrunt-iac branch 3 times, most recently from 70c3bbe to ef63832 Compare January 18, 2024 17:15

cleanup log-based alerts

d963124

LesnyRumcajs force-pushed the terragrunt-iac branch 4 times, most recently from e31f8ed to 793b611 Compare January 18, 2024 18:51

tinker

f93ef18

LesnyRumcajs force-pushed the terragrunt-iac branch from 793b611 to f93ef18 Compare January 18, 2024 19:26

LesnyRumcajs mentioned this pull request Jan 19, 2024

Add snapshot monitor workflow and alerting #384

Closed

LesnyRumcajs added 3 commits January 19, 2024 11:40

self-review

4582f70

Merge branch 'main' into terragrunt-iac

12eb6e0

self-review 2

8495a11

LesnyRumcajs marked this pull request as ready for review January 19, 2024 10:54

LesnyRumcajs requested a review from a team as a code owner January 19, 2024 10:54

LesnyRumcajs requested review from hanabi1224, jdjaustin and lemmih and removed request for a team January 19, 2024 10:54

ruseinov approved these changes Jan 19, 2024

View reviewed changes

fix wildcards workflows

249a74b

samuelarogbonlo approved these changes Jan 22, 2024

View reviewed changes

lemmih approved these changes Jan 22, 2024

View reviewed changes

LesnyRumcajs merged commit e9aeae3 into main Jan 22, 2024
9 checks passed

LesnyRumcajs deleted the terragrunt-iac branch January 22, 2024 16:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

restructure TF-managed services #354

restructure TF-managed services #354

LesnyRumcajs commented Dec 7, 2023 •

edited

Loading

joshdougall Jan 10, 2024

LesnyRumcajs Jan 11, 2024

github-actions bot commented Jan 18, 2024 •

edited

Loading

github-actions bot commented Jan 18, 2024 •

edited

Loading

ruseinov left a comment

LesnyRumcajs commented Jan 19, 2024

ruseinov commented Jan 19, 2024

samuelarogbonlo left a comment

restructure TF-managed services #354

restructure TF-managed services #354

Conversation

LesnyRumcajs commented Dec 7, 2023 • edited Loading

joshdougall Jan 10, 2024

Choose a reason for hiding this comment

LesnyRumcajs Jan 11, 2024

Choose a reason for hiding this comment

github-actions bot commented Jan 18, 2024 • edited Loading

Forest: Snapshot Service Infrastructure Plan: success

github-actions bot commented Jan 18, 2024 • edited Loading

Forest: Sync Check Service Infrastructure Plan: success

ruseinov left a comment

Choose a reason for hiding this comment

LesnyRumcajs commented Jan 19, 2024

ruseinov commented Jan 19, 2024

samuelarogbonlo left a comment

Choose a reason for hiding this comment

LesnyRumcajs commented Dec 7, 2023 •

edited

Loading

github-actions bot commented Jan 18, 2024 •

edited

Loading

github-actions bot commented Jan 18, 2024 •

edited

Loading