EC2 Disk Utilization Alerts Using CloudWatch and SNS Notifications To Microsoft Teams

My company operates a few self-managed GitLab EC2 instances in AWS. One of the challenges was to figure out a way to monitor the health of those instances, in particular disk utilization.

Looking at CloudWatch metrics, the one and only metric that’s missing from EC2 instances is the disk utilization metric. The reason for this is because AWS has no way to obtain this information using EC2 metadata.

Our challenge was to figure out a way to obtain that metric and on top of that, setup alerting through Microsoft Teams as it is the tool of choice in most medium to large corporations and it removes the need setup email subscriptions on SNS although there are plenty of options when it comes to alerting via SNS.

Solution Architecture Diagram


  • AWS Account with permissions to create and edit IAM role, Lambda functions, SNS Topic, CloudWatch Alarm
  • Admin rights on Microsoft Teams to create channels and configure connectors
  • GitLab EC2 instance with access to public Internet

Setup IAM Role On EC2 Instance

EC2 instance already has an attached role

  1. From AWS IAM console, select “roles” on the left menu panel.
  2. Search for the role used by Gitlab EC2 instance and click on the name.
  3. Under “permissions policies” section, click on “Add permissions” button.
  4. Select “Attach policies”
  5. Search for “CloudWatchAgentServer” managed IAM policy.
  6. Check off the policy and click the attach button.

EC2 Instance does not use an IAM Role

  1. From AWS IAM console, select “roles” from the left menu panel
  2. Click on “create role” button.
  3. Select “AWS Service” as trusted entity type.
  4. Select “EC2” as use case and click next.
  5. Search for “CloudWatchAgentServer” in Add Permissions page and check off the IAM policy and hit next button.
  6. Fill in the name for the role and optionally the tags, them click “create role” button.

Setup IAM Role and Policy for Lambda Function

Setup Policy for Lambda Customer KMS Key

If you use a customer KMS key for Lambda functions, you will need to create an IAM policy to enable the role used by Lambda function for access.

Before we begin, make sure you go to AWS KMS console and retrieve your KMS Key’s ARN.

1. From AWS IAM console, select “policies” form the left menu panel.

2. Click on “create policy” button.

3. Select the “JSON” tab and paste the following code below. Be sure to enter your KMS key’s ARN as the value of the Resource field.

    "Version": "2012-10-17",
    "Statement": [
            "Sid": "LambdaKMSKeyAccess",
            "Effect": "Allow",
            "Action": [
            "Resource": "<your kms key ARN>"

4. Click on “Next: Tags” button to optionally add tags.

5. Click on “Next: Review” button and provide a name and description for the IAM Policy; As an example, I have named my IAM Policy “gitlab-alert-lambda-kms-access”.

6. Click “Create policy” button.

7. From AWS IAM console, select “roles” from the left menu panel

8. Click on “create role” button.

9. Select “AWS Service” as trusted entity type.

10. Select “EC2” as use case and click next.

11. Search for “gitlab-alert-lambda-kms-access” in Add Permissions page and check off the IAM policy.

12. Search for “AWSLambdaBasicExecutionRole” on the same page and check off the IAM policy.

13. Click the next button.

14. Fill in the name for the role and optionally the tags, them click “create role” button; As an example, I named my role “gitlab-alert-lambda”.

Setup CloudWatch Agent On EC2 Instance

Note: The following commands are based on Amazon Linux 2 AMI. It should also work for CentOS as well. For installation instructions for other operating systems visit AWS documentation.

  1. SSH into your EC2 instance.
  2. sudo yum update -y && sudo yum upgrade -y
  3. sudo yum install amazon-cloudwatch-agent -y
  4. With the editor of your choice, edit /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
  5. Paste the following JSON text and modify as needed
  "agent": {
    "metrics_collection_interval": 60,
    "run_as_user": "cwagent"
  "metrics": {
    "append_dimensions": {
        "InstanceId": "${aws:InstanceId}"
    "metrics_collected": {
      "disk": {
        "measurement": [
        "metrics_collection_interval": 60,
        "resources": [

Explanation of the JSON Properties

  • metrics_collected: metrics that CloudWatch agent will send to CloudWatch; This would be the disk used percentage specified by the code above.
  • resources: specifies the disks that CloudWatch agent should collect metrics on; This would be the primary volume. Since it is defined as an array, you may add additional volumes as necessary.
  • metric_collection_internal: The amount of seconds metrics will be collected by CloudWatch agent on the EC2 instance.

For more configuration parameters, see this AWS Documentation.

7. Save the configuration file and exit the editor.

8. Restart CloudWatch agent and check the logs by executing the following.

sudo systemctl restart amazon-cloudwatch-agent

tail -f /opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log

The metrics should become available in CloudWatch in the next few minutes.

Create SNS Topic

  1. Log into AWS Management Console and navigate to SNS.
  2. Select “Create SNS Topic” button.
  3. Create a standard SNS Topic named “gitlab-ee-alerts” and configure encryption as necessary.

4. Optionally, add email subscriptions by clicking into the new subscription, then click on “create subscription” button under “subscriptions” tab.

Setup CloudWatch Alarm

  1. Navigate to AWS CloudWatch console.
  2. From the left vertical menu panel, expand “Alarms” section and select “all alarms”.
  3. Select “create alarm” button.
  4. Click on “select metric” button.
  5. Under “Custom namespaces” section of Metrics, select “CWAgent”

6. Select “InstanceId, device, fstype, path” box; These are the types of dimensions for which metrics are reported.

7. Find the entry that matches your GitLab instance name and instance Id and check the checkbox next to it.

8. Click on “select metric button”.

9. Configure the alarm with your desired threshold configurations and click “next”.

For my scenario, I have the following configurations:

  • Statistic: Average
  • Period: 10 mins
  • Threshold Type: Static
  • Whenever disk_used_percent is: Greater/Equal than 90

10. In the onfigure actions step, we will create 2 notifications; The first notification is for “in alarm” state. While creating this notification, select “select an existing SNS topic “gitlab_ee_alerts”.

11. At the bottom of notifications section, click on “add notification” button.

12. In this new alarm, set the state trigger to “OK” and use the same configurations as the one shown above.

13. Scroll down to the bottom of the page and click “next”.

14. Set the name of the alarm as “gitlab-high-disk-utilization” and a meaningful description such as “Gitlab Server disk utilization has reached 90% and above” and click “next”

15. Review the contents and select “create alarm”.

Setup Microsoft Teams Channel

  1. In microsoft teams, select “teams” tab.
  2. In your group, click on the ellipsis (…) and select “add channel”.
  3. Create a channel named “Gitlab Alert” and click “add”.
  4. Mouse over “Gitlab Alert” channel and click on the ellipsis (…).
  5. Select “Connectors”.
  6. Search for “incoming webhook” and click the “configure” button next to it.

7. Upload an image for Gitlab and provide a name such as “Gitlab Alert” for the webhook and click on “create”.

8. Copy the generated URL and save this URL. This URL will be used in the Lambda Function section below.

Create Lambda Function To Call Microsoft Teams Webhook

1. Go to AWS Lambda under AWS Management Console.

2. Create a lambda function named “gitlab-ms-teams-alert”.

3. Expand “Change default execution role” section under “Permissions” heading.

4. Select “Use existing role” and choose the lambda role you’ve created (ie: “gitlab-alert-lambda” role).

5. Copy and paste the following code into the code section.

import json
import logging
import os
import re

from urllib.request import Request, urlopen
from urllib.error import URLError, HTTPError

HOOK_URL = os.environ['HookUrl']

logger = logging.getLogger()

def lambda_handler(event, context):"Event: " + str(event))
    message = json.loads(event['Records'][0]['Sns']['Message'])"Message: " + str(message))

    metric_dimensions = message['Trigger']['Dimensions']
    instance_id_result = [item['value'] for item in metric_dimensions \
        if item['name'] == 'InstanceId']
    instance_id = f"{instance_id_result[0]}" if instance_id_result != [] else ""
    alarm_name = message['AlarmName']
    old_state = message['OldStateValue']
    new_state = message['NewStateValue']
    reason = message['NewStateReason']
    alarm_description = message['AlarmDescription']
    # Format message
    is_missing_datapoints = re.findall('no datapoints were received', reason)
    is_alarm_in_ok_state = new_state == "OK"
    if is_missing_datapoints:
        output_message = f"CloudWatch metrics is unable to obtain " \
        f"datapoints for Gitlab server. Ensure "\
        f"CloudWatch Agent is "\
        f"running using the command `sudo tail -f /opt/aws/" \
        f"amazon-cloudwatch-agent/logs/" \
        f"amazon-cloudwatch-agent.log` on the instance. " \
        f"Restart the service by running " \
        f"`sudo systemctl restart amazon-cloudwatch-agent` if necessary."
    elif is_alarm_in_ok_state:
        output_message = f"GitLab Instance is fully " \
        output_message = f"{alarm_description}!"
    data = {
        "colour": "64a837",
        "title": f"Resolved: {alarm_name}",
        "text": output_message
    if new_state.lower() == 'alarm':
        data = {
            "colour": "d63333",
            "title": f"Alert: {alarm_name}",
            "text": output_message
    message = {
      "@context": "",
      "@type": "MessageCard",
      "themeColor": data["colour"],
      "title": data["title"],
      "text": data["text"]

    req = Request(HOOK_URL, json.dumps(message).encode('utf-8'))
        response = urlopen(req)"Message posted")
        return { "status": "200 OK"}
    except HTTPError as e:
        logger.error("Request failed: %d %s", e.code, e.reason)
    except URLError as e:
        logger.error("Server connection failed: %s", e.reason)

6. Select “Configuration” tab, then select “Environment Variables” from the side menu.

7. Click on the “edit” button and enter “HookUrl” as key and the WebHook URL as the value and click “save” button.

8. From the side menu, select “Triggers”.

9. Click on “Add trigger” button.

10. Select “SNS” trigger.

11. Enter “gitlab-ee-alerts” as the SNS Topic name and click “add” button.


CloudWatch Alarm is not triggering SNS Topic to send alerts to Microsoft Teams Channel due to unauthorized access to SNS Topic encryption key.

“Failed to execute action arn:sns:<region>:<account number>:gitlab-ee-alerts. Received error: “CloudWatch Alarms does not have authorization to access the SNS topic encryption key.”

This error indicates that the SNS Topic you’ve configured uses a custom KMS key which CloudWatch Service does not have access.

To provide access, go to AWS KMS console, select “customer managed keys” on the left menu panel, select the SNS topic key and add the following to the resource policy:

    "Sid": "Allow_CloudWatch_for_CMK",
    "Effect": "Allow",
    "Principal": {
    "Action": [
    "Resource": "*"

Credits And Reference

1. Monitoring EC2 disk usage with CloudWatch Metrics and Alarms – Marbot

2. Why I didn’t receive SNS notification for my CloudWatch alarm trigger – AWS SNS Knowledge Center


Docker Image Pushed To AWS ECR Does Not Behave As Expected

While working on a continous integration continous delivery (CICD) pipeline to deploy an application, I’ve come across a weird problem. The image the GitLab runner pushed to AWS Elastic Container Repository (ECR) worked during the build stage, however, once the deployment’s image URL was set to the image repository and tag, it would work some of the time. Other times it would cause an Error or a CrashLoopBackOff due to missing dependency to which is impossible since my Dockerfile specifically includes a step to install the dependencies.

Background Information

This section is aimed at providing some background information to those who are not familiar with Docker, Kubernetes, and AWS Elastic Container Registry as the topics covered require some indepth knowledge.

Docker: A tool that allows you to package applications into an ‘image’ that can run ontop of a container runtime and run reliably across different machine and operating system configurations. This tool is revolutionary because containers do not require a full blown operating system to run the applications versus traditional virtual machines which makes it very lightweight and cost effective to run applications on a single machine.

Kubernetes: A container orchestration tool that allows a single machine to run thousands of applications running on a container.

AWS Elastic Container Registry (ECR): A repository managed by AWS that allows you to store your private Docker images. Think of this as the image equivalent version of GitHub.

Kubernetes Pod: A compute unit representing an application containing one or more containers.

Kubernetes Pod Error State: A Pod enters error state when it tries to start up, but the application exits unexpected due to an exception.

Kubernetes Pod CrashLoopBackoff State: When a Kubernetes Pod fails to start up, the cluster will attempt to restart the Pod as a self-healing process. If the crash occurs too many times, the cluster will break the infinite crash loop by transitioning the state of the Pod into CrashLoopBackOff state.

How Did It Happen?

  • I was working on deployments to non-dev environments which uses a git tag specifying the version (example: v1.00).
  • To refine the CICD pipeline and docker build process, I had to update the same git tag over and over again by forcing the tag to point to a new commit
  • The repository in AWS ECR does not enforce tag immutability to allow the same tag to be pushed
  • The Pod would start up, but an error would be emitted in the logs stating that Python “Sanic” was missing
  • In my Dockerfile, there was a specific step that would copy all the files over to an “app” folder that I’ve created in the earlier steps and run pip3 install -r requirements.txt.
  • Looking at the CICD pipeline output on the docker build step, I see that sanic has been installed successfully
  • I tried troubleshooting this issue by pulling the image straight from ECR into my Macbook Pro as soon as docker build step had completed. Running the image on my Mac, when I execute pip3 list, I can see sanic is present.
  • After deploying the application the Kubernetes, which involves setting the image in the cluster to the image repository and tag in ECR, pulling the same tag now results in all dependencies missing

What Happened?

As the CICD continously pushed updates using the same tag to AWS ECR, the tag wasn’t being applied to the new image.

This is because each image “committed” to AWS ECR must be unique in order for ECR to differentiate the particular “commit” that Docker/Kubernetes should download. From what I’ve learned from James Walker, Docker image tags need to be immutable because once an image has been published, it is assumed that there maybe downstream consumers that may utilize this image and allowing another image to replace an existing one may lead to unpreditable behavior. To get around this problem, AWS ECR does creates an untagged commit, providing a unique SHA to differentiate it from the rest of the commits.

The Solution

The solution is to use unique Docker image tags. You might be wondering how this can be done given my scenario of working on developing a CICD pipeline for a particular version. The solution I used was to use a timestamp as the build number of the same release.

A tag name must be valid ASCII and may contain lowercase and uppercase letters, digits, underscores, periods and dashes. A tag name may not start with a period or a dash and may contain a maximum of 128 characters.

Docker Documentation

Unfortunately Docker image does not allow the use of the plus symbol which is recommended by semver standard; I chose to use of underscore to denote build numbers instead.

This is how I now tag Docker images: v<semver version number>_<YYYYMMDDHHmm>

Example: v1.2.1_202203232307

This translates to v1.2.1 (2022 March 23 23:07).

What I’ve Learned From This Incident

The purpose of a Git Tag is to give a particular commit a name to denote a specific milestone in your application development. Thus it is a bad idea to constantly change the commit that the name is referencing. This is also the reason that the Git CLI requires you to use the -f (force) option to trigger this change; otherwise it would throw an error.

From the docker prespective, it is a bad practice to push continously using the same tag to any image repository as it defeats immutability which can cause unpredictable behavior; for example, the issue I was having.

If you’ve read this far, you’re probably encountering this very issue right now. I hope this article has helped you resolve what that took me and my colleagues 2 days to resolve. Best of luck.

A Change In Direction. Again.

Back in November of 2021, I was given an option to join a special project team at my company. The title would be Cloud Operations Engineer. This was a step up from my previous role, Cloud Operations Analyst. With a bigger title comes bigger pay. The offer was simply too good to refuse. Once again, I’m back in the Cloud and DevOps role.

What’s interesting about this change is this role forces me to accelerate my learning from knowing only the basics of Kubernetes to becoming an intermediate engineer. I’m thrusted into learning about Helm for deployments, create advanced Terraform scripts, troubleshoot Kubernetes issues, working directly with Gitlab runners, and working much more closely with Docker and AWS ECR.

One of the tools I’ve built recently was for a security architect at my company who wanted to be able to change a specific source IP address to a new IP address across all security groups. I wanted to build a great user experience even for a simple tool with that purpose. Having great experience working with Terraform CLI and Kubectl, I took inspiration to the way those tools worked in my design process. The tool was delivered to the security architect 3 weeks ago to which he was very happy knowing that it would save him time. As part of my ethos to build great long lasting tools, this tool includes a built-in version checker that will notify the user when a new version is available, paving the way for possible future updates from bug fixes to addition of new features.

While it took 2 weeks including weekends to complete this tool, it was a wonderful feeling to have learned new things about creating a CLI tool using python3, building on top of my previous work, putting a smile on someone’s face, and knowing that I’ve built a tool that worked so well and efficent that I personally would use.

Jest Unit Testing: Mocking an ES Module within a test file

What is Mocking?

Mocking in unit testing is swapping out the functionality of something and implementing custom logic to return a predictable response. This is useful when running unit tests to assert that the result matches your expectations after passing data into a function especially if the function performs some database or API calls where data retrieved may change over time.

Mocking User Defined ES Module Within Your Test File

Here’s a sample module that I want to mock in my unit test for demonstration purposes


import axios from 'axios';

export const namedImport = async(message: string): Promise<string> => await axios.get('' { params: { message }});

export default MyDefaultImport = (a: number, b: number): number => a + b;

In the test file, we want to use jest.mock() to mock out our module. Let’s see it in action.


import MyDefaultImport, { namedImport } from '../src/myModule';

jest.mock('../src/myModule', () => ({
  __esModule: true,
  namedImport: async (message: string): Promise<string> => new Promise((resolve) => resolve('namedImport response')),
	default: (a: number, b: number): number => 5,

describe('Test Mock', () => {
  it('namedImport function returns "namedImport response"', async () => {
    const result = await namedImport('hi');
  it('MyDefaultImport returns 5', () => {
    const result = MyDefaultImport(5,5);

In this example, I’ve switched out the logic of namedImport to always return ‘namedImport response’ and for MyDefaultImport function to always return the number 5.

The premise of this example is to show how to mock named and default exports.

It is important to note that __esModule: true in the object returned by the callback function passed to jest.mock() is vital to signal that an ES module is being mocked. Without it, default exports will fail. Node will complain that MyDefaultImport is not a function (it will be undefined).

I’ve also provided a mock example of an asynchronous function as well as a synchronous function and how the syntax differs. In the asynchronous function, I am only returning resolve() because I am only concerned about the happy path in this example; In your code, feel free to add a reject parameter and logic for that scenario; And if you have no idea what I’m talking about, here’s Mozilla’s explanation of JavaScript promise.

ES Module Partial Mocking

I’ve shown how an entire module can be mocked above; But what if I only want namedImport to be mocked above and for MyDefaultImport to retain its original functionality?

We use ...jest.requireActual('<path to the module file>') within the object returned by the callback function passed to jest.mock()


import MyDefaultImport, { namedImport } from '../src/myModule';

jest.mock('../src/myModule', () => ({
  __esModule: true,
  namedImport: async (message: string): Promise<string> => new Promise((resolve) => resolve('namedImport response')),

describe('Test Mock', () => {
  it('namedImport function returns "namedImport response"', async () => {
    const result = await namedImport('hi');

How to Spy on a mocked import?

In addition to mocking the implementation of a function, sometimes it is desirable to assert the arguments we’ve passed into a the function as well as events acted upon the function such as the number of times it was called. To do this, we use a spy.

As an example, we will use our namedImport async function to demonstrate how to apply a spy to this function.

The key is to pass our function into jest.fn().

jest.mock('../src/myModule', () => ({
  __esModule: true,
  namedImport: jest.fn(async (message: string): Promise<string> => new Promise((resolve) => resolve('namedImport response'))),

An alternative to this method would be to create our spy first jest.fn() then add mockImplementation() method on the chain and pass in our mocked implementation.

jest.fn().mockImplementation(async (message: string): Promise => new Promise((resolve) => resolve('namedImport response')))

Beware of Gotchas

Here are some gotchas that I’ve encountered while learning this that I hope will you advance quicker.

1. Ensure the mock function’s signature matches the actual function’s signature.

It may seem obvious , but I’ve been stuck for a few hours because I missed a tiny detail in the function signature. Changing this will change the nature of the function. It’s like calling your dog Reggy Remy. Node will simply throw an error saying the the thing you’re calling is undefined and rightly so.

2. jest.mock() must be called outside your tests!

Jest library is utilized in the global namespace and is injected in the test environment when you run npm test. When you execute this within your describe() / test() / it() functions, the scope level changes and you will no longer have access to the jest object.

3. Keep your imports in your test file at the top.

Jest will hoist jest.mock() to the beginning of your test file. As a result, there’s no difference between putting it before or after your module imports.

A Change in Direction

My job title has recently changed from Cloud Operations Analyst to Systems Developer. For the folks who are worried that I will no longer make posts regarding AWS, Linux, Terraform and all the other DevOps related topics, fear not, my quest in learning is always on going.

To assure you that my resolve in learning DevOps and Cloud technologies has not soften, I had obtained my AWS Cloud Practitioner Certification just two weeks ago! While the certification is entry level, it is only the beginning for many more certifications to come.

It does, however, shift my priorities to the needs of my new team. One of such tasks is to write unit tests for a complex react application that uses redux, TypeScript, and Auth0 to name a few.

The upcoming series will be based on my learnings from picking up ReactJS once again to my humble beginnings in diving into TypeScript and working with Jest and React Testing Library in my quest to be the team’s test junkie.

Automating Installation of Nginx on EC2 Instances

A recent task at work prompted me to automate the installation of nginx on EC2 instances. This process needs to be automated to reduce man power overhead, but most of all, the need for auto scaling groups to provision fresh EC2 instances with nginx installed and configured on the fly.


  • 1x Security Group
  • 1x EC2 Launch template
  • 1x AWS Autoscaling Group Target
  • 1x AWS Autoscaling Group
  • 1x S3 bucket
  • 1x SSL Certificate for your domain name.


The strategy to getting this solution to work is to setup the instrumental components of the solution, then assemble them together.

Create Security Groups

  1. Log into AWS EC2 console and search for EC2 service.
  2. On the left menu panel, scroll down to “Network and Security” section and select “Security Groups”.
  3. Create a new security group; Provide a descriptive name such as “http-only access”.
  4. Select the inbound tab and add a 2 rules using the HTTP preset.
  5. Set the first rule to source for IPv4 addresses.
  6. Set the second rule to source ::/0 for IPv6 addresses.
  7. Save the settings and move select “outbound” tab.
  8. Add the same rules as step 5 and 6 and save.
  9. Go back to inbound tab and add another rule. Take note of the security group Id (ie: sg-06284b5e0dc13bef5).
  10. Add another http preset rule but set the source to the security group Id.
  11. Repeat step 10 in outbound tab with the same security group Id.
  12. Repeat steps 1-11, replacing “HTTP” with “HTTPS”.

Customize Nginx configuration file

  1. Download nginx.conf file.
  2. Open it with a text editor (ie: Visual Studio Code, notepad)
  3. Search for and modify the following line:
rewrite ^/$ redirect;

4. Change the text between ^ and $ symbols with the path which you wish redirect to occur. If you use the example above with just a single forward slash, this would default to an immediate redirect when visiting your domain name (ie:

Here’s a live example:

rewrite ^/my-special-page.html$ redirect;

The example above will trigger a redirect on https://<your domain name>/my-special-page.html to

5. Change “” to the url which you wish to forward your users to.

6. Change the word “redirect” to “permanent” if you wish to signal a permanent redirect. For temporary redirect, leave it at “redirect”.

Create S3 Bucket

  1. From the services search bar at the top in AWS Console, search for S3.
  2. Create an S3 bucket for this type of automation (example: example-automation-bucket)
  3. Copy nginx.conf into the s3 bucket.
  4. Take note of the s3 path of where nginx.conf file is stored.

Create IAM Role and Policy for EC2 Instance

  1. From the services search bar at the top in AWS Console, search for IAM.
  2. From the left menu bar, expand “Access management” section and click on “Policies”.
  3. Click on “Create policy” button.
  4. Copy and paste the following policy. Replace the text wrapped around with angle brackets with the actual values.
    "Version": "2012-10-17",
    "Statement": [
            "Sid": "AllowS3BucketAccess",
            "Effect": "Allow",
            "Action": [
            "Resource": [
            "Sid": "ListAllMyS3Buckets",
            "Effect": "Allow",
            "Action": "s3:ListAllMyBuckets",
            "Resource": "*"

5. Save the policy and provide a name for the policy (ie: ec2-nginx-redirector-s3-access).

5. From the left menu bar, expand “Access management” section and click on “Roles”.

6. Click on “Create role” button.

7. For trusted entity type select “AWS Service” and choose “EC2” as the service.

8. When selecting permission policies, search for the policy created in steps 4 and 5.

9. Provide a name ofr the role (ie: ec2-nginx-redirector) and save.

Create Launch Template

  1. From the services search bar at the top in AWS Console, search for EC2.
  2. On the left menu panel scroll down “instances” section and select “Launch Templates”.
  3. Select “Create Launch Template” orange button.
  4. Provide a name for the launch template.
  5. For AMI, choose “Amazon Linux 2 HVM SSD (x86)”.
  6. In the Instance Type selection section, choose one of t2.micro, t3.micro, or t4g.micro**.

**: t4g.micro is a new instance type being promoted by AWS at the time of writing. The promotion is for 750 hours of free usage until June 30th, 2021 (00:00 UTC time). Use of this instance will incur charges outside of the 750 hours or if the promotion has expired, whichever comes first.

7. In the “Network settings” section, click on “security groups” dropdown menu and select the security group you’ve created above (ie: http-only-access).

8. Expand the “Advance Details” section.

9. Under “IAM instance profile”, select the IAM role you’ve created above (ie: ec2-nginx-redirector).

10. Scroll down to “userdata” textfield.

11. Copy and paste the following code:

#! /bin/bash
sudo amazon-linux-extras enable nginx1
sudo yum clean metadata && sudo yum install -y nginx1
aws s3 cp s3://<bucket name>/nginx.conf /etc/nginx/nginx.conf
sudo systemctl enable nginx
sudo systemctl start nginx

12. Change “<bucket name>” to the actual name of the bucket you’ve created to store nginx.conf.

13. Save the launch template.

Create Auto Scaling Group

  1. Still within AWS Console EC2 service, select “Auto Scaling Groups” under “Auto Scaling” section.
  2. Click “Create Auto Scaling group”.
  3. Provide a name for the Auto Scaling Group (ie: nginx-redirectors).
  4. In the “Launch Template” section, select the launch template you’ve created above and click “next”.
  5. Under “Network” section, select the appropriate VPC (or use the default provided VPC if you have not configured any).
  6. Select all subnets from the subnets drop down box.
  7. In the advanced options option page, leave everything to their default values and click “next”.
  8. In the “configure group size and scaling policies” page, set the desired, minimum, and maximum capacity. I would recommend maximum capacity to be at least 2 or more; Hit next once you’re done.
  9. Click “Next” in the “Add notifications page”.
  10. Add any tags you desire and click next; This is an optional step.

Create Auto Scaling target Group

  1. Click into the name of the new auto scaling group you’ve just created.
  2. Click on “Actions” button and select “Launch instance from template”.
  3. From the left menu bar within AWS Console EC2 service, select “Target Groups”.
  4. Select “Create Target Groups”.
  5. Provide a target group name.
  6. Ensure protocol selected is “HTTP” and port number is “80”.
  7. Select the VPC used for Auto Scaling Group.
  8. Leave Protocol version in HTTP1.
  9. For health checks, leave protocol in HTTP.
  10. Set the health check path to the redirect path you’ve specified earlier in nginx.conf.
  11. Add optional tags.
  12. Click Next.
  13. Under available instances, check off the instances that were provisioned by the Auto Scaling Group.
  14. Click on “Include as pending below” button and hit “Create target group” button.

Create Application Load Balancer

  1. Still within AWS Console EC2 service, select “Load Balancers” under “Load Balancing” section.
  2. Click on “Create Load Balancer” button.
  3. Click on the “create” button under Application Load Balancer.
  4. Provide a name for the balancer (ie: nginx-redirector).
  5. Under listeners, select “HTTPS” as the listener protocol.
  6. Under “Availability Zones”, select the VPC and subnets that was selected in the Auto Scaling Group.
  7. Click “Next” to configure security settings.
  8. For certificate type, select “Upload a certificate to ACM”.
  9. In the private key, paste the private key you’ve recieved from your DNS vendor.
  10. In the certificate body, paste in the server certificate.
  11. In the certificate chain, paste in the bottom certificate block followed by the top certificate block. This will allow the certificate to conform with the format specified by AWS Amazon Certificate Manager (ACM).
  12. For security policy, select “ELBSecurityPolicy-FS-1-2-Res-2020-10; This is the latest version at the time writing.
  13. Click “Next” to configure security groups.
  14. Select the HTTPs security group you’ve configured above and click “next”.
  15. Configure routing by selecting “Existing target group” from the “target group” dropdown menu and point it to the new target group you’ve created above.
  16. Review and deploy the load balancer; It may take a few minutes to provision the balancer.
  17. Take note of the CNAME of the

At this point, once the load balancer has been provisioned, a DNS name will be created for the load balancer which you can visit via the browser to have the request redirected to the EC2 instances provisioned by the auto scaling group with redirect.

Create Route 53 Entry

  1. Create a hosted Zone for your website.
  2. Create an Alias record pointing to the CNAME of the load balancer above.

Bash Shell Script Explanation

  1. Enables the nginx package from amazon-linux-extras package manager designed by Amazon.
  2. Clear metadata cache and install nginx.
  3. Copy the nginx configuration file from your S3 bucket to nginx’s configuration folder.
  4. Enable Nginx service such that it will auto start the next start the instance boots up.
  5. Start Nginx service.


Creating Nested Conditional Dynamic Terraform Blocks

While working on an assignment to update the AWS Cognito User Pool for the team that I’ve built using Terraform, I faced a problem where the software development team is working on revising custom attributes. They gave me a heads up that additional custom attributes maybe added at a later date.

When I initially created the user pool, I made the mistake of hard coding the schema with the expectation that the schema was set in stone. Instead of continuing to hard code the attributes, which would be cumbersome to maintain in the future and make my Terraform code longer than necessary, I’ve taken on the challenge to refactor the code.

My weapon of choice? Dynamic Blocks.

Dynamic Blocks are great because it keeps Terraform code DRY (Do not Repeat Yourself). You provide a list of data and dynamic blocks will generate the type of blocks you define.

Foundations of Dynamic Blocks

Dynamic blocks have 3 distinctive components:

  • The type that specifies the type of block you wish to generate dynamically
  • for_each meta-argument which allows you to reference a variable that contains a list of elements as data used for each block dynamically generated
  • Content block that specifies the contents of the dynamic block

To demonstrate the use of a dynamic block, I will use the example of defining recovery mechanisms for AWS Cognito user pool

resource "aws_cognito_user_pool" "main" {
  account_recovery_setting {
    recovery_mechanism {
      name     = "verified_email"
      priority = 1

Here’s an example of the same recovery_mechanism block written using dynamic block

resource "aws_cognito_user_pool" "main" {
//... other user pool settings omitted for brevity

  account_recovery_setting {
    dynamic "recovery_mechanism" {
      for_each = var.user_pool_account_recovery_mechanisms
      content {
        name     = recovery_mechanism.value["name"]
        priority = recovery_mechanism.value["priority"]

The keyword “dynamic” indicates that the block being defined is a dynamic block. “recovery_mechanism” is the type of block that needs to be made dynamic and it is a type of block that is defined under aws_cognito_user_pool resource. The for_each meta-argument allows me to specify the variable that contains a list of recovery mechanisms:

    name     = "verified_email"
    priority = 1

The content block allows me to specify where to obtain the value for recovery mechanism’s blocks name and priority parameters. Notice in the content block, when referring to each recovery mechanism, I used “recover_mechanism.value” followed by square brackets and the name of the key as the string within to reference the value. This is how dynamic blocks refer to the item being iterated over. You must use the dynamic block type as a reference point to access those values.

Nested Dynamic Blocks

Now that we know how dynamic blocks work, how do we define nested dynamic blocks? Under what circumstance should nested dynamic blocks be used? And How can we make nested dynamic blocks conditional?

When Should Nested Dynamic Blocks Be Used?

Nested dynamic blocks should be used when the block you define that repeats contains a child block.

Going back to my original goal, I have a list of custom user pool attributes that need to be passed to aws_cognito_user_pool resource. This is a great use case for nested dynamic blocks. Each schema block defines a custom attribute and within that custom attribute, the string_attribute_constraints block may be defined.

resource aws_cognito_user_pool {
    schema {
      name                = "my-custom-attribute"
      attribute_data_type = "String"
      is_required            = false
      is_mutable             = true

      string_attribute_constraints = [
          min_length = 4
          max_length = 256

Defining Nested Dynamic Blocks

It comes with no surprise that nested dynamic blocks has the same core components as regular dynamic blocks so the way it works is very similar. The trick however, is figuring out how to structure the data to take advantage of nested dynamic blocks.

The solution is to use a map:

    name                = "my-custom-attribute"
    attribute_data_type = "String"
    is_required            = false
    is_mutable             = true

    string_attribute_constraints = [
        min_length = 4
        max_length = 256

In our Terraform codebase, this is what my code looks like:

resource "aws_cognito_user_pool" {
  dynamic "schema" {
    for_each = var.user_pool_custom_attributes
    content {
      name                = schema.value["name"]
      attribute_data_type = schema.value["attribute_data_type"]
      mutable             = schema.value["is_mutable"]
      required            = schema.value["is_required"]

      dynamic "string_attribute_constraints" {
        for_each = lookup(schema.value, "string_attribute_constraints", [])
        content {
          min_length = string_attribute_constraints.value["min_length"]
          max_length = string_attribute_constraints.value["max_length"]

In the inner dynamic block, I’m defining “string_attribute_constraints” block as dynamic. Notice the for_each attribute utilizes the terraform function lookup. That function will check for “string_attribute_constraints” sub attribute within the map. This is how nested dynamic blocks can obtain its data.

for_each will pull in a list of maps that you can iterate over in which you can access by calling “string_attribute_constraints.value” and access the values by providing the name of the key in squared brackets. Dynamic blocks allow you reference each item it is iterating over when you use the type of the dynamic block as shown above.

Making Nested Dynamic Blocks Optional

We now have cognito user pool custom attributes generated as schema blocks with “string_attribute_constraint” blocks generated in a consistent manner; The only issue is for each custom attribute we define right now, the “string_attribute_constraint” block must also be generated. This is problematic because a custom attribute’s data type can be a string or a number.

To make string_attribute_constraint block dynamic, we lean on the fact that for_each meta-argument will instruct Terraform to iterate through a list or map. All we have to do is create a list with a single map for the nested map value. For custom attributes that do not require string_attributes_constraints block, we simply do not define this attribute since lookup function will automatically default to the value we provide, which in this case, is an empty list if the the attribute does not exist; This will stop our dynamic block from being generated.


Pfeifer studio wood nesting boxes – Jeri’s Organizing and Decluttering News Blog

env command

env command enables printing and setting environment variables. Executing “env” will print all of the environment variables.

Alternatively, a command to print environment variables is “printenv”

Automatic IAM User Access Key Rotation

The company I work at uses a hybrid cloud setup. While modern applications are able to utilize HashiCorp Vault to provision temporary access, applications that continue to run on legacy on premises servers are unable to do so. In these scenarios, an IAM user needs to be provisioned in order to generate AWS Access Key and Secret Access Key to grant programmatic access.

Following best security practices, these keys need to be rotated from time to time. Having many of these access keys generated would increase the team’s operational workload.

Here’s how I automated this process.

AWS Services Used

  • SNS Topic
  • Secrets Manager
  • EventBridge (Formerly known as CloudWatch Events)
  • AWS Lambda
  • Identity Access Management


It’s important for us to get a clear picture of what we’re trying to achieve and how all of these AWS services are connected to one another. Here’s how it works

  • An IAM User is created with Access Key generated
  • 3 Event Bridge rules are used to trigger a lambda function based on number of days elapsed.
    • Every 90 days – Create new access keys
    • Every 104 days – Deactivate the old access key
    • Every 128 days – Delete the old access key
  • Lambda function receives 1) IAM Username 2) The action to perform and performs the 3 actions above accordingly on IAM User.
  • Secrets Manager stores the new access key and holds records of previous access keys
Create Access Key
Deactivate Access Key
Delete Access Key


  1. Create the IAM user and generate AWS Access Key.
Add user with “Programmatic access” type selected
Access Key Generated
  • For all IAM user and roles created, skip attaching or creating IAM policies. This will be revisited once we have created all necessary resources.

2. Create IAM Role for our Lambda function.

  • When creating this role, set the trusted entity type to be AWS service and the usecase to be “Lambda”.

3. Create Secrets Manager Secret with the secret name matching the name of the IAM username you intended.

Add new secret
  • Select “Other type of secrets” option as secret type
  • For secret key names, use “AccessKey” for IAM Access Key and “SecretKey” for IAM secret access key
  • Keep key rotation disabled

4. Create Lambda Function that will process IAM key rotation requests.

Use the following code for the lambda function. Be sure to modify parts of the code that have been surrounded by ankle brackets.

import json
import boto3
import base64
import datetime
import os
from datetime import date
from botocore.exceptions import ClientError
AWS_REGION_NAME = '<your region here>'
SNS_TOPIC_ARN = '<your sns topic arn>'
ACCESS_KEY_SECRET_NAME = '<iam username>'
iam = boto3.client('iam')
secretmanager = boto3.client('secretsmanager')
sns = boto3.client('sns', region_name=<AWS_REGION_NAME)

def create_key(iam_username):
    '''Generates a new access key on behalf of the user and stores the new
    access key in secrets manager. Then, send a notification email to users to
    notify them to rotate the key for their applications. It returns
    a JSON with status 200 if successful and 500 if error occurs.

    iam_username - The iam user's username as a string.

        response = iam.create_access_key(UserName=iam_username)
        access_key = response['AccessKey']['AccessKeyId']
        secret_key = response['AccessKey']['SecretAccessKey']
        json_data = json.dumps(
            {'AccessKey': access_key, 'SecretKey': secret_key})
            SecretId=iam_username, SecretString=json_data)

        iam_user_details = get_iam_user_details(iam_username)

        emailmsg = 'Hello,\n\n' \
            'A new access key has been created for key rotation. \n\n' \
            f'Access Key Id: {access_key}\n' \
            f'Secrets Manager Secret Id: {iam_username}'

        emailmsg = f'{emailmsg}\n\n' \
            f'Please obtain the new access key information from ' \
            'secrets manager using the secret Id provided above in ' \
            f'{AWS_REGION_NAME} and update your application within 14 days ' \
            'to avoid interruption.\n'

        sns.publish(TopicArn=SNS_TOPIC_ARN, Message=emailmsg,
                    Subject=f'AWS Access Key Rotation: New key is available for '
        print(f'New access key has been created for {iam_username}')
        return {'status': 200}
    except ClientError as e:
        return {"status": 500}

def deactive_key(iam_username):
    '''Finds the secret that stores the user's previous access key
    and mark it as inactive. Then, send a notification email to users to remind
    them to rotate the key for their applications. It returns
    a JSON with status 200 if successful and 500 if error occurs.

    iam_username - The iam user's username as a string.

        previous_secret_value = secretmanager.get_secret_value(
            SecretId=iam_username, VersionStage='AWSPREVIOUS')
        previous_secret_data = json.loads(
        previous_access_key = previous_secret_data['AccessKey']

        iam_user_details = get_iam_user_details(iam_username)

            f'deactivating access key {previous_access_key} '
            f'for IAM user {iam_username}')

                              Status='Inactive', UserName=iam_username)

        emailmsg = f'Hello,\n\n' \
            f'The previous access key {previous_access_key}'

        emailmsg = f'{emailmsg} has been disabled for {iam_username}.\n\n' \
            f'This key will be deleted in the next 14 days. ' \
            f'If your application has lost access, be sure to update the ' \
            f'access key.\n You can find the new key by looking up the secret ' \
            f'"{iam_username}" under secrets manager via AWS Console ' \
            f'in {AWS_REGION_NAME}.\n\n'
            TopicArn=SNS_TOPIC_ARN, Message=emailmsg,
            Subject='AWS Access Key Rotation: Previous key deactivated for '
        print('Access Key has been deacivated')
        return {'status': 200}
    except ClientError as e:
        return {'status': 500}

def delete_key(iam_username):
    '''Deletes the deactivated access key in the given iam user. Returns
    a JSON with status 200 if successful, 500 for error and 400 for
    if secrets don't match

    iam_username - The iam user's username as a string.
        previous_secret_value = secretmanager.get_secret_value(
            SecretId=iam_username, VersionStage='AWSPREVIOUS')
        previous_secret_string = json.loads(
        previous_access_key_id = previous_secret_string['AccessKey']
        pprint(f'previous_access_key_id: {previous_access_key_id}')
        keylist = iam.list_access_keys(UserName=iam_username)[

        for key in keylist:
            key_status = key['Status']
            key_id = key['AccessKeyId']

            print(f'key id: {key_id}')
            print(f'key status: {key_status}')

            if key_status == "Inactive":
                if previous_access_key_id == key_id:
                    print('Deleting previous access key from IAM user')
                        UserName=iam_username, AccessKeyId=key_id)
                    print(f'Previous access key: '
                          f'{key_id} has been deleted for user '
                          f' {iam_username}.')
                    return {'status': 200}
                        'secret manager previous value doesn\'t match with '
                        'inactive IAM key value')
                    return {'status': 400}
                print('previous key is still active')
        return {'status': 200}
    except ClientError as e:
        return {'status': 500}

def lambda_handler(event, context):
    action = event["action"]
    iam_username = event["username"]
    status = {'status': 500}

    print(f'Detected Action: {action}')
    print(f'Detected IAM username: {iam_username}')

    if action == "create":
        status = create_key(iam_username)
    elif action == "deactivate":
        status = deactive_key(iam_username)
    elif action == "delete":
        status = delete_key(iam_username)

    return status

5. Create Event bridge rule to trigger creating access key.

  • Select “default” event bus
  • Define the pattern to use “Schedule”
    • Set Fixed Rate to every 90 days
  • Set the target as “Lambda function”
    • Set the Function to the lambda function name in step 4
    • Set “Configure input” setting to “Constant (JSON text)”
      • Set value to { “action”: “create”, “username”: “<the iam username in step 1>”
  • Add tags as necessary

6. Create Event bridge rule to trigger deactivating access key.

  • Select “default” event busDefine the pattern to use “Schedule”
  • Set Fixed Rate to every 104 daysSet the target as “Lambda function”
  • Set the Function to the lambda function name in step 4
    • Set “Configure input” setting to “Constant (JSON text)”
      • Set value to { “action”: “deactivate”, “username”: “<the iam username in step 1>”

7. Create Event bridge rule to trigger deleting deactivated access key.

  • Select “default” event busDefine the pattern to use “Schedule”
  • Set Fixed Rate to every 118 daysSet the target as “Lambda function”
  • Set the Function to the lambda function name in step 4
  • Set the Function to the lambda function name in step 4
    • Set “Configure input” setting to “Constant (JSON text)”
      • Set value to { “action”: “delete”, “username”: “<the iam username in step 1>”

8. Create IAM Policy to enabled our lambda function to 1) Access Secrets Manager Secret 3) Access IAM service to manage user access key.

    "Statement": [
        "Effect": "Allow"
        "Action": [
        "Resource": "<secrets manager secret arn>"
        "Effect": "Allow"
        "Action": [
        "Resource": "<secrets manager secret arn>"
        "Principal": {
          "AWS": "<iam role arn>"
        "Effect": "Allow"
        "Action": "iam:ListAccessKeys",
        "Resource": "*"
        "Effect": "Allow"
        "Action": "sns:Publish",
        "Resource": "<sns topic arn>"

9. Create IAM Policy to grant IAM User permissions to access secrets manager secret that stores AWS Access Key and Secret Access Key.

    "Statement": [
        "Effect": "Allow"
        "Action": [
           <list of actions the API access should grant>
        "Resource": [
           "<the resources access should be granted to>"
        "Effect": "Allow"
        "Action": [
        "Resource": "<secrets manager secret arn>"

10. Revisit IAM user and attach the new policy created in step 9.

11. Revisit IAM Lambda role and attach the new policy created in step 8.

12. Attach AWS Managed IAM Policy “AWSLambdaBasicExecutionRole” to IAM lambda role.

13. Revisit the secrets manager secret in step 3 and add the following policy to “Resource Permissions”

  "Version" : "2012-10-17",
  "Statement" : [ {
    "Sid" : "AllowLambdaFunctionReadWriteAccess",
    "Effect" : "Allow",
    "Principal" : {
      "AWS" : "<lambda iam role>"
    "Action" : [ "secretsmanager:GetSecretValue", "secretsmanager:PutSecretValue" ],
    "Resource" : "<the arn of the secret>"
  }, {
    "Sid" : "AllowIAMUserReadAccess",
    "Effect" : "Allow",
    "Principal" : {
      "AWS" : "<the arn of the iam user>"
    "Action" : "secretsmanager:GetSecretValue",
    "Resource" : "<the arn of the secret>"
  } ]


By using event bridge rules, we can set schedules to trigger the lambda function and pass the event data needed to process key rotation. In our design, we provide the developers with 14 days to rotate their keys and provide an additional 14 days of grace period before deleting the keys permanently. The notification is provided to the software developers via SNS topic and subscribing them to those topics. We use secrets manager to store the secret key and secret access key information; Since secrets manager maintain versions of secret, the lambda function can leverage this to match the last issued access key Id in order to deactivate it during key rotation. IAM user and policy is setup to grant the user access to the access key secret as well as permissions that the software developers needed. IAM role and policy is created to allow the lambda to execute via AWSLambdaBasicExecutionRole managed policy and attaching inline or custom policy to grant access to the access key secret as well as SNS topic to publish notifications to software developers during key rotation and key deactivation.

Coming Soon

In this post, you’ve seen how I build these resources manually using AWS Console. Benhur P.’s post provides you with an overview of how to create the same resources using CloudFormation.

In a follow up post at later date, I will demonstrate how this can be done using Terraform, a tool that I use on a daily basis.


This post was made possible due to these amazing authors. My ideas were expressed and built on top of their work.

lsof command

lsof command is short for “list open files”. It lists all files that are open on the operating system. This is great to troubleshoot networking issues because every open socket on a Linux/Unix operating system is treated as a file.


lsof -i <protocol><@hostname or host address>:<service or port>

-i by default will prioritize IPv4 addresses above IPv6 but will list both. If you prefer to for the tool to only list either one, specify a 4 or 6 after the “i” (example: lsof -i4 …)

Example of Searching TCP Connection By App

The following will list all sockets opened with TCP protocol for Brave Browser.

lsof -i tcp | grep Brave

Example Output

Brave\x20 10426 user   23u  IPv4 0x1bf754502b0433d3      0t0  TCP> (ESTABLISHED)
Brave\x20 10426 user   35u  IPv4 0x1bf754502d4c1313      0t0  TCP> (ESTABLISHED)

Example of Searching TCP Connection By Port

The following will list all sockets opened on TCP Protocol over the port number 10426

lsof -i tcp:10426

Example Output

Brave\x20 10426 user   23u  IPv4 0x1bf754502b0433d3      0t0  TCP> (ESTABLISHED)
Brave\x20 10426 user   35u  IPv4 0x1bf754502d4c1313      0t0  TCP> (ESTABLISHED)

By doing so you can see that TCP port 10426 is being used by Brave Browser.