My company operates a few self-managed GitLab EC2 instances in AWS. One of the challenges was to figure out a way to monitor the health of those instances, in particular disk utilization.
Looking at CloudWatch metrics, the one and only metric that’s missing from EC2 instances is the disk utilization metric. The reason for this is because AWS has no way to obtain this information using EC2 metadata.
Our challenge was to figure out a way to obtain that metric and on top of that, setup alerting through Microsoft Teams as it is the tool of choice in most medium to large corporations and it removes the need setup email subscriptions on SNS although there are plenty of options when it comes to alerting via SNS.
Solution Architecture Diagram

Pre-Requisites
- AWS Account with permissions to create and edit IAM role, Lambda functions, SNS Topic, CloudWatch Alarm
- Admin rights on Microsoft Teams to create channels and configure connectors
- GitLab EC2 instance with access to public Internet
Setup IAM Role On EC2 Instance
EC2 instance already has an attached role
- From AWS IAM console, select “roles” on the left menu panel.
- Search for the role used by Gitlab EC2 instance and click on the name.
- Under “permissions policies” section, click on “Add permissions” button.
- Select “Attach policies”
- Search for “CloudWatchAgentServer” managed IAM policy.
- Check off the policy and click the attach button.
EC2 Instance does not use an IAM Role
- From AWS IAM console, select “roles” from the left menu panel
- Click on “create role” button.
- Select “AWS Service” as trusted entity type.
- Select “EC2” as use case and click next.
- Search for “CloudWatchAgentServer” in Add Permissions page and check off the IAM policy and hit next button.
- Fill in the name for the role and optionally the tags, them click “create role” button.
Setup IAM Role and Policy for Lambda Function
Setup Policy for Lambda Customer KMS Key
If you use a customer KMS key for Lambda functions, you will need to create an IAM policy to enable the role used by Lambda function for access.
Before we begin, make sure you go to AWS KMS console and retrieve your KMS Key’s ARN.
1. From AWS IAM console, select “policies” form the left menu panel.
2. Click on “create policy” button.
3. Select the “JSON” tab and paste the following code below. Be sure to enter your KMS key’s ARN as the value of the Resource field.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "LambdaKMSKeyAccess",
"Effect": "Allow",
"Action": [
"kms:Decrypt"
],
"Resource": "<your kms key ARN>"
}
]
}
4. Click on “Next: Tags” button to optionally add tags.
5. Click on “Next: Review” button and provide a name and description for the IAM Policy; As an example, I have named my IAM Policy “gitlab-alert-lambda-kms-access”.
6. Click “Create policy” button.
7. From AWS IAM console, select “roles” from the left menu panel
8. Click on “create role” button.
9. Select “AWS Service” as trusted entity type.
10. Select “EC2” as use case and click next.
11. Search for “gitlab-alert-lambda-kms-access” in Add Permissions page and check off the IAM policy.
12. Search for “AWSLambdaBasicExecutionRole” on the same page and check off the IAM policy.
13. Click the next button.
14. Fill in the name for the role and optionally the tags, them click “create role” button; As an example, I named my role “gitlab-alert-lambda”.
Setup CloudWatch Agent On EC2 Instance
Note: The following commands are based on Amazon Linux 2 AMI. It should also work for CentOS as well. For installation instructions for other operating systems visit AWS documentation.
- SSH into your EC2 instance.
- sudo yum update -y && sudo yum upgrade -y
- sudo yum install amazon-cloudwatch-agent -y
- With the editor of your choice, edit /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
- Paste the following JSON text and modify as needed
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "cwagent"
},
"metrics": {
"append_dimensions": {
"InstanceId": "${aws:InstanceId}"
},
"metrics_collected": {
"disk": {
"measurement": [
"used_percent"
],
"metrics_collection_interval": 60,
"resources": [
"/"
]
}
}
}
}
Explanation of the JSON Properties
- metrics_collected: metrics that CloudWatch agent will send to CloudWatch; This would be the disk used percentage specified by the code above.
- resources: specifies the disks that CloudWatch agent should collect metrics on; This would be the primary volume. Since it is defined as an array, you may add additional volumes as necessary.
- metric_collection_internal: The amount of seconds metrics will be collected by CloudWatch agent on the EC2 instance.
For more configuration parameters, see this AWS Documentation.
7. Save the configuration file and exit the editor.
8. Restart CloudWatch agent and check the logs by executing the following.
sudo systemctl restart amazon-cloudwatch-agent
tail -f /opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log
The metrics should become available in CloudWatch in the next few minutes.
Create SNS Topic
- Log into AWS Management Console and navigate to SNS.
- Select “Create SNS Topic” button.
- Create a standard SNS Topic named “gitlab-ee-alerts” and configure encryption as necessary.
4. Optionally, add email subscriptions by clicking into the new subscription, then click on “create subscription” button under “subscriptions” tab.
Setup CloudWatch Alarm
- Navigate to AWS CloudWatch console.
- From the left vertical menu panel, expand “Alarms” section and select “all alarms”.
- Select “create alarm” button.
- Click on “select metric” button.
- Under “Custom namespaces” section of Metrics, select “CWAgent”

6. Select “InstanceId, device, fstype, path” box; These are the types of dimensions for which metrics are reported.
7. Find the entry that matches your GitLab instance name and instance Id and check the checkbox next to it.
8. Click on “select metric button”.
9. Configure the alarm with your desired threshold configurations and click “next”.
For my scenario, I have the following configurations:
- Statistic: Average
- Period: 10 mins
- Threshold Type: Static
- Whenever disk_used_percent is: Greater/Equal than 90
10. In the onfigure actions step, we will create 2 notifications; The first notification is for “in alarm” state. While creating this notification, select “select an existing SNS topic “gitlab_ee_alerts”.
11. At the bottom of notifications section, click on “add notification” button.
12. In this new alarm, set the state trigger to “OK” and use the same configurations as the one shown above.
13. Scroll down to the bottom of the page and click “next”.
14. Set the name of the alarm as “gitlab-high-disk-utilization” and a meaningful description such as “Gitlab Server disk utilization has reached 90% and above” and click “next”
15. Review the contents and select “create alarm”.
Setup Microsoft Teams Channel
- In microsoft teams, select “teams” tab.
- In your group, click on the ellipsis (…) and select “add channel”.
- Create a channel named “Gitlab Alert” and click “add”.
- Mouse over “Gitlab Alert” channel and click on the ellipsis (…).
- Select “Connectors”.
- Search for “incoming webhook” and click the “configure” button next to it.
7. Upload an image for Gitlab and provide a name such as “Gitlab Alert” for the webhook and click on “create”.
8. Copy the generated URL and save this URL. This URL will be used in the Lambda Function section below.
Create Lambda Function To Call Microsoft Teams Webhook
1. Go to AWS Lambda under AWS Management Console.
2. Create a lambda function named “gitlab-ms-teams-alert”.
3. Expand “Change default execution role” section under “Permissions” heading.
4. Select “Use existing role” and choose the lambda role you’ve created (ie: “gitlab-alert-lambda” role).

5. Copy and paste the following code into the code section.
import json
import logging
import os
import re
from urllib.request import Request, urlopen
from urllib.error import URLError, HTTPError
HOOK_URL = os.environ['HookUrl']
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
logger.info("Event: " + str(event))
message = json.loads(event['Records'][0]['Sns']['Message'])
logger.info("Message: " + str(message))
metric_dimensions = message['Trigger']['Dimensions']
instance_id_result = [item['value'] for item in metric_dimensions \
if item['name'] == 'InstanceId']
instance_id = f"{instance_id_result[0]}" if instance_id_result != [] else ""
alarm_name = message['AlarmName']
old_state = message['OldStateValue']
new_state = message['NewStateValue']
reason = message['NewStateReason']
alarm_description = message['AlarmDescription']
# Format message
is_missing_datapoints = re.findall('no datapoints were received', reason)
is_alarm_in_ok_state = new_state == "OK"
if is_missing_datapoints:
output_message = f"CloudWatch metrics is unable to obtain " \
f"datapoints for Gitlab server. Ensure "\
f"CloudWatch Agent is "\
f"running using the command `sudo tail -f /opt/aws/" \
f"amazon-cloudwatch-agent/logs/" \
f"amazon-cloudwatch-agent.log` on the instance. " \
f"Restart the service by running " \
f"`sudo systemctl restart amazon-cloudwatch-agent` if necessary."
elif is_alarm_in_ok_state:
output_message = f"GitLab Instance is fully " \
f"operational."
else:
output_message = f"{alarm_description}!"
data = {
"colour": "64a837",
"title": f"Resolved: {alarm_name}",
"text": output_message
}
if new_state.lower() == 'alarm':
data = {
"colour": "d63333",
"title": f"Alert: {alarm_name}",
"text": output_message
}
message = {
"@context": "https://schema.org/extensions",
"@type": "MessageCard",
"themeColor": data["colour"],
"title": data["title"],
"text": data["text"]
}
req = Request(HOOK_URL, json.dumps(message).encode('utf-8'))
try:
response = urlopen(req)
response.read()
logger.info("Message posted")
return { "status": "200 OK"}
except HTTPError as e:
logger.error("Request failed: %d %s", e.code, e.reason)
except URLError as e:
logger.error("Server connection failed: %s", e.reason)
6. Select “Configuration” tab, then select “Environment Variables” from the side menu.
7. Click on the “edit” button and enter “HookUrl” as key and the WebHook URL as the value and click “save” button.
8. From the side menu, select “Triggers”.
9. Click on “Add trigger” button.
10. Select “SNS” trigger.
11. Enter “gitlab-ee-alerts” as the SNS Topic name and click “add” button.
Troubleshooting
CloudWatch Alarm is not triggering SNS Topic to send alerts to Microsoft Teams Channel due to unauthorized access to SNS Topic encryption key.
“Failed to execute action arn:sns:<region>:<account number>:gitlab-ee-alerts. Received error: “CloudWatch Alarms does not have authorization to access the SNS topic encryption key.”
This error indicates that the SNS Topic you’ve configured uses a custom KMS key which CloudWatch Service does not have access.
To provide access, go to AWS KMS console, select “customer managed keys” on the left menu panel, select the SNS topic key and add the following to the resource policy:
{
"Sid": "Allow_CloudWatch_for_CMK",
"Effect": "Allow",
"Principal": {
"Service":[
"cloudwatch.amazonaws.com"
]
},
"Action": [
"kms:Decrypt","kms:GenerateDataKey*"
],
"Resource": "*"
}
Credits And Reference
1. Monitoring EC2 disk usage with CloudWatch Metrics and Alarms – Marbot
2. Why I didn’t receive SNS notification for my CloudWatch alarm trigger – AWS SNS Knowledge Center