slic-watch
Automatic, best-practice CloudWatch Dashboards and Alarms for your SAM, CloudFormation, CDK and Serverless Framework applications.
SLIC Watch supports: AWS Lambda, API Gateway, DynamoDB, Kinesis Data Streams, SQS Queues, Step Functions, ECS (Fargate or EC2), SNS, EventBridge, Application Load Balancer and AppSync.
Supported tools include:
- âĄī¸ Serverless Framework v3 via the SLIC Watch Serverless Plugin
- đŋ AWS SAM, đĻ AWS CDK and âī¸ CloudFormation using the CloudFormation Macro, published in the Serverless Application Repository (SAR).
Contents
- slic-watch
Getting Started with Serverless Framework
If you are using AWS SAM or CloudFormation, skip to the section below.
- đĻ Install the plugin:
npm install serverless-slic-watch-plugin --save-dev
- đī¸ Add the plugin to the
plugins
section ofserverless.yml
:
plugins: - serverless-slic-watch-plugin
- đĒ Optionally, add some configuration for the plugin to the
custom -> slicWatch
section ofserverless.yml
. Here, you can specify a reference to the SNS topic for alarms. This is optional, but it's usually something you want so you can receive alarm notifications via email, Slack, etc.
custom: slicWatch: alarmActionsConfig: { alarmActions: [{'Fn::Ref': myTopic}] }
See the Configuration section below for more detailed instructions on fine tuning SLIC Watch to your needs.
- đĸ Deploy your application in the usual way, for example:
sls deploy
- đ Head to the CloudWatch section of the AWS Console to check out your new dashboards đ and alarms â° !
Getting Started with AWS SAM, CDK or CloudFormation
âšī¸ IMPORTANT: If you are using AWS SAM, CDK, or just plain CloudFormation, the most important thing to know is that your AWS account/region should have the SLIC Watch Macro deployed before you do anything. Once that's done, it is very simple to add this macro as a transform to your SAM or CloudFormation template.
Deploying the SLIC Watch Macro
It would be nice if CloudFormation allowed us to publicly publish a macro so you don't need this step, but for now, you can deploy the SLIC Watch Macro using any of the following options. We have made the macro available as a Serverless Application Repository (SAR) app. This SAR app is used in Options 1 and 2 below. Option 3 is a manual option where you deploy the macro from this repository directly without using SAR.
- Option 1 using the Service Application Repository (SAR) console: Go to SLIC Watch in the Serverless Application Repository and click the Deploy button.
- Option 2 (using SAR with CloudFormation): If you prefer to automate the deployment of SAR apps using Infrastructure as Code, you can add the SAR app as a resource in any CloudFormation template. Note that this cannot be the same template as the application in which you want to use SLIC Watch! The snippet of CloudFormation is as follows.
Resources: ... SlicWatchMacro: Type: AWS::Serverless::Application Properties: Location: ApplicationId: arn:aws:serverlessrepo:eu-west-1:949339270388:applications~slic-watch-app SemanticVersion: <enter latest version>
To determine the list of available versions, you can use the AWS CLI:
aws serverlessrepo list-application-versions \ --application-id arn:aws:serverlessrepo:eu-west-1:949339270388:applications/slic-watch-app
- Option 3 (manual Macro deployment using SAM directly from source):
npm installsam build --base-dir . --template-file cf-macro/template.yamlsam deploy --guided
Adding the SLIC Watch Transform to SAM or CloudFormation templates
Once you have deployed the macro, you can start using SLIC Watch in SAM or CloudFormation templates by adding this to the Transform section:
Transform: - ... - SlicWatch-v3
đĒ Optionally, add some configuration for the plugin to the Metadata -> slicWatch
section of template.yml
.
Here, you can specify a reference to the SNS topic for alarms. This is optional, but it's usually something you want
so you can receive alarm notifications via email, Slack, etc.
Metadata: slicWatch: enabled: true alarmActionsConfig: alarmActions: - !Ref MonitoringTopic
See the Configuration section below for more detailed instructions on fine tuning SLIC Watch to your needs.
If you want to override the default alarm and dashboard settings for each Lambda Function resource, add the slicWatch
property to the Metadata
section.
Adding the SLIC Watch Transform to CDK Apps
Once you have deployed the macro, add it to CDK Stack in the constructor of the class that extends Stack. It should be done for every Stack in the CDK App.
// JavaScript/TypeScript:export class MyStack extends cdk.Stack { constructor (scope: cdk.App, id: string, props?: cdk.StackProps) { super(scope, id, props) this.addTransform('SlicWatch-v3') ... }}
# Python:self.add_transform("SlicWatch-v3")
// C#:this.AddTransform("SlicWatch-v3")
// Java:this.addTransform("SlicWatch-v3");
đĒ Optionally, add some configuration for the plugin as below:
this.templateOptions.metadata = { slicWatch: { enabled: true, alarmActionsConfig: { alarmActions: ["arn:aws:sns:eu-west-1:xxxxxxx:topic"], okActions: ["arn:aws:sns:eu-west-1:xxxxxxx:topic"], actionsEnabled: true } }}
Features
CloudWatch Alarms and Dashboard widgets are created for all supported resources in the CloudFormation stack generated by The Serverless Framework. This includes generated resources as well as resources specified explicitly in the resources
section.
Any feature can be configured or disabled completely - see the section on configuration to see how.
Lambda Functions
Lambda Function alarms are created for:
- Errors
- Throttles, as a percentage of the number of invocations
- Duration, as a percentage of the function's configured timeout
- Invocations, disabled by default
- IteratorAge, for functions triggered by an Event Source Mapping
Lambda dashboard widgets show:
Errors | Throttles | Duration Average, P95 and Maximum |
---|---|---|
Invocations | Concurrent Executions | Iterator Age |
API Gateway
API Gateway alarms are created for:
- 5XX Errors
- 4XX Errors
- Latency
API Gateway dashboard widgets show:
5XX Errors | 4XX Errors | Latency | Count |
---|---|---|---|
DynamoDB
DynamoDB alarms are created for:
- Read Throttle Events (Table and GSI)
- Write Throttle Events (Table and GSI)
- UserErrors
- SystemErrors
Dashboard widgets are created for tables and GSIs: dynamodbGSIReadThrottle.png dynamodbGSIWriteThrottle.png dynamodbTableWriteThrottle.png
ReadThrottleEvents (Table) | WriteThrottleEvent (Table) |
---|---|
ReadThrottleEvents (GSI) | WriteThrottleEvent (GSI) |
Kinesis Data Streams
Kinesis data stream alarms are created for:
- Iterator Age
- Read Provisioned Throughput Exceeded
- Write Provisioned Throughput Exceeded
- PutRecord.Success
- PutRecords.Success
- GetRecords.Success
Kinesis data stream dashboard widgets show:
Iterator Age | Read Provisioned Throughput Exceeded | Write Provisioned Throughput Exceeded |
---|---|---|
SQS Queues
SQS Queue alarms are create for:
- Age Of Oldest Message (disabled by default). If enabled, a threshold in seconds should be specified.
- In Flight Messages Percentage. This is a percentage of the AWS hard limits (20,000 messages for FIFO queues and 120,000 for standard queues).
SQS queue dashboard widgets show:
Messages Sent, Received and Deleted | Messages Visible | Age of Oldest Message |
---|---|---|
Step Functions
Step Function alarms are created for:
- Execution Throttled
- Executions Failed
- Executions Timed Out
The dashboard contains one widget per Step Function:
ExecutionsFailed ExecutionThrottled, ExecutionsTimedOut |
---|
ECS / Fargate
ECS alarms are created for Fargate or EC2 clusters:
- Memory Utilization
- CPU Utilization
SNS
SNS alarms are created for:
- Number of Notifications Filtered Out due to Invalid Attributes
- Number of Notifications Failed
SNS Topic dashboard widgets show:
Messages Filtered Out - Invalid Attributes | Notifications Failed |
---|---|
EventBridge
EventBridge alarms are created for:
- Failed Invocations
- Throttled Rules
EventBridge Rule dashboard widgets show:
Failed Invocations | Invocations |
---|---|
Application Load Balancer
Application Load Balancer alarms are created for:
- HTTP Code ELB 5XX Count
- Rejected Connection Count
- HTTP Code Target 5XX Count
- UnHealthy Host Count
- Lambda Internal Error
- Lambda User Error
Application Load Balancer dashboard widgets show:
HTTP Code ELB 5XX Count | HTTP Code Target 5XX Count | Rejected Connection Count |
---|---|---|
UnHealthy Host Count | Lambda User Error | Lambda Internal Error |
AppSync
AppSync alarms are created for:
- 5XX Error
- Latency
AppSync dashboard widgets show:
5XX Error, Latency, 4XX Error, Request |
---|
Connect Server Error, Disconnect Server Error, Subscribe Server Error, Unsubscribe Server Error,PublishDataMessageServerError |
Configuration
Configuration is entirely optional - SLIC Watch provides defaults that work out of the box.
Note: Alarm configuration is cascading. This means that configuration properties are automatically propagated from parent to children nodes (unless an override is present at the given node).
You can customize the configuration:
- at the top level, for all resources in each service, and/or
- at the level of individual resources
Top-level configuration
Top-level SLIC Watch configuration can be specified for all resources of each type:
- For Serverless Framework applications, in the
custom
âslicWatch
section ofserverless.yml
:
custom: slicWatch: enabled: true ...
- For CloudFormation or SAM templates, in the
Metadata
âslicWatch
section of the template:
Metadata: slicWatch: enabled: true ...
- For *CDK Stacks, the top-level SLIC Watch configuration can be set as follows.
this.templateOptions.metadata = { slicWatch: { enabled: true, .... }}
- The
alarmActionsConfig
may be optionally added to specific one or more SNS Topic destinations for all alarm status changes toALARM
andOK
. If you omit destination topics, alarms are still created but are not sent to any destination. For example:
slicWatch: alarmActionsConfig: alarmActions: # Default to no actions - arn:aws:sns:eu-west-1:123456789012 okActions: # Defaults to no actions - arn:aws:sns:eu-west-1:123456789012 actionsEnabled: - true # Defaults to true
- Alarms or dashboards can be disabled at any level in the configuration by adding
enabled: false
. You can even disable all plugin functionality by specifyingenabled: false
at the top-level plugin configuration.
A complete set of supported options along with their defaults are shown in default-config.js
Example projects are also provided for reference:
Resource-level configuration
Alarms and dashboards for each resource can be customised using CloudFormation metadata. This configuration will take precedence over the top-level configuration.
Resources: regularQueue: Type: AWS::SQS::Queue Metadata: slicWatch: alarms: InFlightMessagesPc: Threshold: 95 dashboard: ApproximateAgeOfOldestMessage: yAxis: right NumberOfMessagesReceived: enabled: false
This can be done for any CloudFormation, AWS and SAM resource. It can also be done for CDK with the following syntax.
const dlq = new sqs.Queue(this, 'DeadLetterQueue')const cfnDlq = dlq.node.defaultChild as CfnResourcecfnDlq.cfnOptions.metadata = { slicWatch: { alarms: { InFlightMessagesPc: { Threshold: 95 } }, dashboard: { ApproximateAgeOfOldestMessage: { yAxis: 'right' }, NumberOfMessagesReceived: { enabled: false } } }}
Serverless Framework function-level configuration
Function-level configuration works a bit differently for Serverless Framework functions. Here, the slicWatch
configuration parameter is set directly on the function: For each function, add the slicWatch
property to configure specific overrides for alarms and dashboards relating to the AWS Lambda Function resource.
functions: hello: handler: basic-handler.hello slicWatch: dashboard: enabled: false # No Lambda widgets will be created for this function alarms: Invocations: Threshold: 2 # The invocation threshold is specific to # this function's expected invocation count
To disable all alarms for any given function, use:
functions: hello: handler: basic-handler.hello slicWatch: alarms: enabled: false
SAM/CloudFormation function-level configuration
Resources: LambdaFunction1: Type: AWS::Serverless::Function # Can also be applied to AWS::Lambda::Function resources Properties: Handler: lambda1.functionHandler Metadata: slicWatch: alarms: Invocations: Threshold: 3 dashboard: enabled: true
To disable all alarms for any given function, use:
Resources: LambdaFunction1: Type: AWS::Serverless::Function # Can also be applied to AWS::Lambda::Function resources Properties: Handler: lambda1.functionHandler Metadata: slicWatch: alarms: enabled: false
CDK function-level configuration
const hello: lambda.Function;const cfnFuncHello = hello.node.defaultChild as CfnResource;cfnFuncHello.cfnOptions.metadata = { slicWatch: { alarms: { Invocations: { Threshold: 2 } } }}
A note on CloudWatch cost
This plugin creates additional CloudWatch resources that, apart from a limited free tier, have an associated cost. Depending on what you enable, SLIC Watch creates one dashboard and multiple alarms. The number of each depend on the number of resources in your stack and the number of stacks you have.
Check out the AWS CloudWatch Pricing page to understand the cost impact of creating CloudWatch resources.
References
Other Projects
- serverless-plugin-aws-alerts
- Real World Serverless Application - Serverless Operations
- CDK Watchful
- CDK Patterns - The CloudWatch Dashboard
Reading
- AWS Well Architected Serverless Applications Lens
- How to Monitor Lambda with CloudWatch Metrics - Yan Cui
LICENSE
Apache - LICENSE