AWS DevOps & Developer Productivity Blog
StackSets Deployment Strategies: Balancing Speed, Safety, and Scale to Optimize Deployments for Different Organizational Needs
AWS CloudFormation StackSets enables organizations to deploy infrastructure consistently across multiple AWS accounts and regions. However, success depends on choosing the right deployment strategy that balances three critical factors: deployment speed, operational safety, and organizational scale. This guide explores proven StackSets deployment strategies specifically designed for multi-account infrastructure management.
Understanding StackSets Deployment Fundamentals
What are StackSets Actually Used For?
Unlike single-account AWS CloudFormation templates, StackSets are specifically designed for multi-account infrastructure governance. Common use cases include Security baselines (deploying IAM policies, security groups, and access controls across all accounts), Compliance controls (rolling out AWS Config rules, AWS CloudTrail configurations, and audit requirements), Organizational standards (establishing consistent VPC configurations, tagging policies, and naming conventions), Shared services (deploying monitoring solutions, logging infrastructure, and backup policies) or Cost management (implementing budget controls, cost allocation tags, and resource optimization policies)
The Multi-Account Challenge
Managing infrastructure across dozens or hundreds of AWS accounts presents unique challenges:
Single Account (CFN Template) Multi-Account (StackSets)
App A Org Unit A (50 accounts)
| |
[Deploy Once] [Deploy consistently across all]
| |
Success/Fail Complex success/failure matrix
Multi account and multi region Cloudformation deployment complexity
The Speed-Safety-Scale Triangle
Every StackSets deployment strategy involves trade-offs: Speed (how quickly changes propagate across your organization), Safety (risk mitigation and failure containment) and Scale (ability to manage hundreds of accounts efficiently)
Prerequisites
Before implementing any of the deployment strategies described in this guide, ensure you have:
- AWS CLI Installation
- Install the latest version of AWS CLI by following the AWS CLI installation guide
- Verify installation with: aws –version
- AWS Profile Configuration
- Configure your AWS credentials using: aws configure
- For details on configuration, see AWS CLI configuration basics
- Ensure your profile has appropriate permissions for CloudFormation StackSets operations as described in AWS StackSets prerequisites
- Proper Account Access The commands in this guide must be executed from either:
- The management account of your AWS Organization
- OR a delegated administrator account for CloudFormation
For information on setting up a delegated administrator, see Register a delegated administrator
Note: StackSets deployments using service-managed permissions cannot be performed from standalone accounts.
Verify you’re using the correct account with:
bash
# For management account
aws organizations describe-organization
# For delegated admin
aws cloudformation list-stack-sets —call-as DELEGATED_ADMIN
AWS CLI to check the usage of an Organization and not a Standalone account
Core Deployment Strategies
As explained in the StackSet documentation:
- “For a more conservative deployment, set Maximum Concurrent Accounts to 1, and Failure Tolerance to 0. Set your lowest-impact region to be first in the Region Order Start with one region.”
- “For a faster deployment, increase the values of Maximum Concurrent Accounts and Failure Tolerance as needed. ”
Based on the above, we are proposing below several deployment strategies, depending on the speed, safety and scale you want to achieve.
1. Sequential Deployment: Maximum Safety
Use Case : Critical security updates, compliance requirements, first-time organizational rollouts
Below are listed some possible use cases:
- Security baseline updates: New IAM policies affecting root access
- Compliance rollouts: SOX, HIPAA, or PCI-DSS control implementations
- Critical infrastructure changes: VPC security group modifications
- Organizational policy changes: New AWS Config rules for audit compliance
Implementation Example:
For this example, we will download the following template ConfigRuleCloudtrailEnabled.yml from the Cloudformation sample library in the AWS documentation to configure an AWS Config rule to determine if AWS CloudTrail is enabled and follow the next steps:
Step 1: Create the StackSet
With the AWS CLI:
# Create Stackset for security baseline
# StackSet operation managed from us-east-1
aws cloudformation create-stack-set \
--stack-set-name security-baseline \
--template-body file://ConfigRuleCloudtrailEnabled.yml \
--capabilities CAPABILITY_NAMED_IAM \
--permission-model SERVICE_MANAGED \
--auto-deployment Enabled=true,RetainStacksOnAccountRemoval=false \
--region us-east-1
AWS CLI to create a security-baseline Stackset
The expected response should be similar to the following :
{"StacksetId": "security-baseline: ...."}
Step 2: Create Stack Instances
Before you launch the below command, you need to adjust the values of the following parameters:
- OrganizationalUnitIds: you must change the value “ou-test” in the below command line to the name of the target OU you want to deploy to. I recommend creating a new test OU in the console or via the CLI for the purpose of this test.
- regions: if needed, change the “us-east-1 eu-west-1” value, here you need to list all the regions you want to deploy to. AWS Config must be active in the accounts/regions that you choose, otherwise you’ll get an error when deploying the Stack.
# Deploy security baseline to production accounts
# StackSet operation managed from us-east-1
# Deployed to regions us-east-1 and eu-west-1
# SEQUENTIAL = One region at a time, sequentially
# MaxConcurrentPercentage = Deploy to 5% of accounts at once
# FailureTolerancePercentage = Stop on first failure
aws cloudformation create-stack-instances \
--stack-set-name security-baseline \
--deployment-targets OrganizationalUnitIds=ou-test\
--regions us-east-1 eu-west-1 \
--region us-east-1 \
--operation-preferences RegionConcurrencyType=SEQUENTIAL,MaxConcurrentPercentage=5,FailureTolerancePercentage=0
AWS CLI to create security-baseline Stack Instances sequentially for maximum safety
The CLI output should look like the following:
{"OperationId": ....}
Or create the StackSet and add the Stacks with the AWS Console:
In the CloudFormation Console, click “Create StackSet”
AWS CloudFormation Console: create a security-baseline Stackset
Upload your template from S3 or from your computer and click Next:
AWS CloudFormation Console: specify a template
Specify the StackSet name and parameters and click Next:
AWS CloudFormation Console: specify the StackSet name and parameters
Configure StackSet options and click Next:
AWS CloudFormation Console: configure the StackSet options
Set deployment options and click Next:
AWS CloudFormation Console: set deployment options
AWS CloudFormation Console: set more deployment options
Then Review and Submit.
Not to overweight this blog, we’ll provide only this example of CLI output and Console screenshot, but the “Parallel Deployment” and “Balanced Approach” will be similar to this example. You just need to update the parameters for the different StackSet Operations options.
A real-world example would be a financial services company deploying new MFA requirements across 200 production accounts. They could use sequential deployment with 5 concurrency to ensure each batch was validated before proceeding.
2. Parallel Deployment: Maximum Speed
The Parallel Deployment is best for non-critical updates, development environments, routine maintenance
Here are some possible use cases:
- Development account standardization: Rolling out new development tools
- Monitoring infrastructure: Deploying Amazon CloudWatch dashboards and alarms
- Cost optimization: Implementing automated resource cleanup policies
- Non-production updates: Updating development and staging environments
Implementation Example:
For this example, we will copy paste the .yml template from this Re:Post article about monitoring IAM events in a file called “monitoring-baseline.yml”, and use it in the following command lines.
Step 1: Create the StackSet
# Create Stackset for monitoring baseline
# StackSet operation managed from us-east-1
aws cloudformation create-stack-set \
--stack-set-name monitoring-baseline \
--template-body file://monitoring-baseline.yml \
--capabilities CAPABILITY_NAMED_IAM \
--permission-model SERVICE_MANAGED \
--auto-deployment Enabled=true,RetainStacksOnAccountRemoval=false \
--region us-east-1
AWS CLI to create a monitoring-baseline Stackset
Step 2: Create Stack Instances
Just like in the previous example, before you launch the below command, you need to adjust the values of the OrganizationalUnitIds and regions parameters.
# Deploy monitoring baseline to dev and sandbox accounts
# StackSet operation managed from us-east-1
# Deployed to regions us-east-1 and eu-west-1
# PARALLEL = Deployment in parallel
# MaxConcurrentPercentage = Deploy to 80% of accounts at once
# FailureTolerancePercentage = Tolerate failures in 20% of accounts
aws cloudformation create-stack-instances \
--stack-set-name monitoring-baseline \
--deployment-targets OrganizationalUnitIds=ou-development,ou-sandbox \
--regions us-east-1 eu-west-1 \
--region us-east-1 \
--operation-preferences RegionConcurrencyType=PARALLEL,MaxConcurrentPercentage=80,FailureTolerancePercentage=20
AWS CLI to create monitoring-baseline Stack Instances in parallel with high value for max concurrent percentage for maximum speed
3. Progressive Deployment: Balanced Approach or Multi Phase Approach (Recommended)
For most production scenarios with moderate risk tolerance, it is recommended to use a Balanced Approach, or Multi-Phase Implementation.
Balanced Approach
For this example, to make it easier, you can create a copy of “monitoring-baseline.yml” created previously, and name it “balanced-template.yml”.
cp monitoring-baseline.yml balanced-template.yml
bash command to copy the monitoring-baseline.yml file to balanced-template.yml
Then you can use it in the following command lines.
Step 1: Create the StackSet
# Create Stackset for a balanced creation
# StackSet operation managed from us-east-1
aws cloudformation create-stack-set \
--stack-set-name balanced-deployment \
--template-body file://balanced-template.yml \
--capabilities CAPABILITY_NAMED_IAM \
--permission-model SERVICE_MANAGED \
--auto-deployment Enabled=true,RetainStacksOnAccountRemoval=false \
--region us-east-1
AWS CLI to create a balanced-deployment Stackset
Step 2: Create Stack Instances
You need to adjust the values of the OrganizationalUnitIds and regions parameters.
# Deploy monitoring baseline to production accounts
# StackSet operation managed from us-east-1
# Deployed to regions us-east-1, eu-west-1 and ap-southeast-1
# PARALLEL = Deployment in parallel
# MaxConcurrentPercentage = Deploy to 25% of accounts at once
# FailureTolerancePercentage = Tolerate failures in 8% of accounts
aws cloudformation create-stack-instances \
--stack-set-name balanced-deployment \
--deployment-targets OrganizationalUnitIds=ou-development,ou-sandbox \
--regions us-east-1 eu-west-1 ap-southeast-1 \
--region us-east-1 \
--operation-preferences RegionConcurrencyType=PARALLEL,MaxConcurrentPercentage=25,FailureTolerancePercentage=8
AWS CLI to create balanced-deployment Stack Instances in parallel with low max concurrent percentage for a balanced deployment
Multi-Phase Implementation:
Step 1: Create the StackSet
# Create Stackset for a balanced creation
# StackSet operation managed from us-east-1
aws cloudformation create-stack-set \
--stack-set-name balanced-deployment \
--template-body file://balanced-template.yml \
--capabilities CAPABILITY_NAMED_IAM \
--permission-model SERVICE_MANAGED \
--auto-deployment Enabled=true,RetainStacksOnAccountRemoval=false \
--region us-east-1
AWS CLI to create a balanced-deployment Stackset
Phase 1: Pilot Accounts (10% of target)
Phase 1: Create Pilot Stack Instances
You need to adjust the values of the OrganizationalUnitIds and regions parameters.
# Deploy monitoring baseline to production accounts
# StackSet operation managed from us-east-1
# Deployed to regions us-east-1
# SEQUENTIAL = Deployment in sequence
# MaxConcurrentPercentage = 100% Deploy full speed for small pilot
# FailureTolerancePercentage = Zero tolerance in pilot
aws cloudformation create-stack-instances \
--stack-set-name balanced-deployment \
--deployment-targets Accounts=pilot-account-1,pilot-account-2 \
--regions us-east-1 \
--region us-east-1 \
--operation-preferences RegionConcurrencyType=SEQUENTIAL,MaxConcurrentPercentage=100,FailureTolerancePercentage=0
AWS CLI to create balanced-deployment Stack Instances sequentially for maximum safety in Pilot accounts
Wait for Pilot validation before proceeding to Phase 2
Phase 2: Early Adopter OUs (30% of target)
Phase 2: Create Early Adopter Stack Instances
You need to adjust the values of the OrganizationalUnitIds and regions parameters.
# Deploy monitoring baseline to production accounts
# StackSet operation managed from us-east-1
# Deployed to regions us-east-1, eu-west-1
# PARALLEL = Deployment in parallel
# MaxConcurrentPercentage = Deploy to 25% of accounts at once
# FailureTolerancePercentage = Tolerate failures in 5% of accounts
aws cloudformation create-stack-instances \
--stack-set-name balanced-deployment \
--deployment-targets OrganizationalUnitIds=ou-early-adopter \
--regions us-east-1 \
--region us-east-1 eu-west-1 \
--operation-preferences RegionConcurrencyType=PARALLEL,MaxConcurrentPercentage=25,FailureTolerancePercentage=5
AWS CLI to create balanced-deployment Stack Instances in parallel with low max concurrent percentage for a balanced deployment in Early Adopter OU
Wait for Early Adopter validation before proceeding to Phase 3
Phase 3: Full Deployment (Remaining 60%)
Phase 3: Full Deployment
You need to adjust the values of the OrganizationalUnitIds and regions parameters.
# Deploy monitoring baseline to production accounts
# StackSet operation managed from us-east-1
# Deployed to regions us-east-1, eu-west-1 and ap-southeast-1
# PARALLEL = Deployment in parallel
# MaxConcurrentPercentage = Deploy to 40% of accounts at once for higher speed after validation
# FailureTolerancePercentage = Tolerate failures in 10% of accounts for moderate tolerance
aws cloudformation create-stack-instances \
--stack-set-name balanced-deployment \
--deployment-targets OrganizationalUnitIds=ou-standard-prod,ou-legacy-prod \
--regions us-east-1 \
--region us-east-1 eu-west-1 ap-southeast-1 \
--operation-preferences RegionConcurrencyType=PARALLEL,MaxConcurrentPercentage=25,FailureTolerancePercentage=5
AWS CLI to create balanced-deployment Stack Instances in parallel with low max concurrent percentage for a balanced deployment in the remaining OUs
Using Step Functions for Orchestration
AWS Step Functions provides a serverless workflow service that can orchestrate StackSets deployments with advanced control flow, error handling, and state management capabilities. This approach enhances your multi-account deployments with features not available through standard StackSets operations alone.
Some of the Key Benefits include:
- Advanced Deployment Orchestration: Coordinate multi-phase rollouts with validation gates
- Human Approval Workflows: Implement manual approval steps for critical changes
- Enhanced Error Handling: Define sophisticated retry policies and fallback mechanisms
- Visual Monitoring: Track deployment progress through the Step Functions visual console
Real-World Use Case: Compliance Control Rollout
In regulated industries, AWS Step Functions enables a phased approach that combines automation with necessary governance. For instance, you can:
- Deploy compliance controls to test accounts
- Run automated validation and generate compliance reports
- Obtain manual approval from compliance team
- Deploy to production accounts with comprehensive monitoring
This approach ensures consistent governance while maintaining the complete audit trail required for regulatory compliance.
Monitoring and Optimization
AWS CloudFormation StackSets do not have extensive built-in Amazon CloudWatch metrics specifically designed for monitoring StackSet operations and health. This is actually why the monitoring implementation in our blog post is valuable.
Here’s what AWS does and doesn’t provide out of the box:
What AWS provides natively:
- Basic AWS API call metrics via AWS CloudTrail (which show that operations happened but don’t track success rates or performance)
- General service quotas and throttling metrics for CloudFormation as a whole
- CloudFormation provides some metrics for individual stacks, but not consolidated StackSet-specific metrics
What requires custom implementation (as in our blog post):
- Success rate metrics for StackSet operations across accounts
- Deployment completion time tracking
- Configuration drift detection and monitoring
- Account-specific failure analysis
- Comprehensive dashboards that show StackSet health across your organization
The code in our blog post demonstrates how to implement the success rate custom metrics by:
- Gathering data from the CloudFormation API about StackSet operations
- Calculating the success rate metrics for StackSet deployments
- Creating custom Amazon CloudWatch metrics in a custom namespace (like “StackSetMonitoring”)
- Setting up alerts for issues
This explains why organizations need to implement custom monitoring solutions like the one shown in our blog post rather than relying solely on built-in metrics.
Automated Monitoring Implementation: example of a custom metric to monitor the StackSet operations success rate
The following AWS Cloudformation template provides real-time monitoring and alerting for AWS CloudFormation StackSet operations through automated infrastructure deployment. This solution creates a complete monitoring system using a AWS Lambda function, Amazon EventBridge rules, Amazon SNS notifications, and Amazon CloudWatch dashboards to track StackSet success and failure rates. The core Lambda function named StackSetMonitor continuously monitors all active StackSets in your account, calculating success rates and publishing custom metrics to Amazon CloudWatch under the StackSetMonitoring namespace.
Below you’ll find a few example of possible custom metrics that could be implemented based on this AWS Cloudformation template:
- Count of all operations (CREATE, UPDATE, DELETE) per StackSet over time periods
- Number of stack instances with configuration drift (requires additional API calls)
- Average time taken for StackSet operations to complete
- Rate of StackSet operations to identify peak usage times
- Number of individual stack instances that failed during operations
- Number of retried operations (indicates infrastructure issues)
- …
Here’s the StackSetMonitor.yml CloudFormation Template:
# StackSetMonitor.yml
# CFN template for monitoring AWS CloudFormation StackSet operations with real-time alerts, metrics, and dashboards.
AWSTemplateFormatVersion: '2010-09-09'
Description: 'CloudFormation template for StackSet operation monitoring using CloudWatch and SNS'
Parameters:
StackSetName:
Type: String
Description: 'Name of the StackSet to monitor'
Default: 'security-baseline'
MinLength: 1
MaxLength: 128
AllowedPattern: '[a-zA-Z][-a-zA-Z0-9]*'
ConstraintDescription: 'Must be a valid StackSet name (1-128 characters, alphanumeric and hyphens, must start with a letter)'
VpcId:
Type: String
Description: 'VPC ID where the Lambda function will be deployed (leave empty to create new VPC)'
Default: ''
SubnetIds:
Type: CommaDelimitedList
Description: 'List of subnet IDs for the Lambda function (leave empty to create new subnets)'
Default: ''
SecurityGroupIds:
Type: CommaDelimitedList
Description: 'List of security group IDs for the Lambda function (leave empty to create new security group)'
Default: ''
Conditions:
CreateVPC: !Equals [!Ref VpcId, '']
CreateVPCAndSubnets: !And [!Equals [!Ref VpcId, ''], !Equals [!Join [',', !Ref SubnetIds], '']]
HasCustomSecurityGroups: !Not [!Equals [!Join [',', !Ref SecurityGroupIds], '']]
Resources:
# KMS Key for CloudWatch Logs encryption
LogsKMSKey:
Type: AWS::KMS::Key
DeletionPolicy: Delete
UpdateReplacePolicy: Delete
Properties:
Description: 'KMS Key for StackSet Monitor CloudWatch Logs and Lambda environment variable encryption'
EnableKeyRotation: true
KeyPolicy:
Version: '2012-10-17'
Statement:
- Sid: Enable IAM User Permissions
Effect: Allow
Principal:
AWS: !Sub 'arn:${AWS::Partition}:iam::${AWS::AccountId}:root'
Action: 'kms:*'
Resource: '*'
- Sid: Allow CloudWatch Logs
Effect: Allow
Principal:
Service: !Sub 'logs.${AWS::Region}.amazonaws.com'
Action:
- 'kms:Encrypt'
- 'kms:Decrypt'
- 'kms:ReEncrypt*'
- 'kms:GenerateDataKey*'
- 'kms:DescribeKey'
Resource: '*'
Condition:
ArnEquals:
'kms:EncryptionContext:aws:logs:arn':
- !Sub 'arn:${AWS::Partition}:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/StackSetMonitor'
- !Sub 'arn:${AWS::Partition}:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/cloudformation/stacksets'
- Sid: Allow Lambda Service
Effect: Allow
Principal:
Service: lambda.amazonaws.com
Action:
- 'kms:Encrypt'
- 'kms:Decrypt'
- 'kms:ReEncrypt*'
- 'kms:GenerateDataKey*'
- 'kms:DescribeKey'
Resource: '*'
LogsKMSKeyAlias:
Type: AWS::KMS::Alias
Properties:
AliasName: alias/stackset-monitor-logs
TargetKeyId: !Ref LogsKMSKey
# VPC Resources (created when no existing VPC is provided)
StackSetMonitorVPC:
Type: AWS::EC2::VPC
Condition: CreateVPC
Properties:
CidrBlock: 10.0.0.0/16
EnableDnsHostnames: true
EnableDnsSupport: true
Tags:
- Key: Name
Value: StackSetMonitor-VPC
- Key: Purpose
Value: VPC for StackSet Monitor Lambda function
PrivateSubnet1:
Type: AWS::EC2::Subnet
Condition: CreateVPC
Properties:
VpcId: !Ref StackSetMonitorVPC
CidrBlock: 10.0.1.0/24
AvailabilityZone: !Select [0, !GetAZs '']
Tags:
- Key: Name
Value: StackSetMonitor-Private-Subnet-1
- Key: Purpose
Value: Private subnet for StackSet Monitor Lambda
PrivateSubnet2:
Type: AWS::EC2::Subnet
Condition: CreateVPC
Properties:
VpcId: !Ref StackSetMonitorVPC
CidrBlock: 10.0.2.0/24
AvailabilityZone: !Select [1, !GetAZs '']
Tags:
- Key: Name
Value: StackSetMonitor-Private-Subnet-2
- Key: Purpose
Value: Private subnet for StackSet Monitor Lambda
PrivateRouteTable1:
Type: AWS::EC2::RouteTable
Condition: CreateVPC
Properties:
VpcId: !Ref StackSetMonitorVPC
Tags:
- Key: Name
Value: StackSetMonitor-Private-RT-1
PrivateRouteTable2:
Type: AWS::EC2::RouteTable
Condition: CreateVPC
Properties:
VpcId: !Ref StackSetMonitorVPC
Tags:
- Key: Name
Value: StackSetMonitor-Private-RT-2
PrivateSubnet1RouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Condition: CreateVPC
Properties:
RouteTableId: !Ref PrivateRouteTable1
SubnetId: !Ref PrivateSubnet1
PrivateSubnet2RouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Condition: CreateVPC
Properties:
RouteTableId: !Ref PrivateRouteTable2
SubnetId: !Ref PrivateSubnet2
# VPC Endpoints for AWS Services (no internet access needed)
CloudFormationVPCEndpoint:
Type: AWS::EC2::VPCEndpoint
Condition: CreateVPC
Properties:
VpcId: !Ref StackSetMonitorVPC
ServiceName: !Sub com.amazonaws.${AWS::Region}.cloudformation
VpcEndpointType: Interface
SubnetIds:
- !Ref PrivateSubnet1
- !Ref PrivateSubnet2
SecurityGroupIds:
- !Ref VPCEndpointSecurityGroup
PrivateDnsEnabled: true
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal: '*'
Action:
- cloudformation:ListStackSets
- cloudformation:ListStackSetOperations
- cloudformation:ListStackInstances
- cloudformation:DescribeStackInstance
- cloudformation:DescribeStacks
- cloudformation:GetTemplate
Resource: '*'
CloudWatchVPCEndpoint:
Type: AWS::EC2::VPCEndpoint
Condition: CreateVPC
Properties:
VpcId: !Ref StackSetMonitorVPC
ServiceName: !Sub com.amazonaws.${AWS::Region}.monitoring
VpcEndpointType: Interface
SubnetIds:
- !Ref PrivateSubnet1
- !Ref PrivateSubnet2
SecurityGroupIds:
- !Ref VPCEndpointSecurityGroup
PrivateDnsEnabled: true
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal: '*'
Action:
- cloudwatch:PutMetricData
Resource: '*'
SNSVPCEndpoint:
Type: AWS::EC2::VPCEndpoint
Condition: CreateVPC
Properties:
VpcId: !Ref StackSetMonitorVPC
ServiceName: !Sub com.amazonaws.${AWS::Region}.sns
VpcEndpointType: Interface
SubnetIds:
- !Ref PrivateSubnet1
- !Ref PrivateSubnet2
SecurityGroupIds:
- !Ref VPCEndpointSecurityGroup
PrivateDnsEnabled: true
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal: '*'
Action:
- sns:Publish
Resource: '*'
EventsVPCEndpoint:
Type: AWS::EC2::VPCEndpoint
Condition: CreateVPC
Properties:
VpcId: !Ref StackSetMonitorVPC
ServiceName: !Sub com.amazonaws.${AWS::Region}.events
VpcEndpointType: Interface
SubnetIds:
- !Ref PrivateSubnet1
- !Ref PrivateSubnet2
SecurityGroupIds:
- !Ref VPCEndpointSecurityGroup
PrivateDnsEnabled: true
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal: '*'
Action:
- events:PutEvents
Resource: '*'
LogsVPCEndpoint:
Type: AWS::EC2::VPCEndpoint
Condition: CreateVPC
Properties:
VpcId: !Ref StackSetMonitorVPC
ServiceName: !Sub com.amazonaws.${AWS::Region}.logs
VpcEndpointType: Interface
SubnetIds:
- !Ref PrivateSubnet1
- !Ref PrivateSubnet2
SecurityGroupIds:
- !Ref VPCEndpointSecurityGroup
PrivateDnsEnabled: true
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal: '*'
Action:
- logs:CreateLogGroup
- logs:CreateLogStream
- logs:PutLogEvents
Resource: '*'
SQSVPCEndpoint:
Type: AWS::EC2::VPCEndpoint
Condition: CreateVPC
Properties:
VpcId: !Ref StackSetMonitorVPC
ServiceName: !Sub com.amazonaws.${AWS::Region}.sqs
VpcEndpointType: Interface
SubnetIds:
- !Ref PrivateSubnet1
- !Ref PrivateSubnet2
SecurityGroupIds:
- !Ref VPCEndpointSecurityGroup
PrivateDnsEnabled: true
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal: '*'
Action:
- sqs:SendMessage
Resource: '*'
STSVPCEndpoint:
Type: AWS::EC2::VPCEndpoint
Condition: CreateVPC
Properties:
VpcId: !Ref StackSetMonitorVPC
ServiceName: !Sub com.amazonaws.${AWS::Region}.sts
VpcEndpointType: Interface
SubnetIds:
- !Ref PrivateSubnet1
- !Ref PrivateSubnet2
SecurityGroupIds:
- !Ref VPCEndpointSecurityGroup
PrivateDnsEnabled: true
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal: '*'
Action:
- sts:AssumeRole
- sts:GetCallerIdentity
- sts:AssumeRoleWithWebIdentity
Resource: '*'
# Security Group for Lambda function
LambdaSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Security group for StackSet Monitor Lambda function
VpcId: !If
- CreateVPC
- !Ref StackSetMonitorVPC
- !Ref VpcId
SecurityGroupEgress:
- IpProtocol: tcp
FromPort: 443
ToPort: 443
CidrIp: 10.0.0.0/16
Description: HTTPS to VPC Endpoints
- IpProtocol: tcp
FromPort: 53
ToPort: 53
CidrIp: 10.0.0.0/16
Description: DNS TCP to VPC for name resolution
- IpProtocol: udp
FromPort: 53
ToPort: 53
CidrIp: 10.0.0.0/16
Description: DNS UDP to VPC for name resolution
Tags:
- Key: Name
Value: StackSetMonitor-Lambda-SG
- Key: Purpose
Value: Security group for StackSet Monitor Lambda
VPCEndpointSecurityGroup:
Type: AWS::EC2::SecurityGroup
Condition: CreateVPC
Properties:
GroupDescription: Security group for VPC Endpoints
VpcId: !Ref StackSetMonitorVPC
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 443
ToPort: 443
SourceSecurityGroupId: !Ref LambdaSecurityGroup
Description: HTTPS from Lambda security group
- IpProtocol: tcp
FromPort: 53
ToPort: 53
SourceSecurityGroupId: !Ref LambdaSecurityGroup
Description: DNS TCP from Lambda security group
- IpProtocol: udp
FromPort: 53
ToPort: 53
SourceSecurityGroupId: !Ref LambdaSecurityGroup
Description: DNS UDP from Lambda security group
SecurityGroupEgress:
- IpProtocol: tcp
FromPort: 443
ToPort: 443
CidrIp: 10.0.0.0/16
Description: HTTPS outbound within VPC
- IpProtocol: tcp
FromPort: 53
ToPort: 53
CidrIp: 10.0.0.0/16
Description: DNS TCP outbound within VPC
- IpProtocol: udp
FromPort: 53
ToPort: 53
CidrIp: 10.0.0.0/16
Description: DNS UDP outbound within VPC
Tags:
- Key: Name
Value: StackSetMonitor-VPCEndpoint-SG
- Key: Purpose
Value: Security group for VPC Endpoints
# Dead Letter Queue for Lambda function
StackSetMonitorDLQ:
Type: AWS::SQS::Queue
DeletionPolicy: Delete
UpdateReplacePolicy: Delete
Properties:
QueueName: StackSetMonitor-DLQ
MessageRetentionPeriod: 1209600 # 14 days
KmsMasterKeyId: alias/aws/sqs
Tags:
- Key: Purpose
Value: Dead Letter Queue for StackSet Monitor Lambda
StackSetAlertsTopic:
Type: AWS::SNS::Topic
Properties:
TopicName: StackSetAlerts
DisplayName: StackSet Monitoring Alerts
KmsMasterKeyId: alias/aws/sns
StackSetLogGroup:
Type: AWS::Logs::LogGroup
DeletionPolicy: Delete
UpdateReplacePolicy: Delete
Properties:
LogGroupName: /aws/cloudformation/stacksets
RetentionInDays: 30
KmsKeyId: !GetAtt LogsKMSKey.Arn
LambdaLogGroup:
Type: AWS::Logs::LogGroup
DeletionPolicy: Delete
UpdateReplacePolicy: Delete
Properties:
LogGroupName: /aws/lambda/StackSetMonitor
RetentionInDays: 30
KmsKeyId: !GetAtt LogsKMSKey.Arn
StackSetMonitoringDashboard:
Type: AWS::CloudWatch::Dashboard
Properties:
DashboardName: StackSetMonitoring
DashboardBody: !Sub |
{
"widgets": [
{
"type": "metric",
"width": 24,
"height": 8,
"properties": {
"metrics": [
[ "StackSetMonitoring", "SuccessRate", "StackSetName", "${StackSetName}" ]
],
"region": "${AWS::Region}",
"title": "StackSet Operations",
"period": 300,
"stat": "Average"
}
},
{
"type": "log",
"width": 24,
"height": 6,
"properties": {
"query": "SOURCE '/aws/lambda/StackSetMonitor' | fields @timestamp, @message\n| sort @timestamp desc\n| limit 20",
"region": "${AWS::Region}",
"title": "Latest StackSet Monitor Logs",
"view": "table"
}
}
]
}
# Consolidated rule to catch ALL StackSet events for comprehensive monitoring
AllStackSetOperationsRule:
Type: AWS::Events::Rule
Properties:
Name: AllStackSetOperationsRule
Description: "Rule for monitoring all CloudFormation StackSet operations with failure notifications"
EventPattern: {source: ["aws.cloudformation"], detail-type: ["CloudFormation StackSet Operation Status Change"]}
State: ENABLED
Targets:
- Id: ProcessAllEvents
Arn: !GetAtt StackSetMonitorLambda.Arn
- Id: NotifyFailure
Arn: !Ref StackSetAlertsTopic
InputTransformer:
InputPathsMap:
"stackSetId": "$.detail.stack-set-id"
"operationId": "$.detail.operation-id"
"status": "$.detail.status"
"time": "$.time"
InputTemplate: '"StackSet Event: ID: , Op: , Status: , Time:
AWS DevOps & Developer Productivity Blog