aws white logo

Chaos Engineering for AWS

reliability advice - explorer

Test system reliability across the AWS ecosystem & beyond

If you’re running AWS-based applications, your system reliability isn’t guaranteed. If you’re not testing disruptions from dependencies, scaling demand, or inconsistent redundancies; you’re relying on a strategy of hope.

While AWS Fault Injection Service (FIS) can be a helpful tool for initial testing, it has a limited scope. You can run network, resource, and application tests across your AWS systems and beyond with a tool like Steadybit.

With one central reliability testing platform, you can execute experiments to validate your system behaviors across hosts, containers, clusters, databases, services meshes, and more. Install the AWS extension for Steadybit to instantly discover reliability vulnerabilities and verify risks with pre-built experiment templates.

AWS reliability pillar

Achieve AWS operational excellence

Chaos engineering is a critical practice for adhering to the reliability pillar of the AWS Well-Architected Framework. By running chaos experiments over time, you can verify your architecture and workload requirements and ensure continuity as systems change. AWS provides configuration recommendations and you can test whether these are implemented across your organization with Reliability Advice in Steadybit.

Proactive reliability testing can also provide teams with the documentation they need to comply with industry standards like DORA in the European Union. For example, DORA requires organizations to prove that they have disaster recovery plans and have tested them regularly. With chaos tests, your team can test run your incident response playbooks, validate monitoring alerts, and build an audit trail for streamlined compliance.

steadybit - editor view

Map out your performance limits with reliability tests

Push your systems to their limits with controlled chaos experiments. With Steadybit, you have access to a library of AWS experiment templates that enable your team to run complex failure scenarios in minutes. For example, you can simulate an availability zone outage to verify that your load balancer correctly reroutes traffic. Scale up and down your EKS clusters and validate that your application performance meets your expectations.

By testing RDS failover times and EBS performance degradation directly, you ensure your data layer consistently meets its recovery time objectives. Shift reactive SRE work into proactive testing and learning. By finding performance limits earlier in the software development lifecycle, you can prevent critical incidents for your customers.

Find Easy Reliability Wins in Steadybit

In this quick walkthrough, you can tour the Steadybit platform and see exactly how accessible reliability testing can be.

Build reliability experiments for AWS

Use no-code fault injections and health checks to stress test the reliability of your AWS systems.

Attacks
Templates
Targets
aws fis icon

Run an AWS FIS Experiment

Kick off an existing AWS FIS experiment.
aws lambda icon

Fill Diskspace

Fills tmp diskspace of the function.
aws lambda icon

Inject Latency

Run this action to inject latency into the function.
aws lambda icon

Inject Exception

Inject an exception into the function.
network subnet icon

Blackhole Subnet

This attack simulates a network outage of a subnet.
blackhole attack icon

Blackhole Zone

Simulate an outage of an entire availability zone.
aws lambda icon

Block TCP Connections

Block outbound connections to specified hosts during execution.
aws ec2 icon

Change EC2 Instance State

Reboot, terminate, stop or hibernate EC2 instances.
block dns icon

Block DNS

This attack blocks access to DNS servers.
blackhole attack icon

Block Traffic

Block network traffic (incoming and outgoing) for set time.
signal loading icon

Delay Outgoing Traffic

Inject latency into egress network traffic.
aws ecs task icon

Fill Disk

Run this action to write data to the disk.
stress memory icon

Stress Memory

Stress memory with ongoing reallocation.
stress io icon

Stress IO

Generate read/write operation on hard disks.
aws rds icon

Reboot RDS instance

Reboot a single RDS database instance.
application load balancers

Return Static Response

Return a static response for a given load balancer listener.
aws rds icon

Trigger DB Cluster Failover

Trigger failover by promoting a standby instance to primary.
aws rds icon

Trigger DB Instance Stop

Run this action to stop a DB instance for a given time.
elasticache

Trigger Elasticache Failover

Trigger failover by promoting a replica node to primary.
MSK brokers

Trigger MSK Broker Reboot

Leverage the AWS MSK API to trigger a broker reboot.
Apache Kafka logo

Limit Network Threads

Change the number of network threads per broker.

Explore the Action Library

Browse the full catalog of actions.

Steadybit makes chaos engineering easy for teams

With one platform, you can detect issues automatically and run experiments to validate system behaviors.

Chaos Engineering for Your AWS Systems

Install the Steadybit extension for AWS to instantly discover targets and run attacks and checks on your systems.

Read More
Steadybit's Integration with AWS FIS

Utilize your existing FIS tests and expand them with Steadybit's extension framework.

Read More
Running an RDS Reboot Attack

Test your database failover processes with a chaos experiment for RDS.

Read More
steadybit integrations - logo gallery

Use open source extensions to deploy across technologies

Steadybit has a hybrid architecture that enables open source customization. With open source extensions for popular technologies in the Reliability Hub, it’s easy to roll out chaos engineering across systems.

  • Support for any configuration: Cloud, Multi-cloud, On-Prem, Air-gapped, Kubernetes, VMs, Serverless, Service Mesh, Message Brokers, etc.
  • Inject faults and run health checks at the network, resource, and application layers
  • Visualize your systems and group targets with discovered metadata

Get a Personalized Demo

Ready to hear more about Steadybit?

Schedule a demo with our team to see a platform walk-through and get your questions answered.

ufo image around a planet