Extending Steadybit - Part 1 - An Overview

Extending Steadybit - Part 1 - An Overview

Extending Steadybit - Part 1 - An Overview





5 minutes


You can extend Steadybit to make it a perfect match for your systems. We already provide some OSS extensions. This article will give you an overview of the available extension points.

There are no two systems that look the same. Today there is a vast amount of different technologies used in software development. At Steadybit, we aim to support the most common technologies out of the box so that you can get started immediately. But sooner or later, you will encounter some tech Steadybit lacks support for. For this reason, Steadybit has several extension points to add your custom functionalities.

How Extensions work

Extensions implement a well-defined HTTP interface that the agent uses to control the extension. Extensions are deployed alongside the agent on your infrastructure. Steadybit doesn’t care how you implement or deploy the extension. The extensions we’re providing are implemented using Go and packaged as container images.

The agent picks up extensions using auto-discovery or via configuration and reports them to the platform. The platform makes no difference between your custom extension and those provided by Steadybit. We plan to deliver all Steadybit attacks, checks, and other agent capabilities as extensions in the future. Modularizing the agent allows finer-grained permission restrictions and reduces resource consumption by removing unnecessary capabilities from your deployment.

Custom Attacks, Checks, Actions

Attacks, checks, running a load test - all these are actions. So basically, every step in an experiment is an action from an implementation perspective. Attacks act upon targets from the discovery (needed for RBAC), while other actions may or may not do this.

An extension can contribute custom actions by implementing the interface defined in the ActionKit, which also provides type bindings for Typescript and Go.

An action describes itself and is divided into prepare, start, and stop steps you need to implement. If you need to pass around some state between those, the agent manages that state for the extension.

A defined lifecycle is crucial for rolling back attacks and cleaning up any allocated resources. We don’t want to run arbitrary shell scripts and leave a messy system behind.

In addition to these basics, actions may do the following:

Provide artefacts

The action can provide any file to be attached to an experiment run so that the users can inspect the file after the experiment.
For example, the Postman extension uses this to save the summary json file.

Provide metrics and widget configurations

The action can provide metrics that will be recorded during the experiment run. The experiment view will show these metrics when the action contributes a widget configuration.
For example, the Datadog extension uses this to visualize the monitor status in the experiment view.

Provide log messages

The action can contribute log messages to the agent log, which is recorded for the experiment run.
For example, the Postman extension uses this to forward output from the postman collection execution.

Custom Targets

Discovery is where Steadybit looks at all your systems and identifies the targets that may be attacked.

Unless you write a custom attack for already present targets, you will most likely implement the discovery interface in your extension.

A discovery describes itself, the target types it will provide, and implements the identification of targets in the environment.

For example, the AWS extension uses this to discover RDS instances that the reboot RDS attack can then target.

Consuming Steadybit Events

Extensions can also consume events from Steadybit. Using the EventKit is a good choice when you can’t use the platform’s webhooks for some reason, e.g., security. 

You can use the events to write audit logs to some internal system, report events to your monitoring, or do custom notifications.

When the extension describes itself as a listener and implements the corresponding endpoint, it will receive events happening in the platform.

For example, the Datadog extension uses this to publish Steadybit events to Datadog.

Existing Open Source Extensions

We already provide several open-source extensions. It may be worth looking at these if you want to implement your own. You don't need to publish your extension, but we think it would be awesome.

AWS Extension

Provides an attack for state changes of EC2 instances and discovery for RDS instances, and an attack to reboot those.

Kubernetes Extension

Provides an attack to trigger deployment restarts

Kong Extension

Provides a discovery for Kong services and routes and attacks to terminate requests to those.

Datadog Extension

Provides a check collecting monitor status information from Datadog, with optional status verification. Implements an event listener publishing Steadybit events to Datadog.

Prometheus Extension

Provides a metrics action collecting metric information from Prometheus, with optional metric checks.

Postman Extension

Provides an action to execute a Postman collection via the Postman Cloud API.


Starting with the overview of the available extension points and their use cases, we will guide you on how to write an extension in future blog posts.