Steadybit Academy

Reliability Advice

Detecting Issues Automatically with Reliability Advice

Even before you run your first experiment, you can use Reliability Advice in Steadybit to scan your targets to detect common reliability gaps.

Reliability Advice is a feature that continually checks whether your targets are adhering to certain policies or best practices.

For example, if you have Kubernetes-based targets, you can turn on Reliability Advice to check how your pods compare to the reliability best practices outlined by Kube Score, an open source collection of configuration policies.

In this lesson, we’ll explore how you can use and customize Reliability Advice to detect new issues automatically.

Utilizing Reliability Advice

Picking up where the last lesson left off, let’s return to the “Explorer” view in Steadybit. In the left-hand menu, you’ll see the option to toggle on Reliability Advice.

When you do that, you’ll notice that some of your targets are now color-coded in the following way:

Red (“Action Needed”): There is an issue that needs to be fixed. No need to run any experiment.
Orange (“Validation Needed”): This could be an issue. Run this recommended experiment to check.
Green (“Implemented”): A previously flagged issue has been fixed and validated.

Without needing to run any experiments, you can make reliability improvements by reviewing all targets marked in red. You will see the specific types of issue and exactly where in your code you need to make an edit.

For example, in the “fashion-bestseller” deployment shown above, there is no pod redundancy. If that one pod fails, your service will become unavailable. In the “Instructions” section, you’ll see specific guidance on how to add replicas to fix this and improve your service reliability.

Once you have adjusted this Kubernetes configuration and deployed it into your cluster, Steadybit will automatically detect this change. Depending on the type of issue, it will update to the next Advice stage, either orange for validation needed with a recommended experiment or green for fixed with no validation needed.

Validating Fixes with Experiments

When an issue is in the “Validation Needed” stage, Reliability Advice will provide you with the relevant template you need to create and run an experiment that validates your fix.

For example, in the “toys-bestseller” deployment shown above, there is validation needed to ensure that during an availability outage, pods across multiple zones are still able to support your service. Just click “Create Experiment” to jump into the “Editor” tab to review the experiment template for this specific use case. You can make adjustments to the experiment design and run it when you’re ready.

Once the experiment runs, the results will either show that the issue still persists and needs attention, or that the issue is no longer a reliability concern and the Advice will update from orange to green, or “Validation Needed” to “Implemented”.

Adding Custom Reliability Advice

As we mentioned earlier, the default Advice in Steadybit is 13 policies based on Kube Score, so it only applies to Kubernetes-based targets. If you want to add custom advice to expand checks for Kubernetes or add checks for other types of Targets, you can do that by using our AdviceKit.

If you’re interested in learning more, you can read AdviceKit documentation here.

Lesson Summary

In this lesson, we outlined why Reliability Advice is helpful for detecting reliability issues fast and validating improvements. We covered the three stages of Advice, from “Action Needed” to “Validation Needed” and “Implemented”. Lastly, we explain how you can add your own Advice to custom checks on your targets.

Module Overview

Learn the basics of the Steadybit Platform: Start exploring your systems, finding reliability issues, and running chaos experiments safely.

Next Lesson →

Up Next

1.8 Building Your First Experiment

In this lesson, we will share how you can get started building your first experiment in the Steadybit editor.

Continue →