Step Functions: Unix Pipes in the Cloud
Welcome to the second post in a series exploring the foundational components for building a platform on AWS that enables Proactive Ops.
Pipes are a key component of Unix. Pipes pass text streams from one process to the next. This allows users to execute a series of commands where the output of one command becomes the input of the next. Pipes can help you find the 5 processes consuming the most CPU time. They can also fetch a webpage and output all the external links in alphabetical order. The possibilities are almost endless.
Back in 2007 Yahoo! was so inspired by Unix pipes they released their product “Pipes”. Until it was shutdown in 2015 Yahoo Pipes allowed users to fetch data from the web and “remix” it by passing it through a series of commands. This was the first product that implemented the pipes philosophy for the web that built a mass user base.
Streaming JSON not Text
Modern workflow automation tools embraced the Unix pipes philosophy. Some examples include the consumer focused IFTTT, more business focused Zapier and Make/Intregomat, data centric hevo, open core n8n, and open source huginn. These low/no code platforms offer tools for building workflows that react to events. These JSON events are processed and transformed by the various steps in a user defined workflow.
The principle of Unix pipes is that programs should accept streams of text and output streams of text. In 2023, much of the text streamed to clients over the web is structured data in the form of JSON. Most modern languages and frameworks support JSON out of the box. Instead of text in and out, these days online it is more of a case of JSON in, JSON out.
Step Functions as Pipes
At Re:Invent 2016, Amazon joined the low code workflow party when they released Step Functions. AWS Step Functions is another workflow focused implementation of Unix pipes. Like the other tools listed above, it includes a visual workflow builder. Being an Amazon product, Step Functions integrates most AWS services, including support for interacting with around 10 000 of the platform’s API endpoints.
JSON is everywhere in the cloud and this is especially true in Amazon Step Functions. The Amazon States Language used for defining Step Functions state machines is an elaborate JSON schema. Each step in a Step Functions state machine takes JSON as an argument and expects to get a JSON object back. AWS Step Functions is the cloud version of Unix pipes.
Instead of being a basic cloud implementation of Unix pipes, Step Functions takes this pattern and extends it into a powerful framework for building automated processes.
Composition
To get the most out of Step Functions, each workflow should be composed of small reusable steps. Often Lambda functions execute all the steps of a task end to end. A state machine can manage the execution of smaller steps in a more flexible flow.
Amazon’s small but useful set of intrinsic functions allow for some logic to exist without the need for a Lambda call. Rather than reproduce the documentation, I will just call out the hashing functions and UUID generator as my favourite intrinsic functions. Let’s hope Amazon continues to expand this library of functions.
Often, calling intrinsic functions isn’t enough to build a functional workflow. Amazon provides integrations with various AWS services. This is done either via optimised integration such as invoking a Lambda function or triggering another Step Functions state machine. Where optimised integrations don’t exist, SDK integrations provide a more rudimentary path for integrating with AWS services.
When you need business logic or to call a third party service, there is Lambda functions. Last week I shared my thoughts about building small reusable Lambdas and calling them using the Lambda Invoke API. Step Functions is a great place to use such Lambda functions. Steps calling smaller functions encourage code reuse and rapid iteration.
Combining these three components allow teams to build powerful state machines with little code.
Paths are Amazon’s jq
Working with raw JSON on the command line isn’t for the feint of heart. jq
makes manipulating JSON objects on the command line a little easier. In Step Functions Paths allow manipulation of the JSON data on the way in and out of each step in the state machine. Paths use JsonPath syntax for specifying selectors.
Paths allow manipulation of the data at every stage of its journey through the step. This includes as it enters a step, what values are passed as properties, what the step calls such as a Lambda function or AWS API endpoint, the result of the function call and the response from the step. This provides a high degree of flexibility within a flow.
Visual Debugging
Released less than two years ago, Amazon’s Workflow Studio, provides a visual editor for building workflows. This is a great place for new users to start exploring the power of Step Functions. Users drag and drop components from a palette of available options to build out the workflow.
Workflow Studio wasn’t the first visual component in Step Functions. There has always been a heavy emphasis on displaying information visually in Step Functions. The product rendered flow charts for the statement from launch. The workflow execution state is communicated through colouring the steps in the flow chart. This rendering of the status has become more polished over time.
A well designed Step Function is often a lot easier to debug than a traditional application or even a Lambda function. When there is an issue with an application, debugging starts with the logs. In the case of a failed state machine, the Step Functions console highlights which step failed. The input and output of the step are a click away, as are the error message and logs.
A misbehaving state machine is just as easy to debug. It is possible to see the flow of the state machine. If it takes the wrong execution path the full context for the selected path is available immediately. There’s no need to add extra code for debugging, it’s all there, even before you realise you needed it.
Feature Flagging
These days all the cool kids are doing feature flags. Inside a state machine it is possible to splice in an experiment. There are several ways of implementing this. One way is to add an extra step that generates a random number, followed by a choice step that selects a path based on the value of the random number. The threshold value can be stored in SSM Parameter Store to allow adjustment without further deployments. Another approach is to inspect the value of a property and choose an execution path.
Standard or Express
Step Functions come in two flavours - standard and express. Amazon offers both flavours under the Step Function brand and use Amazon States Language to define a workflow. They behave very differently.
Standard is the default version of Step Functions. These functions can track state for up to a year, while express flows have a maximum execution time of 5 minutes. Express Step Functions are designed for high volume workloads. When invoked asynchronously, express workflow guarantees at least once execution, where in synchronous mode it is at most once, while standard workflows guarantee only once execution.
There are other differences between the express and standard Step Functions. Conceptually the express state machines can be thought of as a fancy way of defining a Lambda function.
It Comes at a Price
There’s always a catch. In the case of Step Functions, it is cost. For standard workflows, each account gets 4000 transitions per month for free. After that, transitions are charged at 0.000025USD each or 25USD per million in most regions. For express workflows, the pricing is time based and matches that of Lambda functions.
While Step Functions can get expensive in high transaction volume environments, it can be expensive to have engineers properly instrument applications. How much does an extra couple of hours of downtime for a critical workflow cost your organisation? Could that be avoided if someone can quickly and easily see where the problem is?
In this post I’ve only scratched the surface of Amazon Step Functions. Unlike a quick bash one liner, Step Functions allows teams to build far more powerful solutions. Rather than being quick throwaways to get the job done, Step Functions can be the foundation of a critical business process. The additional cost may be worth it if it allows your team to iterate more tightly and build better tooling.
If you’re not familiar with Step Functions, start learning how you can use it to build automated workflows. In future posts, I will share some example workflows that can help your team catch issues before they become major problems. 🌊
Need Help?
If you want to adopt Proactive Ops, but you're not sure where to start, get in touch! I am happy to help get you.
Proactive Ops is produced on the unceeded territory of the Ngunnawal people. We acknowledge the Traditional Owners and pay respect to Elders past and present.