An Introduction to AWS Step Functions – Part 2

In Part 1, I went through a whirlwind tour and left example code on how to create a Lambda function and a Step Function. The thing is, I did not explain how the code works. That is what this article is about. I am going to be using the same code from the previous post, but will be going through the code and explaining it. We shall see how well the formatting works.

Jumping right into it. This is the entire code for the StepFunction from the previous post. As of right now, this code has to be json. I did not find any information about supporting yaml. Here is the documentation provided by AWS. It is pretty thorough, but it could be a bit clearer. My explanation could be better or worse.

{
  "Comment": "A Retry example of the Amazon States Language using an AWS Lambda Function",
  "StartAt": "LambdaFunction",
  "States": {
    "LambdaFunction": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789012:function:aws-serverless-repository-hello-w-helloworldpython-1JQ8TEEDUAHCE",
      "ResultPath": "$.taskresult",
      "Retry": [
        {
          "ErrorEquals": ["CustomError"],
          "IntervalSeconds": 1,
          "MaxAttempts": 2,
          "BackoffRate": 2.0
        },
        {
          "ErrorEquals": ["States.TaskFailed"],
          "IntervalSeconds": 30,
          "MaxAttempts": 2,
          "BackoffRate": 2.0
        },
        {
          "ErrorEquals": ["States.ALL"],
          "IntervalSeconds": 5,
          "MaxAttempts": 5,
          "BackoffRate": 2.0
        }
      ],
      "Next": "ChoiceState"
    },
    "ChoiceState": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.taskresult.value1",
          "StringEquals": "value1",
          "Next": "SuccessState"
        },
        {
          "Variable": "$.taskresult.count",
          "NumericLessThan": 5,
          "Next": "LambdaFunction"
        }
      ],
      "Default": "FailState"
    },
    "SuccessState": {
      "Type": "Succeed"
    },
    "FailState": {
      "Type": "Fail",
      "Cause": "Invalid response.",
      "Error": "ErrorA"
    }
  }
}

Looking at this for the first time, it all seems a bit overwhelming. However, after breaking it down, you will see that while you have to be precise with it, Step Functions are fairly easy to read. (Writing them is still a PITA, but that is just what it is) When you get down to it, the Step Function configuration file only contains 3 sections: ‘Comment‘, ‘StartsAt‘, and ‘States‘. That is it.

Let us take a quick look at the the first two mentioned above, Comment and StartsAt. The former of these is just a comment so you know what the purpose of the Step Function is. On the other hand, you could use this for whatever comment you like. Then, there is StartsAt. StartsAt, actually links to a State. And the States section is what contains all the real meat of the Step Function. The value that is used for StartsAt is the name of one of the States.

States

States are where everything really happens when it comes to Step Functions. They are the individual units that when strung together, make the magic happen. When you look at it from this perspective, it sounds ridiculously easy. But, just like everything else, the devil is in the details. The key to understanding States is to know what the various components that make up a State, and the type of states that are available.

Since this is an introduction, and based off of my code here, I am not going to go through all the options. However, there are docs from Amazon on it. The think about it is, it is not simple. Based on the docs, the only required field is Type. Great, but what are the various types? How do I link them together to make a cohesive whole? What are these other options and how do I pass variables from one section to the next and use them over and over again in a loop? This information is available, but it is scatter all over the documentation and you have to piecemeal it all together to figure out what is available.

What are the State types?

  • Pass – used to take input and pass to output. Basically, used to debug step functions, so you know what the heck is happening between States.
  • Task – this is where the work gets done. It can be a Lambda function or any other supported service. This will take your input and from there provide output to another State to be consumed
  • Choice – handles branching in your Step Function by looking at the output from the previous State and sending the Step Function to perform the next decided upon State in the workflow.
  • Wait – allows you to put a pause in your Step Function to wait for an action to finish, like deploying CloudFormation or some other long running task.
  • Succeed – the state machine has finished with success. This one is actually simple and does just what it says.
  • Fail – the end state where all has not gone according to plan. This is the opposite of Succeed, and allows for some reporting on what went wrong.
  • Parallel – run States at the same time. I have not yet worked with this, so I do not have as much information as I would like on this type of State

In the example above, I use only 4 of the States from above, Task, Choice, Succeed, and Fail. Mainly because I wanted to touch on a good bit of the basics without going to far down the rabbit hole. For my example, my Task was a Lambda function. To me this is the most logic way to use step functions, but there are probably many more that I have not considered. I also wanted to employ a loop. This being that I wanted to test flow control, and passing variable to and from Lambda functions. I could have strung the Lambda functions together in a straight line, but I wanted to test out branching as well.

The big factor to remember is that the order of the States does not matter. What matters is linking which State comes next. You could use a single Choice State to manage the flow through the entire Step Function, and based upon the input move onto any number of other States or back to a previous one.

      ],
      "Next": "ChoiceState"
    },

This little block of code is a great example. The Next keyword is used, and the following word is the name of the next State block that is going to be executed. The name of each State is identified by the start of the json template block. In the example above, I am not going to a State of type Choice, but have created a State that is called ChoiceState. In retrospect, much of this could have been done cleaner, but it was an example that I put together rather quickly. This is the declaration:

    },
    "ChoiceState": {
      "Type": "Choice",
      "Choices": [

I am going to end here for today. The next section is going to be about passing data from the Step Function to Lambda, and how to reuse it. But, I want to work on a smaller block of code and focus on just the one thing.

Getting Started with AWS Step Functions Part I

AWS came out with Step Functions a few years ago, and up until recently, I have not had the opportunity to dive in and give them a try. Yes, I could build my own pipeline or state machine, but the idea behind Step Functions is that it does most of the heavy lifting for you. That, and it ties into other AWS services. As such, I decided to dive into getting started, and looked at the demo options and walkthroughs that were available. None of them met my needs, so I rolled my own.

The idea is to see how I can create a Step Function that will run multiple loops, and call a Lambda function multiple times. What I wanted to test was the following:

  • Pass Variables into the Step Function and see how they are handled
  • Call a Lambda function multiple times
  • Create a loop using the Step Function DSL
  • Test output from Lambda and make a decision based upon it
  • Figure out any gotchas and how to trigger Step Functions

Let’s dive in. Now, this is the final result. It took me a few iterations to actually get to this point. Smarter people than I might be able to get it done on one go, but not I.

Lambda Code

I came up with a simple Lambda function written in Python 3.6. All that I wanted to do was to perform a loop with Step Functions, and then get output the values. Simple. And as you can see, this code is pretty simple. It could be streamlined, but it was quick and easy to write.

def lambda_handler(event, context):
    print('value1 = ' + event['key1'])
    print('value2 = ' + event['key2'])
    print('value3 = ' + event['key3'])
    taskresult = event.get('taskresult', None)
    if taskresult is None:
        count = 0
    else:
        count = taskresult.get('count', None)
    
    if count is None:
        count = 0
    else:
        count = count +1
        
    if count < 5:
        output = {
            'count' : count,
            'value1' : 'ThereIsNoSpoon';
        }
    else:
        output = { 'value1' : event['key1'],
                    'value2': event['key2'],
                    'count' : count }
    return output

Now we need to move onto the body of what we are working on, and that would be the Step Function. Step Functions have their own language or domain specific language (DSL) that is used to define the state machine. I wanted more than just a “Hello World” example. The idea was to loop through a step functions. Make sure that I could call it multiple times, and then either go to a success or failed state

AWS Step Function Code

{
  "Comment": "A Retry example of the Amazon States Language using an AWS Lambda Function",
  "StartAt": "LambdaFunction",
  "States": {
    "LambdaFunction": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789012:function:aws-serverless-repository-hello-w-helloworldpython-1JQ8TEEDUAHCE",
      "ResultPath": "$.taskresult",
      "Retry": [
        {
          "ErrorEquals": ["CustomError"],
          "IntervalSeconds": 1,
          "MaxAttempts": 2,
          "BackoffRate": 2.0
        },
        {
          "ErrorEquals": ["States.TaskFailed"],
          "IntervalSeconds": 30,
          "MaxAttempts": 2,
          "BackoffRate": 2.0
        },
        {
          "ErrorEquals": ["States.ALL"],
          "IntervalSeconds": 5,
          "MaxAttempts": 5,
          "BackoffRate": 2.0
        }
      ],
      "Next": "ChoiceState"
    },
    "ChoiceState": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.taskresult.value1",
          "StringEquals": "value1",
          "Next": "SuccessState"
        },
        {
          "Variable": "$.taskresult.count",
          "NumericLessThan": 5,
          "Next": "LambdaFunction"
        }
      ],
      "Default": "FailState"
    },
    "SuccessState": {
      "Type": "Succeed"
    },
    "FailState": {
      "Type": "Fail",
      "Cause": "Invalid response.",
      "Error": "ErrorA"
    }
  }
}

The way that this code works is as follows. Everything works around States. So, you have to move from State to State. This is a key concept when it comes to Step Functions. Now, there are multiple State types, but I am not going to go into that now. The key factor is that you will go through and loops if a proper return value is not returned. Looking at it now, it looks like a bunch of gobbledygook. I am going to have to come back and write up how this works later.

This is what the visual representation looks like when viewed in the AWS Step Function page. There is a defined ‘Start’ and ‘Stop’. The other stages match what was named in the previous section. The code works to present a model that you can follow.

The cool think about AWS Step Functions is that they guarantee a run. And in a situation where you need to ensure that the code is run, and you need a guarantee. This is mostly due to the cost that is associated with it. Running Lambda that Triggers on SQS would be cheaper, but not as easy to ensure.

Back on with our stuff now. We are looking at how we execute the AWS Step Function. Now we need to execute it. Right now, I am not going to go into the logic around passing variables around. Needless, you will need to understand that when writing your own, and I am going to have to revisit it.

Execution. A couple of items to note.

  • Each execution has to have a unique name.
    • Note, this will bite you when you are testing, and think about this when executing it via automation.
  • It takes in an action just like Lambda, via json
  • Making a small change in the inputs can cause madness
{
  "key1": "value1",
  "key2": "value2",
  "key3": "value3"
}

The output will also be in json, and you can see the results in the visual display.

{
  "key1": "value1",
  "key2": "value2",
  "key3": "value3",
  "taskresult": {
    "value1": "value1",
    "value2": "value2",
    "count": 5
  }
}

This is what the output looks like.

I will go into more detail on the breakdown of the Step Function in the next post. There is a lot to be covered, and this just scratches at the surface.

Configuring AWSCLI and Python on Windows

Trying and doing stuff on a Windows 10 machine has become a rather interesting experiment. It started as a place to be able to play video games and have access to a few programs that are not easily available on Linux, to a test of seeing if I could now do all the things on Windows 10 that I could do on Linux.

Turns out, I wanted to test out the Cloud Directory service from AWS. I figured the 2 easiest ways to do this would be via the AWS CLI and Python. It did not occur to me that I had neither of these installed until I opened up Cmder.exe and typed

aws
and then
python
and both came back with the not found.

Wait, what? Where are my programs!

So, now I need to install and configure both of these. The test is to see how easy or difficult it is to get this setup on this Windows machine. Quick list of the normal steps that I take to install the AWS CLI on most any Linux machine.

  1. Install Python
  2. Configure a virtual environment to hold my cli tools
  3. Use pip to install aws cli
  4. Configure aws cli
  5. Test that it all works

The first step in setting all of this up is to get Python installed on your machine. The AWS cli is based on Python, and as such you need to have python installed in order to use it. Now, there are some that will install the AWS cli to the root of the machine, and use the system’s globally installed Python. Due to having worked on multiple versions of Python at the same time, and projects that use different libraries, I almost always setup an Virtual Environment to run my Python programs and other sundry programs from. This way, I don’t cross contaminate my streams, and have a clearly defined idea about which versions I am using on different projects.

Installing Python

This is a relatively straightforward task. Click on the Python installer that suits your needs, download it, and follow the install prompts. I chose Python 3.6.7 because it is the version that I am already using when running some Lambda programs in AWS, and because there are some new changes in 3.7 that have broken a few other libraries. On big one is ‘async’ and ‘await’ now becoming keywords. Follow the prompts to install Python and the restart your favorite command line tool. I run bash via git, and use cmder.exe as my shell program.

Once you have it installed you should be able to run the following to verify that you have install python on your workspace. python --version This should output ‘Python 3.6.7’ or whichever version of  Python that you installed.

Setting up Virtual Environment and AWS CLI

The next part is to install the virutal environment and to then use that to install the aws cli. This should be able to be done with just a few commands, and then you should be up and running.  First we run python and install the virtual environment. Then we activate the environment and install the AWS cli. It is just a few simple commands, and you should then be up and running.

c:\ericv\dev\python -m venv p367
c:\ericv\dev\> p367\Scripts\activate.bat
(p367) c:\ericv\dev\> pip install awscli
(p367) c:\ericv\dev\> aws help

And bamm! you are done. Now, there is always configuring aws to use params, but that is another issue. But, it took me longer to write this up, than it took me to do the install.  That in of itself is a good thing to know. Now, the question is if I will run into any more problems. But, so far so good.


Why go Serverless

The question is not why you would want to go serverless, the question is why would you not go serverless.

Let us think about this for just a second. In a typical environment, you would not only have to think about the flow of the application and how its pieces fit together, but also to the underlying architecture. Now that DevOps is becoming more common, it is the developers that sometimes find themselves having to worry about these issues. Or in a more traditional shop, there is the team that the application is “thrown over the fence” to that has to deal with the underpinnings. In other companies, it is the DevOps group that works with the Dev team on designing and handling the infrastructure. But, no matter how you slice it, the infrastructure is a huge headache that has to be managed.

In a serverless world, you have to ensure that the application runtime that you are programming for, matches your development environment. But for almost all intents and purposes, that is where it ends. When dealing with servers, there are a number of factors that you have to take into account. For a moment, consider all the items that have to considered when dealing with a classic server architecture:

  • Physical or Virtual Hardware
  • Operating Systems
    • Patches
    • Package installation
  • Montioring
    • server uptime
    • server performance
  • Scheduling such as cron.

The list goes on and on. This is just a precursory list to get you thinking about it. But when dealing with servers, there is no magic button to push that takes care of all of your issues. You have to deploy the system, patch it, configure it for the application, install the application, add it to inventory, and add monitoring. All this is time that takes away from the application that you are trying to develop. Oh, and I failed to mention all the security considerations that must also be taken into account. (And yes, you still have to think about security when dealing with serverless).

The normal response to this is that my application can never be serverless. There are some rare occasions where this is true, but in many situations this is just not simply true anymore. With various services from Amazon and Azure, you can make serverless work for a wide variety of applications. My area of specialty is in AWS, and as such, most of my examples will come from there. If you wanted to make a serverless application in AWS you would probably use the following services:

  • Lambda — the serverless code execution platform
  • Cloudwatch — Monitoring is good
  • API Gateway
  • S3 — for hosting the front end static content or powered by a javascript framework
  • DynamoDB
  • CloudFront
  • Cognito

Using theses tools in various ways can allow you to create applications of all sorts. In fact, I believe that tying in other parts of the AWS family of tools would provide you with the ability to create serverless applications in a way that I have never thought of.

But, how do you start using serverless? For this, it is best to look at your current environment, find a task that is short lived and runs on a machine, and then move it. Or, an even better use case would be to find a repetitive task that needs to be automated, and try doing it with Lambda. This will allow you to get your feet wet with something that is already being accomplished while not breaking existing support models. Finding something that is simple is the best way to get a win under your belt, and give you the confidence to move on to continue to expand your use of it.

The big thing to remember is that all this serverless stuff can be used in all areas of the stack. It can be used for application development to application and server support. For example, automating the backups of long lived servers could be done via lambda. This is an infrastructure helper. Converting images into thumbnails and providing them for use to the application, that would be an application specific task. And, that could be one that supports a classic application running on servers.

Do not let the idea of serverless scare you off. Take a moment, see where you could make it fit, and give it a try. Worse case, you don’t like it, and have lost a few cycles. Best case, you have found a new tool that you can add to your quiver.

 

Building a Windows 2012R2 Instance in AWS with Terraform

Terraform is an application by HashiCorp that is designed to treat infrastructure as code. Lately, I have been working with it to begin automation of resources within AWS, and have been quite pleased.

Lets get started with building out a Windows 2012 R2 server with Terraform on AWS.

You are going to need to have the following items configured in AWS in order for this to work, as I am not going to be using Terraform to build out these items.

Since the purpose of this test is not to create a VPC, subnets, security_groups, etc, all of those will need to be created beforehand. You will then need the identifiers from these later, for use in building out the server. I always recommend building out your VPC in its own stack, and never mixing it with others. It is a vital piece of the infrastructure that should be touched as little as possible.

Items Needed for this Demo:

  • IAM Instance Role
  • VPC Security Group ID
  • Subnet ID
  • Key to use for Instance Creation

For this example you are going to need just two files. These files are the variables.tf and main.tf. Technically, you can name them any good old name you want, as long as they end in .tf. I almost think it is more confusing to go with main.tf and variables.tf. As such, I am going to go ahead and change the name of the files. I use Atom for editing my files, and so descriptive filenames are nice.

mkdir stand_alone_windows_2012
cd stand_alone_windows_2012
touch win2012_test_instance.tf
touch win2012_test_variables.tf

Now let us add the file that we are going to need for the variables. (I will admit that there are some values hard coded into the second file, but that is because I have been testing this to get it working.

variable "admin_password" {
  description = "Windows Administrator password to login as."
}

variable "aws_region" {
  description = "AWS region to launch servers."
  default = "us-west-2"
}

# Windows Server 2012 R2 Base
variable "aws_amis" {
  default = {
    us-east-1 = "ami-3f0c4628"
    us-west-2 = "ami-b871aad8"
  }
}

variable "key_name" {
  description = "Name of the SSH keypair to use in AWS."
  default = {
    "us-east-1" = "AWS Keypair"
    "us-west-2" = "AWS Keypair"
  }
}

variable "aws_instance_type" {
  default = "m4.large"
}

variable "aws_subnet_id" {
  default = {
    "us-east-1" = "subnet-xxxxxxxx"
    "us-west-2" = "subnet-xxxxxxxx"
  }
}

variable "aws_security_group" {
  default = {
    "us-east-1" = "sg-xxxxxxxx"
    "us-west-2" = "sg-xxxxxxxx"
  }
}

variable "node_name" {
  default = "not_used"
}

You will need to go through this file, and update the variables as needed, and create any resources that you do not happen to have. This would include subnets and vpc based security groups.

The next file is win2012_test_instance.tf. This is what does all the heavy lifting. In my example, I am also installing chef, but that is because I plan on automating my entire infrastructure, not just server creation.

# Specify the provider and access details
provider "aws" {
  region = "${var.aws_region}"
}

data "template_file" "init" {
    /*template = "${file("user_data")}"*/
    template = <
  winrm quickconfig -q & winrm set winrm/config/winrs @{MaxMemoryPerShellMB="300"} & winrm set winrm/config @{MaxTimeoutms="1800000"} & winrm set winrm/config/service @{AllowUnencrypted="true"} & winrm set winrm/config/service/auth @{Basic="true"}


  netsh advfirewall firewall add rule name="WinRM in" protocol=TCP dir=in profile=any localport=5985 remoteip=any localip=any action=allow
  $admin = [ADSI]("WinNT://./administrator, user")
  $admin.SetPassword("${var.admin_password}")
  iwr -useb https://omnitruck.chef.io/install.ps1 | iex; install -project chefdk -channel stable -version 0.16.28

EOF

    vars {
      admin_password = "${var.admin_password}"
    }
}

resource "aws_instance" "win2012_instance" {
  connection {
    type = "winrm"
    user = "Administrator"
    password = "${var.admin_password}"
  }
  instance_type = "${var.aws_instance_type}"
  ami = "${lookup(var.aws_amis, var.aws_region)}"
  key_name = "${var.key_name}"
  tags {
    Name = "MY_DYNAMIC_STATIC_NAME",
    Env = "TEST"
  }
  key_name = "${lookup(var.key_name, var.aws_region)}"
  iam_instance_profile = "STATIC_ROLE_NAME_SHOULD_BE_A_VARIABLE"
  tenancy = "dedicated"
  subnet_id = "${lookup(var.aws_subnet_id, var.aws_region)}"
  vpc_security_group_ids = ["${lookup(var.aws_security_group, var.aws_region)}"]
  /*user_data = "${file("user_data")}"*/
  user_data = "${data.template_file.init.rendered}"
}