Setting up the cfnmason project for Python

Cfnmason is yet another tool that can be used to manipulate cloudformation stacks. It is not designed to be a replacement for CloudFormation like Terraform, but as a means of making building and managing them easier. I had written a version of this ages ago in Ruby, but with most of my work now being in Python, I am creating a new version in Python. As I have never created an exportable Python package, this will track the process of building and releasing a new PyPi package.

Oh, and to keep things interesting, I am doing this on a mix of Windows 10 and Linux.

Creating the default layout using Poetry

The first thing that we need to do is to create the base package. I could do it by hand, but I want to try out the Poetry package and see if I will hate the decision later.

c:\dev> poetry new cfnmason
...
c:\dev\cfnmason> tree \f \a

|   pyproject.toml
|   README.rst
|
+---cfnmason
|       __init__.py
|
\---tests
        test_cfnmason.py
        __init__.py

Init a new Git repo and add a .gitignore file

These are more civilized times. As such, I almost always create a git repository when I am working on a project, even small ones. They may or may not be public, and it can fluctuate on which platform I use to host my code. This time I am opting for Github, and have uploaded the code.

I have been using gitignore.io to generate base .gitignore files for ages. Type in a few operating systems, the language you are coding for, any IDEs, etc, and you can have a useful .gitignore file right out of the gate. Sure you can do it by hand, but this is quick and easy, and it can always be edited later.

Modifying the Readme file

Poetry starts you with a README.rst file. I don’t know about you, but I have been working with MarkDown for ages. It is common on a number of platforms and there is support for it in a number of editors. I understand the RST files are really designed for technical documentation, but I can burn that bridge later. For now, I need a decent starting point.

We could have started by writing out the template by hand. This would have been long and tedious. Instead, I am using a template. By using a template, I am up and running quickly. I can add and remove parts that I do or do not need, and hopefully, I will not forget a major part. As this is the first pass, I am not going to update the entire thing, but at least get the project name in and the fact that I am working on it.

The license file

The last thing needed before I start working on the code is the license file. Which license you choose is up to you. Personally, I like to use the MIT license. It lets people do what they want. As this is an opensource project, I feel that people should be able to use it as much or as little as they want.

The easiest way to do this is using the Github console. Just add a new file via the web interface and name it LICENSE or LICENSE.md in all caps. Then you will have the option to choose from a list of known licenses.

That is it and next steps

Hmmm. This took longer to write than the work to get the base package up and running. And, that is going to be expected. But, it should also show some of the thoughts and considerations that are needed when creating a new project. Especially if it is going to open for the world.

With that I will leave you to it. In the next couple of days, I am going to start adding in the libraries that are needed to begin working on the project. As I have written this in Ruby before, I have a rough layout in my head of how I want the application to work. And, I know what libraries are needed to meet the core functionality.

The question that I have for myself is if I should take the time to ensure that I update the design docs. Having a design doc is a great way to ensure that you are not missing any features. I am going to have to sleep on it.

An Introduction to AWS Step Functions – Part 2

In Part 1, I went through a whirlwind tour and left example code on how to create a Lambda function and a Step Function. The thing is, I did not explain how the code works. That is what this article is about. I am going to be using the same code from the previous post, but will be going through the code and explaining it. We shall see how well the formatting works.

Jumping right into it. This is the entire code for the StepFunction from the previous post. As of right now, this code has to be json. I did not find any information about supporting yaml. Here is the documentation provided by AWS. It is pretty thorough, but it could be a bit clearer. My explanation could be better or worse.

{
  "Comment": "A Retry example of the Amazon States Language using an AWS Lambda Function",
  "StartAt": "LambdaFunction",
  "States": {
    "LambdaFunction": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789012:function:aws-serverless-repository-hello-w-helloworldpython-1JQ8TEEDUAHCE",
      "ResultPath": "$.taskresult",
      "Retry": [
        {
          "ErrorEquals": ["CustomError"],
          "IntervalSeconds": 1,
          "MaxAttempts": 2,
          "BackoffRate": 2.0
        },
        {
          "ErrorEquals": ["States.TaskFailed"],
          "IntervalSeconds": 30,
          "MaxAttempts": 2,
          "BackoffRate": 2.0
        },
        {
          "ErrorEquals": ["States.ALL"],
          "IntervalSeconds": 5,
          "MaxAttempts": 5,
          "BackoffRate": 2.0
        }
      ],
      "Next": "ChoiceState"
    },
    "ChoiceState": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.taskresult.value1",
          "StringEquals": "value1",
          "Next": "SuccessState"
        },
        {
          "Variable": "$.taskresult.count",
          "NumericLessThan": 5,
          "Next": "LambdaFunction"
        }
      ],
      "Default": "FailState"
    },
    "SuccessState": {
      "Type": "Succeed"
    },
    "FailState": {
      "Type": "Fail",
      "Cause": "Invalid response.",
      "Error": "ErrorA"
    }
  }
}

Looking at this for the first time, it all seems a bit overwhelming. However, after breaking it down, you will see that while you have to be precise with it, Step Functions are fairly easy to read. (Writing them is still a PITA, but that is just what it is) When you get down to it, the Step Function configuration file only contains 3 sections: ‘Comment‘, ‘StartsAt‘, and ‘States‘. That is it.

Let us take a quick look at the the first two mentioned above, Comment and StartsAt. The former of these is just a comment so you know what the purpose of the Step Function is. On the other hand, you could use this for whatever comment you like. Then, there is StartsAt. StartsAt, actually links to a State. And the States section is what contains all the real meat of the Step Function. The value that is used for StartsAt is the name of one of the States.

States

States are where everything really happens when it comes to Step Functions. They are the individual units that when strung together, make the magic happen. When you look at it from this perspective, it sounds ridiculously easy. But, just like everything else, the devil is in the details. The key to understanding States is to know what the various components that make up a State, and the type of states that are available.

Since this is an introduction, and based off of my code here, I am not going to go through all the options. However, there are docs from Amazon on it. The think about it is, it is not simple. Based on the docs, the only required field is Type. Great, but what are the various types? How do I link them together to make a cohesive whole? What are these other options and how do I pass variables from one section to the next and use them over and over again in a loop? This information is available, but it is scatter all over the documentation and you have to piecemeal it all together to figure out what is available.

What are the State types?

  • Pass – used to take input and pass to output. Basically, used to debug step functions, so you know what the heck is happening between States.
  • Task – this is where the work gets done. It can be a Lambda function or any other supported service. This will take your input and from there provide output to another State to be consumed
  • Choice – handles branching in your Step Function by looking at the output from the previous State and sending the Step Function to perform the next decided upon State in the workflow.
  • Wait – allows you to put a pause in your Step Function to wait for an action to finish, like deploying CloudFormation or some other long running task.
  • Succeed – the state machine has finished with success. This one is actually simple and does just what it says.
  • Fail – the end state where all has not gone according to plan. This is the opposite of Succeed, and allows for some reporting on what went wrong.
  • Parallel – run States at the same time. I have not yet worked with this, so I do not have as much information as I would like on this type of State

In the example above, I use only 4 of the States from above, Task, Choice, Succeed, and Fail. Mainly because I wanted to touch on a good bit of the basics without going to far down the rabbit hole. For my example, my Task was a Lambda function. To me this is the most logic way to use step functions, but there are probably many more that I have not considered. I also wanted to employ a loop. This being that I wanted to test flow control, and passing variable to and from Lambda functions. I could have strung the Lambda functions together in a straight line, but I wanted to test out branching as well.

The big factor to remember is that the order of the States does not matter. What matters is linking which State comes next. You could use a single Choice State to manage the flow through the entire Step Function, and based upon the input move onto any number of other States or back to a previous one.

[code]
],
“Next”: “ChoiceState”
},
[/code]

This little block of code is a great example. The Next keyword is used, and the following word is the name of the next State block that is going to be executed. The name of each State is identified by the start of the json template block. In the example above, I am not going to a State of type Choice, but have created a State that is called ChoiceState. In retrospect, much of this could have been done cleaner, but it was an example that I put together rather quickly. This is the declaration:

[code]
},
“ChoiceState”: {
“Type”: “Choice”,
“Choices”: [
[/code]

I am going to end here for today. The next section is going to be about passing data from the Step Function to Lambda, and how to reuse it. But, I want to work on a smaller block of code and focus on just the one thing.

DevOps is more than just about developers

There. It has been said. Take a moment, and think about it.

There is a push in the community to talk about Continuous Integration / Continuous Deployment (CI/CD), and tying this all to the DevOps movement. About what practices are used to get code to market the fastest, and with minimal amount of code defects. And, I will admit, that I agree wholeheartedly, that CI/CD or some sort of pipeline is important to allow software development teams to move fast and innovate. Or, to just roll out code without doing it every 6 months. I would almost argue that CI/CD is more akin to Agile and Scrum methodologies than it is to DevOps.

When it comes to DevOps, there is a side to this equation that is not commonly being addressed, Infrastructure and Operations.

Infrastructure Ops

InfraOps, whatever you want to call it. This is the glue that holds together all the services that applications run on top of. It is the items that people take for granted when it works, but cry foul when things go down. When servers go down at 2 a.m. InfraOps is what keeps the lights on.

It is easy to dismiss this side of the DevOps house. But, lets quickly discuss all the items that we need to cover here.

Infrastructure Operations List

  • System Deployment
    • Operating systems
    • Patching
  • Virtualization (Docker, AWS, Azure)
  • CMDB
  • Monitoring
  • Software Deployment
    • 3rd Part
    • In house
  • Load Balancers
  • Hardware Failover
  • Disaster Recovery
  • Caching Servers
  • Database Systems
  • Development Environments
  • Backups

This is by no means an exhaustive list of all the items that must be handled for a solution to be properly deployed and supported. But, the idea is to bring to light the side of the DevOps world that is often overlooked. It does not matter how good your code is if you don’t have an infrastructure that will stay online and support your applications. Often this is an overlooked and under staffed area that needs more attention.

Automation is king

The only way to ensure that the Infrastructure is able to support ongoing systems, deployments, and updates is with automation. Every time that a machine is built by hand, a configuration file is manually changed, or updates are performed manually, you have introduced risk into your environment. Each of these interactions is potentially a slow down in the pipeline, or worse, a manual change that has never been documented.

There are a number of tools and processes wrapped around handling Infrastructure Automation: Chef, Ansible, cfEgine, Salt. The list goes on. In some places there people have rolled their own. It does not matter as much about the tool, as long as we are moving away from manual intervention to a more dynamic scalable infrastructure. These tools all require an in-depth knowledge of underlying systems as well as the ability to code. But the end goal is DevOps on the infrastructure as well as on the Development side of the house.

There are places where tools are lacking to help support of automation. While new SaaS Solutions are filling some of these gaps, if you need to run your monitoring system in house, many of the solutions that currently exist are not built around dynamic infrastructure and ephemeral machines. The tools need to catch up with the needs of the users. In the meantime, the back end DevOps guys write workarounds to handle situations like this.

Moving Forward

Moving forward, let us try and look at all aspects of DevOps, from development to application deployment, and from hardware to infrastructure and scaling design. There is still much work to be done, but we are moving in the correct direction.

Building a Windows 2012R2 Instance in AWS with Terraform

Terraform is an application by HashiCorp that is designed to treat infrastructure as code. Lately, I have been working with it to begin automation of resources within AWS, and have been quite pleased.

Lets get started with building out a Windows 2012 R2 server with Terraform on AWS.

You are going to need to have the following items configured in AWS in order for this to work, as I am not going to be using Terraform to build out these items.

Since the purpose of this test is not to create a VPC, subnets, security_groups, etc, all of those will need to be created beforehand. You will then need the identifiers from these later, for use in building out the server. I always recommend building out your VPC in its own stack, and never mixing it with others. It is a vital piece of the infrastructure that should be touched as little as possible.

Items Needed for this Demo:

  • IAM Instance Role
  • VPC Security Group ID
  • Subnet ID
  • Key to use for Instance Creation

For this example you are going to need just two files. These files are the variables.tf and main.tf. Technically, you can name them any good old name you want, as long as they end in .tf. I almost think it is more confusing to go with main.tf and variables.tf. As such, I am going to go ahead and change the name of the files. I use Atom for editing my files, and so descriptive filenames are nice.

mkdir stand_alone_windows_2012
cd stand_alone_windows_2012
touch win2012_test_instance.tf
touch win2012_test_variables.tf

Now let us add the file that we are going to need for the variables. (I will admit that there are some values hard coded into the second file, but that is because I have been testing this to get it working.

variable "admin_password" {
  description = "Windows Administrator password to login as."
}

variable "aws_region" {
  description = "AWS region to launch servers."
  default = "us-west-2"
}

# Windows Server 2012 R2 Base
variable "aws_amis" {
  default = {
    us-east-1 = "ami-3f0c4628"
    us-west-2 = "ami-b871aad8"
  }
}

variable "key_name" {
  description = "Name of the SSH keypair to use in AWS."
  default = {
    "us-east-1" = "AWS Keypair"
    "us-west-2" = "AWS Keypair"
  }
}

variable "aws_instance_type" {
  default = "m4.large"
}

variable "aws_subnet_id" {
  default = {
    "us-east-1" = "subnet-xxxxxxxx"
    "us-west-2" = "subnet-xxxxxxxx"
  }
}

variable "aws_security_group" {
  default = {
    "us-east-1" = "sg-xxxxxxxx"
    "us-west-2" = "sg-xxxxxxxx"
  }
}

variable "node_name" {
  default = "not_used"
}

You will need to go through this file, and update the variables as needed, and create any resources that you do not happen to have. This would include subnets and vpc based security groups.

The next file is win2012_test_instance.tf. This is what does all the heavy lifting. In my example, I am also installing chef, but that is because I plan on automating my entire infrastructure, not just server creation.

# Specify the provider and access details
provider "aws" {
  region = "${var.aws_region}"
}

data "template_file" "init" {
    /*template = "${file("user_data")}"*/
    template = <<EOF
<script>
  winrm quickconfig -q & winrm set winrm/config/winrs @{MaxMemoryPerShellMB="300"} & winrm set winrm/config @{MaxTimeoutms="1800000"} & winrm set winrm/config/service @{AllowUnencrypted="true"} & winrm set winrm/config/service/auth @{Basic="true"}
</script>
<powershell>
  netsh advfirewall firewall add rule name="WinRM in" protocol=TCP dir=in profile=any localport=5985 remoteip=any localip=any action=allow
  $admin = [ADSI]("WinNT://./administrator, user")
  $admin.SetPassword("${var.admin_password}")
  iwr -useb https://omnitruck.chef.io/install.ps1 | iex; install -project chefdk -channel stable -version 0.16.28
</powershell>
EOF

    vars {
      admin_password = "${var.admin_password}"
    }
}

resource "aws_instance" "win2012_instance" {
  connection {
    type = "winrm"
    user = "Administrator"
    password = "${var.admin_password}"
  }
  instance_type = "${var.aws_instance_type}"
  ami = "${lookup(var.aws_amis, var.aws_region)}"
  key_name = "${var.key_name}"
  tags {
    Name = "MY_DYNAMIC_STATIC_NAME",
    Env = "TEST"
  }
  key_name = "${lookup(var.key_name, var.aws_region)}"
  iam_instance_profile = "STATIC_ROLE_NAME_SHOULD_BE_A_VARIABLE"
  tenancy = "dedicated"
  subnet_id = "${lookup(var.aws_subnet_id, var.aws_region)}"
  vpc_security_group_ids = ["${lookup(var.aws_security_group, var.aws_region)}"]
  /*user_data = "${file("user_data")}"*/
  user_data = "${data.template_file.init.rendered}"
}