AWS System Manager: my experience as an Ansible user
Being a devops engineer in this day in age is demanding. There are so many technologies that we must learn/know in order to get software tested and deployed faster and safer than ever before. There is always a new tool that we “need” to learn, as it will magically solve all our problems. One of my recent experiences with an unfamiliar tool was with AWS Systems Manager, also known as SSM. I have been mostly using Ansible as one of my main tools to deploy and provision tech stacks, both on-prem and on the cloud. There is an overlap between the capabilities that Ansible provides and what SSM offers. In a recent project, I was faced with an environment in AWS that was already being managed by SSM. Prior to this, I had heard of SSM, but I had not had any professional experience with it. This article is my take on using SSM as an Ansible lover.. I have been using SSM for approximately 3 months and there’s probably more nuances not covered in this article.
What is SSM?
AWS Systems Manager (SSM) is a service offered by Amazon Web Services that allows you to control your AWS resources. SSM can perform actions on your AWS resources on your behalf and can perform automatic actions on a system. This is very similar to Ansible as well. Some examples of actions are patching a system, ensuring that a baseline is kept, deploying software, gathering information from your resources, and much more.
AWS SSM is not just a single tool, but it is more like a collection of tools and services that complement each other.. SSM can help keep your cloud and on-prem environment in a desired state, give you visibility into the state of each resource, deliver software to endpoints, run automation, and just so much more.
What I like about AWS SSM
There are many features that I enjoyed while using SSM.
First thing to note is the different architectural style. In Ansible, we have the control node reaching out to the instances through ssh, all without the need to install any software on the node you are trying to reach. In AWS SSM, you need to have an agent installed on the instances and have the right permissions given to the node in order to use it. This node's agent reaches out to SSM to know what actions to perform on it. This is a big difference depending on your needs. This difference in architecture allows for the client to initiate actions and not be so dependent on the control server, which might be exactly what you might need.
I like that it uses the same governance context as other AWS services: authentication and authorization via IAM, with audit via CloudTrail.
No need for jumpbox to perform an action
The first thing that I liked about SSM compared to Ansible is that there is no need for any kind of control node in the environment or jumpbox in order to run a playbook. I have many memories of having to set up ssh on the node I want to control and having to set up my inventory and parameters just right so that I can use the playbook. In AWS SSM, you can write a document, upload it to SSM, and then run the document directly on any ssm managed instances. You interface with the AWS API, and AWS SSM acts directly on the services. Ansible depends on having direct access to each managed instance.
Easy to set up
AWS SSM is relatively simple to get it up and running since the agent is already installed in many popular AMIs. All that is needed is simply adding a few permissions in order to get things going. Once the instance profile is set up, you can use ssm right away. As with anything in AWS, this can all be automated so that the ec2 instances are given the right role upon creation.
No need for inventory
As you start managing instances, the inventory gets auto-populated, and therefore, you can always perform actions on any of the managed-instances. When performing actions on the AWS Management Console, you can get a populated table of all the nodes you are managing.
SSM provides the “Parameter Store” which allows you to keep all your variables in. They can be both encrypted or unencrypted and can be referenced directly in your automation. AWS also provides Secrets Manager, which is very similar, but also allows you to auto-rotate keys and other helpful features.
I can still use Ansible playbooks!!
You can run your ansible playbooks with SSM directly on the managed instances. These playbooks can be stored in either a GitHub repo or in an s3 bucket. For best practice, these playbooks can be put into the s3 bucket by a pipeline of choice. I do wish it offered more places to get your artifacts from.
Can manage instances on-prem, other clouds, or in AWS
AWS SSM can actually manage instances in other accounts and even on-prem machines. Although the actions that you can perform on them are more constrained and more expensive.
Jumping right into any instance straight from the ec2 console
I love being able to just jump into the ec2 service page, find the machine that I am targeting and within two clicks, I am dropped into a shell to troubleshoot the server. What a delight.
Can schedule your automations
Running documents can be automated in many ways, including state manager, an AWS API call, a lambda that you previously setup etc. This is great as we can apply actions depending on the state of our account.
Comprehensive patch management
Patch Manager allows you to automate most of the mundane patching actions that need to take place in your environment. I haven't used this feature as much so I don't know all the ins and out of it, but my team seems to like it. I believe that you set a time interval for your patching to take place and applies it on a schedule. It even keeps track of software installed on each machine and what versions of each software is installed.
What I don't like about AWS SSM
Although you can manage your infrastructure with AWS SSM, there were a couple of things that I did not like about SSM and a few gotchas that I wasn't aware of.
**You write Bash or PowerShell if you don’t want Ansible installed on the host **
First thing is if you don't want Ansible installed on every single machine that Ansible runs on, then you are left with writing Bash or PowerShell scripts to manage your machine. Some of the reasons not to install Ansible is that you grow your attack surface with every software that you install. Therefore, as your requirements of installed software grows, Ansible might seem superfluous when other options exist. It is up to the organization to determine how much risk they want to take on. In my case, reducing risk by not installing Ansible on each machine is exactly what happened, and therefore, I was left writing Bash and PowerShell to provision my machines.
Although Bash and PowerShell are not bad, I find writing Ansible playbooks much easier because of the much higher level of abstraction. Writing at this higher level allows the administrator to write states in simple terms and let Ansible figure out how to carry out what the administrator wants. It is also less error-prone, and easier to maintain as it is easier to understand.
You can't run ansible playbooks using SSM on Windows machines, therefore, you must use PowerShell.
As Ansible playbooks that are run with SSM are run locally, this capability is only able to be run on Linux hosts and not on any Windows machines. As I mentioned before, writing at a higher-level language makes things less error-prone and much easier to maintain than a script.
SSM documents are a little strange to compose
In SSM, you must run documents, which are like Ansible playbooks, but are more Amazon centric and are also broader in scope. Let me explain. A lot of Ansible modules deal with doing actions on a machine. Amazon run document modules deal with doing actions on your account with some options to perform actions on machines (AWS RunShellCommand, AWS-RunAnsiblePlaybook, AWS-RunPowerShellScript). Unlike Ansible, these modules are for running scripts, which again are more error-prone and harder to maintain and understand, compared to playbooks. When choosing to perform actions on a machine, you are left with only these options and hopefully, you will have the option of running either Salt or Ansible. If your environment is Windows heavy, then you are left with running PowerShell and no other provisioning module. I guess AWS is counting the user to leverage Ansible and Salt rather than reinvent the wheel. However, I wish there were other options for managing Windows.
Price can be difficult to determine early in a project
Although the free tier offering from AWS is generous for many of the tools in SSM, you have to really be mindful of what you are doing or you can get an unexpected charge. As an example, we can take a look at the Automation service as this is something that I got a lot of exposure on. At the time of this writing, the Automation service's free tier allows 100,000 automation steps per account per month and five thousand seconds of step duration. There is a small charge per step and second of step duration after that. The cost can be hard to determine as you might have a large environment and you still don't have all the requirements from the customer. Therefore, you don't know how many steps each automation is going to take and how many automations you will use, how long each automation will take, or on how many instances it will run on.
Compare the two - The Verdict
With SSM, you might split provisioning technology
Not being able to run playbooks on Windows makes our approach be split into PowerShell scripts for Windows and Ansible playbooks or Bash scripts for Linux, which is not ideal. I have been successful keeping state when Ansible is able to control both Windows and Linux systems and all configuration is written in the same way and delivered the same way.
Having access to AWS state directly
Running playbooks on SSM feels like you have much broader access to services that you might not have access to with Ansible by itself. Don't get me wrong, you can do so much with Ansible on AWS, and could probably get all the information you need with Ansible's AWS modules, but SSM just integrates with all the other services in a more natural way. For example, when I run a document, Automator already has an inventory for me and can use tagging to get the instances that I need. Another example is pushing and getting parameters from Parameter Store is very easy to do with Automator.
Setup to SSM documents to run automatically
State Manager can run SSM documents at a specified interval on a specific set of instances, whereas with Ansible, I would have to set up some cron, setup an API to run a playback or even use Ansible Tower for that. With SSM, I can store a certain playbook, and be able to run the playbook with the CLI. You could even create a document that runs Ansible playbooks for you.
There is still so much that I was not able to cover as I haven't touched every corner of this service, but I do like the service and I can see myself using it more, especially if installing Ansible on each instance is not a problem.
I honestly do like the abilities that AWS SSM provides the administrators. As a native AWS service, it works well with other services and you can start imagining the possibilities for automation. However, as an Ansible user, I still feel something missing when not using Ansible to perform the on-host actions as Bash and PowerShell are no substitute for Ansible. If I had to set up an environment the way that I like, I would actually keep using Ansible to provision a machine and would also use SSM for scheduled tasks, give updates, and maintain desired state configuration.