- May 2, 2023
- 8 min read

Secure IT Infrastructure with ABAC and SCP

Whoever starts to use a multi-account environment on AWS will face a lot of challenges in terms of governance and compliance. As a responsible team for an enterprise working in AWS, you have the responsibility to scale your business by enabling agility for developer teams and protecting your critical infrastructure at the same time. I´ve faced this challenge in my organisation and came up with an interesting solution approach I want to share with you.

This blog post will tell you my lessons learned from scaling IT Infrastructure on an AWS Environment for a mid-sized enterprise in the industrial area (10.000+ employees). I will illustrate how you as an account provider can spin up infrastructure in any member account of your AWS organization and protect it against unwanted manipulation.

I strongly recommend you to have a look into this article if you have to deploy AWS resources in account´s you do not own. In addition, the article gives you a valuable approach of protect automated resources via Attribute Based Access Control (ABAC). This approach enables you to increase your product quality when developing microservices. It applies common best practices like an abbreviation of the open-close principle (OCP) and Seperation Of Concern (SOC) to IaC.

Step 1: Learn from the experts

When I started to implement the AWS Control Tower in my organization I read tons of documentation and reverse-engineered the solution approaches. I´ve discovered that the architects introduced protection mechanisms to secure their own infrastructure. This is where Service Control Policies (SCPs) are handy:

The example above describes an SCP which protects all Lambda functions which are having the "aws-controltower-" prefix in their name. Typically such SCPs are bound at the organizational root which means they have global scope. If you take a closer look into this setup you will find out that it doesn´t really scale for an enterprise. You may have good protection for your Control Tower environment, but cannot apply the policies to your own infrastructure. In addition, SCPs have a hard quota of 5120 bytes of size per document. You are only allowed to attach a maximum of 5 SCPs to the root, any organizational unit, or account.

SCPs are a great control to protect your infrastructure. However, the constraints with 5120 bytes per Document has a valid reason: Speed! AWS has to execute your SCPs for API calls which will have an impact on your overall Performance. This means: Use this control with caution and never implement ad hoc changes.

Step 2: Understand the use case

Before starting to create any solution design I want to share my use case with you:

I wanted to be able to manage my resources in customer accounts via Terraform. The setup should ensure that all deployed resources can only be managed by a specific deployment role.
I need to isolate my deployed resources from each other. If different services are deployed inside a member account the service role should not be able to touch resources managed by a foreign deployment role.
I want to keep my operational effort to a minimum. If Terraform needs to spin up new resources I want my access rights to scale without any policy change if possible.
I want to ensure that the terraform role cannot be assumed by human beings (exception: Break Glass Access).
I want to implement my IaC role in a loosened least-privilege manner. Real least privilege cannot be possible with (3) in mind. I am allowing for example a subset of read commands.
I want to be able to extend the pattern to other microservices. In order to do my daily work I need a way for humans to enable operations.

NEVER run your automation with administrator privileges! Try to find your own way to secure your environment. Feel free to copy my ideas and make sure your instastructure is protected and potential attackers have a minimum blast radius when your automation get´s hijacked.

Step 3: Implementation

With the requirements defined, I´ve tried to find the best possible setup for implementation. I quickly found out that I need to work with Attribute Based Access Control (ABAC). This kind of access pattern is typically used for IAM Policies. However, It is not limited to IAM Policies - you can also use it in SCPs.

I will clarify the solution to above statements (1) to (6) in this section:

Requirement 1: Protect deployed infrastructure in member accounts

This requirement can be caught with the following SCP:

The first statement ensures that no one can manipulate resources with the tag "automationcontext" set. Unfortunately, I have faced several issues with this policy.

I had to include "NotAction" in the first scp statement. A lot of actions aren´t bound to any resource - and the implementation differs for different aws services. Terraform is often throwing an error when you do not have the list action available. Also, the "Create" Action is implemented differently for various AWS Services. For example, the "iam:createrole" was possible without any problems. For the "sns:createtopic" call I had to add the action into the NotAction statement of the SCP above. I had to loosen the constraints to get this working properly.

After some tests, I´ve found additional flaws and decided to add the following SCP statements:

Some services do not support ABAC! One of them (and this was a real pain) is Lambda and Dynamodb. So I had to enhance my policy for those services with an additional statement. I can recommend the following link to check your services against any sort of IAM feature: AWS services that work with IAM - AWS Identity and Access Management (amazon.com) You can even go further and block any create action if the prefix is not present. However, I´ve decided to do not to do this in this demo setup (since it has to be applied and tested on each service).
I had to ensure that nobody can touch my terraform roles. I know this is implicit included in the first statement. However, to be 100% sure I´ve decided to add an explicit policy for this use case.
Also, the automation tag itself is protected explicitly.

Requirement 2: Isolate resources by automation context

This task was actually quite easy. I just had to add the "standard" ABAC constraint. The following SCP tells us that a role with a specific automation context can only touch resources with the same automation context.

You will face some problems with specific services. As always: There are exceptions. For example: If you use Instance Scheduler you may need to touch resources which are managed by other automation products. As already mentioned: My approach was to strenghen the security of my setup - not to eliminate the chance of any misconfigurations. You should always go for a multi layered (security in depth) approach and try to implement measurements at different layers in your architecture.

Requirement 3: Scale your deployment Role (for IaC) Requirement 4: I want to ensure that the terraform role cannot be assumed by human beings

Requirement 5: Implement the IaC role in a loosened least-privilege manner.

Now that we have our SCP´s set we also need to take care of the role which is used by our IaC Framework like terraform or cloud formation. This one is a little bit tricky as it contains some best practices and makes use of different IAM Features. I will try to break down the most important parts for you:

The most important features you should take into consideration when deploying IaC Roles:

The creation of IaC Roles should happen over StackSets: We´ve locked our environment and enabled the creation of IaC Roles only via 2 options: Breakglass Access or StackSets. I recommend managing your IaC Roles over StackSets which allows you to prepare everything you need for your deployment without worrying that SCP´s will stop you.
Adapt the MaxSessionDuration: Most deployments do not need more than a minute to finish. Adapt the session duration to your needs. Be careful when you also launch long-running actions like stack set updates in a management account.
Use a resource path for your policies and roles: You can limit your actions in IAM to a specific resource path. This helps your SOC Team to quickly align a role with the feature team which maintains the IAM Resource. In addition, you do not need to look into the tags (which may not be maintained properly by each feature team) in order to gain your first insight.
Use the generic ABAC statement: This statement offers you full flexibility in your setup. When creating resources with IaC you cannot know what to build before the development has started/is finished. You gain a lot of freedom with this statement.
Make use of permission boundaries: Most probably your IaC framework will create IAM Resources. Make sure to bind permission boundaries in order to prevent IaC from creating any dangerous IAM Roles outside of your "sandbox". More information can be found on https://github.com/aws-samples/example-permissions-boundary
Make use of read-only rights: List actions aren´t bound to any resource. You need to apply read-only rights in order to work with IaC. I know this enables your role to also see foreign resources: However, if you´ve used the automationcontext tag on other resources chances are high that your IaC sandbox cannot do any describe or get action on resources with a foreign context.
Ensure that you can create resources (also for Services with ABAC support): I´ve found out that different services are applying for different permissions in terms of the create function in combination with ABAC. Some will allow you to create the resource. Others will prevent you from creating resources.
Ensure that you can work with services outside of ABAC support: Not all Services support ABAC. You need to add additional statements to enable your IaC framework to work with Services that aren´t supporting ABAC.
Ensure that your IaC role cannot be assumed by humans: I typically do not want that humans interact with my IaC deployments. This is why I only allow code build (or the calling execution instance) to assume the IaC Role. In terms of productivity, you may choose to loosen this constraint in your test or dev environment.

Requirement 6: I want to be able to extend the pattern to other microservices.

With all the work done so far, we are at the end of this article. The last requirement doesn´t need any additional action. You may have noticed that we´ve locked ourselves out of the whole setup. Some teams may want to run some operations functions over the provisioned resources. You can spin of additional service roles in the cloud formation stack set which can enable some actions for your deployed infrastructure. As an alternative, you could also enable the SessionAttributes on your SSO Roles and deploy a permission set to interact with your infrastructure. Be aware that you can only assume one automation context at a given time. It´s not possible to extend the permissions to multiple microservices.

With this article, I´ve shown a new approach to how an enterprise can host multiple services inside an organization that is supporting a high level of isolation while enabling your developer teams to be productive and innovative. This article is not meant to be copied 1:1 as it has to be adapted to your needs. However, it shows a lot of best practices that can also be used in many other ways. Your infrastructure is protected by default:

All critical services (KMS, IAM, Secrets manager, ParameterStore, ...) are supporting ABAC and will be protected against any actions from outside (even if you are a local admin)
The design ensures security in many different layers of AWS (permission boundaries, SCPs, ABAC Policies). All the actions are allowing you to increase your level of security. Even if you have some overprivileged Roles or Policies in your environment, you can be sure that a lot of dangerous actions are stopped by design.
The design ensures to do a proper implementation of the microservice's best practice. Do not worry that a feature team goes the short path and directly calls a lambda from a different microservice or fetches some data from dynamodb/rds which is not within the scope of your microservice

Since the whole construct cannot be called trivial I recommend you to make your own experience. Even though the setup may sound complex, it is easy to consume by your feature teams and the additional SCP´s will have no impact on your existing deployments. I have enjoyed exploring all the features and learned a lot about the internals of AWS IAM Services.