AWS IAM Roles for Kubernetes Service Accounts

August 3, 2022 7 minute read

kubernetes

Current Scenario

Imagine a scenario in which you are running an application on your kubernetes cluster as a deployment which simply fetches the user profiles from your S3 bucket and creates a presigned url for the user to be able to access their profile image. Normally to provide the access to the S3 bucket, what you would do is create a user in AWS, policy and attach that policy to the user. Then you would use the AWS access and secret keys of that user to load into your environment variables using kubernetes secrets.

Our core application

Imagine we have a flask application running and there is a function called create_presigned_url which generates a presigned URL for the objects that are present on our S3 bucket using the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY that are provided as environment variables.

from flask import Flask
import logging
import boto3
from botocore.exceptions import ClientError

app = Flask(__name__)

#......................[snip].....................

def create_presigned_url(bucket_name, object_name, expiration=3600):


    # Generate a presigned URL for the S3 object
    s3_client = boto3.client(
			    's3',
                aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
                aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY")
                )
    try:
        response = s3_client.generate_presigned_url('get_object',
                                                    Params={'Bucket': bucket_name,
                                                            'Key': object_name},
                                                    ExpiresIn=expiration)
    except ClientError as e:
        logging.error(e)
        return None

    # The response contains the presigned URL
    return response

#......................[snip].....................

app.run(host='0.0.0.0', port=8000)

Reference: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-presigned-urls.html

Dockerfile used to create the image

There is nothing fancy here. This is the Dockerfile that is used to build the docker image which runs the flask application.

From python:3.7-alpine

RUN apk update && apk upgrade

COPY . /app
WORKDIR /app

ENV FLASK_APP=app.py

RUN pip3 install -r requirements.txt

RUN chown -R nobody:nobody /app
RUN chmod 700 -R /app
USER nobody

CMD ["python3","app.py"]

Kubernetes Manifest files

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: test
---
apiVersion: v1
data: 
  AWS_ACCESS_KEY_ID: c29fc3dlZXRfdGhhdF95b3Vfd291bGRfZGVjb2RlX3RoaXM=
  AWS_SECRET_ACCESS_KEY: SV9zdXJlbHlfd291bGRfbm90X2xlYWtfYXdzX3NlY3JldF9vbl9hX2FydGljbGU=
kind: Secret
metadata: 
  name: aws-secrets
type: Opaque
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
  labels:
    app: test
spec:
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test
    spec:
      serviceAccountName: test
      containers:
      - env:
        envFrom:
        - secretRef:
            name: aws-secrets
        image: shishirsub10/test
        name: test

Here, I have created a Kubernetes secret, a service account and a deployment where the secrets called aws-secrets are injected inside the pod test.

Let us apply the above manifest file.

╭─ubuntu@kubernetes ~
╰─$ kubectl apply -f deploy.yaml                                                                                                                                  
serviceaccount/test created
secret/aws-secrets created
deployment.apps/test created

Checking the secrets mounted inside the pod

╭─ubuntu@kubernetes ~
╰─$ kubectl get pods
NAME                    READY   STATUS    RESTARTS   AGE
test-7d88b97d78-tsfdw   1/1     Running   0          22s

╭─ubuntu@kubernetes ~
╰─$ kubectl exec test-7d88b97d78-tsfdw -- env | grep -i 'AWS'
AWS_ACCESS_KEY_ID=so_sweet_that_you_would_decode_this
AWS_SECRET_ACCESS_KEY=I_surely_would_not_leak_aws_secret_on_a_article

We can see the secrets mounted inside the container as environment variables. Now, this credential will be used by our flask application to create the presigned URL for S3 objects.

With this setup, we will always have the problem of rotating the AWS secrets. These credentials might end up in the configmap, Kubernetes secrets, CI/CD pipelines as well as source code where it might be visible to everyone working inside the comapny. Meanwhile many employees having an access to the AWS secrets might leave and these credentials are not rotated as often as we like, giving the ex employees access to the S3 bucket.

Real world examples where ex employees have done malicious activites on AWS infrastructure

OIDC Implementation

To tackle these issues, let us use OIDC authentication such that a role is created on the AWS which is then attached to the Kubernetes service account associated with a deployment giving the pod access to the resources that it needs.

What is OIDC?

OpenID Connect (OIDC) is an identity layer built on top of the OAuth 2.0 framework. It allows third-party applications to verify the identity of the end-user and to obtain basic user profile information. OIDC uses JSON web tokens (JWTs), which you can obtain using flows conforming to the OAuth 2.0 specifications.

Reference: https://auth0.com/docs/authenticate/protocols/openid-connect-protocol

Changing the code

Let us start from the source code. It was explicitly mentioned to use the AWS Access and Secret keys in the boto3 client which needs to be removed. By default, boto3 looks for different methods in which AWS resources can be accessed and web identity token is one of them.`

import logging
import boto3
from botocore.exceptions import ClientError

def create_presigned_url(bucket_name, object_name, expiration=3600):


    # Generate a presigned URL for the S3 object
    s3_client = boto3.client(
			    's3'
                )
    try:
        response = s3_client.generate_presigned_url('get_object',
                                                    Params={'Bucket': bucket_name,
                                                            'Key': object_name},
                                                    ExpiresIn=expiration)
    except ClientError as e:
        logging.error(e)
        return None

    # The response contains the presigned URL
    return response        

Right now, the application will work as expected as it still uses the AWS access and secret keys that are present as enviroment variables.

Boto3 will look in several locations when searching for credentials. The mechanism in which Boto3 looks for credentials is to search through a list of possible locations and stop as soon as it finds credentials. The order in which Boto3 searches for credentials is:

Passing credentials as parameters in the boto.client() method

Passing credentials as parameters when creating a Session object

Environment variables

Shared credential file (~/.aws/credentials)

AWS config file (~/.aws/config)

Assume Role provider

Boto2 config file (/etc/boto.cfg and ~/.boto)

Instance metadata service on an Amazon EC2 instance that has an IAM role configured.

Reference: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html

Let us create a role and policy on AWS first before removing the kubernetes secrets containing AWS credentials.

Creating role on AWS

Login into AWS console and go to Identity and Access Management(IAM).
We have to create a new role for the kubernetes service account.

IAM » Roles » Create Role

Select Web Identity as Trusted entity type in which you have to select the Identity Provider for the right EKS cluster as well as set the audience. You have to create a Identity provider if you have not already done it and set the Audience to sts.amazonaws.com.

Click on Next.

Skip the adding the new provider section if you have already created an Identity provider on AWS.

Adding a new provider

If you do not have an identity provider, you can create new identity provider from IAM » Identity Providers » Add Provider.

a. Click on Add provider.

b. We want to use OpenID connect as the provider type. We can get the provider URL from our EKS cluster info page. c. Get the value of the OpenID connect provider URL from EKS » Clusters » Select relevant Clusters » Overview » OpenID Connect Provider URL

d. Paste the URL and click on Get thumbprint. e. Set the value of the audience as sts.amazonaws.com.

Reference: https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html

Creating a new policy

After selecting the trusted entity, we now have to either create a new policy with fine grained permission or attach the existing policy for this role.

Example policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "s3:List*",
                "s3:Describe*"
            ],
            "Resource": [
                "arn:aws:s3:::test-bucket-for-profile-pictures/*",
                "arn:aws:s3:::test-bucket-for-profile-pictures"
            ]
        }
    ]
}

Either create a new policy or select the exisiting policy and click Next.

We have to set the name, review and create the role.

Now we have to make a small change so that the serviceaccount is associated with this role.

View the trust relationship of the recently created role from IAM » Roles » Select the newly Created Role » Trust Relationships.

We have to change the aud to sub and the “sts.amazonaws.com” to “system:serviceaccount:<namespace>:<service-account-name>”. For our use case it would be “system:serviceaccount:production:test”

Now all the job is done on the AWS side. We just have to note the ARN of the role that can be seen on the screenshot above.

Adding annotations on the service accounts

Let us edit the serviceaccount that we used on the deployment.
```
╭─ubuntu@kubernetes ~
╰─$ kubectl edit sa test
```
Adding role’s arn as the annotation on the service account. The annotations that we have to add are:
1. eks.amazonaws.com/role-arn : ARN-of-the-role
2. eks.amazonaws.com/sts-regional-endpoints: “true” - Seting true to use regional STS endpoints helps in reducing the latency

apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/oidc-role
    eks.amazonaws.com/sts-regional-endpoints: "true"
  name: test
  namespace: production
secrets:
- name: test-token-zhbzn

Now let us delete the secrets that we have created earlier and change the deployment.

╭─ubuntu@kubernetes ~
╰─$ kubectl get secrets                                                                                                                                           
NAME                  TYPE                                  DATA   AGE
aws-secrets           Opaque                                2      36m
default-token-9p6nz   kubernetes.io/service-account-token   3      4d2h
test-token-zhbzn      kubernetes.io/service-account-token   3      36m

╭─ubuntu@kubernetes ~
╰─$ kubectl delete secret aws-secrets
secret "aws-secrets" deleted

Changing the deployment such that the AWS credentials are not loaded inside pod as environment variables.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
  labels:
    app: test
spec:
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test
    spec:
      serviceAccountName: test
      containers:
      - image: shishirsub10/test
        name: test

Let us rollout restart the deployment to ensure the changes that we have made gets reflected.

╭─ubuntu@kubernetes ~
╰─$ kubectl rollout restart deployment test
deployment.apps/test restarted

If everything goes as expected, you can see two new environment variables on the deployment : AWS_ROLE_ARN and AWS_WEB_IDENTITY_TOKEN_FILE and our application should work as expected without any problem.

Token Reflected on environment variables

╭─ubuntu@kubernetes ~
╰─$ k exec test-5b6467794-9j4gr -- env | grep "AWS_ROLE\|AWS_WEB"
AWS_ROLE_ARN=arn:aws:iam::*********:role/oidc-role
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token

How often these tokens are rotated?

The kubelet requests and stores the token on behalf of the pod. By default, the kubelet refreshes the token if it is older than 80 percent of its total TTL, or if the token is older than 24 hours.

Reference: https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts-technical-overview.html

Twitter Facebook LinkedIn