Xpolog deployment on AWS-EKS with external coniguration on EFS

Xpolog deployment on AWS-EKS with external coniguration on EFS

What is Amazon EKS?

image-20250826-082833.png

Amazon Elastic Kubernetes Service (EKS) is AWS's managed service that makes it easy to run, manage, and scale containerized applications using Kubernetes on the AWS cloud.

Think of it like this: Kubernetes is a powerful but complex engine for orchestrating containers. Instead of building, securing, and maintaining the most complicated parts of that engine yourself, EKS provides a fully managed, highly available, and secure Kubernetes control plane as a service.

Key Benefits of EKS

  • Managed Control Plane: This is the core advantage. AWS automatically manages the availability, scalability, and patching of the Kubernetes control plane components (like etcd and the API server). This frees you from significant operational overhead and lets you focus on your applications.

  • High Availability: The EKS control plane is distributed across multiple AWS Availability Zones (AZs), eliminating any single point of failure and ensuring your cluster's brain is always running.

  • Seamless AWS Integration: EKS is deeply integrated with the AWS ecosystem. It works effortlessly with services like:

    • IAM for secure authentication and authorization.

    • VPC for isolated and secure networking.

    • Elastic Load Balancers (ALB/NLB) for exposing your services.

    • EFS & EBS for persistent storage solutions.

  • Pure Kubernetes Experience: EKS runs upstream, certified Kubernetes. This means you get a standard, community-tested experience, and any tools or add-ons that work with Kubernetes will work with EKS.

How It Works: Control Plane vs. Worker Nodes

An EKS cluster is primarily composed of two parts:

  1. The EKS Control Plane: Managed entirely by AWS. You don't see the underlying instances, but you interact with them through the Kubernetes API (e.g., using kubectl).

  2. Worker Nodes: These are the EC2 instances where your application containers (Pods) actually run. You provision and manage these nodes within your VPC and are responsible for them. They register themselves with the control plane to form the complete cluster.

Why EKS Matters for This Guide

EKS provides the robust, production-grade Kubernetes environment where our applications will run and generate logs. We will deploy Fluent Bit as a DaemonSet across all the worker nodes in our EKS cluster. Fluent Bit's task is to reliably collect logs from every application on every node and forward them to a central location. By using EKS, we start with a secure and scalable foundation for our entire logging pipeline.


Create New Cluster

Open your AWS GUI

Search for EKS:

image-20250826-081905.png
image-20250716-075448.png

Press “Create cluster”

image-20250716-075654.png

The EKS Cluster IAM Role: Your Cluster's AWS Passport

When you create an EKS cluster, you're asked to select a "Cluster IAM role." This is one of the most important configuration steps as it defines the permissions your cluster has to interact with other AWS services.

In simple terms, this IAM role is what the Kubernetes control plane uses to make AWS API calls on your behalf.

Why is it Necessary?

Think of the EKS control plane as a manager hired by you but living in a separate AWS-managed building. This manager (the control plane) needs a set of keys (the IAM role) to access and manage resources within your building (your AWS account and VPC).

Without this role, the control plane would be isolated and unable to perform essential tasks, such as:

  • Networking: Creating and managing Elastic Network Interfaces (ENIs) in your VPC subnets for pod networking.

  • Load Balancing: Provisioning and configuring Application or Network Load Balancers when you create a Kubernetes Service of type LoadBalancer.

  • Storage: Interacting with services like EBS when creating PersistentVolumes.

This role acts as a secure "passport," granting the EKS service just enough permission to manage these resources without giving it full access to your entire AWS account.

What Permissions Does It Need?

You don't have to figure out the permissions yourself. AWS provides a managed policy specifically for this purpose called AmazonEKSClusterPolicy. This policy contains all the necessary permissions (ec2:CreateNetworkInterface, elasticloadbalancing:RegisterTargets, etc.) that the control plane requires to function correctly.

When you create the cluster using the AWS Management Console, it will often guide you to create a new role and will automatically attach this policy for you.

Key Takeaway

The Cluster IAM Role is the security link between the AWS-managed Kubernetes control plane and the resources running in your own AWS account. You are granting the EKS service explicit permission to manage cluster-related resources on your behalf.


Creating the EKS Cluster IAM Role

You will create a new IAM role that the EKS service can assume. The AWS console simplifies this process by pre-selecting the correct trust relationship and permissions policy for you.

Here are the step-by-step instructions:

  1. Navigate to the IAM Console(or press “Create recommended role”) in your AWS account.

  2. On the left-hand navigation pane, click on Roles, then click the "Create role" button.

  3. Step 1: Select Trusted Entity

  • image-20250826-115348.png

    For "Trusted entity type," choose AWS service.

  • Under "Use case," select EKS from the dropdown menu.

  • This will reveal another option below. Choose EKS - Auto Cluster.

  • Click Next.

  1. Step 2: Add Permissions

  • image-20250826-115500.png

    The console will automatically select the required permissions policy: AmazonEKSClusterPolicy.

  • You don't need to do anything else on this screen. Simply click Next.

  1. Step 3: Name, Review, and Create

image-20250826-100236.png

 

  • Role name: Give your role a descriptive name that you will easily recognize. For example: my-eks-cluster-role or EKSClusterRoleForGuide.

  • Review the details to ensure the trusted entity is eks.amazonaws.com and the attached policy is AmazonEKSClusterPolicy.

  • Click the "Create role" button at the bottom.

The Role to Select

image-20250826-100456.png

Now, when you are creating your EKS cluster and you get to the "Cluster IAM role" dropdown menu, you will select the role you just created (e.g., my-eks-cluster-role).

This explicitly grants the EKS control plane the permissions defined in the AmazonEKSClusterPolicy to manage resources within your account.


Of course. After the Cluster Role, the next critical component is the Node IAM Role.


 

The Node IAM Role: The Worker's Toolkit

While the Cluster Role is for the EKS control plane (the manager), the Node IAM Role is attached to each of your EC2 worker nodes. This role grants the necessary permissions for the nodes themselves to function correctly within the cluster and interact with other AWS services.

Think of this as the toolkit you give to each individual worker on your team. Each worker node needs this toolkit to perform its core job.

Why is it Necessary?

Your worker nodes are not just passive machines; they are active participants in the Kubernetes cluster. The kubelet (the primary "node agent") running on each node, and the pods scheduled on them, need permissions to:

  • Join the Cluster: A node needs permission to communicate with the EKS control plane to register itself and receive workloads.

  • Pull Container Images: To run your applications, the nodes must have permission to pull container images from Amazon ECR (Elastic Container Registry).

  • Manage Networking: The AWS VPC CNI plugin, which handles pod networking, runs on each node and needs permissions to manage network interfaces.

  • Access Other AWS Services: If a pod on a node needs to access an S3 bucket or a DynamoDB table, it will (by default) inherit permissions from this role.

How to Create the Node IAM Role

The creation process is similar to the Cluster Role, but with a different trusted entity and different policies.

  1. Navigate to the IAM Console(or press “Create recommended role”), go to Roles, and click "Create role".

  2. Step 1: Select Trusted Entity

image-20250826-101117.png

 

  • For "Trusted entity type," choose AWS service.

  • Under "Use case," select EC2. This is because your worker nodes are EC2 instances.

  • Click Next.

  1. Step 2: Add Permissions

image-20250826-101336.png
  • In the search bar, find and attach the following three AWS managed policies. You must attach all of them:

    • AmazonEKSWorkerNodePolicy

    • It provides the worker nodes with the minimum permissions needed to communicate with the EKS control plane.

    • AmazonEKS_CNI_Policy

    • The Amazon VPC CNI plugin (aws-node DaemonSet) is what allows Kubernetes pods in EKS to get IP addresses from your VPC subnets and communicate with other resources (pods, services, and AWS infrastructure)

    • AmazonEC2ContainerRegistryReadOnly (This allows nodes to pull images from ECR)

  • Click Next.

  1. Step 3: Name, Review, and Create

image-20250826-101433.png

 

  • Role name: Give it a clear name, such as my-eks-node-role.

  • Review the configuration to ensure the trusted entity is ec2.amazonaws.com and the three required policies are attached.

  • Click "Create role".

Where This Role is Used

You will select this role (my-eks-node-role) later in the EKS setup process, specifically when you create a Node Group for your cluster. Assigning this role to the node group ensures that every EC2 instance launched within it has the correct permissions to operate as a functional E-K-S worker node.


Choosing Your Cluster's Network: The VPC

At this step, you're defining the private network space where your entire EKS cluster will live. VPC stands for Virtual Private Cloud, and you can think of it as your own logically isolated, fenced-off area within the vast AWS cloud. All your cluster's resources—the worker nodes, the pods, and the internal load balancers—will be launched inside this VPC.

Key Requirements for an EKS VPC

For an EKS cluster to be resilient and function correctly, the VPC you select must meet a few critical requirements:

  • Multiple Subnets: The VPC must have at least two subnets.

  • Multiple Availability Zones (AZs): Crucially, these subnets must be in different Availability Zones. An AZ is a distinct data center within an AWS region. Spanning your cluster across multiple AZs ensures high availability, so if one data center has an issue, your cluster can continue running in another.

  • Public and Private Subnets: A production-ready setup includes both public and private subnets:

    • Public Subnets: These are for internet-facing resources, primarily your public-facing load balancers. They have a direct route to an AWS Internet Gateway.

    • Private Subnets: This is where your worker nodes should live for security. They don't have public IP addresses and can access the internet securely through a NAT Gateway that resides in a public subnet.

 

Your Options

You have two main choices on the EKS creation screen:

  1. Use an Existing VPC: If you already have a VPC configured that meets the requirements above, you can select it. This is common in established AWS environments.

  2. Let AWS Create a New VPC: For this guide, and for anyone new to EKS, this is the highly recommended option. AWS provides a CloudFormation template that automatically creates a new VPC perfectly configured for EKS. It will set up the public and private subnets across multiple AZs, create the necessary route tables, and provision an Internet Gateway and NAT Gateways for you.

For this guide, select the default VPC or follow the prompts to have AWS create a new VPC for you. This will prevent common networking issues and ensure your cluster is built on a solid, secure, and highly available foundation.

Subnets:

image-20250826-102716.png

Leave all the subnets selected just as they are.

For EKS to function correctly, it needs to be aware of all the available subnets in its VPC. Deselecting any of them could lead to issues with networking, load balancing, or node placement. Simply accept the default selection and proceed to the next step.

Press Create and wait for status “Active

image-20250826-104222.png

Connect your terminal to the cloud and new cluster:

Open your terminal:

aws configure
AWS Access Key ID: AWS Secret Access Key:

AWS Management Console

  1. Sign in to the AWS Console.

  2. Navigate to IAM (under “Security, Identity, & Compliance”).

  3. In the left sidebar, click Users, then select your user name.

  4. Go to the Security credentials tab.

  5. Under Access keys, you’ll see your existing Access Key IDs (but you can only view the ID, not the secret).

    • If you need a new key, click Create access key, give it a name/description, and you’ll be shown both the Access Key ID and the Secret Access Key one time.

image-20250716-081244.png
image-20250716-081318.png
image-20250716-081401.png
image-20250716-081545.png

Connect kubectl to EKS:

aws eks update-kubeconfig --region <cluster_region> --name <cluster_name>
image-20250828-051000.png
aws eks update-kubeconfig --region eu-north-1 --name andrey-test-fb
image-20250716-082342.png

Optional: Set environment variables:

export CLUSTER="andrey-test-fb" export REGION="eu-north-1" export ACCOUNT_ID="655536767854"

Deploy test App (Apache)

nano apache-deployment.yaml
apiVersion: apps/v1 kind: Deployment metadata: name: apache-deployment spec: replicas: 1 selector: matchLabels: app: apache template: metadata: labels: app: apache spec: containers: - name: apache image: httpd:2.4 ports: - containerPort: 80
kubectl apply -f apache-deployment.yaml

Create Namespace

kubectl create namespace logging
image-20250716-082749.png

Stand up a shared, ReadWriteMany Amazon EFS volume in EKS.

0) What you’ll need (inputs)

  • An existing EKS cluster and kubectl/helm access.

  • IDs for your VPC and private subnets where the worker nodes run.

  • The security group (SG) of your node group(s).

  • AWS CLI configured for the right account/region.

EFS must sit in the same VPC as your nodes, with mount targets in each AZ where nodes run. NFS (port 2049/TCP) must be allowed from node SGs → EFS mount targets. AWS Documentation+2AWS Documentation+2

1) Create the EFS file system + networking

# ---------- change these ---------- export AWS_REGION=eu-north-1 export VPC_ID=vpc-00150a1cd684700d9 export SUBNET_IDS="subnet-00ab400cacdce9d40 subnet-0a2f471dec4cb6d70 subnet-0be575b8b3138c576" # private subnets with nodes (1 per AZ used) export NODE_SG_ID=sg-0d258d04e8ab37139 # SG attached to your node group # ----------------------------------

 

# 1.1 Security group for EFS mount targets (allows NFS from nodes) EFS_SG_ID=$(aws ec2 create-security-group \ --group-name eks-efs-sg --description "NFS from EKS nodes" \ --vpc-id $VPC_ID --query GroupId --output text --region $AWS_REGION) aws ec2 authorize-security-group-ingress --group-id $EFS_SG_ID \ --protocol tcp --port 2049 --source-group $NODE_SG_ID --region $AWS_REGION # 1.2 Create the EFS file system (encrypted at rest; pick a KMS key if you have one) FS_ID=$(aws efs create-file-system \ --performance-mode generalPurpose \ --encrypted \ --region $AWS_REGION \ --query FileSystemId --output text) echo "EFS=$FS_ID"
image-20250901-070306.png

Wait until EFS is available

Check status:

aws efs describe-file-systems --file-system-id $FS_ID --region $AWS_REGION \ --query 'FileSystems[*].LifeCycleState'

It must return:

"available"
image-20250901-061546.png

Create a mount target in each subnet used by the nodes:

for sn in $SUBNET_IDS; do aws efs create-mount-target \ --file-system-id $FS_ID \ --subnet-id $sn \ --security-groups $EFS_SG_ID \ --region $AWS_REGION done
image-20250901-061716.png

Why this matters: EFS mount targets must allow inbound NFS/2049 from your node SGs. Without this, the CSI driver will hang on mount. AWS Documentation+2AWS Documentation+2

2) Install the Amazon EFS CSI driver as an EKS add-on (recommended)

The driver needs IAM permissions to manage EFS access points for dynamic provisioning. The managed policy is AmazonEFSCSIDriverPolicy. We’ll use EKS Pod Identity (or use IRSA if you prefer). AWS Documentation+2AWS Documentation+2

2.1 Create Pod Identity association (eksctl)

export CLUSTER=andreyXpologTest export AWS_REGION=eu-north-1 export ROLE_NAME=AmazonEKS_EFS_CSI_DriverRole # make sure Pod Identity Agent addon exists (required for associations) aws eks describe-addon --cluster-name "$CLUSTER" \ --addon-name eks-pod-identity-agent --region "$AWS_REGION" >/dev/null 2>&1 \ || aws eks create-addon --cluster-name "$CLUSTER" \ --addon-name eks-pod-identity-agent --region "$AWS_REGION" #create the association with the correct policy ARN eksctl create podidentityassociation \ --cluster "$CLUSTER" \ --namespace kube-system \ --service-account-name efs-csi-controller-sa \ --role-name "$ROLE_NAME" \ --permission-policy-arns arn:aws:iam::aws:policy/service-role/AmazonEFSCSIDriverPolicy
image-20250901-072246.png

2.2 Install the add-on

aws eks create-addon \ --cluster-name $CLUSTER \ --addon-name aws-efs-csi-driver \ --region $AWS_REGION

The EFS driver add-on is the AWS-supported path and keeps versions aligned with your cluster. Pod Identity (or IRSA) ensures the controller has exactly the permissions it needs. AWS Documentation+1

1) Namespace + StorageClass + PVC

nano storage.yaml
apiVersion: v1 kind: Namespace metadata: name: logging --- apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: efs-sc provisioner: efs.csi.aws.com parameters: provisioningMode: efs-ap fileSystemId: fs-0107a1b4c87fb0213 # your EFS FS ID basePath: "/logging" directoryPerms: "0750" gidRangeStart: "1000" gidRangeEnd: "2000" ensureUniqueDirectory: "true" subPathPattern: "${.PVC.namespace}/${.PVC.name}" mountOptions: - tls --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: xpolog-pvc namespace: logging spec: accessModes: ["ReadWriteMany"] storageClassName: efs-sc resources: requests: storage: 20Gi
kubectl apply -f storage.yaml
image-20250901-075913.png

Cheks:

kubectl -n logging get pvc xpolog-pvc
image-20250901-080024.png

Xpolog Deployment:

nano deployment.yaml
apiVersion: apps/v1 kind: Deployment metadata: name: xpolog namespace: logging labels: app: xpolog spec: replicas: 1 strategy: type: Recreate selector: matchLabels: app: xpolog template: metadata: labels: app: xpolog spec: # IMPORTANT: match the AP-created ownership (UID/GID 1000) securityContext: runAsUser: 1000 runAsGroup: 1000 fsGroup: 1000 initContainers: # This new initContainer runs as root to fix permissions - name: fix-permissions image: 1200km/xplg:7.Release-9787 securityContext: runAsUser: 0 # Run as root to chown command: ["/bin/sh", "-c"] args: - | set -ex # Copy original app files to the shared volume cp -r /opt/xplg-service/. /workdir/ # Change ownership of the copied files to user 1000 chown -R 1000:1000 /workdir volumeMounts: - name: xplg-workdir mountPath: /workdir # Your original initContainer to prepare the EFS volume - name: init-efs image: public.ecr.aws/amazonlinux/amazonlinux:2023 securityContext: runAsUser: 1000 runAsGroup: 1000 command: ["/bin/bash","-c"] args: - | set -e mkdir -p /efs/config /efs/data /efs/logs chmod -R 0770 /efs || true volumeMounts: - name: xpolog-storage mountPath: /efs containers: - name: xpolog image: 1200km/xplg:7.Release-9787 imagePullPolicy: IfNotPresent ports: - name: http containerPort: 30303 env: - name: JAVA_TOOL_OPTIONS value: "-Xmx4g -Dxpolog.uid.structure=master" readinessProbe: httpGet: path: / # You may still need to change this to /health or another path port: 30303 initialDelaySeconds: 60 # Increased for safety periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 6 livenessProbe: httpGet: path: / # You may still need to change this to /health or another path port: 30303 initialDelaySeconds: 90 # Increased for safety periodSeconds: 20 timeoutSeconds: 5 failureThreshold: 6 resources: requests: { cpu: "500m", memory: "6Gi" } limits: { cpu: "1000m", memory: "8Gi" } volumeMounts: # Mount the shared volume with correct permissions - name: xplg-workdir mountPath: /opt/xplg-service # Your original PVC mounts - name: xpolog-storage mountPath: /home/data subPath: data - name: xpolog-storage mountPath: /opt/xplg/config subPath: config volumes: # The PVC for persistent data - name: xpolog-storage persistentVolumeClaim: claimName: xpolog-pvc # The shared volume for the application files - name: xplg-workdir emptyDir: {}
kubectl apply -f deployment.yaml

Cheks:

kubectl -n logging get pods
image-20250901-081707.png

Exposure stage, where you make XpoLog accessible inside the cluster and outside (from your browser).

nano exposure.yaml
# Internal service for in-cluster access (already correct) apiVersion: v1 kind: Service metadata: name: xpolog namespace: logging spec: type: ClusterIP selector: app: xpolog ports: - name: http port: 30303 targetPort: 30303 protocol: TCP --- # External service (NLB) for outside access apiVersion: v1 kind: Service metadata: name: xpolog-public namespace: logging annotations: service.beta.kubernetes.io/aws-load-balancer-type: "nlb" service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing" # <-- The fix spec: type: LoadBalancer selector: app: xpolog ports: - name: http-public port: 30443 targetPort: 30303 protocol: TCP
kubectl apply -f exposure.yaml

 

Add a rule that allows inbound traffic on the NodePort (31216) from any source (0.0.0.0/0).

Run the following command in your terminal to authorize the connection.

Bash
aws ec2 authorize-security-group-ingress \ --group-id sg-0d258d04e6ab37139 \ --protocol tcp \ --port 31216 \ --cidr 0.0.0.0/0

After running this command, wait about 30-60 seconds, and then try to access your URL again. It should now load successfully.


External configurationan

Open GUI, and paste external configuration directory:

/opt/xplg/config

restart xpolog

image-20250901-095012.png

Validation Checklist for XpoLog Deployment on AWS EKS with EFS and External Access

awesome — here’s a tight, copy-paste test plan for your setup. it checks pod health, EFS, mounts, and both external + internal GUI access.

Assumes: namespace logging, Deployment xpolog, PVC xpolog-pvc, public LB Service xpolog-public, ClusterIP Service xpolog.


1) Pod status

# quick health kubectl -n logging get deploy xpolog kubectl -n logging get pods -l app=xpolog -o wide kubectl -n logging logs deploy/xpolog --tail=150

✅ Success looks like: READY 1/1, Running, and logs show “XpoLog started”.

image-20250901-104824.png

2) EFS status (cluster + k8s objects)

# EFS CSI driver components healthy? kubectl -n kube-system get pods -l app.kubernetes.io/name=aws-efs-csi-driver # StorageClass exists and points to your FS ID kubectl get storageclass efs-sc -o yaml | egrep 'provisioner|fileSystemId|provisioningMode|basePath' # PVC bound to a PV kubectl -n logging get pvc xpolog-pvc kubectl -n logging describe pvc xpolog-pvc

✅ Success looks like:

  • EFS CSI controller/daemonset pods Running

  • efs-sc shows provisioner: efs.csi.aws.com and your fileSystemId: fs-0107a1b4c87fb0213

  • PVC STATUS: Bound

image-20250901-104938.png

3) Access pod & verify EFS is mounted and writable

# get the pod name & jump in POD=$(kubectl -n logging get pod -l app=xpolog -o jsonpath='{.items[0].metadata.name}') # show mounts and disk usage for the EFS-backed paths kubectl -n logging exec -it "$POD" -- sh -lc ' echo "== whoami =="; id; echo "== mounts (look for efs.csi.aws.com / nfs4) =="; mount | egrep "nfs4|efs|/opt/xplg/config|/home/data" || true; echo "== df -h for mounted dirs =="; df -h /opt/xplg/config /home/data 2>/dev/null || true; echo "== rw test =="; touch /opt/xplg/config/_rw_$(date +%s) && ls -l /opt/xplg/config | tail -n 3 '

✅ Success looks like:

  • mount shows NFS4/EFS for /opt/xplg/config and /home/data

  • df -h returns sizes for those paths

  • touch succeeds and you see the new file

image-20250901-105137.png

4) External GUI (through the NLB)

Your pod serves HTTP on 30303. The public Service exposes port 30443 but still forwards HTTP to 30303. Use <http://> (not <https://)> unless you add an Ingress with TLS.

ELB=$(kubectl -n logging get svc xpolog-public -o jsonpath='{.status.loadBalancer.ingress[0].hostname}') echo "NLB: $ELB" # quick CLI test curl -sv "http://$ELB:30443/" | head -n 20

✅ Success looks like an HTTP status line (200/3xx) and some HTML.
If it times out but the next test works, open the NodePort on the node SG.

image-20250901-105309.png

Optional (bypass NLB; proves Service/NodePort path):

NODEPORT=$(kubectl -n logging get svc xpolog-public -o jsonpath='{.spec.ports[0].nodePort}') NODEIP=$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}') echo "NodePort test -> http://$NODEIP:$NODEPORT/" curl -sv "http://$NODEIP:$NODEPORT/" | head -n 20

5) Internal GUI (via port-forward)

Option A (native port):

# terminal 1: keep running kubectl -n logging port-forward svc/xpolog 30303:30303

Then visit: http://localhost:30303

Option B (keep your 30443 habit, still HTTP):

# terminal 1: keep running kubectl -n logging port-forward svc/xpolog 30443:30303

Then visit: http://localhost:30443

✅ Each request prints “Handling connection for …” in the port-forward terminal.


image-20250901-115605.png