Kubernetes Configuration: Migrating Elasticsearch from EC2 Instances

December 16, 2022

In our previous article, I explored the question: Why migrate to Kubernetes? I looked at the resources needed for this migration, and you might remember that I showed you the architecture of an Elasticsearch cluster running on Kubernetes.

In this article, I will describe the essential configuration the adjoe Cloud Engineering team needed to carry out in order to migrate Elasticsearch from EC2 instances to Kubernetes. We’re talking ECK operators, Helm charts – you name it.

Let’s dive deeper into the details.

Configuring an Elasticsearch Cluster via ECK Helm Chart

Elasticsearch offers an ECK operator that not only makes deploying Elasticsearch and Kibana simple; it also goes further by handling most of the mundane tasks that require human intervention. We’re talking about tasks such as upgrading or updating the cluster, and adding and/or removing nodes to/from the cluster. But the ECK operator does this all for us without any downtime.

We use Terraform to manage our Kubernetes cluster, which runs on AWS EKS. The ECK operator was installed on the Kubernetes cluster using the ECK Helm chart.

resource "helm_release" "es_operator" {
 name             = "elasticsearch-operator"
 repository       = "https://helm.elastic.co"
 chart            = "eck-operator"
 create_namespace = true
 namespace        = "elastic-system"
 version          = var.eck_operator_version
}

We then create a Helm chart to deploy the following custom resource, which instructs the operator to make an Elasticsearch cluster. This is what it looks like.

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
 annotations:
   eck.k8s.elastic.co/downward-node-labels: "topology.kubernetes.io/zone"
 name: {{ .Values.cluster_name }}
spec:
 version: {{ .Values.es_version }}
 auth:
   fileRealm:
   - secretName: secret-basic-auth
 http:
   service:
     spec:
       type: NodePort
       ports:
       - name: http
         port: 9200
         targetPort: 9200
   tls:
     selfSignedCertificate:
       disabled: true
 nodeSets:
   - name: masters
     count: {{ .Values.master_count }}
     config:
       node.attr.zone: ${ZONE}
       cluster.routing.allocation.awareness.attributes: k8s_node_name,zone
       bootstrap.memory_lock: true
       node.roles: ["master"]
       xpack.ml.enabled: true
     podTemplate:
       spec:
       # restricts Elasticsearch nodes so they are only scheduled on Kubernetes hosts tagged with label instance-type: m5.2xlarge
         affinity:
           nodeAffinity:
             requiredDuringSchedulingIgnoredDuringExecution:
               nodeSelectorTerms:
               - matchExpressions:
                 - key: node.kubernetes.io/instance-type
                   operator: In
                   values: {{- range .Values.kube_es_master_instance_type }}
                     - {{ . }}
                     {{- end }}
         containers:
           - name: elasticsearch
             env:
               - name: ZONE
                 valueFrom:
                   fieldRef:
                     fieldPath: metadata.annotations['topology.kubernetes.io/zone']
             resources:
               requests:
                 memory: {{ .Values.master_memory_request }}
                 cpu: {{ .Values.master_cpu_request }}
               limits:
                 memory: {{ .Values.master_memory_limit }}
                 cpu: {{ .Values.master_cpu_limit }}
           # Pod topology spread constraints to spread the Pods across availability zones in the Kubernetes cluster.
         topologySpreadConstraints:
           - maxSkew: {{.Values.kube_es_master_maxSkew}}
             topologyKey: topology.kubernetes.io/zone
             whenUnsatisfiable: DoNotSchedule
             labelSelector:
               matchLabels:
                 elasticsearch.k8s.elastic.co/cluster-name: {{ .Values.cluster_name }}
     volumeClaimTemplates:
       - metadata:
           name: elasticsearch-data
         spec:
           accessModes:
             - ReadWriteOnce
           resources:
             requests:
               storage: {{ .Values.master_disk_size }}
           storageClassName: {{ .Values.storage_class }}
   - name: data
     count: {{ .Values.data_count }}
     config:
       node.attr.zone: ${ZONE}
       cluster.routing.allocation.awareness.attributes: k8s_node_name,zone
       bootstrap.memory_lock: true
       node.roles: ["data"]
     podTemplate:
       spec:
       # restricts Elasticsearch nodes so they are only scheduled on Kubernetes hosts tagged with any of the specified instance types.
         affinity:
           nodeAffinity:
             requiredDuringSchedulingIgnoredDuringExecution:
               nodeSelectorTerms:
               - matchExpressions:
                 - key: node.kubernetes.io/instance-type
                   operator: In
                   values: {{- range .Values.kube_es_data_instance_type }}
                     - {{ . }}
                     {{- end }}
         containers:
           - name: elasticsearch
             env:
               - name: ZONE
                 valueFrom:
                   fieldRef:
                     fieldPath: metadata.annotations['topology.kubernetes.io/zone']
             resources:
               requests:
                 memory: {{ .Values.data_memory_request }}
                 cpu: {{ .Values.data_cpu_request }}
               limits:
                 memory: {{ .Values.data_memory_limit }}
                 cpu: {{ .Values.data_cpu_limit }}
       # Pod topology spread constraints to spread the Pods across availability zones in the Kubernetes cluster.
         topologySpreadConstraints:
           - maxSkew: {{.Values.kube_es_data_maxSkew}}
             topologyKey: topology.kubernetes.io/zone
             whenUnsatisfiable: DoNotSchedule
             labelSelector:
               matchLabels:
                 elasticsearch.k8s.elastic.co/cluster-name: {{ .Values.cluster_name }}
     volumeClaimTemplates:
       - metadata:
           name: elasticsearch-data
         spec:
           accessModes:
             - ReadWriteOnce
           resources:
             requests:
               storage: {{ .Values.data_disk_size }}
           storageClassName: {{ .Values.storage_class }}

How Do You Configure Kibana via Helm Chart?

There are two configurations set, one for the Elasticsearch master nodes; the other is for data nodes. Both sections consist of similar configurations.

One of the most important parts of configuration are the topology spread constraints. These were specified in order for the pods to be spread on all the availability zones. We did this because we didn’t want all or most of the pods to be scheduled in one availability zone. We also wanted the pods to be scheduled on certain instance types only, which is described in both of the sections as well.

Then we come to Kibana. We not only needed the Kibana pod but also a proxy container to handle all the incoming traffic, authenticate the user with Google SSO, and then redirect the authenticated user to Kibana. SSO is also available as a feature in Kibana; however, we use the basic Elasticsearch license, which does not include this feature. Hence we used OAuth2 Proxy for Google authentication.

Here’s the Helm chart for Kibana.

apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
 name: kibana
spec:
 version: {{ .Values.es_version }}
 http:
   service:
     spec:
       type: NodePort
       ports:
       - name: http
         port: 80
         targetPort: 3000
   tls:
     selfSignedCertificate:
       disabled: true
 count: 1
 elasticsearchRef:
   name: {{ .Values.cluster_name }}
 config:
   server.publicBaseUrl: {{ .Values.kibana_url }}
   xpack.security.authc.providers:
     anonymous.anonymous1:
       order: 0
       credentials:
         username: "xxx"
         password: {{ .Values.es_readonly_password }}
     basic.basic1:
       order: 1
 podTemplate:
   spec:
     containers:
     - name: kibana
       resources:
         requests:
           memory: {{ .Values.kibana_memory_request }}
           cpu: {{ .Values.kibana_cpu_request }}
         limits:
           memory: {{ .Values.kibana_memory_limit }}
           cpu: {{ .Values.kibana_cpu_limit }}
       volumeMounts:
       - name: elasticsearch-templates
         mountPath: /etc/elasticsearch-templates
         readOnly: true
     - name: kibana-proxy
       image: 'quay.io/oauth2-proxy/oauth2-proxy:latest'
       imagePullPolicy: IfNotPresent
       args:
         - --cookie-secret={{ .Values.cookie_secret }}
         - --client-id={{ .Values.client_id }}
         - --client-secret={{ .Values.client_secret }}
         - --upstream=http://localhost:5601
         - --email-domain=example.com
         - --footer=-
         - --http-address=http://:3000
         - --redirect-url={{ .Values.redirect_url }}
         - --custom-sign-in-logo=https://path/to/logo
       ports:
         - containerPort: 3000
           name: http
           protocol: TCP
       resources:
         requests:
           memory: {{ .Values.proxy_memory_request }}
           cpu: {{ .Values.proxy_cpu_request }}
         limits:
           memory: {{ .Values.proxy_memory_limit }}
           cpu: {{ .Values.proxy_cpu_limit }}
     volumes:
       - name: elasticsearch-templates
         configMap:
           name: ilm-and-index-templates

As you might notice in the metadata, a service of type NodePort is requested. An ingress load balancer (not in this configuration) has been configured to route http and https traffic to port 3000 of the Kibana-proxy container. The .Values.{variable} fields in these configurations are the placeholders for the variables, which will be passed from the various environments.

Automating Post-Cluster Setup Configurations

Once the charts were deployed and the cluster was up and running, we also needed to carry out some manual configurations. This included index templates, index lifecycle management, policies, roles, data view index pattern creation, etc.

To automate these tasks, we used Kubernetes resources – that is, ConfigMaps and CronJobs. We created all the necessary API requests as JSON files and, using ConfigMaps, mounted them on the attached volumes. A shell script that makes the API requests using the JSON files is also mounted. This script is executed on a daily basis with CronJob as well as after spawning the Kibana pod.

Here’s the CronJob.

apiVersion: batch/v1
kind: CronJob
metadata:
 name: script-execution
spec:
 schedule: "0 5 * * *"
 jobTemplate:
   spec:
     template:
       spec:
         containers:
         - name: script-execution
           image: alpine/curl:latest
           imagePullPolicy: IfNotPresent
           command:
           - /bin/sh
           - -c
           - sh /etc/elasticsearch-templates/execution-script.sh
           volumeMounts:
           - name: elasticsearch-templates
             mountPath: /etc/elasticsearch-templates
             readOnly: true
         restartPolicy: OnFailure
         volumes:
           - name: elasticsearch-templates
             configMap:
               name: ilm-and-index-templates

The ConfigMap is mounted on the volumes using Terraform.

resource "kubernetes_config_map" "ilm-and-index-templates" {
 metadata {
   name      = "ilm-and-index-templates"
   namespace = var.namespace
 }
 data = {
    "application-indices.json" = templatefile("${path.module}/templates/application-indices.json",
     {
       ENV            = var.environment
       APP_LOG_PREFIX = var.app_log_prefix
     }
   )
   "execution-script.sh" = templatefile("${path.module}/templates/execution-script.sh",
     {
       ENV             = var.environment
       CLUSTER_NAME    = var.cluster_name
       ES_USER         = var.es_user
       ES_PASSWORD     = data.kubernetes_secret.pass.data["elastic"]
   )
   "delete-old-indices-policy.json" = templatefile("${path.module}/templates/delete-old-indices-policy.json",
     {
       RETENTION_AGE = var.retention_age
     }
   )
 }
}

To give you an idea, I’ll give you the shell script, too.

# Life Cycle Policy
curl -s -XPUT "http://${ES_USER}:${ES_PASSWORD}@${CLUSTER_NAME}-es-http.elasticsearch.svc:9200/_ilm/policy/Delete_app_indices" -H 'Content-Type: application/json' -d @/etc/elasticsearch-templates/delete-old-indices-policy.json


# Index templates
curl -s -XPUT "http://${ES_USER}:${ES_PASSWORD}@${CLUSTER_NAME}-es-http.elasticsearch.svc:9200/_index_template/application-indices" -H 'Content-Type: application/json' -d @/etc/elasticsearch-templates/application-indices.json

Following on from Our Kubernetes Configuration

In the next article, I discuss the lessons we have learned as a team while migrating Elasticsearch from EC2 instances to Kubernetes.

What could we have done initially in order to optimize the time it took to find a working solution?
Was this migration worthwhile? Or was it just some fancy way to run Elasticsearch without any added benefit to the simple installation on EC2 (or a physical node)?

Stay tuned for my final article!

Senior DevOps Engineer (f/m/d)

adjoe
Cloud Engineering
Full-time

Apply for this job

adjoe is a leading mobile ad platform developing cutting-edge advertising and monetization solutions that take its app partners’ business to the next level. Part of the applike group ecosystem, adjoe is home to an advanced tech stack, powerful financial backing from Bertelsmann, and a highly motivated workforce to be reckoned with. We are looking for a Senior DevOps engineer to strengthen the Cloud Engineering team.

Meet Your Team: Cloud Engineering

The Cloud Engineering team is the core of adjoe’s tech department. It is responsible for the underlying infrastructure that helps adjoe’s developers to run their software – and the company to grow its business.

From various AWS services to Kubernetes and Apache foundation open source projects, the team continuously validates new cloud and architecture services to efficiently handle a huge amount of data. Cloud Engineering tackles the challenge of choosing when to use self-managed or managed services to reduce $300K of monthly hosting costs, while still ensuring convenience and data security for adjoe’s developers.

Because adjoe needs to ensure high-quality service and minimal downtime to grow its business, Cloud Engineering invests heavily in monitoring and alerting technologies for insights into system health (networking, application logs, cloud service information, hardware, etc.). The cloud engineers also provide working solutions, knowledge, and documentation to the entire community of adjoe developers, giving them the autonomy to work on the infrastructure themselves and ensure the smooth sailing of adjoe’s systems.

What You Will Do

You will work together in a team of experienced DevOps engineers to reinvent our cloud infrastructure by introducing new technologies and improving the existing environment.

You will help transfer our current managed AWS cloud infrastructure to self-hosted and open source technologies: We believe a hybrid combination between managed and self-hosted offers the best cost/efficiency ratio.

You will support our developers in building a high-performance backend with Go (based on our existing backend structures separated over several globally located data centers).

You will collaborate with experts from different technological backgrounds and countries, learn from highly experienced colleagues, and share your knowledge.

You will work with our current tech stack: Go, DruidDB, Kafka, DynamoDB, ScyllaDB, RDS, Kubernetes, Terraform, Gitlab, ECS, EMR, Lambda, complex CI/CD pipelines, Prometheus, Data pipelines, and many more.

You will introduce new technologies, including migrating part of the architecture to our new Kubernetes and Kafka clusters and introducing Apache Spark/Flink and our own hosted object storage.

You will troubleshoot issues in complex systems, conduct root cause analysis, and implement appropriate solutions.

You will provide mentorship and technical guidance to junior team members, fostering their professional growth.

Who You Are

You are a skilled DevOps engineer/SRE/Platform engineer with a strong passion for improving scalability and cloud infrastructure and a keen interest in coding using languages like Golang, Rust, Python, etc.

You have a deep understanding of Kubernetes and ECS.

You have a profound understanding of the AWS ecosystem and infrastructure as code with Terraform.

You have good knowledge of microservice architecture and c communication between microservices (e.g. Topics, Queues, Object storage, etc.).

You have a deep with CI/CD tools and experience with building and maintaining pipelines.

You have strong problem-solving skills and ability to tackle complex technical challenges.

You are self-motivated and eager to learn new technologies and tools.

You are open to relocating to Hamburg, Germany

Heard of Our Perks?

Work-Life Package: 2 remote days per week, 30 vacation days, 3 weeks per year of remote work, flexible working hours, dog-friendly kick-ass office in the center of the city.

Relocation Package: Visa & legal support, relocation bonus, reimbursement of German Classes costs, and more.

Happy Belly Package: Monthly company lunch, tons of free snacks and drinks, free breakfast & fresh delicious pastries every Monday

Physical & Mental Health Package: In-house gym with a personal trainer, various classes like Yoga with expert teachers & free of charge access to our EAP (Employee Assistance Program) to support your mental health and well-being

Activity Package: Regular team and company events, and hackathons.

Education Package: Opportunities to boost your professional development with courses and training directly connected to your career goals

Wealth building: virtual stock options for all our regular employees

Skip writing cover letters. Tell us about your most passionate personal project, your desired salary and your earliest possible start date. We are looking forward to your application!

We welcome applications from people who will contribute to the diversity of our company.

Apply for this job

Conquer cloud technologies at adjoe

See vacancies

Contents