To Russia with Love: Deploying Kubernetes in Foreign Locations

20-03-2019 / CloudOps

A few years ago, CloudOps started working with a large European client who wanted to migrate their workloads into Google Cloud Platform (GCP). They were looking to modernize their application with containers and use the most mature Kubernetes distribution at the time: Google Kubernetes Engine (GKE). They additionally wanted their application platform to use libraries of automation recipes based on Terraform and Ansible. CloudOps helped them through this migration process, and the architecture was completed in March, 2018.

Deploying Kubernetes in Exotic Locations

The existing architecture, shown in the image to the left, had a hardware-based load balancer, which served as an entry point and was able to service content both from our CDN as well as the actual application running inside Kubernetes. The architecture was comprised of many components. Redis was used for session caching. Fluentd was a logging container, which integrated directly with GCP’s StackDriver. We created a custom Fluentd container that logged into a custom ELK stack. These components all had to be migrated to an on-prem provider based in Russia.

The architecture met our client’s key technical and business requirements. However, they had a large number of customers in the Russian Federation, which requires Russian data sovereignty for all individuals in Russia (through 152-FZ). Our client therefore needed to augment their containerized infrastructure in GCP with a solution hosted in Russia. There were a few steps along the way.

Finding a Cloud Provider

To date, there are no hyper-scale cloud providers in Russia, including GCP, so a traditional VM hosting on-prem solution was the only option.

Finding a Russian on-prem provider was the first step. We evaluated about five, and they all varied greatly in quality. We eventually chose a VMWare CloudDirector-based provider that has fully 152-FZ compliant environments and was able to boast a few Terraform plugins.

Automating and/or Provisioning VMs

The cloud provider had a Terraform provider plugin, but it was difficult to use because of the geographic distance between their API servers and our teams. This made states unreliable and caused many timeouts. Additionally, the plugins were using outdated versions of VMWare products that were incompatible.

All this meant we couldn’t use most of our automation recipes. As we couldn’t reliably use this Terraform provider to automate much of the work, we ended up having to manually configure new recipes for this specific use case.

Our hosts needed to work in two different flavours of Debian 9 base images, with 40 and 100 gigabyte root disks respectively. We created a base image to run automations against in order to secure VMs that could be used to build the infrastructure. It was a very manual and time-consuming process, but it allowed our computing resources to reach Russia. With time, the UI became easier to use.

The UI itself was a funny story. It was an old version of Cloud Director and was a Flash-based administrative UI. They recommended that we use Internet Explorer to make sure all it’s features worked. This should raise many red flags in the reader’s mind, as it did in the author’s. Opting for caution, we ran the UI inside a dedicated VM with nothing else on it for a few months before trusting the application. Fortunately, the UI has been working without issues for over a year now. We were successfully able to provision all the VMs and could move to the next step.

Choosing a Kubernetes Distribution

As we couldn’t use or extend any public cloud managed Kubernetes service, namely GKE, EKS, or AKS, we had to find a Kubernetes distribution that was completely self-contained and easy to operate.

We didn’t want to use Kubespray or Kubeadm as they both take too long to install and are complex to configure and run. They also both historically had issues creating multi-master setups and were known for making operating clusters difficult in the long-run.

We decided to use Rancher Kubernetes Engine (RKE), which I think is the the best custom Kubernetes installer to date. All RKE requires to install Kubernetes clusters is a VM box that runs Docker and is preferably compatible with a Kubernetes version. You can still run non-compatible versions, but there are more risks. You also need an SSH login, but that’s it – really!
There are many reasons why RKE is increasingly being viewed as the best open source Kubernetes distribution. Chick-fil-A wrote an interesting article explaining why they reached that conclusion too.

Operating Rancher Kubernetes Engine

To run RKE, we downloaded the single ‘rke’ binary and created a single YAML file that would specify our Kubernetes cluster.

RKE has a set of commands to operate, install, or decommission clusters. The basic command, ‘rke up’, will connect SSH tunnels, detect the state of your cluster, and then bootstrap a Kubernetes installation.

This altogether allows you to run Kubernetes entirely inside Docker images that come from Rancher. It took us five minutes to bootstrap a Kubernetes cluster with three master nodes and five worker nodes. The multi-master worked out of the box with a fully distributed etcd. Once the cluster is provisioned, the RKE binary would exit and output a KUBECONFIG client certificate that could be used to interact with the cluster. Customizing configurations was simple with RKE.

       spec:
        imagePullSecrets:
        - name: my-gcr-secret
      nodes:
        address: masterworker
        port: “22”
        role: 
        controlplane
        etcd
        worker
      services:
        Kube-api:
        Service_cluster_ip_range: 10.43.0.0/21
      network: 
        plugin: canal
      authentication:
        strategy: x509
      ssh_key_path: “/path/to/key”
      authorization:
        mode: rbac
      ignore_docker_version: true
      docker
      cluster_name: “my-cluster”

The image to the left shows a minimal RKE configuration file, showing what must be set. At the very least, you need a single node that you define as a controlplane, an etcd, a worker node, or any combination thereof. Define the IP ranges for your pods and services. In our case, we were using the canal CNI. This meant that we didn’t need to pick addresses that could be routed outside the cluster as they were encapsulated packets. Pick any reasonable range – I find 10.43 to be a good range. Define the cluster DNS server (11th IP of the Service address range is standard, so .10). Finally, enable RBAC and tell it to ignore Docker versions.

RKE is easy to install and is known for its simple lifecycle management.

To add new nodes, simply add them to the appropriate section in cluster.yaml and run ‘rke up’ once again.

To decommission nodes, remove them from the cluster.yaml and run ‘rke up.’

You can also fix specific components of Kubernetes without upgrading the entire cluster by changing the Docker image to the desired version.
To upgrade Kubernetes entirely, adjust cluster.yaml, add a new binary, and run ‘rke up.’ RKE will do rolling restarts and upgrades. Whatever you do, remember not to skip in-between versions when upgrading.

Solving for Key Cloud Features

RKE is a great tool but is not a complete solution in of itself for our use case. There are a few key features missing:

– The ability to communicate with Google Container Registry to pull images
– An analog for CloudSQL
– Container Storage Interface (CSI Driver) that can enable persistent volumes and persistent volume claims
– Load balancer
– Analog for Google Object Storage
– Analog for CloudCDN

Our use case needed each of these features. Let’s take a look at how we solved each of these problems in order.

Google Container Registry

To pull images from the Google Container Registry, we used a remote Kubernetes secret from GCR outside of GCP. We were able to do so because none of these images contained customer data. With a proper Kubernetes secret, it’s easy to enable remote pulling of images from GCR outside of GCP:

       apiVersion: v1
      data:
        .dockerconfigjson: BASE64_ENCODED_SERVICE_ACCOUNT_JSON_KEY
      kind: Secret
      metadata:
        name: my-gcr-secret
        namespace: default
      type: kubernetes.io/dockerconfigjson

And then use that secret the deployment manifest:

       spec:
            imagePullSecrets:
            - name: my-gcr-secret

Analog for CloudSQL

To solve for running CloudSQL on-prem, we just had to install MySQL.

We used the same Debian 9 image that we used in Kubernetes clusters. We were able to use some of our existing Ansible playbooks to configure MySQL 5.7.

We did a standard deployment of a single master with two slaves. One was a real time slave for doing reporting and failover. The other was a time delayed slave that shipped binlogs instantly, but didn’t apply them for thirty days. This allowed us to restore them to a point in time even if there was a replicated corruption.

CloudSQL and Google Container Registry were both fairly straightforward problems to solve.

Storage Solution – CDN

Object Storage required a more interesting solution.

At the time, Rook/Ceph were not production ready on-premises, so we used GlusterFS, an off-the-shelf technology from RedHat, to provide object-like storage functionality.

We set up two NGINX Ingress nodes to serve content from GlusterFS to act as Content Delivery Network (CDN) as well as reverse proxy the Kubernetes cluster service NodePorts and Ingress ports.

Each Ingress node gets replicated data in GlusterFS on a dedicated device brick and serves data from GlusterFS on a specific path for all CDN data.

The GlusterFS volume would then be mounted on Kubernetes worker nodes for read/write and added to the deployment as a HostPath volume under /cdn-data. This would allow each pod to write content for the CDN.

This was an adequate solution for object storage and CDN.

Storage Solution – Redis

Depending on which flavour of Redis you use, you may need to use persistent volume claims to store its data in session caches. We were using the Helm version of Redis as a session cache, which does require PVCs. Since we had no PVC ability, we temporarily reconfigured the application to use the SQL database as a session cache instead of Redis as a workaround.

Final Storage Solution – Rook/Ceph

In December 2018, at Kubecon in Seattle, Rook/Ceph was promoted as an ‘incubating’ project at the CNCF, and the Ceph driver was marked production ready.   We decided it was time for us to start using this technology, albeit in a limited manner: to provide Persistent Storage to Redis Session Cache and possibly for the ElasticSearch component part of ELK. This way, if we experienced operational difficulties with Rook/Ceph, we’d at most lose Logging Data (not the end of the world) and Sessions would invalidate (also not a huge deal relatively speaking).

There is a minor configuration change that is required for RKE’s cluster.yaml file to allow it to pick up the CSI Driver that Rook will install once it’s running:

       kubelet:
       extra_args:
        volume-plugin-dir: 
      /usr/libexec/kubernetes/kubelet- plugins/volume/exec
       extra_binds:
        - 
/usr/libexec/kubernetes/kubelet-plugins/volume/exec:/usr/libexec/kubernetes/kubelet-plugins/volume/exec

Running ‘rke up’ on an existing cluster will fix this in place.

Finally we can add the Rook Helm Chart Repo to our local Helm install, and install the Rook Operator (v0.9.3 as of this writing):

helm repo add rook-stable https://charts.rook.io/stablehelm install --namespace rook-ceph-system rook-stable/rook-ceph

From there we used the examples in the Rook source tree located under /cluster/examples/kubernetes/ceph to create a ‘Ceph Cluster’ (cluster.yaml) and a Storage Class (storageclass.yaml) to allocate a Ceph Replica Pool and tie it together with a Kubernetes Storage Class (called rook-ceph-block).

IMPORTANT: the default values are definitely not production ready here. Specifically, the replicated.size configuration item in storageclass.yaml is default 1.  This means every bit of data stored in the Rook/Ceph cluster only exists once.  Depending on the number of nodes in your cluster, you may want to increase this. For our use case, in production, we went with replicated.size: 3

We can now use Redis, telling it to use that storage class using Helm.

       helm install --name=redis stable/redis \ 
      --set master.persistence.storageClass=rook-ceph-block\ 
      --set slave.persistence.storageClass=rook-ceph-block

Find a Logging Solution

Fluentd is the de facto standard logging agent for Kubernetes that is also used by GKE. It’s implemented in Ruby and has a non-dynamic language-based agent in development (called fluent-bit).

It has many input and output plugins and has been under active development for a few years. We needed our logging solution to be a comprehensive monitoring configuration for Kubernetes, have sane application log parsing out of the box, and sink to ElasticSearch (E from ELK).

We had to use a custom configuration to do so. Version 1.2 has a dependency on a non-standard Debian library that exists on the worker node and causes a very high load if absent. The out-of-the-box JSON in JSON parsing didn’t work. We found this to be the best version of the FluentD image to use for our use case:

       fluent/fluentd-kubernetes-daemonset:v1.3.1-debian-elasticsearch-1.3

We appended the following configuration to Fluentd image kubernetes.conf to enable proper JSON parsing of our application logs and to cast some log item types for easier searching in ElasticSearch:

      @type parser
    @type json
    json_parser json
    types elapsed_time:float,status_code:integer,bytes_sent:integer
  
  replace_invalid_sequence true
  emit_invalid_record_to_error false
  key_name log
  reserve_data true

We placed the complete contents into a configmap called ‘fluentd-config’ that had a single key ‘kubernetes.conf’ which contained the entire fluentd config. We then mounted that config into the FluentD, overriding the kubernetes.conf from inside the image:

       volumes:
- name: fluentd-config
configMap:
name: fluentd-config

  volumeMounts:
    - name: fluentd-config
      subPath: kubernetes.conf
      mountPath: /fluentd/etc/kubernetes.conf

Find more Compute Resources

The Russian installation initially had a separate dedicated ELK cluster (3 VMs for HA). It was both underutilized and overprovisioned.

We wanted to use the compute resources in the Kubernetes cluster, and run ELK inside. We were able to use The official Elastic-Stack Helm chart to do so.

We wiped the ELK cluster VMs with a fresh Debian 9 image and then added the old ELK nodes to the RKE ‘cluster.yaml’ file as ‘worker’ nodes and then ran ‘rke up.’ A few minutes later, the compute capacity from the ELK cluster had been moved to the Kubernetes cluster.

Finally, we installed ELK using Helm:

       helm install –name elk stable/elastic-stack
–set elasticsearch.data.persistence.size=50Gi
–set elasticsearch.data.persistence.storageClass=rook-ceph-block
–set elasticsearch.master.persistence.storageClass=rook-ceph-block
–set kibana.env.ELASTICSEARCH_URL=http://elk-elasticsearch-client:9200

IMPORTANT: Kubernetes’ default settings for ELK assume a certain performance profile. The default JVM Heap memory configuration for ElasticSearch is quite low compared to a standalone (outside of Kubernetes) cluster, so be aware of how much data you are sending there daily. You MUST use a curator to close indices and eventually clean them up. For our use case, we decided to keep 3 days of open indices in memory, and trimmed data older than 90 days. You may opt to keep indices in memory longer, and therefore increase the JVM Heapsize for the ElasticSearch Server, which can be set with an override value on the Helm chart.

Load Balancing

The Russian cloud provider didn’t have the ability to assign hardware load balancers to front the infrastructure. The solution we decided to use had two NGINX Ingress nodes and a round robin DNS resolutions for every request. It was an adequate solution that was able to reverse proxy all traffic on 80 and 443 while serving CDN traffic on another path. Our solution used no SSL termination, which may have reset some ssl sessions. Instead, we used the NGINX Ingress controller in Kubernetes instead.

End State

There were a few unexpected challenges along the way, but we were successfully able to build a Russian on-prem augmentation to our client’s GCP infrastructure. We had to adapt our usual processes and the Russian deployment looked quite different from the main infrastructure, but everything ended up working well. The process of deploying Kubernetes on prem took a bit more time as we couldn’t rely on our tried and true recipes and processes. Nonetheless, it was great to see the infrastructure up and running.

Learn more about modernizing your application with containers and sign up for one of our hands-on DevOps workshops available remotely and in a variety of cities. Visit our workshops calendar for more information.

New call-to-action