Custom Connection Limit for Load Balancers

Custom Connection Limit for Load Balancers

You need to set a custom limit for incoming connections on your load balancer? Here you can learn how to achieve this!

About the Connection Limit

The connection limit specifies the maximum allowed number of connections per second for a load balancer listener (open frontend port).

You may ask, why set a connection limit in the first place?
The most obvious reason would be, to prevent request flooding against your exposed services which are running in your Kubernetes cluster. You can set a tighter limit and just let your load balancer drop connections if they exceed the service capacity. But if you have proper handling for that in your cluster or if you use something like autoscaling to scale your service on demand, then you might want to increase the limit on your load balancer.

Problems with High Connection Limits

Well, in earlier versions of our load balancer service Octavia we in fact had no default limit for incoming connections. But soon we came across some issues.

Problems occurred when the load balancer was configured with more than 3 listeners. As the connection limit is applied to each listener, setting no limit at all would cause the HAProxy processes to crash if the number of listeners is further increased. More details about the problem can be found in this bug report.

In order to prevent that our customers would run into this issue, we decided to set a default connection limit of 50000.
In this post I will show you how you can customise the limit yourself.

 

Service Annotations for Load Balancers

Since the Kubernetes clusters in NWS are running on top of OpenStack, we rely on the “Cloud Provider OpenStack”. In the load balancer documentation of the provider, we can find a section with all available Kubernetes service annotations for the load balancers.

The annotation we need in order to set a custom connection limit is “loadbalancer.openstack.org/connection-limit”.

Applying the Annotation

The annotation needs to be set on the service of the load balancer. You see your load balancer as type “LoadBalancer” in the service list.

~ $ kubectl get service
NAME                                 TYPE           CLUSTER-IP       EXTERNAL-IP       PORT(S)                      AGE
ingress-nginx-controller             LoadBalancer   10.254.194.107   193.111.222.333   80:30919/TCP,443:32016/TCP   2d
ingress-nginx-controller-admission   ClusterIP      10.254.36.121    <none>            443/TCP                      2d
kubernetes                           ClusterIP      10.254.0.1       <none>            443/TCP                      2d17h
my-k8s-app                           ClusterIP      10.254.192.193   <none>            80/TCP                       47h

In my example the service was set up automatically when I installed a Nginx Ingress Controller from a helm chart.
Nevertheless I can just edit this service and set the annotation:

~ $ kubectl edit service ingress-nginx-controller

Now set the annotation for the connection limit to the desired value (“100000” in my example):

apiVersion: v1
kind: Service
metadata:
  annotations:
    loadbalancer.openstack.org/connection-limit: "100000"
    meta.helm.sh/release-name: ingress-nginx
    meta.helm.sh/release-namespace: default
  creationTimestamp: "2021-09-08T08:40:59Z"
...

Save the changes and exit the editor. You should be prompted with:

service/ingress-nginx-controller edited

If not, make sure that there is no typo and to put the value in double quotes.

After successfully editing the service, it usually takes around 10 to 15 seconds until the changes are applied via API calls on the actual OpenStack load balancer.

You can check the status of your load balancer in the web interface of NWS.
The provisioning status will be on “PENDING_UPDATE” while setting the connection limit and will go back to “ACTIVE” as soon as the changes were applied successfully.

 

Conclusion

It can be useful to set custom connection limits on the load balancers and as you just saw, it is also fairly easy to accomplish. Just be aware that you might come across the mentioned problem when using many listeners in conjunction with a very high connection limit.

Automatic Fedora CoreOS Updates for your Kubernetes

Automatic Fedora CoreOS Updates for your Kubernetes

You want automated Fedora CoresOS updates for your Kubernetes? And what do Zincati and libostree have to do with it? Here you will quickly see an overview!

Fedora CoreOS is used as the operating system for many Kubernetes clusters. This operating system, which specializes in containers, scores particularly well with simple, automatic updates. Unlike usual, it is not updated package by package. Fedora CoreOS first creates a new, updated image of the system and finalizes the update with a reboot. rpm-ostree in combination with Cincinnati and Zincati ensures a smooth process.

Before we take a closer look at the components, let’s first clarify how you can enable automatic updates for your NWS Kubernetes cluster.

 

How do you activate automatic updates for your NWS Kubernetes Cluster?

 

In the NWS portal you can easily choose between different update mechanisms. Click on “Update Fedora CoreOS” in the context menu of your Kubernetes cluster and choose between immediate, periodic and lock-based.

Shows settings for automatic periodic updatesImmediate applies updates immediately to all of your Kubernetes nodes and finalizes the update with a reboot.

Periodic updates your nodes only during a freely selectable maintenance window. In addition to the days of the week, you can also specify the start time and the length of the maintenance window.

Lock-based uses the FleetLock protocol to coordinate the updates. Here, a lock manager is used to coordinate the finalization of updates. This ensures that nodes do not finalize and reboot updates at the same time. In addition, the update process is stopped in the event of problems and other nodes do not perform an update.

Disable deactivates automatic updates.

So far, so good! But what is rpm-ostree and Zincati?

 

 

Updates but different!

 

The introduction of container-based applications has also made it possible to standardize and simplify the underlying operating systems. Reliable, automatic updates and the control of these – by the operator of the application – additionally reduce the effort for maintenance and coordination.

 

rpm-ostree creates the images

rpm-ostree is a hybrid of libostree and libdnf and therefore a mixture of image and package system. libostree describes itself as a git for operating system binaries, with each commit containing a bootable file tree. A new release of Fedora CoreOS therefore corresponds to an rpm-ostree commit, maintained and provided by the CoreOS team. libdnf provides the familiar package management features, making the base provided by libostree extensible by users.

Taints and Tolerations Nodes on which containers cannot be started or are unreachable are given a so-called taint by Kubernetes (e.g. not-ready or unreachable). As a counterpart, pods on such nodes are given a toleration. This also happens during a Fedora CoreOS update. Pods are automatically marked with tolerationSeconds=300 on reboot, which will restart your pods on other nodes after 5 minutes. Of course, you can find more about taints and tolerations in the Kubernetes documentation.

 

Cincinnati and Zincati distribute the updates

To distribute the rpm-ostree commits, Cincinnati and Zincati are used. The latter is a client that regularly asks the Fedora CoreOS Cincinnati server for updates. As soon as a suitable update is available, rpm-ostree prepares a new, bootable file tree. Depending on the chosen strategy, Zincati finalizes the update by rebooting the node.

 

 

What are the advantages?

 

Easy rollback

With libostree it is easy to restore the old state. For this, you just have to boot into the previous rpm-ostree commit. This can also be found as an entry in the grub bootloader menu.

Low effort

Fedora CoreOS can update itself without manual intervention. In combination with Kubernetes, applications are also automatically moved to the currently available nodes.

Flexible Configuration

Zincati offers a simple and flexible configuration that will hopefully allow any user to find a suitable update strategy.

Better Quality

The streamlined image-based approach makes it easier and more accurate to test each version as a whole.

 

Only time will tell whether this hybrid of image and package-based operating system will prevail. Fedora CoreOS – as the basis for our NMS Managed Kubernetes – significantly simplifies the update process while still providing our customers with straightforward control.

X-Forward-For and Proxy-Protocol

X-Forward-For and Proxy-Protocol

You want to know how to get the IP addresses of your clients in your Kubernetes cluster? In five minutes you have an overview!

From HTTP client to application

In the nginx-Ingress-Controller tutorial, we showed how to make an application publicly accessible. In the case of the NETWAYS Cloud, your Kubernetes cluster uses an Openstack load balancer, which forwards the client requests to an nginx ingress controller in the Kubernetes cluster. This then distributes all requests to the corresponding pods.

With all the pushing around and forwarding of requests, the connection details of the clients get lost without further configuration. Since the problem has not only arisen since Kubernetes, the tried and tested solutions X-Forward-For or Proxy-Protocol are used.

In order not to lose track in the buzzword bingo between service, load balancer, ingress, proxy, client and application, you can look at the path of an HTTP request from the client to the application through the components of a Kubernetes cluster in this example.

Der Weg vom HTTP-Request zur Anwendung im Kubernetes-Cluster

 

Client IP Addresses with X-Forward-For

If you use HTTP, the client IP address can be stored in the X-Forward-For (XFF) and transported further. XFF is an entry in the HTTP header and is supported by most proxy servers. In this example, the load balancer places the client IP address in the XFF entry and forwards the request. All other proxy servers and the applications can therefore recognise in the XFF entry from which address the request was originally sent.

In Kubernetes, the load balancer is configured via annotations in the service object. If you set loadbalancer.openstack.org/x-forwarded-for: true there, the load balancer is configured accordingly. Of course, it is also important that the next proxy does not overwrite the X-Forward-For header again. In the case of nginx, you can set the option use-forwarded-headers in its ConfigMap.

Service

---
kind: Service
apiVersion: v1
metadata:
  name: loadbalanced-service
  annotations:
    loadbalancer.openstack.org/x-forward-for: "true"
spec:
  selector:
    app: echoserver
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP

nginx ConfigMap

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx
data:
  use-forwarded-headers: "true"

Since the HTTP header is used, it is not possible to enrich HTTPS connections with the client IP address. Here, one must either terminate the TLS/SSL protocol at the load balancer or fall back on the proxy protocol.

 

Client Information with Proxy Protocol

If you use X-Forwarded-For, you are obviously limited to HTTP. In order to enable HTTPS and other applications behind load balancers and proxies to access the connection option of the clients, the so-called proxy protocol was invented. Technically, a small header with the client’s connection information is added by the load balancer. The next hop (here nginx) must of course also understand the protocol and handle it accordingly. Besides classic proxies, other applications such as MariaDB or postfix also support the proxy protocol.

To activate the proxy protocol, you must add the annotation loadbalancer.openstack.org/proxy-protocol to the service object. The protocol must also be activated for the accepting proxy.

Service Loadbalancer

---
kind: Service
apiVersion: v1
metadata:
  name: loadbalanced-service
  annotations:
    loadbalancer.openstack.org/proxy-protocol: "true"
spec:
  selector:
    app: echoserver
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 8080Helm Up
    protocol: TCP

Nginx ConfigMap

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx
data:
  use-proxy-protocol: "true"

In most cases, however, you will fall back on the Helm chart of the nginx ingress controller. There, a corresponding configuration is even easier.

 

nginx-Ingress-Controller and Helm

If you use Helm to install the nginx-ingress-controller, the configuration is very clear. The proxy protocol is activated for both the nginx and the load balancer via the Helm values file:

nginx-ingress.values:

---
controller:
  config:
    use-proxy-protocol: "true"
  service:
    annotations:
      loadbalancer.openstack.org/proxy-protocol: true
    type: LoadBalancer

 

$ helm install my-ingress stable/nginx-ingress -f nginx-ingress.values

The easiest way to test whether everything works as expected is to use the Google Echoserver. This is a small application that simply returns the HTTP request to the client. As described in the nginx-Ingress-Controller tutorial, we need a deployment with service and ingress. The former starts the echo server, the service makes it accessible in the cluster and the ingress configures the nginx so that the requests are forwarded to the deployment.

 

Deployment

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: echoserver
spec:
  selector:
    matchLabels:
      app: echoserver
  replicas: 1
  template:
    metadata:
      labels:
        app: echoserver
    spec:
      containers:
      - name: echoserver
        image: gcr.io/google-containers/echoserver:1.8
        ports:
          - containerPort: 8080
Service

---
apiVersion: v1
kind: Service
metadata:
  name: echoserver-svc
spec:
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP
    name: http
  selector:
    app: echoserver

Ingress

---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: echoserver-ingress
spec:
  rules:
  - host: echoserver.nws.netways.de
    http:
      paths:
        - backend:
            serviceName: echoserver-svc
            servicePort: 80

For testing purposes, it’s best to fake your /etc/hosts so that echoserver.nws.netways.de points to the public IP address of your nginx ingress controller. curl echoserver.nws.netways.de will then show you everything that the echo server knows about your client, including the IP address in the X-Forward-For header.

 

Conclusion

In the Kubernetes cluster, the proxy protocol is probably the better choice for most use cases. The well-known Ingress controllers support the proxy protocol and TLS/SSL connections can be configured and terminated in the K8s cluster. The quickest way to find out what information arrives at your application is to use Google’s echo server.

Kubernetes Alerting with Prometheus Alert manager

Kubernetes Alerting with Prometheus Alert manager

In a previous post, Sebastian explained how to monitor your Kubernetes cluster with the Prometheus Operator. This post builds on that and shows how to set up notifications via email and as push notifications with the Alert Manager.

Install the Monitoring Stack with Helm

In addition to the method shown by Sebastian for deploying the Prometheus operator, there is another variant that can be used to set up a complete Prometheus stack in the Kubernetes cluster. This is done with the help of the package manager Helm. The sources of the Helm Chart for the Prometheus stack can be found on Github. There you will also find a file containing all configuration options and default values of the monitoring stack. Besides the Prometheus operator and Grafana, the Prometheus Node Exporter, Kube State Metrics and the Alert manager are also part of the stack.

New at this point is Kube State Metrics and the alert manager. Kube State Metrics binds to the Kubernetes API and can therefore retrieve metrics about all resources in the cluster. The alert manager can alert based on a set of rules for selected metrics. In the Helm setup, each component can be configured via the options in the values file. It makes sense to use a PVC for both Prometheus and Alert manager and therefore to have the metrics and history of alarms as well as the alarm suppressions persistent.

In my case, a Persistent Volume Claim (PVC) is created for both with the following values. The name of the storage class can of course vary.

alertmanager:
  alertmanagerSpec:
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: nws-storage
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 10Gi

prometheus:
  prometheusSpec:
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: nws-storage
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 10Gi

I will save the configuration in the directory prom-config as a file stack.values. Now you can deploy the stack.
First you have to integrate the appropriate repos:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add stable https://kubernetes-charts.storage.googleapis.com/
helm repo update

The stack is then created:

helm install nws-prometheus-stack prometheus-community/kube-prometheus-stack -f prom-config/stack.values

Using port forwarding, you can call the Grafana web interface via browser and log in with the credentials admin/prom-operator.

kubectl port-forward service/nws-prometheus-stack-grafana 3002:80

Then open http://localhost:3002/ in the browser.

Some predefined dashboards are already included. However, in order to display metrics for all dashboards (scheduler, Etcd, etc.), further options have to be set in the values of Prometheus. In this post, however, I will not go into how these can be configured. Instead, as I mentioned at the beginning, I would like to demonstrate how to set up alarms. This is where the alert manager comes into play. In Grafana, however, you look for it in vain. A look at the Prometheus Web UI makes more sense. So first set up a port forward for it again:

kubectl port-forward service/nws-prometheus-stack-kube-prometheus 3003:9090

Then open http://localhost:3003/ in the browser.

Under Alerts you can see all the alarm rules that are already pre-configured in the stack. By clicking on any rule, you can see its definition. However, nothing can be configured here either. Using the buttons “Inactive”, “Pending” and “Firing”, the rules can be shown or hidden with their respective status.

We go to the next web interface – Alert manager. Here, too, we need a port forward.

kubectl port-forward service/nws-prometheus-stack-kube-alertmanager 3004:9093

The web interface (accessible at http://localhost:3004/) is similar to that of Prometheus. Under Alerts, however, you have a history of alerts that have already been triggered. Silences shows an overview of the suppressed alerts and Status contains version information and the operating status as well as the current config of the alert manager. Unfortunately, this config cannot be changed here either.

So how do we configure the alert manager?

Configuring the Alert Manager

Some will have guessed it already: by adjusting the values of the Helm Chart. So we write a new values file that contains only the alert manager config. The documentation shows us which options are available and how they can be set. We start with a basic example and first add global SMTP settings:

alertmanager-v1.values

alertmanager:
  config:
    global:
      resolve_timeout: 5m
      smtp_from: k8s-alertmanager@example.com
      smtp_smarthost: mail.example.com:587
      smtp_require_tls: true
      smtp_auth_username: k8s-alertmanager@example.com
      smtp_auth_password: xxxxxxxxx
    route:
      receiver: 'k8s-admin'
      routes: []
    receivers:
    - name: 'k8s-admin'
      email_configs:
      - to: k8s-admin@example.com

This enables the alert manager to send out e-mails. Of course, it is also necessary to define at least one recipient. Under “receivers” you can simply enter the name and e-mail address as in this example. Please make sure that the text indentation is correct, otherwise there may be start-up problems with the alert manager when rolling out the config.
To ensure that the alerts are actually delivered, the routes must be defined under “route”. If you first want to send all triggered alerts to one contact, then you can do this as in my example. We will take a closer look at what else can be done with routes in a later example. With a Helm upgrade we can deploy our new configuration for the alert manager. We will take over all the values already set by means of “-reuse-values”:

helm upgrade --reuse-values -f prom-config/alertmanager-v1.values nws-prometheus-stack prometheus-community/kube-prometheus-stack

And how do we test this now?

Testing Alerts

If you don’t want to launch a node straight away, you can simply restart the alert manager. When it is started, the “Watchdog” alert rule is triggered. This is only there to test the proper functioning of the alerting pipeline. This is how you can carry out the restart:

kubectl rollout restart statefulset.apps/alertmanager-nws-prometheus-stack-kube-alertmanager

Shortly after the restart, the e-mail should arrive. If not, first check whether the alert manager is still starting. If it hangs in a crash loop, you can find out what is going wrong by looking at the pod’s logs. If a config option is set incorrectly, you can adjust it again in the values and deploy again with a Helm upgrade. If you have mistyped an option key, you can roll back to a previous status using the Helm Rollback:

helm rollback nws-prometheus-stack 1

This gives us a rudimentary monitoring system for the Kubernetes cluster itself. Of course, you can add more receivers and therefore also notify several contacts.

Add your own Metrics and Alert Rules

Now we’ll take a look at what the routes can be useful for. However, we will first boot up a few pods for testing and create different namespaces. We will also take a quick look at the Prometheus Blackbox Exporter.

Scenario: In a K8s cluster, different environments are operated via namespaces – e.g. development and production. If the sample app of the development namespace fails, the on-call team should not be alerted. The on-call team is only responsible for failures of the sample app of the production namespace and receives notifications via email and mobile phone. The developers are informed separately by e-mail about problems with the sample app from the development namespace.

We define the two namespaces “nws-production” and “nws-development” in namespaces-nws.yaml:

apiVersion: v1
kind: Namespace
metadata:
  name: nws-development
  labels:
    name: nws-development
---
apiVersion: v1
kind: Namespace
metadata:
  name: nws-production
  labels:
    name: nws-production
kubectl apply -f ./prom-config/namespaces-nws.yaml

Now we start two sample apps that alternately return HTTP 200 and HTTP 500 in a 60 second interval. I am using a simple image that I created for this purpose (sources on GitHub).

sample-app.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nws-sample-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nws-sample-app
  template:
    metadata:
      labels:
        app: nws-sample-app
    spec:
      containers:
      - name: nws-sample-app
        image: gagaha/alternating-http-response
        ports:
        - containerPort: 80
kubectl apply -f prom-config/sample-app.yaml -n nws-production
kubectl apply -f prom-config/sample-app.yaml -n nws-development

We then expose the app in the cluster:

kubectl expose deployment nws-sample-app -n nws-production
kubectl expose deployment nws-sample-app -n nws-development

Now, however, we need a component that can query the availability of these apps via HTTP request and provide them as metrics for Prometheus. The Prometheus Blackbox Exporter is a good choice for this. It can be used to check not only HTTP/HTTPS requests but also connections with the DNS, TCP and ICMP protocols.

First we have to deploy the Blackbox Exporter in the cluster. Again, I use the official Helm Chart.

helm install nws-blackbox-exporter prometheus-community/prometheus-blackbox-exporter

Now we have to tell Prometheus how to get the black box exporter. The targets are interesting here. Here we enter the HTTP endpoints that are to be queried by the Blackbox Exporter. We write the additional configuration in a file and deploy it via Helm Upgrade:
prom-blackbox-scrape.yaml

prometheus:
  prometheusSpec:
    additionalScrapeConfigs:
     - job_name: 'nws-blackbox-exporter'
       metrics_path: /probe
       params:
         module: [http_2xx]
       static_configs:
        - targets:
           - http://nws-sample-app.nws-production.svc
           - http://nws-sample-app.nws-development.svc
       relabel_configs:
        - source_labels: [__address__]
          target_label: __param_target
        - source_labels: [__param_target]
          target_label: instance
        - target_label: __address__
          replacement: nws-blackbox-exporter-prometheus-blackbox-exporter:9115
helm upgrade --reuse-values -f prom-config/prom-blackbox-scrape.yaml nws-prometheus-stack prometheus-community/kube-prometheus-stack

If we then start the port forward for Prometheus again, we can see the two new targets of the nws-blackbox-exporter under http://localhost:3003/targets. And therefore metrics for Prometheus are already coming in. However, we also have to define new alert rules so that alerts can be issued for these metrics.

We edit the rules directly via Kubectl:

kubectl edit prometheusrules nws-prometheus-stack-kube-k8s.rules

Before the k8s.rules we add our new rule:

...
spec:
  groups:
  - name: blackbox-exporter
    rules:
    - alert: HttpStatusCode
      annotations:
        description: |-
          HTTP status code is not 200-399
            VALUE = {{ $value }}
            LABELS: {{ $labels }}
        summary: HTTP Status Code (instance {{ $labels.instance }})
      expr: probe_http_status_code &amp;amp;lt;= 199 OR probe_http_status_code &amp;amp;gt;= 400
      for: 30s
      labels:
        severity: error
  - name: k8s.rules
    rules:
...

Now we only have to define the contact details of the different recipients and routes.

Under route you can specify different recipients in routes. Of course, these recipients must also exist below. Conditions can also be defined for a route. The notification is only sent to the specified recipient if the conditions apply.

Here is the config for the scenario:
alertmanager-v2.values

alertmanager:
  config:
    global:
      resolve_timeout: 5m
      smtp_from: k8s-alertmanager@example.com
      smtp_smarthost: mail.example.com:587
      smtp_require_tls: true
      smtp_auth_username: k8s-alertmanager@example.com
      smtp_auth_password: xxxxxxxxx
    route:
      receiver: 'k8s-admin'
      repeat_interval: 5m
      routes:
      - receiver: 'dev_mail'
        match:
          instance: http://nws-sample-app.nws-development.svc
      - receiver: 'bereitschaft'
        match:
          instance: http://nws-sample-app.nws-production.svc
    receivers:
    - name: 'k8s-admin'
      email_configs:
      - to: k8s-admin@example.com
    - name: 'dev_mail'
      email_configs:
      - to: devs@example.com
    - name: 'bereitschaft'
      email_configs:
      - to: bereitschaft@example.com
      pushover_configs:
      - user_key: xxxxxxxxxxxxxxxxxxxxxxxxxxx
        token: xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
helm upgrade --reuse-values -f prom-config/alertmanager-v2.values nws-prometheus-stack prometheus-community/kube-prometheus-stack

It is best to restart the alert manager afterwards:

kubectl rollout restart statefulset.apps/alertmanager-nws-prometheus-stack-kube-alertmanager

Provided the configuration is correct, alerts should start arriving soon.

Conclusion

Setting up the alert manager can be time-consuming. However, you also have many configuration options and can set up your notifications via rules as you need them. If you wish, you can also edit the message templates and therefore adapt the format and the information contained.

Logging with Loki and Grafana in Kubernetes

Logging with Loki and Grafana in Kubernetes

You already know the most important building blocks for starting your application from our Tutorial-Serie. Are you still missing metrics and logs for your applications? After this blog post, you can tick off the latter.

Logging with Loki and Grafana in Kubernetes – an Overview

One of the best-known, heavyweight solutions for collecting and managing your logs is also available for Kubernetes. This usually consists of Logstash or Fluentd for collecting, paired with Elasticsearch for storing and Kibana or Graylog for visualising your logs.

In addition to this classic combination, a new, more lightweight stack has been available for a few years now with Loki and Grafana! The basic architecture hardly differs from the familiar setups.

Promtail collects the logs of all containers on each Kubernetes node and sends them to a central Loki instance. This aggregates all logs and writes them to a storage back-end. Grafana is used for visualisation, which fetches the logs directly from the Loki instance.

The biggest difference to the known stacks is probably the lack of Elasticsearch. This saves resources and effort, and therefore no triple-replicated full-text index has to be stored and administered. And especially when you start to build up your application, a lean and simple stack sounds appealing. As the application landscape grows, individual Loki components are scaled up to spread the load across multiple servers.

No full text index? How does it work?

Of course, Loki does not do without an index for quick searches, but only metadata (similar to Prometheus) is indexed. This greatly reduces the effort required to run the index. For your Kubernetes cluster, Labels are therefore mainly stored in the index and your logs are automatically organised using the same metadata as your applications in your Kubernetes cluster. Using a time window and the Labels, Loki quickly and easily finds the logs you are looking for.

To store the index, you can choose from various databases. Besides the two cloud databases BigTable and DynamoDB, Loki can also store its index locally in Cassandra or BoltDB. The latter does not support replication and is mainly suitable for development environments. Loki offers another database, boltdb-shipper, which is currently still under development. This is primarily intended to remove dependencies on a replicated database and regularly store snapshots of the index in chunk storage (see below).

A quick example

A pod produces two log streams with stdout and stderr. These log streams are split into so-called chunks and compressed as soon as a certain size has been reached or a time window has expired.

A chunk therefore contains compressed logs of a stream and is limited to a maximum size and time unit. These compressed data records are then stored in the chunk storage.

Label vs. Stream

A combination of exactly the same labels (including their values) defines a stream. If you change a label or its value, a new stream is created. For example, the logs from stdout of an nginx pod are in a stream with the labels: pod-template-hash=bcf574bc8, app=nginx and stream=stdout.

In Loki’s index, these chunks are linked with the stream’s labels and a time window. A search in the index must therefore only be filtered by labels and time windows. If one of these links matches the search criteria, the chunk is loaded from the storage and the logs it contains are filtered according to the search query.

Chunk Storage

The compressed and fragmented log streams are stored in the chunk storage. As with the index, you can also choose between different storage back-ends. Due to the size of the chunks, an object store such as GCS, S3, Swift or our Ceph object store is recommended. Replication is automatically included and the chunks are automatically removed from the storage based on an expiry date. In smaller projects or development environments, you can of course also start with a local file system.

Visualisation with Grafana

Grafana is used for visualisation. Preconfigured dashboards can be easily imported. LogQL is used as the query language. This proprietary creation of Grafana Labs leans heavily on PromQL from Prometheus and is just as quick to learn. A query consists of two parts:
First, you filter for the corresponding chunks using labels and the Log Stream Selector. With = you always make an exact comparison and =~ allows the use of regex. As usual, the selection is negated with !
After you have limited your search to certain chunks, you can expand it with a search expression. Here, too, you can use various operators such as |= and |~ to further restrict the result. A few examples are probably the quickest way to show the possibilities:

Log Stream Selector:

{app = "nginx"}
{app != "nginx"}
{app =~ "ngin.*"}
{app !~ "nginx$"}
{app = "nginx", stream != "stdout"}
Search Expression:


{app = "nginx"} |= "192.168.0.1"
{app = "nginx"} != "192.168.0.1"
{app = "nginx"} |~ "192.*" 
{app = "nginx"} !~ "192$"

Further possibilities such as aggregations are explained in detail in the official documentation of LogQL.

After this short introduction to the architecture and functionality of Grafana Loki, we will of course start right away with the installation. A lot more information and possibilities for Grafana Loki are of course available in the official documentation.

Get it running!

You would like to just try out Loki?

With the NWS Managed Kubernetes Cluster you can do without the details! With just one click you can start your Loki Stack and always have your Kubernetes Cluster in full view!

 

As usual with Kubernetes, a running example is deployed faster than reading the explanation. Using Helm and a few variables, your lightweight logging stack is quickly installed. First, we initialise two Helm repositories. Besides Grafana, we also add the official Helm stable charts repository. After two short helm repo add commands we have access to the required Loki and Grafana charts.

Install Helm

$ brew install helm
$ apt install helm
$ choco install kubernetes-helm

You don’t have the right sources? On helm.sh you will find a brief guide for your operating system.

 

$ helm repo add loki https://grafana.github.io/loki/charts
$ helm repo add stable https://kubernetes-charts.storage.googleapis.com/

 

Install Loki and Grafana

For your first Loki stack you do not need any further configuration. The default values fit very well and helm install does the rest. Before installing Grafana, we first set its configuration using the well-known helm values files. Save them with the name grafana.values.

In addition to the password for the administrator, Loki that has just been installed is also set as the data source. For visualisation, we import a dashboard and the required plugins. And hence you install a Grafana configured for Loki and can get started directly after the deploy.

grafana.values:

---
adminPassword: supersecret

datasources:
  datasources.yaml:
    apiVersion: 1
    datasources:
    - name: Loki
      type: loki
      url: http://loki-headless:3100
      jsonData:
        maxLines: 1000

plugins:
  - grafana-piechart-panel

dashboardProviders:
  dashboardproviders.yaml:
    apiVersion: 1
    providers:
      - name: default
        orgId: 1
        folder:
        type: file
        disableDeletion: true
        editable: false
        options:
          path: /var/lib/grafana/dashboards/default

dashboards:
  default:
    Logging:
      gnetId: 12611
      revison: 1
      datasource: Loki

 

The actual installation is done with the help of helm install. The first parameter is a freely selectable name. With its help, you can also quickly get an overview:

$ helm install loki loki/loki-stack
$ helm install loki-grafana stable/grafana -f grafana.values
$ kubectl get all -n kube-system -l release=loki

 

After deployment, you can log in as admin with the password supersecret. To be able to access the Grafana Webinterface directly, you still need a port-forward:

$ kubectl --namespace kube-system port-forward service/loki-grafana 3001:80

The logs of your running pods should be immediately visible in Grafana. Try the queries under Explore and discover the dashboard!

 

Logging with Loki and Grafana in Kubernetes – the Conclusion

With Loki, Grafana Labs offers a new approach to central log management. The use of low-cost and easily available object stores makes the time-consuming administration of an Elasticsearch cluster superfluous. The simple and fast deployment is also ideal for development environments. While the two alternatives Kibana and Graylog offer a powerful feature set, for some administrators Loki with its streamlined and simple stack may be more enticing.

Manage Kubernetes Nodegroups

Manage Kubernetes Nodegroups

As of this week, our customers can use the “node group feature” for their NWS Managed Kubernetes Cluster plan. What are node groups and what can I do with them? Our seventh blog post in the series explains this and more.

What are Node Groups?

With node groups, it is possible to create several Kubernetes node groups and manage them independently of each other. A node group describes a number of virtual machines that have various attributes as a group. Essentially, it determines which flavour – i.e. which VM-Model – is to be used within this group. However, other attributes can also be selected. Each node group can be scaled vertically at any time independently of the others.

Why Node Groups?

Node groups are suitable for executing pods on specific nodes. For example, it is possible to define a group with the “availability zone” attribute. In addition to the already existing “default-nodegroup”, which distributes the nodes relatively arbitrarily across all availability zones. Further node groups can be created, each of which is explicitly started in only one availability zone. Within the Kubernetes cluster, you can divide your pods into the corresponding availability zones or node groups.

Show existing Node Groups


The first image shows our exemplary Kubernetes cluster “k8s-ses”. This currently has two nodegroups: “default-master” and “default-worker”.

 

Create a Node Group

A new nodegroup can be created via the ‘Create Nodegroup’ dialogue with the following options:

  • Name: Name of the nodegroup, which can later be used as a label for K8s
  • Flavor: Size of the virtual machines used
  • Node Count: Number of initial nodes, can be increased and decreased later at any time
  • Availability Zone: A specific availability zone
  • Minimum Node Count: The node group must not contain fewer nodes than the defined value
  • Maximium Node Count: The node group cannot grow to more than the specified number of nodes

The last two options are particularly decisive for AutoScaling and therefore limit the automatic mechanism.


You will then see the new node group in the overview. Provisioning the nodes takes only a few minutes. The number of each group can also be individually changed or removed at any time.

 

Using Node Groups in the Kubernetes Cluster

Within the Kubernetes cluster, you can see your new nodes after they have been provisioned and are ready for use.

 
kubectl get nodes -L magnum.openstack.org/role
NAME                                 STATUS   ROLES    AGE   VERSION   ROLE
k8s-ses-6osreqalftvz-master-0        Ready    master   23h   v1.18.2   master
k8s-ses-6osreqalftvz-node-0          Ready    <none>   23h   v1.18.2   worker
k8s-ses-6osreqalftvz-node-1          Ready    <none>   23h   v1.18.2   worker
k8s-ses-zone-a-vrzkdalqjcud-node-0   Ready    <none>   31s   v1.18.2   zone-a
k8s-ses-zone-a-vrzkdalqjcud-node-1   Ready    <none>   31s   v1.18.2   zone-a
k8s-ses-zone-a-vrzkdalqjcud-node-2   Ready    <none>   31s   v1.18.2   zone-a
k8s-ses-zone-a-vrzkdalqjcud-node-3   Ready    <none>   31s   v1.18.2   zone-a
k8s-ses-zone-a-vrzkdalqjcud-node-4   Ready    <none>   31s   v1.18.2   zone-a

The node labels magnum.openstack.org/nodegroup and magnum.openstack.org/role bear the name of the node group for nodes that belong to the group. There is also the label topology.kubernetes.io/zone, which carries the name of the Availability Zone.

Deployments or pods can be assigned to nodes or groups with the help of the nodeSelectors:

 
nodeSelector:
  magnum.openstack.org/role: zone-a

 

Would you like to see for yourself how easy a Managed Kubernetes plan is at NWS? Then try it out right now at: https://nws.netways.de/de/kubernetes/