We rely on containers and Kubernetes to operate the NETWAYS Cloud and our other services. Among other things, this approach allows us to cleanly separate different microservices on the same hypervisor and operate multi-tenant SaaS solutions.
However, this architecture also creates new attack surfaces, for example. out of containers.
Software solutions such as Falco, a Graduated CNCF Projectcan provide a remedy here: Falco monitors data sources such as the kernel or Kubernetes’ audit logs in real time and sounds the alarm if something suspicious happens.
Falco: The motion detector on our systems
Falco sits in our data center as a Docker container on every hypervisor and reads previously defined events at kernel level. This is made possible by Falco’s architecture: depending on the target system, the software uses either a kernel module or a so-called eBPF probe, which is installed in the system’s kernel and enables communication with the agent in user space.
If an event to be logged is detected using the predefined or added rules, Falco writes it to stdout in its container. Here we can then collect it with Fluent Bit and write it to our OpenSearch backend for auditing purposes.
But which events are recorded by Falco and lead to an alarm? And how do you formulate further rules? The software provides its own configuration format for this.
Falco Rules and Condition Syntax
Falco’s configuration is done via one or more configuration files in YAML format. These can contain macros, rules, overrides and exceptions that are written in the condition syntax of the project.
An exemplary rule can be found in the official documentation:
- rule: shell_in_container
desc: notice shell activity within a container
condition: >
evt.type = execve and
evt.dir = < and
container.id != host and
(proc.name = bash or
proc.name = ksh)
output: >
shell in a container |
user=%user.name container_id=%container.id container_name=%container.name
shell=%proc.name parent=%proc.pname cmdline=%proc.cmdline
priority: WARNINGThe example contains the mandatory fields rule, desc, condition, output and priority. Each Falco rule consists of these (and optionally other) fields to define the behavior of the rule and the events to be monitored.
Depending on the event to be monitored, Falco provides various parameters and attributes with information on the monitored event, e.g. proc.name, container.id or evt.type. These attributes can also be reused in the output and thus used for auditing and notifying of events.
The observed attributes and enrichment of the events is carried out by the kernel module or eBPF probe used by the Falco agent. Both architectures allow the agent to ‘watch’ what other processes are doing on the system at syscall level. These observations can then be compared with the defined rules.
The condition of the above example means nothing other than:
evt.type = execve– The observed syscall isexecveevt.dir = <– The observed syscall is completely processedcontainer.id != host– The observed syscall originates from a container(proc.name = bash or proc.name = ksh)– the observed syscall spawns a process namedbashorksh
In addition, these logical queries can be logically compared and grouped as in the example. This means that any number of complex rules can be written for different scenarios, for example to properly monitor our cloud platform for anomalies at the hypervisor level.
Falco comes with a whole range of useful rules in the context of containers, Kubernetes and Linux servers in general.
Action and reaction: Sidekick and Talon
For our main reason for using Falco – auditability of events in our infrastructure – the setup described above is already sufficient. Falco provides most of the rules during installation, we add or adapt the rest, and during operation Fluent Bit writes the observed events to our OpenSearch.
But what if you want to react differently or more immediately to events? The Falco project provides accompanying software for this purpose:
- Falco SidekickA notification daemon for Falco that can send events to over 50 supported services, from Slack to SMS.
- Falco TalonA reaction engine for Falco that can be connected either to the software itself or to the Sidekick.
Sidekick is particularly useful if, unlike in our setup, you have several destinations for events: Sidekick could, for example, forward events simultaneously to a storage backend such as OpenSearch or Loki, create a message in a Slack or MS Teams channel via webhook, and send an SMS to the on-call colleague on duty.
Talon, on the other hand, enables a Falco setup to actively react to observed events. The tool is still in an early stage of development and some features are certainly still missing, but it is already able to terminate Pods in Kubernetes environments, create NetworkPolicies or trigger serverless functions in public cloud environments, for example.
“Just get started” with Falco
In contrast to other solutions in the area of real-time detection of anomalies or security risks, such as Tetragon, Falco offers an advantage that should not be underestimated: you can simply get started:
Thanks to software packages for various Linux distributions, container images and Helmchart, the security tool can be easily installed even in heterogeneous environments. The supplied rule catalog for a large number of supported events helps with your first steps and can be adapted, extended or overwritten as required. And thanks to a small ecosystem with tools such as Sidekick or Talon, you don’t have to reinvent the wheel just to be able to process events according to your requirements or even react to them proactively.
Of course, this does not mean that Falco is the only or most obvious solution for all conceivable scenarios. We opted for Falco to monitor our infrastructure so that we could get started quickly with a relatively small team and without much specific knowledge of the Linux kernel and syscalls.
However, if you have more time, know-how and resources available, Tetragon, which is also based on eBPF and is more finely configurable, may also be worth a look. This gives you more powerful options for reactions and proactive action, at the expense of generally applicable rules and simple configuration.
But that’s a topic for a future blog post.





0 Comments