A Beginner's Guide to Using the Prometheus Operator

Prometheus is a simple and effective open-source monitoring system. In the years after we published the article Monitoring Microservices with Prometheus, the system has graduated from the Cloud Native Computing Foundation (CNCF) and became the preferred monitoring tool for distributed systems. One of the reasons for this, as mentioned in our previous article, is its intuitive simplicity. It doesn’t try to do anything fancy. It provides a data store, data scrapers, an alerting mechanism and a very simple user interface.

Deploying Prometheus and the associated Alertmanger tool can be a complicated task, but there is tooling available to simplify and automate the process, such as the Prometheus Operator project.

In this blog post, we explain what operators are in general, how the Prometheus operator works and how to configure it to best use Prometheus and Alertmanager.

Operators

As stated in our article Kubernetes Operators Explained, operators are a kind of software extension to Kubernetes. They provide a consistent approach to handle all the application operational processes automatically, without any human intervention, which they achieve through close cooperation with the Kubernetes API.

Operators are built on two key principles of Kubernetes: Custom Resources (CRs), implemented here by way of Custom Resource Definitions (CRDs) and custom controllers. A CR is an extension of the Kubernetes API that provides a place where you can store and retrieve structured data—the desired state of your application. Custom controllers are used to observe this CR and with the received information take actions to adjust the Kubernetes cluster to the desired state.

The Prometheus Operator

The main purpose of this operator is to simplify and automate the configuration and management of the Prometheus monitoring stack running on a Kubernetes cluster. Essentially it is a custom controller that monitors the new object types introduced through the following CRDs:

Prometheus: defines the desired Prometheus deployments as a StatefulSet
Alertmanager: defines a desired Alertmanager deployment
ServiceMonitor: declaratively specifies how groups of Kubernetes services should be monitored
PodMonitor: declaratively specifies how groups of pods should be monitored
Probe: declaratively specifies how groups of ingresses or static targets should be monitored
PrometheusRule: defines a desired set of Prometheus alerting and/or recording rules
AlertmanagerConfig: declaratively specifies subsections of the Alertmanager configuration

Why the Prometheus Operator

As expressed before, using the operator can drastically reduce the effort to configure, implement, and manage all the components of a Prometheus monitoring stack. It also provides dynamic updates of resources, like alerting and/or Prometheus rules, with no downtime.

Using the introduced CRDs is relatively straight forward, and a turn key solution for adopting operational best practices for this stack. Furthermore, this approach makes it possible to run multiple instances, even with different versions of Prometheus.

Using the Prometheus Operator

Prerequisites

To follow the examples shown in this post, it is necessary to meet the following requirements:

Kubernetes cluster: For testing purposes, we recommend using Kind to run a local cluster using Docker containers, Minikube can be used as an alternative
kubectl command-line tool: installed and configured to connect to the cluster
A web application exposing Prometheus metrics: We’re using the microservices-demo, which simulates the user-facing part of an e-commerce website and exposes a /metrics endpoint for each service. Follow the documentation to deploy it on the cluster

Deploy the Operator

We start by deploying the Prometheus Operator into the cluster. We have to create all the CRDs that define the Prometheus, Alertmanager, and ServiceMonitor abstractions used to configure the monitoring stack—as well as the Prometheus Operator controller and Service.

This can be done using the bundle.yaml file from the Prometheus Operator GitHub repository:

kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/master/bundle.yaml

First we verify that all the CRDs were created:

kubectl get crds

The output should be similar to this:

NAME                                        CREATED AT
alertmanagerconfigs.monitoring.coreos.com   2021-04-20T19:34:44Z
alertmanagers.monitoring.coreos.com         2021-04-20T19:34:57Z
podmonitors.monitoring.coreos.com           2021-04-20T19:35:00Z
probes.monitoring.coreos.com                2021-04-20T19:35:01Z
prometheuses.monitoring.coreos.com          2021-04-20T19:35:06Z
prometheusrules.monitoring.coreos.com       2021-04-20T19:35:11Z
servicemonitors.monitoring.coreos.com       2021-04-20T19:35:12Z
thanosrulers.monitoring.coreos.com          2021-04-20T19:35:14Z

Then we check that the operator was created in the current namespace (default) and the pod is in the Running state:

kubectl get deploy

NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
prometheus-operator   1/1     1            1           5m59s

kubectl get pods

NAME                                   READY   STATUS    RESTARTS   AGE
prometheus-operator-5b5887c64b-w7gqj   1/1     Running   0          6m1s

Finally, we confirm that the operator service has also been created:

kubectl get service

NAME                  TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
prometheus-operator   ClusterIP   None         <none>        8080/TCP   8m25s

RBAC Permissions

The Prometheus server needs access to the Kubernetes API to scrape targets and reach the Alertmanager clusters. Therefore, a ServiceAccount is required to provide access to those resources, which must be created and bound to a ClusterRole accordingly:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/metrics
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources:
  - configmaps
  verbs: ["get"]
- apiGroups:
  - networking.k8s.io
  resources:
  - ingresses
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: default

Add the above to a manifest file rbac.yaml, then apply:

kubectl apply -f rbac.yaml

Check that the role was created and bound to the ServiceAccount:

kubectl describe clusterrolebinding prometheus

Name:         prometheus
Labels:       <none>
Annotations:  <none>
Role:
  Kind:  ClusterRole
  Name:  prometheus
Subjects:
  Kind            Name        Namespace
  ----            ----        ---------
  ServiceAccount  prometheus  default

Deploy and Configure

Prometheus

After creating the Prometheus ServiceAccount and giving it access to the Kubernetes API, we can deploy the Prometheus instance.

Create a file prometheus.yaml with this content:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  serviceAccountName: prometheus
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  podMonitorSelector: {}
  resources:
    requests:
      memory: 400Mi

This manifest defines the serviceMonitorNamespaceSelector, serviceMonitorSelector and podMonitorSelector fields to specify which CRs to include. In this example, the {} value is used to match all the existing CRs. If we want for instance to match only the serviceMonitors in the sock-shop namespace we could use the following matchLabels value:

serviceMonitorNamespaceSelector:
    matchLabels:
      name: sock-shop

Apply the file:

kubectl apply -f prometheus.yaml

Check that the instance is in the Running state:

kubectl get prometheus

NAME         VERSION   REPLICAS   AGE
prometheus                        10s

kubectl get pods


NAME                                     READY   STATUS    RESTARTS   AGE
prometheus-prometheus-0                  2/2     Running   5          10s

A prometheus-operated service should also have been created:

kubectl get services


NAME                  TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
prometheus-operated   ClusterIP   None         <none>        9090/TCP   17s

Access the server by forwarding a local port to the service:

kubectl port-forward svc/prometheus-operated 9090:9090

ServiceMonitor

The operator uses ServiceMonitors to define a set of targets to be monitored by Prometheus. It uses label selectors to define which Services to monitor, the namespaces to look for, and the port on which the metrics are exposed.

Create a file service-monitor.yaml with the following content to add a ServiceMonitor so that the Prometheus server scrapes only its own metrics endpoints:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: prometheus
  labels:
    name: prometheus
spec:
  selector:
    matchLabels:
      operated-prometheus: "true"
  namespaceSelector:
    any: true
  endpoints:
    - port: web

The ServiceMonitor only matches services containing the operated-prometheus: "true" label which is added automatically to all the Prometheus instances and scrapes the port named web on all the underlying endpoints. As the namespaceSelector is set to any:true all the services in any namespace matching the selected labels are included.

After applying the manifest, the Prometheus endpoints that were picked up as scrape targets should be shown on the Prometheus UI targets page:

PodMonitor

There could be use cases that require scraping Pods directly, without direct association with services (for instance scraping sidecars). The operator also includes a PodMonitor CR, which is used to declaratively specify groups of pods that should be monitored.

As an example, we’re using the front-end app from the microservices-demo project, which, as we mentioned before, simulates the user-facing part of an e-commerce website that exposes a /metrics endpoint.

Define a PodMonitor in a manifest file podmonitor.yaml to select only this deployment pod from the sock-shop namespace. Even though it could be selected using a ServiceMonitor, we’ve used a targetPort field instead. This is because the pod exposes metrics on port 8079 and doesn’t include a port name:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: front-end
  labels:
    name: front-end
spec:
  namespaceSelector:
    matchNames:
      - sock-shop
  selector:
    matchLabels:
      name: front-end
  podMetricsEndpoints:
  - targetPort: 8079

The front-end endpoint should have been added as a Prometheus target:

Additional Scrape Configuration

It’s possible to append additional scrape configurations to the Prometheus instance via secret files. These files must follow the Prometheus configuration scheme and the user is responsible to make sure that they are valid.

An example of which you can see below, where an additional job is added to Prometheus to scrape the catalogue service endpoints. First generate the prometheus-additional-job.yaml file that declares the job to scrape the catalogue service:

\- job\_name: "catalogue"   
  static\_configs:     
    - targets: \["catalogue.sock-shop"\]

Then create the additional-scrape-configs.yaml secret file with the prometheus-additional-job.yaml content:

kubectl create secret generic additional-scrape-configs --from-file=prometheus-additional-job.yaml --dry-run=client -oyaml > additional-scrape-configs.yaml

Check the additional-scrape-configs.yaml content and apply:

apiVersion: v1
data:
  prometheus-additional-job.yaml: LSBqb2JfbmFtZTogImNhdGFsb2d1ZSIKICBzdGF0aWNfY29uZmlnczoKICAgIC0gdGFyZ2V0czogWyJjYXRhbG9ndWUuc29jay1zaG9wIl0K
kind: Secret
metadata:
  creationTimestamp: null
  name: additional-scrape-configs

 kubectl apply -f additional-scrape-configs.yaml

Finally the Prometheus instance must be edited to reference the additional configuration using the additionalScrapeConfigs field:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  serviceAccountName: prometheus
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  podMonitorSelector: {}
  additionalScrapeConfigs:
    name: additional-scrape-configs
    key: prometheus-additional-job.yaml
  resources:
    requests:
      memory: 400Mi
  enableAdminAPI: false

kubectl apply -f prometheus.yaml

Now the catalogue endpoint should also be listed as a Prometheus target:

Alertmanager

The Prometheus Operator also introduces an Alertmanager resource, which allows users to declaratively describe an Alertmanager cluster. It also adds an AlertmanagerConfig CR, which allows users to declaratively describe Alertmanager configurations.

First, create an alertmanager-config.yaml file to define an AlertmanagerConfig resource that sends notifications to a non-existent wechat receiver and its corresponding Secret file:

apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: config-alertmanager
  labels:
    alertmanagerConfig: socks-shop
spec:
  route:
    groupBy: ['job']
    groupWait: 30s
    groupInterval: 5m
    repeatInterval: 12h
    receiver: 'wechat-socks-shop'
  receivers:
  - name: 'wechat-socks-shop'
    wechatConfigs:
    - apiURL: 'http://wechatserver:8080/'
      corpID: 'wechat-corpid'
      apiSecret:
        name: 'wechat-config'
        key: 'apiSecret'
---
apiVersion: v1
kind: Secret
type: Opaque
metadata:
  name: wechat-config
data:
  apiSecret: cGFzc3dvcmQK

Apply the above manifest:

kubectl apply -f alertmanager-config.yaml

Then create the alertmanager.yaml file to define the Alertmanager cluster:

apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
  name: socks-shop
spec:
  replicas: 1
  alertmanagerConfigSelector:
    matchLabels:
      alertmanagerConfig: example

The alertmanagerConfigSelector field is used to select the correct AlertmanagerConfig.

Apply and check that it was created:

kubectl apply -f alertmanager.yaml

alertmanager.monitoring.coreos.com/socks-shop created

An alertmanager-operated service should have been created automatically, use it to access the Alertmanager web UI:

kubectl port-forward svc/alertmanager-operated 9093:9093

Now we have a fully functional Alertmanager cluster, but without any alerts fired against it. Alert rules can be added using the PrometheusRule custom resource to define the rules that are evaluated.

PrometheusRules

The PrometheusRule CR supports defining one or more RuleGroups. These groups consist of a set of rule objects that can represent either of the two types of rules supported by Prometheus, recording or alerting.

As an example, create the prometheus-rule.yaml file with the following PrometheusRule that will always trigger an alert:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  creationTimestamp: null
  labels:
    prometheus: socks-shop
    role: alert-rules
  name: prometheus-example-rules
spec:
  groups:
  - name: ./example.rules
    rules:
    - alert: ExampleAlert
      expr: vector(1)

Now that the Alertmanager cluster is running and an alert rule was created, we need to connect it to Prometheus. To do this, edit the Prometheus instance to specify the Alertmanager cluster to use and the alert rules to be mounted into it:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  serviceAccountName: prometheus
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  podMonitorSelector: {}
  additionalScrapeConfigs:
    name: additional-scrape-configs
    key: prometheus-additional-job.yaml
  resources:
    requests:
      memory: 400Mi
  enableAdminAPI: false
  alerting:
    alertmanagers:
    - namespace: default
      name: alertmanager-operated
      port: web
  ruleSelector:
    matchLabels:
      role: alert-rules
      prometheus: socks-shop

We should be able to see the alert firing in the Alertmanager UI and also in the Prometheus instance alerts section:

Final thoughts

As we mentioned at the beginning of this post, and saw in the earlier examples, using the Prometheus Operator can help to reduce the overhead of managing the components of Prometheus and Alertmanager in a fast, automated, and reliable way.

Despite all the advantages an operator provides, we have to remember that we are adding another layer of abstraction. With this comes a small increase in complexity, which can lead to unnoticed misconfigurations that could be harder to debug than a traditional static configuration.

A final thought, although the operator may be seen as introducing an extra layer of complexity, it is our opinion that this is a justified trade-off, and that the benefits far outweigh any potential downsides.