优雅的使用Prometheus Operator

前言

为了方便大家使用prometheus,Coreos出了提供了一个OperatorPrometheus Operator,而为了方便大家一站式的监控方案就有了项目kube-prometheus是一个脚本项目,它主要使用jsonnet写成,其作用呢就是模板+参数然后渲染出yaml文件集,主要是作用是提供一个开箱即用的监控栈,用于kubernetes集群的监控和应用程序的监控。
这个项目主要包括以下软件栈

  • Prometheus Operator
  • Highly available Prometheus
  • Highly available Alertmanager
  • Prometheus node-exporter
  • Prometheus Adapter for Kubernetes Metrics APIs
  • Kube-state-metrics
  • Grafana

说是开箱即用,确实也是我们只需要clone下来,然后kubectl apply ./manifests,manifests目录中生成的是预先生成的yaml描述文件,有诸多不方便的地方,比如说

  • 镜像仓库的地址都在gcr和query.io,这两个地址国内拉起来都费劲
  • 没有持久化存储promethus的数据

    安装编译软件

    下面呢我们就开始定制它成为我们想要的东西。这里他是使用的jsonnet渲染的模板文件。所以我们需要先安装jsonnet

  • MAC

    1
    brew install jsonnet
  • 还需要安装jb,安装也十分简单。go get即可。
    一般而言需要先设置你的代理信息,我这里设置为

    1
    2
    export http_proxy=http://127.0.0.1:1087
    export https_proxy=http://127.0.0.1:1087
  • 安装jb

    1
    go get github.com/jsonnet-bundler/jsonnet-bundler/cmd/jb
  • 将json编译成yaml文件我们需要用gojsontoyaml,这里安装一下

    1
    go get github.com/brancz/gojsontoyaml
  • 准备工作完成之后我们就可以初始化项目了。创建项目根目录

    1
    mkdir my-kube-prometheus; cd my-kube-prometheus
  • 初始化jb,如果写过node项目或者maven项目的都会知道,他们都有一个依赖描述的文件,而jb用的依赖描述文件叫jsonnetfile.json,这里init就会创建这个文件。

    1
    jb init
  • 初始化完成我们就可以添加kube-prometheus依赖进来了。根据你的网速,这需要一段时间,请耐心等待他完成。

    1
    jb install github.com/coreos/kube-prometheus/jsonnet/kube-prometheus@master
  • install成功之后的完整的vendor目录如下截图

替换为自己的私有仓库

  • 默认自带的镜像地址大多都托管在gcr.k8s.ioquery.io这两个仓库国内下起来都费劲。这里我们可以替换掉默认的仓库。

sync-to-internal-registry.jsonnet

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
local kp = import 'kube-prometheus/kube-prometheus.libsonnet';
local l = import 'kube-prometheus/lib/lib.libsonnet';
local config = kp._config;

local makeImages(config) = [
{
name: config.imageRepos[image],
tag: config.versions[image],
}
for image in std.objectFields(config.imageRepos)
];

local upstreamImage(image) = '%s:%s' % [image.name, image.tag];
local downstreamImage(registry, image) = '%s/%s:%s' % [registry, l.imageName(image.name), image.tag];

local pullPush(image, newRegistry) = [
'docker pull %s' % upstreamImage(image),
'docker tag %s %s' % [upstreamImage(image), downstreamImage(newRegistry, image)],
'docker push %s' % downstreamImage(newRegistry, image),
];

local images = makeImages(config);

local output(repository) = std.flattenArrays([
pullPush(image, repository)
for image in images
]);

function(repository='my-registry.com/repository')
std.join('\n', output(repository))
  • 生成镜像搬运脚本,repository 填写自己的仓库地址即可。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
$ jsonnet -J vendor -S --tla-str repository=freemanliu ./sync-to-internal-registry.jsonnet

docker pull k8s.gcr.io/addon-resizer:1.8.4
docker tag k8s.gcr.io/addon-resizer:1.8.4 freemanliu/addon-resizer:1.8.4
docker push freemanliu/addon-resizer:1.8.4
docker pull quay.io/prometheus/alertmanager:v0.18.0
docker tag quay.io/prometheus/alertmanager:v0.18.0 freemanliu/alertmanager:v0.18.0
docker push freemanliu/alertmanager:v0.18.0
docker pull quay.io/coreos/configmap-reload:v0.0.1
docker tag quay.io/coreos/configmap-reload:v0.0.1 freemanliu/configmap-reload:v0.0.1
docker push freemanliu/configmap-reload:v0.0.1
docker pull grafana/grafana:6.2.2
docker tag grafana/grafana:6.2.2 freemanliu/grafana:6.2.2
docker push freemanliu/grafana:6.2.2
docker pull quay.io/coreos/kube-rbac-proxy:v0.4.1
docker tag quay.io/coreos/kube-rbac-proxy:v0.4.1 freemanliu/kube-rbac-proxy:v0.4.1
docker push freemanliu/kube-rbac-proxy:v0.4.1
docker pull quay.io/coreos/kube-state-metrics:v1.7.2
docker tag quay.io/coreos/kube-state-metrics:v1.7.2 freemanliu/kube-state-metrics:v1.7.2
docker push freemanliu/kube-state-metrics:v1.7.2
docker pull quay.io/prometheus/node-exporter:v0.18.1
docker tag quay.io/prometheus/node-exporter:v0.18.1 freemanliu/node-exporter:v0.18.1
docker push freemanliu/node-exporter:v0.18.1
docker pull quay.io/prometheus/prometheus:v2.11.0
docker tag quay.io/prometheus/prometheus:v2.11.0 freemanliu/prometheus:v2.11.0
docker push freemanliu/prometheus:v2.11.0
docker pull quay.io/coreos/k8s-prometheus-adapter-amd64:v0.4.1
docker tag quay.io/coreos/k8s-prometheus-adapter-amd64:v0.4.1 freemanliu/k8s-prometheus-adapter-amd64:v0.4.1
docker push freemanliu/k8s-prometheus-adapter-amd64:v0.4.1
docker pull quay.io/coreos/prometheus-config-reloader:v0.32.0
docker tag quay.io/coreos/prometheus-config-reloader:v0.32.0 freemanliu/prometheus-config-reloader:v0.32.0
docker push freemanliu/prometheus-config-reloader:v0.32.0
docker pull quay.io/coreos/prometheus-operator:v0.32.0
docker tag quay.io/coreos/prometheus-operator:v0.32.0 freemanliu/prometheus-operator:v0.32.0
docker push freemanliu/prometheus-operator:v0.32.0

使用katacoda搬运

拿到docker脚本之后我们可以用https://www.katacoda.com/courses/container-runtimes/what-is-a-container提供的练习机,进行镜像的搬运,可以搬运到自己的docker仓库中。

生成yaml文件

example.jsonnet

  • 这里采用的kubeadm按照的导入kubeadm的lib即可,他会生成两个名为kube-controller-manager-prometheus-discoverykube-scheduler-prometheus-discovery的service,方便监控到kube-controllerkube-scheduler

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    local mixin = import 'kube-prometheus/kube-prometheus-config-mixins.libsonnet';
    local kp =
    (import 'kube-prometheus/kube-prometheus.libsonnet') +
    (import 'kube-prometheus/kube-prometheus-kubeadm.libsonnet') +
    (import 'kube-prometheus/kube-prometheus-anti-affinity.libsonnet') +
    {
    _config+:: {
    namespace: 'monitoring',
    prometheus+:: {
    // 那些ns需要授权给到prometheus。
    namespaces+: ['default',"kube-system","monitoring"],
    },
    },
    // 这里替换成自己的私有仓库地址前缀
    } + mixin.withImageRepository('freemanliu');

    { ['00namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
    { ['0prometheus-operator-' + name]: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
    { ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
    { ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
    { ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
    { ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
    { ['prometheus-adapter-' + name]: kp.prometheusAdapter[name] for name in std.objectFields(kp.prometheusAdapter) } +
    { ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) }
  • 这里我们用一个脚本完成

    build.sh

    1
    2
    3
    4
    5
    6
    7
    #!/usr/bin/env bash
    set -e
    set -x
    set -o pipefail
    rm -rf manifests
    mkdir manifests
    jsonnet -J vendor -m manifests "${1-example.jsonnet}" | xargs -I{} sh -c 'cat {} | gojsontoyaml > {}.yaml; rm -f {}' -- {}
  • 将上面的脚本放置到项目根目录。最终效果如下

  • 编译,生成yaml文件,根据你电脑的配置,需要稍等一会儿。

    1
    chmod +x ./build.sh && ./build.sh
  • 生成完成的结果存放在./manifests文件夹中。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    $ ls
    00namespace-namespace.yaml node-exporter-daemonset.yaml
    0prometheus-operator-0alertmanagerCustomResourceDefinition.yaml node-exporter-service.yaml
    0prometheus-operator-0podmonitorCustomResourceDefinition.yaml node-exporter-serviceAccount.yaml
    0prometheus-operator-0prometheusCustomResourceDefinition.yaml node-exporter-serviceMonitor.yaml
    0prometheus-operator-0prometheusruleCustomResourceDefinition.yaml prometheus-adapter-apiService.yaml
    0prometheus-operator-0servicemonitorCustomResourceDefinition.yaml prometheus-adapter-clusterRole.yaml
    0prometheus-operator-clusterRole.yaml prometheus-adapter-clusterRoleAggregatedMetricsReader.yaml
    0prometheus-operator-clusterRoleBinding.yaml prometheus-adapter-clusterRoleBinding.yaml
    0prometheus-operator-deployment.yaml prometheus-adapter-clusterRoleBindingDelegator.yaml
    0prometheus-operator-service.yaml prometheus-adapter-clusterRoleServerResources.yaml
    0prometheus-operator-serviceAccount.yaml prometheus-adapter-configMap.yaml
    0prometheus-operator-serviceMonitor.yaml prometheus-adapter-deployment.yaml
    alertmanager-alertmanager.yaml prometheus-adapter-roleBindingAuthReader.yaml
    alertmanager-secret.yaml prometheus-adapter-service.yaml
    alertmanager-service.yaml prometheus-adapter-serviceAccount.yaml
    alertmanager-serviceAccount.yaml prometheus-clusterRole.yaml
    alertmanager-serviceMonitor.yaml prometheus-clusterRoleBinding.yaml
    grafana-dashboardDatasources.yaml prometheus-kubeControllerManagerPrometheusDiscoveryService.yaml
    grafana-dashboardDefinitions.yaml prometheus-kubeSchedulerPrometheusDiscoveryService.yaml
    grafana-dashboardSources.yaml prometheus-prometheus.yaml
    grafana-deployment.yaml prometheus-roleBindingConfig.yaml
    grafana-service.yaml prometheus-roleBindingSpecificNamespaces.yaml
    grafana-serviceAccount.yaml prometheus-roleConfig.yaml
    grafana-serviceMonitor.yaml prometheus-roleSpecificNamespaces.yaml
    kube-state-metrics-clusterRole.yaml prometheus-rules.yaml
    kube-state-metrics-clusterRoleBinding.yaml prometheus-service.yaml
    kube-state-metrics-deployment.yaml prometheus-serviceAccount.yaml
    kube-state-metrics-role.yaml prometheus-serviceMonitor.yaml
    kube-state-metrics-roleBinding.yaml prometheus-serviceMonitorApiserver.yaml
    kube-state-metrics-service.yaml prometheus-serviceMonitorCoreDNS.yaml
    kube-state-metrics-serviceAccount.yaml prometheus-serviceMonitorKubeControllerManager.yaml
    kube-state-metrics-serviceMonitor.yaml prometheus-serviceMonitorKubeScheduler.yaml
    node-exporter-clusterRole.yaml prometheus-serviceMonitorKubelet.yaml
    node-exporter-clusterRoleBinding.yaml

持久化prometheus的数据

  • prometheus-operator支持两种存储方式。默认是emptyDir,也支持外挂PVC。

  • 标准的配置格式如下,可以看到和标准的STS的pvc并无二致,只需要配上你的storageClassName即可。
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    apiVersion: monitoring.coreos.com/v1
    kind: Prometheus
    metadata:
    name: persisted
    spec:
    storage:
    volumeClaimTemplate:
    spec:
    storageClassName: local-storage-promethues
    resources:
    requests:
    storage: 10Gi # 实际生产根据需要加大配置

Local PersistentVolume

  • 我们这里为了性能考虑直接使用LocalPV存储数据。默认情况下会启动2个副本的promethues,这里我们创建两个localpv即可。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    # local-pv-promethues.yaml
    apiVersion: v1
    kind: PersistentVolume
    metadata:
    name: local-pv0
    namespace: monitoring
    spec:
    capacity:
    storage: 10Gi # 实际生产根据需要加大配置
    accessModes:
    - ReadWriteOnce
    persistentVolumeReclaimPolicy: Retain
    storageClassName: local-storage-promethues
    local:
    path: /promethues-data
    nodeAffinity:
    required:
    nodeSelectorTerms:
    - matchExpressions:
    - key: kubernetes.io/hostname
    operator: In
    values:
    - node2 # 固定到node2
    ---
    apiVersion: v1
    kind: PersistentVolume
    metadata:
    name: local-pv1
    namespace: monitoring
    spec:
    capacity:
    storage: 10Gi # 实际生产根据需要加大配置
    accessModes:
    - ReadWriteOnce
    persistentVolumeReclaimPolicy: Retain
    storageClassName: local-storage-promethues
    local:
    path: /prome-data
    nodeAffinity:
    required:
    nodeSelectorTerms:
    - matchExpressions:
    - key: kubernetes.io/hostname
    operator: In
    values:
    - node3 # 固定到node3
  • 创建pv

    1
    kubectl apply -f local-pv-promethues.yaml
  • 查看pv,我这里因为部署了好了,所以状态为Bound,并且CLAIM也有值。你如果是初次部署那么STATUS的状态为Available

    1
    2
    3
    4
    $ kubectl get pv
    NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
    local-pv0 10Gi RWO Retain Bound monitoring/prometheus-k8s-db-prometheus-k8s-0 local-storage-promethues 23m
    local-pv1 10Gi RWO Retain Bound monitoring/prometheus-k8s-db-prometheus-k8s-1 local-storage-promethues 23m

创建完成之后我们需要去修改文件prometheus-prometheus.yaml文件,我们在文件的最后追加如下配置。这里的storageClassName需要和上面创建的pv的一致。

1
2
3
4
5
6
7
8
storage:
volumeClaimTemplate:
spec:
storageClassName: local-storage-promethues
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi # 实际生产根据需要加大配置

  • 修改完成之后完整的prometheus-prometheus.yaml配置如下
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    apiVersion: monitoring.coreos.com/v1
    kind: Prometheus
    metadata:
    labels:
    prometheus: k8s
    name: k8s
    namespace: monitoring
    spec:
    alerting:
    alertmanagers:
    - name: alertmanager-main
    namespace: monitoring
    port: web
    baseImage: freemanliu/prometheus
    nodeSelector:
    kubernetes.io/os: linux
    podMonitorSelector: {}
    replicas: 2
    resources:
    requests:
    memory: 400Mi
    ruleSelector:
    matchLabels:
    prometheus: k8s
    role: alert-rules
    securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
    serviceAccountName: prometheus-k8s
    serviceMonitorNamespaceSelector: {}
    serviceMonitorSelector: {}
    version: v2.11.0
    storage:
    volumeClaimTemplate:
    spec:
    storageClassName: local-storage-promethues
    accessModes: [ "ReadWriteOnce" ]
    resources:
    requests:
    storage: 10Gi # 实际生产根据需要加大配置

配置存储时限

*默认的存储时限是24h,如果你需要存储更多时间,比如一周请配置为1w

1
2
spec:
retention: "24h" # [0-9]+(ms|s|m|h|d|w|y)

  • 更多相关配置来自于https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#PrometheusSpec
  • 修改完成之后我们就可以部署了(这里你可以需要执行多次),请等到他执行完成,如果抛出error,请重复该步骤。

    1
    kubectl apply -f ./manifests
  • 查看pvc,可以看到一已经完成bound了。

    1
    2
    3
    4
    $ kubectl get pvc -nmonitoring
    NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
    prometheus-k8s-db-prometheus-k8s-0 Bound local-pv0 10Gi RWO local-storage-promethues 26m
    prometheus-k8s-db-prometheus-k8s-1 Bound local-pv1 10Gi RWO local-storage-promethues 26m
  • 查看pod

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    $ kubectl get po -nmonitoring
    NAME READY STATUS RESTARTS AGE
    alertmanager-main-0 2/2 Running 0 29m
    alertmanager-main-1 2/2 Running 0 29m
    alertmanager-main-2 2/2 Running 0 29m
    grafana-589f884c47-sqfnq 1/1 Running 0 29m
    kube-state-metrics-6c89574f57-xgggx 4/4 Running 0 27m
    node-exporter-7smvg 2/2 Running 0 29m
    node-exporter-8lnr2 2/2 Running 0 29m
    node-exporter-9z6mb 2/2 Running 0 29m
    node-exporter-c2wlf 2/2 Running 0 29m
    node-exporter-j5rzf 2/2 Running 0 29m
    node-exporter-ksdpr 2/2 Running 0 29m
    node-exporter-sdbqb 2/2 Running 0 29m
    node-exporter-znlnl 2/2 Running 0 29m
    prometheus-adapter-56b9677dc5-xgpws 1/1 Running 0 29m
    prometheus-k8s-0 3/3 Running 0 27m
    prometheus-k8s-1 3/3 Running 0 27m
    prometheus-operator-558945d695-r9xp6 1/1 Running 0 29m
  • 查看prometheus的SVCIP

    1
    2
    $ kubectl get svc -nmonitoring | grep prometheus-k8s
    prometheus-k8s ClusterIP 10.98.173.194 <none> 9090/TCP 30m

访问http://10.98.173.194:9090/targets

  • 查看grafana的SVCIP
1
2
$ kubectl get svc -nmonitoring | grep grafana
grafana ClusterIP 10.98.120.103 <none> 3000/TCP 39m
  • 访问http://10.98.120.103:3000 grafana的ui,可以看到默认的监控仪表盘。包括kubelet,kube-controller,api-server,等等。一应俱全。

Ingress

为了方便,我们可以通过域名来访问。

Grafana Ingress

  • 由于grafana自带了鉴权认证,我们可以直接使用它的认证方式,apply一下之后,我们就能用https://grafana.qingmu.io访问我们的仪表盘了
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    apiVersion: networking.k8s.io/v1beta1
    kind: Ingress
    metadata:
    name: grafana-ingress
    namespace: monitoring
    annotations:
    ingress.kubernetes.io/force-ssl-redirect: "true"
    spec:
    tls:
    - hosts:
    - grafana.qingmu.io
    secretName: qingmu-grafana-certs
    rules:
    - host: grafana.qingmu.io
    http:
    paths:
    - backend:
    serviceName: grafana
    servicePort: 3000

Promethues Ingress

  • Promethues没有自带鉴权我们为了安全起见呢。可以加一个basic的认证。
  • 生成一个auth认证需要的信息文件,root是我们的用户名,然后根据提示输入密码即可。

    1
    htpasswd -c auth root
  • 根据上面的提示我们会生成一个名为auth的文本文件。我们将这个文件提交到kubernetes集群中。

    1
    kubectl -n monitoring create secret generic basic-auth --from-file=auth
  • 查看文件内容

    1
    kubectl -nmonitoring get secret  basic-auth  -oyaml
  • 通过注解启用basic认证。apply一下之后,我们就能用prometheus.qingmu.io访问我们的仪表盘了

    secretName: qingmu-certs
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    apiVersion: networking.k8s.io/v1beta1
    kind: Ingress
    metadata:
    name: prometheus-ingress
    namespace: monitoring
    annotations:
    ingress.kubernetes.io/force-ssl-redirect: "true"
    nginx.ingress.kubernetes.io/auth-type: basic
    nginx.ingress.kubernetes.io/auth-secret: basic-auth
    nginx.ingress.kubernetes.io/auth-realm: "Authentication Required - root"
    spec:
    tls:
    - hosts:
    - prometheus.qingmu.io
    rules:
    - host: prometheus.qingmu.io
    http:
    paths:
    - backend:
    serviceName: prometheus-k8s
    servicePort: 9090

GITHUB

推荐文章