介绍
kube-prometheus是读取Metrcs、etcd、api的其中数据。
查看etcd的metrics输出信息
1 | # curl --cacert /etc/kubernetes/ssl/ca.pem --cert /etc/etcd/ssl/etcd.pem --key /etc/etcd/ssl/etcd-key.pem https://172.21.17.30:2379/metrics |
查看kube-apiserver的metrics信息
1 | # kubectl get --raw /metrics |
下载官方的yaml文件
1 | # git clone https://github.com/coreos/kube-prometheus |
部署
部署前需要修改文件;
创建监控etcd secret
etcd 监控要用到证书同时需要修改prometheus-prometheus.yaml。
1 | # kubectl -n monitoring create secret generic etcd-certs --from-file=/etc/kubernetes/ssl/ca.pem --from-file=etcd-key.pem --from-file=etcd.pem |
修改prometheus-prometheus.yaml
1 | # cd prometheus/ |
storageclass.kubernetes.io/is-default-class: true
是设置的默认动态存储,可以参考kube-nfs-动态存储
部署应用
部署前吧adapter 目录下面的 prometheus-adapter-apiService.yaml
重命名,因为前面安装了metrics。如果这里在覆盖安装,就会导致metrics.k8s.io
报错。
1 |
|
查看部署状态
1 | # kubectl get all -n monitoring |
配置ingress
1 | # cat >ingress-monitor.yaml <<EOF |
在浏览器打开域名即可访问
常用应用监控
kubernetes 自身常见的监控有kube-apiserver、kube-scheduler、kube-controller-manager、etcd。node节点常见的有kubelet、kube-proxy。在serviceMonitor目录下面默认的文件只能满足kube-apiserver、kubelet两个,其他的修改单独修改文件才能监控。
上面阐述的是集群是二进制方式安装,而不是以pod形式进行安装。
kube-scheduler监控
kube-scheduler的service、endpoints不在kubernetes集群内,可以通过kubectl get ep -n kube-system
进行查看,修改 prometheus-serviceMonitorKubeScheduler.yaml
,在该文件添加如下内容或者新起一个文件
1 | # cat >> prometheus-serviceMonitorKubeScheduler.yaml <<EOF |
kube-controller-manager 监控
kube-controller-manager修改,因为kubernetes 集群是采用ssl证书安装,默认的kube-controller-manager是没有使用ssl加密的,所以这里需要使用ssl证书,及https,否则不能监控。就会提示什么403、x509、400错误。
prometheus-serviceMonitorKubeControllerManager.yaml 修改
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28# cat prometheus-serviceMonitorKubeControllerManager.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
k8s-app: kube-controller-manager
name: kube-controller-manager
namespace: monitoring
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 30s
port: https-metrics
scheme: https
tlsConfig:
insecureSkipVerify: true
metricRelabelings:
- action: drop
regex: etcd_(debugging|disk|request|server).*
sourceLabels:
- __name__
jobLabel: k8s-app
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
k8s-app: kube-controller-manager新建kube-controller-manager-service.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35# cat >kube-controller-manager-service.yaml <<EOF
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: kube-controller-manager
name: kube-controller-manager
namespace: kube-system
spec:
clusterIP: None
ports:
- name: https-metrics
port: 10252
protocol: TCP
targetPort: 10252
sessionAffinity: None
type: ClusterIP
---
apiVersion: v1
kind: Endpoints
metadata:
labels:
k8s-app: kube-controller-manager
name: kube-controller-manager
namespace: kube-system
subsets:
- addresses:
- ip: 172.21.17.30
- ip: 172.21.17.31
- ip: 172.21.16.110
ports:
- name: https-metrics
port: 10252
protocol: TCP
EOF执行创建
1
2# kubectl apply -f prometheus-serviceMonitorKubeControllerManager.yaml
# kubectl apply -f kube-controller-manager-service.yaml
etcd 监控
etcd 不在k8s集群内部所以要创建Endpoints、Service
- prometheus-serviceMonitoretcd.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61# cat > prometheus-serviceMonitoretcd.yaml<<EOF
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: etcd
name: etcd
namespace: kube-system
spec:
clusterIP: None
ports:
- name: https-metrics
port: 2379
protocol: TCP
targetPort: 2379
sessionAffinity: None
type: ClusterIP
---
apiVersion: v1
kind: Endpoints
metadata:
labels:
k8s-app: etcd
name: etcd
namespace: kube-system
subsets:
- addresses:
- ip: 172.21.17.30
- ip: 172.21.17.31
- ip: 172.21.16.110
ports:
- name: https-metrics
port: 2379
protocol: TCP
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: etcd
name: etcd
namespace: monitoring
spec:
endpoints:
- interval: 10s
port: https-metrics
scheme: https
tlsConfig:
caFile: /etc/prometheus/secrets/etcd-certs/ca.pem
certFile: /etc/prometheus/secrets/etcd-certs/etcd.pem
keyFile: /etc/prometheus/secrets/etcd-certs/etcd-key.pem
insecureSkipVerify: true
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
k8s-app: etcd
EOF
# kubectl apply -f prometheus-serviceMonitoretcd.yaml
kube-proxy 监控
kube-proxy的metrics收集端口为10249,可以查看kub-proxy的安装文档。使用的是http方式,不需要ssl加密
新建 kube-proxy.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35# cat > kube-proxy.yaml <<EOF
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: kube-proxy
name: kube-proxy
namespace: kube-system
spec:
clusterIP: None
ports:
- name: http-metrics
port: 10249
protocol: TCP
targetPort: 10249
sessionAffinity: None
type: ClusterIP
---
apiVersion: v1
kind: Endpoints
metadata:
labels:
k8s-app: kube-proxy
name: kube-proxy
namespace: kube-system
subsets:
- addresses:
- ip: 172.21.16.204
- ip: 172.21.16.231
- ip: ……
ports:
- name: http-metrics
port: 10249
protocol: TCP
EOF新建 prometheus-serviceMonitorProxy.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21# cat > prometheus-serviceMonitorProxy.yaml <<EOF
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
k8s-app: kube-proxy
name: kube-proxy
namespace: monitoring
spec:
endpoints:
- interval: 30s
port: http-metrics
jobLabel: k8s-app
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
k8s-app: kube-proxy
EOF执行创建
1
2# kubectl apply -f prometheus-serviceMonitorProxy.yaml
# kubectl apply -f kube-proxy.yaml
traefik 监控
- 新建prometheus-serviceMonitorTraefix.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23# cat > prometheus-serviceMonitorTraefix.yaml <<EOF
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
k8s-app: traefik-ingress
name: traefik-ingress
namespace: monitoring
spec:
jobLabel: k8s-app
endpoints:
- port: admin #---设置为traefik 8080端口名称 admin
interval: 30s
selector:
matchLabels:
k8s-app: traefik-ingress
namespaceSelector:
matchNames:
- kube-system
EOF
# kubectl apply -f prometheus-serviceMonitorTraefix.yaml
前提是能打开traefix 的metrics页面,跟着我前面的文档安装,默认是开启的。
grafana 修改
grafana默认安装后,需要安装插件,否则饼状图无法显示。而且我们还需要倒入官方的一些dashbord 模版,默认grafana安装如果pod 重建之后什么都没有了,这时候我们需要建立一个pvc,吧数据保存到磁盘里面,即使grafana重建之后数据还在。不受任何影响。
新建grafana-pvc.yaml
- grafana-pvc.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16# cat > grafana-pvc.yaml <<EOF
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: grafana-pvc
namespace: monitoring
spec:
accessModes:
- ReadWriteMany
storageClassName: xxlaila-nfs-storage
resources:
requests:
storage: 5Gi
EOF
# kubectl apply -f grafana-pvc.yaml
修改granfana-deployment.yaml
- granfana-deployment.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34# 修改
volumes:
#- emptyDir: {}
- name: grafana-storage
persistentVolumeClaim:
claimName: grafana-pvc
- name: grafana-datasources
# 新增
- mountPath: /grafana-dashboard-definitions/0/grafana-dashboard-k8s-traefik-ingress
name: grafana-dashboard-k8s-traefik-ingress
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/grafana-dashboard-k8s-etcd-clusters-as-service
name: grafana-dashboard-k8s-etcd-clusters-as-service
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/grafana-dashboard-k8s-etcd-cluster-as-pod
name: grafana-dashboard-k8s-etcd-cluster-as-pod
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/grafana-dashboard-k8s-etcd-server
name: grafana-dashboard-k8s-etcd-server
readOnly: false
# 新增
- configMap:
name: grafana-dashboard-k8s-etcd-clusters-as-service
name: grafana-dashboard-k8s-etcd-clusters-as-service
- configMap:
name: grafana-dashboard-k8s-etcd-cluster-as-pod
name: grafana-dashboard-k8s-etcd-cluster-as-pod
- configMap:
name: grafana-dashboard-k8s-etcd-server
name: grafana-dashboard-k8s-etcd-server
- configMap:
name: grafana-dashboard-k8s-traefik-ingress
name: grafana-dashboard-k8s-traefik-ingress
上述新增值需要吧dashbord的模版倒入grafana-dashboardDefinitions.yaml文件里面,格式可以参考里面的格式,记住数据库需要修改,否则无法链接数据库,dashbord无法显示。
查看service、endpoints
1 | # kubectl get svc,endpoints -n kube-system |
查看接口信息
1 | # kubectl api-versions| grep monitoring |