一、背景
一同事遇到客户在使用华为云CCE时,在一个pod里运行有多个进程,分别需要使用对应的prometheus exporter监控对应的数据。如:pod里同时运行的有nginx、mysql、php,三者都需要配置prometheus监控,在ECS虚拟机上部署是比较简单的,直接运行多个exporter程序,并在prometheus端进行配置就行了,不过k8s里会略有一些变化。
实现思路:有两种实现方法。
- 再运行一个进程或sidecar容器,该容器会将所有的exporter进行聚合处理。如 exporter-merger
- 另外一种就是硬编码在 prometheus 的 annotations 声明里。
方法2实现起来会相对复杂,方法1会有额外的资源开销,但实现起来比较简单。
二、制作测试镜像
对应的文件内容如下:
[root@ecs-82f5]~/make# ls
create_mysql_user.sql Dockerfile nginx_status.conf prometheus-mysqld-exporter start.sh
Dockerfile文件内容如下:
[root@ecs-82f5]~/make# more Dockerfile
FROM ubuntu:latest
RUN apt-get update \
&& apt-get -y install nginx prometheus-nginx-exporter mysql-server prometheus-mysqld-exporter
COPY nginx_status.conf /etc/nginx/sites-enabled/nginx_status.conf
COPY prometheus-mysqld-exporter /etc/default/prometheus-mysqld-exporter
COPY create_mysql_user.sql /tmp/create_mysql_user.sql
COPY start.sh /opt/start.sh
EXPOSE 80 9113 9104
ENTRYPOINT ["/bin/bash","/opt/start.sh"]
#ENTRYPOINT ["/bin/bash"]
对应的启动文件内容如下:
[root@ecs-82f5]~/make# cat start.sh
#!/bin/bash
# -------------------------------
set -e
service nginx start
/etc/init.d/prometheus-nginx-exporter start
#/usr/bin/prometheus-nginx-exporter &
#mkdir -p /nonexistent
service mysql start
# 下行一定要放后台执行,不然重启或stop以后start会报错
mysql < /tmp/create_mysql_user.sql &
export DATA_SOURCE_NAME="prometheus@unix(/run/mysqld/mysqld.sock)/"
/usr/bin/prometheus-mysqld-exporter
nginx.conf文件内容如下:
[root@ecs-82f5]~/make# cat nginx_status.conf
server {
listen 8080;
server_name localhost;
location /stub_status {
stub_status on;
access_log off;
}
}
创建用户的SQL文件内容如下:
[root@ecs-82f5]~/make# cat create_mysql_user.sql
CREATE USER prometheus@localhost IDENTIFIED BY 'StrongPassword';
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO prometheus@localhost;
FLUSH PRIVILEGES;
mysql exporter这里选择的是以sock方式配置,当然也可以选择使用用户名密码方式进行配置,不过选择密码方式,如果密码修改了,对应的配置文件里的信息也需要同步修改。
[root@ecs-82f5]~/make# cat prometheus-mysqld-exporter
ARGS=""
### Database authentication
#
# By default the DATABASE connection string will be read from
# the file specified with the -config.my-cnf parameter. For example:
# ARGS='--config.my-cnf /etc/mysql/debian.cnf'
#
# Note that SSL options can only be set using a cnf file.
# To set a connection string from the environment instead, set the
# DATA_SOURCE_NAME variable.
# To use UNIX domain sockets authentication with or without password:
# DATA_SOURCE_NAME="prometheus:nopassword@unix(/run/mysqld/mysqld.sock)/"
DATA_SOURCE_NAME="prometheus@unix(/run/mysqld/mysqld.sock)/"
# To use a TCP connection and password authentication:
# DATA_SOURCE_NAME="prometheus:password@(hostname:port)/dbname"
编译完成后,运行容器,对应的容器运行信息如下:
[root@ecs-82f5]~/make# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ec003b9ed2df 6c6463dd846d "/bin/bash /opt/star…" 5 hours ago Up 3 seconds 80/tcp, 9104/tcp, 9113/tcp wonderful_diffie
通过对应的IP+ 端口可以查看到监控信息:
curl http://172.16.0.135:9104/metrics
curl http://172.16.0.135:9113/metrics
三、CCE里运行并配合prometheus监控
上传SWR的步骤略过,因为比较简单,直接来看应用部署的yaml文件:
kind: Deployment
apiVersion: apps/v1
metadata:
name: mysqlnginx
namespace: default
generation: 4
labels:
appgroup: ''
version: v1
annotations:
deployment.kubernetes.io/revision: '4'
spec:
replicas: 1
selector:
matchLabels:
app: mysqlnginx
version: v1
template:
metadata:
creationTimestamp: null
labels:
app: mysqlnginx
version: v1
annotations:
metrics.alpha.kubernetes.io/custom-endpoints: '[{"api":"prometheus","path":"","port":"","names":""}]'
prometheus.io/mypath: /metrics
prometheus.io/myport: '9104'
prometheus.io/path: /metrics
prometheus.io/port: '9113'
prometheus.io/scrape: 'true'
spec:
containers:
- name: container-0
image: 'swr.la-north-2.myhuaweicloud.com/app1/nginxmysql:exporter'
resources:
limits:
cpu: '2'
memory: 2Gi
requests:
cpu: 250m
memory: 512Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
securityContext: {}
imagePullSecrets:
- name: default-secret
affinity: {}
schedulerName: default-scheduler
tolerations:
- key: node.kubernetes.io/not-ready
operator: Exists
effect: NoExecute
tolerationSeconds: 300
- key: node.kubernetes.io/unreachable
operator: Exists
effect: NoExecute
tolerationSeconds: 300
dnsConfig:
options:
- name: timeout
value: ''
- name: ndots
value: '5'
- name: single-request-reopen
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 0
revisionHistoryLimit: 10
progressDeadlineSeconds: 600
注意这里的prometheus.io部分,这里有两个端口,可以写多个,如果对应的path路径不一样,也可以写多个。
接下来修改prometheus的configmap文件内容,由于CCE里有插件prometheus,安装完后,可以通在configmap的最后增加如下部分:
- job_name: kubernetes-pods1
honor_timestamps: true
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: pod
tls_config:
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
separator: ;
regex: nginx-mysql //注意这里使用的是CCE里创建时使用的名称
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_mypath]
separator: ;
regex: (.+)
target_label: __metrics_path__
replacement: $1
action: replace
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_myport]
separator: ;
regex: ([^:]+)(?::\d+)?;(\d+)
target_label: __address__
replacement: $1:$2
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.+)
replacement: $1
action: drop
metric_relabel_configs:
- source_labels: [__name__]
separator: ;
regex: kube_node_labels
replacement: $1
action: drop
这里可以看到,我加了prometheus_io_myport部分。这个区配信息使用了精准匹配,如果不使用精准匹配会匹配上的更多,可以直接使用kubernetes-pods的策略复制,只修改mypath和myport部分对比查看prometheus上对应的 targets、Configuration
的变化。
有用的信息可以通过下图中的信息获取:
这部分也可以参考github上的讨论:https://github.com/prometheus/prometheus/issues/3756
需要注意的是configmap文件修改后,并未直接生效,这点感觉有点不正常。因为默认华为prometheus插件配置的有自动加载程序:
[root@qqqq-84409 ~]# kubectl exec -it prometheus-0 -n monitoring -- /bin/bash
Defaulted container "prometheus-server-configmap-reload" out of: prometheus-server-configmap-reload, prometheus-server
OCI runtime exec failed: exec failed: container_linux.go:330: starting container process caused "exec: \"/bin/bash\": stat /bin/bash: no such file or directory": unknown
command terminated with exit code 126
这里是sidecar方式运行的prometheus-server,这里我特意没有指定container。可以看到有两个containers --- prometheus-server-configmap-reload, prometheus-server
。
因为更新configmap信息后,通过以下命令查看日志并未查看到信息更新:
[root@qqqq-84409 ~]# kubectl logs -f prometheus-0 -c prometheus-server -n monitoring
不过这可难不道老运维,参看我之前的博文: 如何重载Prometheus配置
通过以下两种任一种方式都可以重载配置:
curl -X POST IP:9090/-/reload
kill -HUP prometheus-server-pid
重载完配置,可以通过prometheus的web管理界面里的config(status下)确认配置是否已更新生效。然后再在target里确认两个端口对应的信息都已获取到:
四、最后
以上的配置方法除了适用于默认单Pod里单container(里面多运行多进程)之外,也适用于单Pod里多containers(大于等于2)的情况,两种情况我都进行了测试,发现都可以正常使用,不过自定义配置文件的方法相对复杂,技术门槛要求高,通过插件聚合的方式实现起来简单一些。