K8S中使用自定义指标进行HPA扩缩容

k8s的默认扩缩容使用的是metrics-server来实现的,一般自指义指标(custom HPA)通过prometheus和prometheus-adapter来实现的。Prometheus 用于监控应用的负载和集群本身的各种指标,Prometheus Adapter 可以帮我们使用 Prometheus 收集的指标并使用它们来制定扩展策略,这些指标都是通过 APIServer 暴露的。
prometheus-hpa

一、环境准备

这里还是选用的华为CCE,免去安装k8s、Prometheus、prometheus-adapter的过程中(虽然并不复杂,但是和点击几下更省事),装完勾选上Prometheus插件,两者就都装好了。
prometheus-adapter

这里选用的测试镜像使用的是nginx和nginx-exporter,这个可以参看之前的《K8S中使用Prometheus监控nginx指标》。

通过以下指令确认相关信息:

# 确认nginx应用已经运行
[root@testcce-68506-l3jp4 nginx]# kubectl get pods -o wide
NAME                             READY   STATUS    RESTARTS   AGE    IP             NODE            NOMINATED NODE   READINESS GATES
nginx-exporter-5dc4dcd94-6tdp7   2/2     Running   0          37m    172.16.0.135   192.168.0.211              

# 确认可以通过链接可以正常获取metrics数据
[root@testcce-68506-l3jp4 nginx]# curl 172.16.0.135:9113/metrics
# TYPE nginx_http_requests_total counter
nginx_http_requests_total 198  //可以看到该项配置

上面nginx_http_requests_total这项就是后面我们HPA要用的指标。

确认k8s支持自定义指标:

# kubectl get apiservices
# kubectl get apiservices v1beta1.custom.metrics.k8s.io
# kubectl api-resources
# kubectl api-resources|grep metrics.k8s.io

二、自定义策略、配置HPA

配置自定义prometheus-adapter-config配置

获取当前adapter-config的策略配置

kubectl -n monitoring get configmaps adapter-config -o yaml > rule.yaml

编辑该配置,增加自定义的参数配置,在rules下面增加自定义的配置部分:

apiVersion: v1
data:
  config.yaml: |-
    rules:
    - seriesQuery: '{__name__=~"^http_requests_.*",kubernetes_pod_name!="",kubernetes_namespace!=""}'  //这里也可以精确使用nginx_http_requests_total
      resources:
        overrides:
          kubernetes_namespace:
            resource: namespace
          kubernetes_pod_name:
            resource: pod
      name:
        matches: ^(.*)_total$
        as: "${1}_per_second"
      metricsQuery: (sum(rate(<<.Series>>{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>))

使用的时候需要删除以下四行内容,不然会报错Operation cannot be fulfilled on configmaps "ads-central-configuration": the object has been modified; please apply your changes to the latest version and try again**

creationTimestamp:
resourceVersion:
selfLink:
uid:

获取所有的自定义指标,理论应该能看到nginx_http_requests_per_second(因为matches as进行了替换,不过在CCE上比较奇怪的是后面一部分没替换上,显示的名字是nginx_http_requests,只正则了前面一部分:broken_heart:):

kubectl get --raw="/apis/custom.metrics.k8s.io/v1beta1"

上面指令输出比较多,使用这个命令可以格式化输出(后面也可以加管道jq. 或 python -m json.tool格式化查看)。

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/nginx_http_requests" | python -m json.tool

配置HPA

[root@testcce-68506-l3jp4 nginx]# more hpa.yaml
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-custom-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-exporter
  minReplicas: 2
  maxReplicas: 5
  metrics:
  - type: Pods
    pods:
      metricName: nginx_http_requests
      targetAverageValue: 10

配置HPA策略,应用后可以通过watch 'kubectl get hpa'kubectl describe hpa nginx-custom-hpa 查看详细信息。

[root@testcce-68506-l3jp4 nginx]# kubectl get hpa
NAME               REFERENCE                   TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
nginx-custom-hpa   Deployment/nginx-exporter   88m/10    2         5         2          3h39m

注意上面targets里的88m部分,1000m代表1,这个是和平时配置CPU使用配额部分是一样的,这个代码每秒一次请求。

验证测试

可以通过kubectl expose deployment nginx-exporter --type=NodePort --name=nginx-nodeport --port=80进行服务暴漏,通过以下方式进行访问:

[root@testcce-68506-l3jp4 nginx]# kubectl describe svc nginx-nodeport
Name:                     nginx-nodeport
Namespace:                default
Labels:                   
Annotations:              
Selector:                 app=nginx-exporter
Type:                     NodePort
IP:                       10.247.52.252
Port:                       80/TCP
TargetPort:               80/TCP
NodePort:                   31029/TCP
Endpoints:                172.16.0.130:80,172.16.0.131:80
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   
[root@testcce-68506-l3jp4 nginx]# kubectl get nodes
NAME            STATUS   ROLES    AGE     VERSION
192.168.0.211   Ready       5h11m   v1.19.10-r0-CCE21.11.1.B005-21.11.1.B005
192.168.0.241   Ready       5h10m   v1.19.10-r0-CCE21.11.1.B005-21.11.1.B005
[root@testcce-68506-l3jp4 nginx]# curl 192.168.0.211:31029



Welcome to nginx!



Welcome to nginx!

If you see this page, the nginx web server is successfully installed and working. Further configuration is required. For online documentation and support please refer to nginx.org.
Commercial support is available at nginx.com. Thank you for using nginx.

接下来写一个简单的while循环进行压测:

[root@testcce-68506-l3jp4 nginx]# while true;do curl 192.168.0.211:31029;done
# 另开一个终端,可以通过watch 'kubectl get hpa'查看变化过程
[root@testcce-68506-l3jp4 ~]# kubectl get hpa
NAME               REFERENCE                   TARGETS     MINPODS   MAXPODS   REPLICAS   AGE
nginx-custom-hpa   Deployment/nginx-exporter   25757m/10   2         5         2          3h49m
[root@testcce-68506-l3jp4 ~]# kubectl get hpa
NAME               REFERENCE                   TARGETS     MINPODS   MAXPODS   REPLICAS   AGE
nginx-custom-hpa   Deployment/nginx-exporter   55741m/10   2         5         4          3h49m
[root@testcce-68506-l3jp4 ~]# kubectl get hpa
NAME               REFERENCE                   TARGETS     MINPODS   MAXPODS   REPLICAS   AGE
nginx-custom-hpa   Deployment/nginx-exporter   55741m/10   2         5         4          3h49m
[root@testcce-68506-l3jp4 ~]# kubectl get pods
NAME                             READY   STATUS    RESTARTS   AGE
nginx-exporter-5dc4dcd94-6tdp7   2/2     Running   2          3h49m
nginx-exporter-5dc4dcd94-99vq5   2/2     Running   2          3h47m
nginx-exporter-5dc4dcd94-kzcnx   2/2     Running   0          35s
nginx-exporter-5dc4dcd94-lfwvp   2/2     Running   0          35s
nginx-exporter-5dc4dcd94-mmmsv   2/2     Running   0          20s
web-terminal-6f975b97d7-6qrrf    1/1     Running   1          5h9m

最后可以看到nginx-exporter变成了5个pod后就不再增加了。

不过需要注意的是,缩容没那么快,需要等5分钟后(300秒),这个是由behavior字段控制的:

behavior:
  scaleDown:
    stabilizationWindowSeconds: 300
    policies:
    - type: Percent
      value: 100
      periodSeconds: 15
  scaleUp:
    stabilizationWindowSeconds: 0
    policies:
    - type: Percent
      value: 100
      periodSeconds: 15
    - type: Pods
      value: 4
      periodSeconds: 15
    selectPolicy: Max

想要调整快速回收,也可以通过配置该项内容进行控制。具体也可以参看官方文档知乎上的说明

后记:

解决配置自定义prometheus-adapter-config配置 部分中正则不生效的问题。
通过查询了比较多的资料并频繁测试,总终发现每次更新完rule文件后,需要重启custom-metrics-apiserver服务才可以生效

1、我们可以在这里配置rule

cce-hpa-rule

更新rule配置:

update-configmap-rule

重启apiserver并生效,由于是容器化部署,可以通过删除容器,由k8s自行重建完成重启操作:

custom-metrics-apiserver

执行查看,结果如下:

➜  ~ kubectl get --raw="/apis/custom.metrics.k8s.io/v1beta1" | jq |grep nginx|grep per
      "name": "pods/nginx_http_requests_per_second",
      "name": "namespaces/nginx_http_requests_per_second",

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注