KubernetesResourceQoSClasses概念是什么

本文小編為大家詳細介紹“Kubernetes Resource QoS Classes概念是什么”，內(nèi)容詳細，步驟清晰，細節(jié)處理妥當，希望這篇“Kubernetes Resource QoS Classes概念是什么”文章能幫助大家解決疑惑，下面跟著小編的思路慢慢深入，一起來學習新知識吧。

創(chuàng)新互聯(lián)專注于屯留企業(yè)網(wǎng)站建設(shè),成都響應(yīng)式網(wǎng)站建設(shè)公司,商城網(wǎng)站建設(shè)。屯留網(wǎng)站建設(shè)公司,為屯留等地區(qū)提供建站服務(wù)。全流程按需定制開發(fā)，專業(yè)設(shè)計，全程項目跟蹤，創(chuàng)新互聯(lián)專業(yè)和態(tài)度為您提供的服務(wù)

Kubernetes Resource QoS Classes介紹

基本概念

Kubernetes根據(jù)Pod中Containers Resource的request和limit的值來定義Pod的QoS Class。其中，指定容器request，代表系統(tǒng)確保能夠提供的資源下限值。指定容器limit，代表系統(tǒng)允許提供的資源上限值。

Pods需要保證長期穩(wěn)定運行需要設(shè)定“確保運行的最少資源”，然而pod能夠使用的資源經(jīng)常是不能確保的。

通常，Kubernetes通過設(shè)置request和limit的值來指定超賣比例，進而提升資源利用率。K8S的調(diào)度基于request，而不是limit。Borg通過使用“non-guranteed”的資源，提升了20%的資源利用率。

在一個資源被“超賣”的系統(tǒng)(總limits > machine capacity)，容器在資源被耗盡的情況下會被kill。理想情況是那些“不重要”的容器先被kill。

對于每一種Resource都可以將容器分為3中QoS Classes: Guaranteed, Burstable, and Best-Effort，它們的QoS級別依次遞減。K8S底層實際上是通過 limit和request值來實現(xiàn)不同等級QoS的劃分。

Guaranteed 如果Pod中所有Container的所有Resource的limit和request都相等且不為0，則這個Pod的QoS Class就是Guaranteed。

注意，如果一個容器只指明了limit，而未指明request，則表明request的值等于limit的值。

Examples:
containers:
    name: foo
        resources:
            limits:
                cpu: 10m
                memory: 1Gi
    name: bar
        resources:
            limits:
                cpu: 100m
                memory: 100Mi
containers:
    name: foo
        resources:
            limits:
                cpu: 10m
                memory: 1Gi
            requests:
                cpu: 10m
                memory: 1Gi

    name: bar
        resources:
            limits:
                cpu: 100m
                memory: 100Mi
            requests:
                cpu: 100m
                memory: 100Mi

Best-Effort 如果Pod中所有容器的所有Resource的request和limit都沒有賦值，則這個Pod的QoS Class就是Best-Effort.

Examples:
containers:
    name: foo
        resources:
    name: bar
        resources:

Burstable 除了符合Guaranteed和Best-Effort的場景，其他場景的Pod QoS Class都屬于Burstable。當limit值未指定時，其有效值其實是對應(yīng)Node Resource的Capacity。

Examples: 容器bar沒有對Resource進行指定。

containers:
    name: foo
        resources:
            limits:
                cpu: 10m
                memory: 1Gi
            requests:
                cpu: 10m
                memory: 1Gi

    name: bar

容器foo和bar對不同的Resource進行了指定。

containers:
    name: foo
        resources:
            limits:
                memory: 1Gi

    name: bar
        resources:
            limits:
                cpu: 100m

容器foo未指定limit，容器bar未指定request和limit。

containers:
    name: foo
        resources:
            requests:
                cpu: 10m
                memory: 1Gi

    name: bar

可壓縮/不可壓縮資源的區(qū)別

kube-scheduler調(diào)度時，是基于Pod的request值進行Node Select完成調(diào)度的。Pod和它的所有Container都不允許Consume limit指定的有效值(if have)。

request和limit如何生效，依賴于資源是否是壓縮的

可壓縮資源的保證

目前僅支持CPU。
Pods確?？梢垣@取請求的CPU總量，但并不能獲得額外的CPU時間。這并不能完全確保容器能夠用到設(shè)置的資源下限值，因為CPU隔離是容器級別的。之后會引入Pod級別的cgroups資源隔離來解決這個問題。
過量/競爭使用CPU資源，會基于CPU request設(shè)置?？赏ㄟ^cpu.share來分派不同比例的時間片來理解，如果某個容器A的request 設(shè)置為600 milli，容器B設(shè)置為300mili , 兩者競爭CPU時間時，通過2：1的比例來分配。
如果達到Pod CPU資源limit上限，CPU會減速（throttled)，而不是kill pod。如果pod沒有設(shè)置limit上限，pods可以使用超過CPU limit上限。

不可壓縮資源的保證

目前僅支持內(nèi)存。
Pods可以拿到requests設(shè)置的內(nèi)存總量。如果某個pod超過memory request值，當其他pod需要內(nèi)存時，這個pod可能被kill掉。但是如果pods使用內(nèi)存少于request值，它們不會被kill，除非系統(tǒng)任務(wù)或daemon需要更多資源。（說白了，還是要看觸發(fā)oom killer時，遍歷系統(tǒng)上所有進程打分的情況。）
當Pods使用內(nèi)存超過了limit，某個在pod中容器內(nèi)進程使用了大量內(nèi)存，則該進程會被內(nèi)核kill掉.

管理和調(diào)度策略

Pods由kubelet 確認和 scheduler 調(diào)度，會基于分配給容器的requests值，確保所有容器的requests總量在Node可分配容量的范圍之內(nèi)。https://github.com/fabric8io/jenkinshift/blob/master/vendor/k8s.io/kubernetes/docs/proposals/node-allocatable.md

如何根據(jù)不同的QoS回收Resources

CPU 當CPU使用不能達到request值，比如系統(tǒng)任務(wù)和daemons使用了大量CPU，則Pods不會被kill，CPU效率會下降（throttled）。
Memory 內(nèi)存是不可壓縮資源，從內(nèi)存管理的角度做如下區(qū)分：

Best-Effort pods 優(yōu)先級最低。如果系統(tǒng)內(nèi)存耗盡，該類型的pods中的進程最先被kill。這些容器可以使用系統(tǒng)上任意量的空閑內(nèi)存。
Guaranteed pods 優(yōu)先級最高。它們能夠確保不達到容器設(shè)置的limit上限一定不會被kill。只有在系統(tǒng)存在內(nèi)存壓力且沒有更低優(yōu)先級容器時才被驅(qū)逐。
Burstable pods 有一些形式的最小資源保證，但當需要時可以使用更多資源。在系統(tǒng)存在內(nèi)存瓶頸時，一旦內(nèi)存超過他們的request值并且沒有Best-Effort 類型的容器存在，這些容器就先被kill掉。

Node上的OOM Score 配置

Pod OOM 打分配置

mm/oom_kill.c 中的badness()給每個進程一個OOM score，更高OOM得分的進程更容易被kill。得分取決于：

主要是看進程的內(nèi)存消耗情況，包括駐留內(nèi)存、pagetable和swap的使用

一般是內(nèi)存耗費的百分比*10（percent-times-ten）

參考用戶權(quán)限，比如root權(quán)限啟動的進程，打分會減少30。
OOM打分因子：/proc/pid/oom_score_adj (加減) 和 /proc/pid/oom_adj（乘除）

oom_adj： -15～ 15的系數(shù)調(diào)整
oom_score_adj：oom_score會加上oom_score_adj這個值
最終oom score的值還是在 0～1000

這里提供一個計算系統(tǒng)上oom_score分數(shù)TPO10進程（最容易被oom killer殺掉的進程）腳本：

# vim oomscore.sh
#!/bin/bash
for proc in $(find /proc -maxdepth 1 -regex '/proc/[0-9]+'); do
        printf "%2d %5d %s\n" \
                "$(cat $proc/oom_score)" \
                "$(basename $proc)" \
                "$(cat $proc/cmdline | tr '\0' ' ' | head -c 50)"
done 2>/dev/null | sort -nr | head -n 10

以下是幾種K8S QoS 等級的OOM score：

Best-effort

Set OOM_SCORE_ADJ: 1000
所以best-effort容器的OOM_SCORE 值為1000

Guaranteed

Set OOM_SCORE_ADJ: -998
所以guaranteed容器的OOM_SCORE 值為0 或 1

Burstable

如果總的memory request 大于 99.9%的可用內(nèi)存，OOM_SCORE_ADJ設(shè)置為 2。否則， OOM_SCORE_ADJ = 1000 - 10 * (% of memory requested)，這確保了burstable的 POD OOM_SCORE > 1
如果memory request設(shè)置為0，OOM_SCORE_ADJ 默認設(shè)置為999。所以如果burstable pods和guaranteed pods沖突時，前者會被kill。
如果burstable pod使用的內(nèi)存少于request值，那它的OOM_SCORE < 1000。如果best-effort pod和這些 burstable pod沖突時，best-effort pod會先被kill掉。
如果 burstable pod容器中進程使用比request值的內(nèi)存更多，OOM_SCORE設(shè)置為1000。反之，OOM_SCORES少于1000。
在一堆burstable pod中，使用內(nèi)存超過request值的pod，優(yōu)先于內(nèi)存使用少于request值的pod被kill。
如果 burstable pod 有多個進程沖突，則OOM_SCORE會被隨機設(shè)置，不受“request & limit”限制。

Pod infra containers or Special Pod init process

OOM_SCORE_ADJ: -998

Kubelet, Docker

OOM_SCORE_ADJ: -999 (won’t be OOM killed)
系統(tǒng)上的關(guān)鍵進程，如果和guranteed 進程沖突，則會優(yōu)先被kill 。將來會被放到一個單獨的cgroup中，并且限制內(nèi)存。

已知的issue和潛在優(yōu)化點

支持swap: 當前QoS策略默認swap關(guān)閉。如果開啟swap，那些guaranteed 容器資源使用達到limit值，還可以使用磁盤來提供內(nèi)存分配。最終，當swap空間不夠時，pod中的進程才會被kill.此時，node需要在提供隔離策略時，把swap空間考慮進去。
提供用戶指定優(yōu)先級：用戶讓kubelet指定哪些tasks可以被kill.

源碼分析

QoS的源碼位于：pkg/kubelet/qos，代碼非常簡單，主要就兩個文件pkg/kubelet/qos/policy.go,pkg/kubelet/qos/qos.go。上面討論的各個QoS Class對應(yīng)的OOM_SCORE_ADJ定義在：

pkg/kubelet/qos/policy.go:21

const (
        PodInfraOOMAdj        int = -998
        KubeletOOMScoreAdj    int = -999
        DockerOOMScoreAdj     int = -999
        KubeProxyOOMScoreAdj  int = -999
        guaranteedOOMScoreAdj int = -998
        besteffortOOMScoreAdj int = 1000
)

容器的OOM_SCORE_ADJ的計算方法定義在：

pkg/kubelet/qos/policy.go:40

func GetContainerOOMScoreAdjust(pod *v1.Pod, container *v1.Container, memoryCapacity int64) int {
        switch GetPodQOS(pod) {
        case Guaranteed:
                // Guaranteed containers should be the last to get killed.
                return guaranteedOOMScoreAdj
        case BestEffort:
                return besteffortOOMScoreAdj
        }

        // Burstable containers are a middle tier, between Guaranteed and Best-Effort. Ideally,
        // we want to protect Burstable containers that consume less memory than requested.
        // The formula below is a heuristic. A container requesting for 10% of a system's
        // memory will have an OOM score adjust of 900. If a process in container Y
        // uses over 10% of memory, its OOM score will be 1000. The idea is that containers
        // which use more than their request will have an OOM score of 1000 and will be prime
        // targets for OOM kills.
        // Note that this is a heuristic, it won't work if a container has many small processes.
        memoryRequest := container.Resources.Requests.Memory().Value()
        oomScoreAdjust := 1000 - (1000*memoryRequest)/memoryCapacity
        // A guaranteed pod using 100% of memory can have an OOM score of 10. Ensure
        // that burstable pods have a higher OOM score adjustment.
        if int(oomScoreAdjust) < (1000 + guaranteedOOMScoreAdj) {
                return (1000 + guaranteedOOMScoreAdj)
        }
        // Give burstable pods a higher chance of survival over besteffort pods.
        if int(oomScoreAdjust) == besteffortOOMScoreAdj {
                return int(oomScoreAdjust - 1)
        }
        return int(oomScoreAdjust)
}

獲取Pod的QoS Class的方法為：

pkg/kubelet/qos/qos.go:50

// GetPodQOS returns the QoS class of a pod.
// A pod is besteffort if none of its containers have specified any requests or limits.
// A pod is guaranteed only when requests and limits are specified for all the containers and they are equal.
// A pod is burstable if limits and requests do not match across all containers.
func GetPodQOS(pod *v1.Pod) QOSClass {
        requests := v1.ResourceList{}
        limits := v1.ResourceList{}
        zeroQuantity := resource.MustParse("0")
        isGuaranteed := true
        for _, container := range pod.Spec.Containers {
                // process requests
                for name, quantity := range container.Resources.Requests {
                        if !supportedQoSComputeResources.Has(string(name)) {
                                continue
                        }
                        if quantity.Cmp(zeroQuantity) == 1 {
                                delta := quantity.Copy()
                                if _, exists := requests[name]; !exists {
                                        requests[name] = *delta
                                } else {
                                        delta.Add(requests[name])
                                        requests[name] = *delta
                                }
                        }
                }
                // process limits
                qosLimitsFound := sets.NewString()
                for name, quantity := range container.Resources.Limits {
                        if !supportedQoSComputeResources.Has(string(name)) {
                                continue
                        }
                        if quantity.Cmp(zeroQuantity) == 1 {
                                qosLimitsFound.Insert(string(name))
                                delta := quantity.Copy()
                                if _, exists := limits[name]; !exists {
                                        limits[name] = *delta
                                } else {
                                        delta.Add(limits[name])
                                        limits[name] = *delta
                                }
                        }
                }

                if len(qosLimitsFound) != len(supportedQoSComputeResources) {
                        isGuaranteed = false
                }
        }
        if len(requests) == 0 && len(limits) == 0 {
                return BestEffort
        }
        // Check is requests match limits for all resources.
        if isGuaranteed {
                for name, req := range requests {
                        if lim, exists := limits[name]; !exists || lim.Cmp(req) != 0 {
                                isGuaranteed = false
                                break
                        }
                }
        }
        if isGuaranteed &&
                len(requests) == len(limits) {
                return Guaranteed
        }
        return Burstable
}

PodQoS會在eviction_manager和scheduler的Predicates階段被調(diào)用，也就說會在k8s處理超配和調(diào)度預(yù)選階段中被使用。

讀到這里，這篇“Kubernetes Resource QoS Classes概念是什么”文章已經(jīng)介紹完畢，想要掌握這篇文章的知識點還需要大家自己動手實踐使用過才能領(lǐng)會，如果想了解更多相關(guān)內(nèi)容的文章，歡迎關(guān)注創(chuàng)新互聯(lián)行業(yè)資訊頻道。

分享名稱：KubernetesResourceQoSClasses概念是什么
標題鏈接：http://weahome.cn/article/gehsgh.html

真实的国产乱ⅩXXX66竹夫人,五月香六月婷婷激情综合,亚洲日本VA一区二区三区,亚洲精品一区二区三区麻豆