Auto scaling is a service that automatically and economically adjusts service resources based on your service requirements and configured policies.
More and more applications are developed based on Kubernetes. It becomes increasingly important to quickly scale out applications on Kubernetes to cope with service peaks and to scale in applications during off-peak hours to save resources and reduce costs.
In a Kubernetes cluster, auto scaling involves pods and nodes. A pod is an application instance. Each pod contains one or more containers and runs on a node (VM or bare-metal server). If a cluster does not have sufficient nodes to run new pods, add nodes to the cluster to ensure service running.
In CCE, auto scaling is used for online services, large-scale computing and training, deep learning GPU or shared GPU training and inference, periodic load changes, and many other scenarios.
CCE supports auto scaling for workloads and nodes.
Workload Scaling Types
Type |
Component |
Component Description |
Reference |
---|---|---|---|
HPA |
HorizontalPodAutoscaler (built-in Kubernetes component) |
HorizontalPodAutoscaler is a built-in component of Kubernetes for Horizontal Pod Autoscaling (HPA). CCE incorporates the application-level cooldown time window and scaling threshold functions into Kubernetes HPA. |
|
CronHPA |
CronHPA can scale in or out a cluster at a fixed time. It can work with HPA policies to periodically adjust the HPA scaling scope, implementing workload scaling in complex scenarios. |
Node Scaling Types
Component Name |
Component Description |
Application Scenario |
Reference |
---|---|---|---|
An open source Kubernetes component for horizontal scaling of nodes, which is optimized by CCE in scheduling, auto scaling, and costs. |
Online services, deep learning, and large-scale computing with limited resource budgets |