Troubleshoot Kubernetes Events in TKE with Tencent Cloud CLS

Kubernetes incidents often start as small state changes: a node reports disk pressure, a Pod is evicted, a scheduler decision fails, or a cluster autoscaler adds capacity. If those signals are only noticed after the workload is affected, the best troubleshooting window has already passed.

Tencent Kubernetes Engine events record those cluster state changes. Tencent Cloud CLS turns the event stream into searchable logs, dashboards, and operational views so teams can inspect what changed, when it changed, and which Kubernetes object was involved.

What Kubernetes Events Add To Cluster Operations

TKE supports Tencent Kubernetes Engine, Elastic Kubernetes Service, and TKE Edge. Across those deployment models, Kubernetes Events are a lightweight but high-value signal because they describe state transitions rather than only raw resource metrics.

The common fields are:

Event field	Meaning
`Type`	Usually `Normal` or `Warning`, with custom types possible.
`Involved Object`	The object related to the event, such as Pod, Deployment, or Node.
`Source`	The component that reported the event, such as Scheduler or Kubelet.
`Reason`	A short enum-style description used internally by components.
`Message`	The detailed event message.
`Count`	How many times the event occurred.

You can inspect similar information with kubectl describe, but centralizing it in CLS makes the data searchable across clusters and time windows. The useful mental model is: Kubernetes emits a state-change record, CLS stores it as a log event, and operators search by object, component, reason, message, count, and timestamp.

Enable One-Stop Event Search

CLS provides collection, storage, search, and analytics for Kubernetes event logs. In the TKE console, operators can open Cluster Operations -> Event Search and use two entry points:

Entry point	Best use
Event overview	Start from warning volume, affected resource objects, and event trends.
Global search	Run field-level queries and reconstruct a detailed event timeline.

The overview is the broad health view. It helps you see whether the cluster is dominated by node warnings, Pod scheduling issues, kubelet actions, or autoscaler decisions before narrowing the search.

Scenario 1: Find Why A Node Became Abnormal

When one node becomes unhealthy, start from the event overview and filter by the abnormal node name. In this example, the matching record is a node disk space insufficient event.

The timeline then shows that on 2020-11-25, node 172.16.18.13 entered an abnormal state because disk space was insufficient. After that, kubelet began evicting Pods from the node to reclaim local disk. The operational reading is straightforward: disk pressure appears first, kubelet eviction follows, and the next check should focus on node disk usage, eviction thresholds, and workload placement.

This is the useful part of event-based troubleshooting: the event stream connects the visible node condition to the component action that followed it.

Question	Event evidence to inspect
Which node changed state?	`event.involvedObject.name`
What was the immediate reason?	`event.reason` and `event.message`
Which component reported it?	`event.source.component`
Did it repeat?	`event.count` and event trend

Scenario 2: Reconstruct Cluster Autoscaler Expansion

For clusters with node pool autoscaling enabled, the cluster autoscaler can add or remove nodes based on workload pressure. If nodes are added automatically, the question becomes: which Pods triggered the expansion and why did it stop?

Use global search with the autoscaler component:

event.source.component:"cluster-autoscaler"

Then display fields such as event.reason, event.message, and event.involvedObject.name, and sort by log time descending. The result should read like an event ledger: each row connects a workload object, autoscaler decision, and message explaining whether a node was added, skipped, or blocked by a limit.

The event stream shows expansion around 2020-11-25 20:35:45. Three nginx Pods triggered the scale-out:

nginx-5dbf784b68-tq8rd
nginx-5dbf784b68-fpvbx
nginx-5dbf784b68-v9jv5

The cluster eventually added three nodes. Later expansion did not continue because the node pool had reached its maximum node count.

Practical Runbook

Open the TKE event search page.
Start with the event overview for broad health and warning distribution.
Filter by abnormal node, Pod, Deployment, or component name.
For autoscaling investigations, query cluster-autoscaler.
Add event.reason, event.message, and event.involvedObject.name to the visible fields.
Sort by log time descending to reconstruct the latest state transitions.
Use the event chain to decide whether the next action is node cleanup, Pod rescheduling, node pool limit adjustment, or deeper workload debugging.

FAQ

Are Kubernetes Events a replacement for metrics?

No. Metrics explain resource levels and trends. Events explain state changes and component decisions. They are strongest when used together.

Why send Events to CLS instead of only using `kubectl describe`?

CLS gives a central searchable history, dashboards, filtering, and field-level analysis across time windows. That is more practical when the problem spans multiple nodes or happened earlier.

Which event fields matter first during an incident?

Start with object name, source component, reason, message, count, and log time.

Troubleshoot Kubernetes Events in TKE with Tencent Cloud CLS

What Kubernetes Events Add To Cluster Operations

Enable One-Stop Event Search

Scenario 1: Find Why A Node Became Abnormal

Scenario 2: Reconstruct Cluster Autoscaler Expansion

Practical Runbook

FAQ

Are Kubernetes Events a replacement for metrics?

Why send Events to CLS instead of only using `kubectl describe`?

Which event fields matter first during an incident?

Comments

More from this blog

Manage Cloud Product Logs from Tencent Cloud Advisor with CLS

Build a Large-Scale Observability Platform on Tencent Cloud CLS: The Beike Case

Detect Malicious IPs in Cloud Access Logs with Tencent Cloud CLS

Deliver CLS Logs to Tencent Cloud DLC for Spark-Based Analysis

Command Palette

What Kubernetes Events Add To Cluster Operations

Enable One-Stop Event Search

Scenario 1: Find Why A Node Became Abnormal

Scenario 2: Reconstruct Cluster Autoscaler Expansion

Practical Runbook

FAQ

Are Kubernetes Events a replacement for metrics?

Why send Events to CLS instead of only using kubectl describe?

Which event fields matter first during an incident?

Comments

More from this blog

Why send Events to CLS instead of only using `kubectl describe`?