Troubleshoot Tencent Cloud CLS Logs with WorkBuddy Natural Language Commands

The slow part of an incident is often not the query itself. It is opening the console, choosing the region, finding the log topic, remembering the right CQL syntax, adjusting the time window, and then repeating the process when the first query is too broad.

The Tencent Cloud CLS assistant for WorkBuddy connects WorkBuddy to the CLS API so operators can use natural-language commands for log search, troubleshooting, resource checks, and alert operations. The point is not to hide the log platform. The point is to reduce the distance between an operational question and the exact CLS API or CQL query needed to answer it.

What the CLS assistant can do

Task	CLS capability involved	Typical natural-language request
Search error logs	`SearchLog` with CQL	"Find error logs in `default-topic` in `ap-guangzhou` at 6 PM on April 15."
Analyze error distribution	SQL-style aggregation	"Group errors by service and show the top categories."
Inspect log context	`DescribeLogContext`	"Expand this `DB_CONNECTION_TIMEOUT` log with two records before and after."
Diagnose collection path	Machine group and config APIs	"Check the machine group and collection configuration bound to this topic."
Complete vague requests	Topic and region discovery	"Check whether there were errors in the Guangzhou topic that has logs."

30-second error log search

Natural-language error search returns structured CLS log results through SearchLog and CQL.

A common incident command contains four pieces of information:

Element	Example
Region	`ap-guangzhou` or Guangzhou
Object	`default-topic` or a service/topic name
Time range	6 PM on April 15, last 30 minutes, today, last 24 hours
Action	Search errors, count failures, show recent logs

Example:

Query error logs in the default-topic topic in ap-guangzhou at 6 PM on April 15.

The assistant can call SearchLog, generate CQL for level:ERROR, and return structured results. For more specific investigations, add filters:

Only show timeout errors from payment-service in the last 30 minutes.
Find all requests with statusCode 500 and sort them by time descending.

This is useful for checking overnight errors before a morning standup, triaging a fresh alert, or pulling historical errors during a review.

Analyze error distribution

Aggregated error analysis groups failures by service, code, time trend, or multiple dimensions.

Seeing a list of errors is only the first step. During an incident, operators also need to know which service fails most, which error type dominates, and whether the trend is rising.

Analysis goal	Example command
Error classification	"Group by error code and show the top 10."
Time trend	"Show the hourly error-rate trend for the last 24 hours."
Cross-dimensional analysis	"Group by service and error level."

The assistant converts those requests into analysis queries, which makes the first pass faster for people who do not remember the exact SQL shape.

Inspect log context around one key record

Log context lookup centers a key error and returns surrounding records through DescribeLogContext.

After locating a key error, the next question is usually what happened immediately before and after it. The traditional path is to adjust the time range, page around timestamps, and compare records manually.

The natural-language path is shorter:

Expand the context of this DB_CONNECTION_TIMEOUT log, two records before and two records after.

The assistant can call DescribeLogContext and return a log stream centered on the target record. In the source scenario, this made it easier to connect the order-service processing delay, payment-service callback timeout, and api-gateway 502 response into one incident chain.

Diagnose collection-path problems

Collection-path diagnosis checks machine groups, machines, collection configs, and binding relationships.

When logs are missing, the problem may be in the machine group, agent state, collection config, or binding relationship. Instead of SSHing into machines one by one, ask the assistant to inspect the path:

Check the machine group and collection configuration for this log topic.

The assistant can run several CLS API checks in parallel:

API	What it checks
`DescribeMachineGroups`	Machine group list.
`DescribeMachines`	Agent online status for each machine.
`DescribeConfigs`	Collection configuration.
`DescribeMachineGroupConfigs`	Whether machine groups and configs are bound correctly.

This narrows a missing-log incident before deep manual debugging starts.

A four-element prompt pattern

Use this pattern when writing natural-language commands:

Element	Description	Example
Region	Cloud region or region code.	`ap-guangzhou`, Guangzhou, Shanghai, Beijing
Object	Topic name, service name, or log type.	`payment-topic`, payment-service
Time range	Recent window or exact period.	last 1 hour, today, last 7 days
Task	What to do with the logs.	search errors, count failures, inspect context

Complete example:

In ap-guangzhou, check payment-topic errors in the last hour, then count each error type.

Vague commands can also work when discovery is possible:

Check whether the Guangzhou topic with logs has recent errors.

The assistant can first list topics in the region and then inspect recent error logs.

Setup in three steps

The WorkBuddy skill marketplace provides the Tencent Cloud CLS assistant installation entry.

Install the skill from WorkBuddy. Open the WorkBuddy skill marketplace, search for the Tencent Cloud CLS assistant, and install it.
Configure Tencent Cloud credentials.

echo 'export TENCENTCLOUD_SECRET_ID="YOUR_SECRET_ID"' >> ~/.zshrc
echo 'export TENCENTCLOUD_SECRET_KEY="YOUR_SECRET_KEY"' >> ~/.zshrc
source ~/.zshrc

Verify the connection.

Show my log topics in the Guangzhou region.

If the assistant returns a topic list, the basic connection is ready.

When this workflow is strongest

Scenario	Why natural language helps
First response after an alert	The operator can start from intent instead of opening multiple console pages.
Repeated troubleshooting patterns	Common error search, aggregation, and context lookup become reusable prompts.
Less experienced CLS users	The assistant helps translate the operational task into CQL or API calls.
Collection-path triage	Multiple API checks can be grouped into one diagnostic request.

Keep high-risk decisions tied to retrieved evidence. The assistant should accelerate search and summarization, while the underlying logs and API results remain inspectable.

FAQ

Does this replace the CLS console?

No. It gives operators a faster entry point for common log search, aggregation, context lookup, and configuration checks. The console is still useful for deeper manual inspection.

What information should a good command include?

Include region, object, time range, and task. For example: ap-guangzhou, payment-topic, last hour, and count errors.

Which APIs matter most for missing-log diagnosis?

Start with DescribeMachineGroups, DescribeMachines, DescribeConfigs, and DescribeMachineGroupConfigs to check groups, agent state, configs, and bindings.

Why is log context lookup useful?

One error line rarely explains a failure. Context lookup returns nearby records, which helps reconstruct the sequence before and after the key error.

Troubleshoot Tencent Cloud CLS Logs with WorkBuddy Natural Language Commands

What the CLS assistant can do

30-second error log search

Analyze error distribution

Inspect log context around one key record

Diagnose collection-path problems

A four-element prompt pattern

Setup in three steps

When this workflow is strongest

FAQ

Does this replace the CLS console?

What information should a good command include?

Which APIs matter most for missing-log diagnosis?

Why is log context lookup useful?

Comments

More from this blog

Troubleshoot Kubernetes Events in TKE with Tencent Cloud CLS

Manage Cloud Product Logs from Tencent Cloud Advisor with CLS

Build a Large-Scale Observability Platform on Tencent Cloud CLS: The Beike Case

Detect Malicious IPs in Cloud Access Logs with Tencent Cloud CLS

Deliver CLS Logs to Tencent Cloud DLC for Spark-Based Analysis

Command Palette

What the CLS assistant can do

30-second error log search

Analyze error distribution

Inspect log context around one key record

Diagnose collection-path problems

A four-element prompt pattern

Setup in three steps

When this workflow is strongest

FAQ

Does this replace the CLS console?

What information should a good command include?

Which APIs matter most for missing-log diagnosis?

Why is log context lookup useful?

Comments

More from this blog