Real-Time CDN Log Analysis with Tencent Cloud CLS
Use CDN logs in CLS to monitor latency percentiles, detect error spikes, analyze cache performance, and build operational

CDN logs are one of the fastest ways to understand whether a traffic spike is healthy growth, cache pressure, regional imbalance, or an application problem hiding behind edge traffic.
Default CDN monitoring usually covers basic metrics such as request count and bandwidth. That is useful, but it is not enough for interactive troubleshooting. If teams download raw CDN logs and analyze them offline, they often pay for extra infrastructure, delayed data, and slower response during incidents.
Tencent Cloud CDN can deliver access logs into Tencent Cloud Log Service (CLS). Once the logs are in CLS, teams can use search, SQL analysis, dashboards, and alerting for real-time CDN quality and performance monitoring.
Why Send CDN Logs to CLS?
The source workflow highlights four capabilities:
| Capability | Operational value |
|---|---|
| One-click log delivery | CDN access logs become available in CLS without building a separate offline ingestion pipeline. |
| Second-level analysis at large scale | Teams can run interactive queries instead of waiting for delayed batch processing. |
| Real-time dashboard visualization | CDN quality, cache behavior, bandwidth, and errors can be watched continuously. |
| One-minute alerting | Latency or error conditions can trigger notifications quickly. |
This makes CLS a better fit for scenarios such as real-time issue localization, fast validation, CDN alerting, and customized performance analysis.
CDN Log Fields Worth Indexing
Index the fields that answer performance, traffic, and error questions.
| Field | CLS type | Meaning |
|---|---|---|
app_id |
long |
Tencent Cloud account APPID. |
client_ip |
text |
Client IP. |
file_size |
long |
File size. |
hit |
text |
Cache HIT or MISS; edge-node and parent-node hits are both marked as HIT. |
host |
text |
Domain. |
http_code |
long |
HTTP status code. |
isp |
text |
ISP. |
method |
text |
HTTP method. |
param |
text |
URL parameters. |
proto |
text |
HTTP protocol identifier. |
prov |
text |
ISP province. |
referer |
text |
HTTP referer. |
request_range |
text |
Request range. |
request_time |
long |
Response time in milliseconds, from receiving the request to finishing the response to the client. |
request_port |
long |
Client-to-CDN-node connection port, or - when unavailable. |
rsp_size |
long |
Response size. |
time |
long |
Request time as a Unix timestamp in seconds. |
ua |
text |
User-Agent. |
url |
text |
Request path. |
uuid |
text |
Unique request identifier. |
version |
long |
CDN real-time log version. |
For operational dashboards, prioritize request_time, http_code, hit, host, url, client_ip, rsp_size, prov, and isp.
Monitor High CDN Latency with Percentiles
Averages hide tail latency. The source workflow recommends percentile-based monitoring, especially p99 latency, because a small number of slow requests can be smoothed away by average values.
Use a time-series query to compare average, p50, and p99 latency over a day-level window.
* |
select
avg(request_time) as l,
approx_percentile(request_time, 0.5) as p50,
approx_percentile(request_time, 0.99) as p99,
time_series(__TIMESTAMP__, '5m', '%Y-%m-%d %H:%i:%s', '0') as time
group by time
order by time desc
limit 1440
For alerting, reduce the query to the signal the rule needs:
* |
select
approx_percentile(request_time, 0.99) as p99
A practical alert condition is p99 latency greater than a threshold such as 100 ms, with the affected host, url, and client_ip included in the notification through multidimensional analysis.
Detect Error Spikes Minute by Minute
When page access errors increase sharply, the cause may be backend failure, overload, or a sudden change in traffic quality. The source workflow monitors the difference between the latest minute and the previous minute.
Latest minute error count:
* |
select *
from (
select *
from (
select *
from (
select date_trunc('minute', __TIMESTAMP__) as time,
count(*) as errct
where http_code >= 400
group by time
order by time desc
limit 2
)
)
order by time desc
limit 1
)
Previous minute error count:
* |
select *
from (
select *
from (
select *
from (
select date_trunc('minute', __TIMESTAMP__) as time,
count(*) as errct
where http_code >= 400
group by time
order by time desc
limit 2
)
)
order by time asc
limit 1
)
The alert condition is:
latest minute error count - previous minute error count > configured threshold
Route the notification to the operations channel that handles CDN incidents. Include the domain, URL, client IP, and HTTP code distribution so the responder can decide whether the issue is global, domain-specific, or resource-specific.
Build CDN Dashboards for Performance and Quality
The dashboard layer should turn log data into a small set of recurring questions.
| Dashboard panel | Suggested dimensions | Question it answers |
|---|---|---|
| Health score or overall status | time range, domain | Is the CDN service healthy right now? |
| Cache hit ratio | hit, host |
Are requests served from cache as expected? |
| Average downstream bandwidth | rsp_size, time bucket, domain |
Is bandwidth pressure rising during the traffic peak? |
| HTTP status distribution | http_code, time bucket |
Are 4xx or 5xx responses increasing? |
| Top URLs by traffic or errors | url, rsp_size, http_code |
Which resource is driving load or failure? |
| Client and regional distribution | client_ip, prov, isp |
Is the issue concentrated by geography or ISP? |
The visual workflow in CLS supports query results, statistic cards, line charts, bar charts, and distribution views. A useful dashboard combines one top-level health view with drill-down panels for cache, bandwidth, status codes, and affected resources.
Incident Runbook
When a CDN traffic peak starts, use this order:
- Check p99
request_time, not only average latency. - Compare the current minute error count with the previous minute.
- Split errors by
host,url, andhttp_code. - Check
hitto understand whether cache behavior changed. - Review
rsp_sizeand traffic volume to separate traffic growth from error growth. - Add
client_ip,prov, andispwhen the issue may be regional or network-specific. - Turn the final query into an alert if it represents a recurring operational risk.
FAQ
Why not rely only on CDN default metrics?
Default metrics are useful for broad visibility, but log analysis is better when the team needs custom dimensions such as URL, client IP, referer, ISP, cache hit state, or request-level latency.
Why use p99 latency for alerting?
p99 preserves tail latency. Averages can look normal while a meaningful subset of users experiences slow responses.
What fields should be included in alert messages?
Use the affected host, url, client_ip, http_code, and latency metric. These fields help responders judge impact and choose the next query.






