Webhook Debug Tools for Telegram

Why Webhook Debugging Still Hurts in 2025

Even after Bot API 7.0 relaxed TLS cipher suites, operators still lose updates silently: a mistyped secret_token, a forgotten IPv6 record, or a 29.9-second database lock can drop 100% of traffic with zero notice. Long-polling is not an option at 10 kreq/min, so the only escape hatch is a local inspection mirror that lets you pause, rewrite and replay Telegram’s JSON before it reaches production.

Decision Tree: Which Tool Fits Your Stack?

Pick in under 60 seconds:

If you need zero install and traffic is <1 M updates/month → Telegram’s built-in last_error_date + ngrok quick tunnel.
If you must store payloads for 30 days compliance → self-hosted echo micro-service (Go or Node) behind nginx.
If you run multi-bot fleets >50 tokens → third-party catch-all inspector with per-route filtering (avoids rate-limit race).

Jump to the matching section; each contains platform-specific copy-paste commands and rollback one-liners.

Built-in Log Preview (No Code, 2 min)

What Telegram Already Gives You

Since Bot API 6.9 every setWebhook response contains pending_update_count, last_error_message and Unix last_error_date. These three fields update in real time; no extra scopes needed.

Fastest Check on Mobile

Android/iOS: open any bot in-chat → /setwebhook (you need owner privilege) → send raw URL. The reply bubble surfaces the last 80-character error string. Tap-and-hold the bubble to copy JSON—handy when you are on cellular and SSH is impossible.

Desktop Shortcut

Windows/macOS/Linux: in the search bar type @BotFather → /mybots → select bot → Bot Settings → Webhooks. The panel shows the same triplet plus a “Test your webhook” button that fires a dummy {"update_id":0,"message":{...}} to your endpoint and reports HTTP status within three seconds, eliminating the guesswork between Telegram and your reverse proxy.

Silent Failure Patterns to Watch

Experience shows that last_error_message truncates at 80 UTF-8 code points, so “certificate verify failed: unable to get local issuer certificate” becomes “certificate verify failed: unable…” and the root CA name is lost. When you see repeating 403 or 502 stubs, first widen the log retention window in your CDN—Cloudflare, for instance, hides TLS handshake failures from upstream unless “Security Events” is explicitly exported to Logpush. Another blind spot is IPv6: Telegram since 2023 prefers v6 when present; if your DNS AAAA record points to a stale container, pending_update_count climbs yet last_error_message merely reads “connection timeout”.

Zero-Install Tunnel (ngrok, 5 min)

One-Liner Launch

ngrok http 8080 --verify-webhook telegram --verify-webhook-secret $BOT_TOKEN

The experimental --verify-webhook flag (ngrok v3.8+) auto-injects Telegram’s secret-token header and keeps a searchable replay log for 90 minutes—enough for most debugging sessions without signing up for a paid plan.

Replay Walk-Through

Open the ngrok web UI → “Replay” tab → click the green arrow next to any update. You can edit the raw JSON, add a "edited_message" stanza, then hit “Replay” to re-inject it into your local service. This is invaluable for checking idempotency: if re-processing the same update_id creates duplicate invoices, your transaction layer needs a distributed lock keyed by update_id.

Self-Hosted Echo Service (Go, 30 min)

Storage Schema

A single-table SQLite design keeps the binary portable: updates(update_id PRIMARY KEY, payload TEXT, received_at DATETIME, replay_token TEXT). The replay_token is a UUIDv4 returned to the caller so that downstream QA tools can re-fetch the exact payload hours later without exposing your full database.

Dockerfile Optimised for ARM & x86

FROM gcr.io/distroless/static-debian12
COPY tel-echo /tel-echo
EXPOSE 8080
ENTRYPOINT ["/tel-echo","-dsn","/data/updates.db"]

Multi-arch build (docker buildx) produces a 7 MB image that starts in 80 ms on a t4g.nano, keeping cold-start cost negligible even when you scale-to-zero in Fargate.

Fleet-Wide Inspector (SaaS, 10 min)

Token Race Avoidance

When you manage 50+ bots, rotating a single SaaS-provided URL into every setWebhook call risks hitting the 30 req/min global limit. Instead, register a wildcard DNS record *.hook.example.com pointing to the inspector, then set each bot’s webhook to https://<bot-id>.hook.example.com. The inspector uses SNI to demux traffic, so Telegram sees distinct hosts and the rate limiter remains happy.

Cost Projection

At 500 updates/sec a commercial vendor charges roughly USD 19 per million payloads, inclusive of 7-day retention and full-text search. For comparison, a self-hosted t3.micro with 20 GB GP3 storage costs ~USD 2.3 per month but adds 45 min of DevOps time each time you patch OpenSSL. Pick SaaS when your on-call rotation is already saturated; otherwise the break-even point is around 3 M updates/month.

Case Study #1 – Indie Game Bot (1 kreq/min)

Context: A weekend project that sends daily loot drops to 40 k players.
Problem: Sporadic 502s after a providers’ Kubernetes upgrade; last_error_message blank because the load-balancer returned an HTML error page larger than 80 chars.
Intervention: Inserted a 15-line OpenResty Lua snippet to trim error bodies to 60 chars and add X-Request-ID. Cross-matched that ID with ngrok’s replay log, revealing a 1.2-second GC pause in the JVM sidecar.
Result: Moved sidecar to a separate pod; p99 latency dropped from 890 ms to 120 ms and zero updates were lost during the next event spike.
Revisit: Three months later the same setup caught a DNS TTL mis-configuration before it hit 0.1% of users—proof that lightweight mirrors pay ongoing dividends.

Case Study #2 – Neobank KYC Bot (12 kreq/min)

Context: Regulated environment requiring 30-day audit trail and in-country data residency.
Problem: Compliance team demanded non-repudiation of every Telegram update, yet the core banking API could only accept TLS 1.3 with mutual TLS, which Telegram does not support on outgoing webhooks.
Intervention: Deployed the self-hosted Go echo service inside an enclave, wrote replay_token to an append-only Kafka topic, then streamed hashes to an external notary. A sidecar container re-signed each payload with the bank’s private key before forwarding to the KYC micro-service.
Result: Passed external audit with zero findings; average end-to-end latency added 34 ms, well inside the 500 ms SLA.
Revisit: When Bot API 7.2 eventually adds mTLS support, the echo service can be retired; the Kafka topic already provides an immutable buffer for replay, making the migration path trivial.

Monitoring & Rollback Runbook

1. Alert Signals

pending_update_count > 100 for >2 min
rate(nginx_http_requests_5xx[1m]) > 0.05
“certificate verify failed” substring in last_error_message

Page on the first condition; open a high-priority incident if two overlap.

2. Diagnosis Steps

Export getWebhookInfo to JSON.
Compare last_error_date with CDN edge logs.
If mismatch >5 s, suspect clock skew; run ntpdate -q pool.ntp.org.
Replay the last failed update locally; confirm HMAC if secret_token is set.

3. Rollback / Mitigation

# Re-point webhook to last known good endpoint
curl -F "url=https://v1-stable.example.com/telegram" \
     -F "secret_token=$TOKEN_BACKUP" \
     https://api.telegram.org/bot$BOT_TOKEN/setWebhook

Keep previous containers tagged :blue-green so Kubernetes can swap within 30 seconds. Validate rollback with getWebhookInfo; pending_update_count should start decreasing within one polling cycle (≈10 s).

4. Post-Incident Checklist

Update SLO dashboard with minutes of update lag.
Rotate secret_token if any leakage suspected.
Commit new regression test that replays the exact malformed update.

FAQ

Q: Does Telegram retry failed webhooks? A: No automatic retries; once your endpoint returns non-200 or times out after 60 s, the update is gone.
Background: Bot API design favours real-time delivery; clients are expected to fetch missed updates via getUpdates if needed. Q: Can I use a self-signed certificate? A: Yes, but only if you upload the public cert via setWebhook certificate field; Let’s Encrypt is simpler and rotates automatically.
Evidence: Official docs still list PEM upload as supported in Bot API 7.0. Q: Why does getWebhookInfo show “SSL error {error:0x00000005}”? A: OpenSSL error 5 is CERTIFICATE_VERIFY_FAILED; usually the CA bundle is incomplete.
Fix: Append the Let’s Encrypt root to your chain and confirm openssl s_client -connect your.site:443 -servername your.site returns “Verify return code: 0”. Q: Is there a limit on webhook URL length? A: Empirically 1024 UTF-8 bytes; longer URLs return “Bad Request: URL too long”.
Work-around: Store bot-scoped data in a path parameter under 200 chars and keep the rest in headers. Q: IPv6-only host? A: Telegram started preferring IPv6 in 2023; if your AAAA is broken, updates stall with no IPv4 fallback.
Tip: Maintain dual-stack or explicitly remove AAAA until you’re ready. Q: How accurate is last_error_date? A: Unix timestamp is set on Telegram edge right after the failed TCP close; expect <1 s skew versus your NTP-synced clock.
Caveat: If your CDN returns a 4xx edge code without proxying, the timestamp reflects CDN wall time, which may drift. Q: Can I replay an update to a different bot? A: Only after changing update_id and bot-scoped fields; otherwise Telegram ignores it as duplicate.
Legal note: Replaying user data to another identifier may breach GDPR purpose limitation. Q: Does compression help? A: Telegram never compresses outgoing webhooks; enabling gzip on your side reduces ingress bandwidth but adds ~5 ms CPU.
Observation: At 500 kB payload (large group photos) gzip shrinks to 420 kB—rarely worth the complexity. Q: Are webhooks delivered in order? A: Yes per chat, but not globally; two users may receive updates out of order.
Design: Use update_id only for dedupe, not sequencing logic. Q: Maximum headers size? A: Telegram sends ~400 bytes of headers including X-Telegram-Bot-Api-Secret-Token; ensure your reverse proxy buffer >4 kB to accommodate future expansion.

Glossary

Bot APITelegram’s HTTP interface for bot developers; current public version is 7.0. update_idMonotonically increasing identifier for every inbound event; used for deduplication. secret_tokenOptional HMAC secret set during setWebhook; appears in header X-Telegram-Bot-Api-Secret-Token. pending_update_countNumber of updates queued on Telegram side because your endpoint is unreachable. last_error_dateUnix timestamp of the most recent failed delivery attempt. last_error_messageTruncated 80-character error string returned by your server or edge proxy. getWebhookInfoBot API method that returns the triplet above plus max_connections and allowed updates. setWebhookBot API method to register or update your webhook URL and optional parameters. ngrokThird-party tunnel software that exposes local port to a public HTTPS endpoint. IPv6 preferenceSince 2023 Telegram tries IPv6 first if DNS AAAA record exists. replay_tokenUUID returned by echo service for later retrieval of exact payload. mTLSMutual TLS; not yet supported by Telegram outgoing webhooks (as of Bot API 7.0). rate-limit raceContention when >30 setWebhook calls/min hit Telegram edge, causing 429 responses. Let’s EncryptFree CA commonly used for TLS certificates on webhook endpoints. OpenRestyNginx flavour with Lua scripting used to trim error bodies for Telegram logs. blue-greenDeployment strategy where previous stable container remains ready for instant rollback.

Risk Matrix & Boundary Conditions

Unsupported: WebSocket or gRPC endpoints; Telegram delivers only HTTPS/1.1.
Side effect: Enabling drop_pending_updates=True during setWebhook erases the queue—irreversible.
Hard limit: 1 update/30 s per chat in groups >200 members to counter spam; your webhook will not see faster real-time traffic.
Alternative: If you need full-duplex or push-back, switch to getUpdates long-polling with HTTP/2 and streamline your infra to handle the extra round-trips.

Future Trend / Version Outlook

The Bot API roadmap (public issue tracker #2208) hints at batch webhooks—multiple updates per POST—which would cut request overhead by 60% for high-volume fleets but require a new Content-Type: application/vnd.telegram-updates+json. Start designing your parser to handle both single and array payloads today, and you will be ready the day the flag goes live. Until then, keep a local inspection mirror; it remains the fastest way to turn silent failures into actionable logs.