Telegram Bot API Rate Limits: Queuing & Retry Patterns Explained

1. Why Rate Limits Matter for Compliance

Telegram’s Bot API enforces two hard ceilings: 30 messages per second to any single chat and an approximate global flood threshold of ~1 msg/sec averaged across all chats. Breaches return 429 or 403 errors and—crucially—are not logged by Telegram. If your backend does not retain the failed payload and the HTTP response body, an auditor has no way to prove you did not silently drop user data. Building an internal queue with at-least-once retry is therefore not a performance luxury; it is a retention requirement.

Compliance tip: store the full Telegram response (ok, error_code, description) for 90 days, the same window Telegram keeps message content on its CDN. Aligning retention periods simplifies cross-audit.

2. 2025 Limit Matrix—What Actually Changed

No structural change arrived in 2025, but the edge behaviour tightened:

Media group messages (albums) now share the same 30/sec bucket as text, ending the old workaround of “upload 10 photos at once to escape text limit”.
The undocumented burst allowance (≈5× baseline for the first second) is less predictable; bots launched after March 2025 show a 15 % stricter variance in empirical tests.

Because Telegram does not version its backend, the only observable signal is increased retry_after values in 429 replies. Treat any numeric advice older than 2024 as suspect.

3. Metric-Driven Planning: Speed vs. Retention vs. Cost

Before writing code, pick the KPI that hurts you most:

Objective	Metric	Cost Driver
Low latency	P95 enqueue-to-sent ≤ 1 s	CPU (parallel workers)
Audit ready	100 % failed-msg retention 90 d	Storage (S3/GCS)
No over-spend	Worker hours ≤ N	Concurrency limit

Map these to a concrete workload. Example: a news bot reaching 200 k subscribers, publishing 300 stories/day, median fan-out 20 chats per story → 6 k outbound msgs/day. At 1 msg/sec global you need only one worker, but you must still queue because spikes (breaking news) can push 200 msgs in 10 s. The queue absorbs the spike while the worker drains at the allowed rate.

4. Architecture Blueprint—At-Least-Once with Idempotency

A minimal compliant pipeline contains:

Acceptor – HTTP endpoint that receives your application event, assigns uuid, responds 202 in < 100 ms.
Queue – Postgres (with SKIP LOCKED) or Redis Streams; both give millisecond visibility and can be WAL-backed for audits.
Worker – single-threaded event loop that sleeps retry_after when asked, else 1÷limit seconds.
Ledger – append-only table: msg_id, chat_id, payload_hash, http_status, telegram_msg_id, created_at.

Idempotency is achieved by storing the Telegram message_id returned on success; if the same payload_hash is retried later, skip sending and return the cached message_id. This prevents double messages during network retries and satisfies most GDPR “accidental duplication” clauses.

5. Queuing Options—Postgres vs. Redis vs. Cloud Native

Postgres (RDS, Cloud SQL)

Good when you already run relational data; use advisory locks or FOR UPDATE SKIP LOCKED. Write throughput ~5 k inserts/sec on 2 vCPU, enough for 1 M bots/day. WAL doubles as audit trail; turn on log_statement = 'mod' for SOX-style proof.

Redis Streams

Lower latency (< 1 ms) and dead-simple consumer groups. Persistence requires AOF every write; else you lose in-memory data on crash. Cost is memory-bound: 1 k msg (1 KB each) ≈ 1 MB RAM, negligible but scales linearly.

Cloud tasks (SQS, Pub/Sub)

Built-in visibility timeout maps neatly to retry_after. Downsides: 24 h max retention (SQS standard) and no strict ordering; if you need in-order delivery for live sports scores, prefer Redis or Kafka.

6. Retry & Back-off Patterns That Telegram Accepts

Telegram returns two clues:

429 + retry_after (seconds) – honour this value exactly; the backend already calculated margin.
502/520 – cloud gateway error, no retry_after; use exponential back-off starting at 1 s, cap 30 s, jitter ±20 %.

# pseudo-code sleep = response.retry_after or min(30, 2**attempt + random.uniform(0,0.2)) await asyncio.sleep(sleep)

Never retry more than 5 times; after that, write to a dead-letter table and alert. Persistent failures usually mean banned bot or invalid token—no amount of waiting helps.

7. Observability—What to Log and for How Long

Minimal telemetry for an audit trail:

Field	Source	Retention
`uuid`	your acceptor	90 d
`chat_id`	payload	90 d (hash if GDPR)
`telegram_msg_id`	API response	90 d
`http_status`	worker	90 d
`sent_at`	worker	90 d

Store in immutable object storage (e.g., S3 with Object Lock). Query performance is not critical; auditors care about completeness, not millisecond look-ups.

8. Platform Differences—Where You Configure Nothing

Telegram Bot API is platform-agnostic; rate limits are enforced server-side. However, developers sometimes look for client-side clues:

Desktop/macOS: @BotFather → /mybots → <Bot> → API Token (same on every OS).
Android/iOS: path identical; no additional switches for rate limits exist.

If you hoped for a “slow mode” toggle, there is none; throttling is your responsibility.

9. A/B Two Plans—Single Worker vs. Multi-Region Fleet

Plan A: Single Worker (Cost < $5/month)

A 256 MB VPS or Cloud Run instance with concurrency=1. Postgres queue lives on the same VM. Throughput ceiling = 30 msg/sec (Telegram per-chat limit), enough for 2.5 M msgs/day. Audit storage: 90 d × 2.5 M × 1 KB ≈ 225 GB, ~$5 on AWS S3 Glacier.

Plan B: Regional Fleet (P95 latency < 500 ms)

Deploy 3 workers in eu-central, us-east, ap-sing. Each worker owns a Redis stream consumer group; consistent hashing by chat_id guarantees ordering. Add Stackdriver alert if queue depth > 1 000. Cost jumps to ~$90/month (3 × 1 vCPU, Redis 2 GB, multi-region egress), but you survive zone outages and keep latency low for global audiences.

When not to scale out

If your daily volume < 50 k msgs, the single-worker model is cheaper and simpler to audit. Extra shards introduce clock skew across regions, complicating the 90-day immutable log.

10. Validation & Load Test Without Getting Banned

Telegram has no sandbox; every call hits production. Safe procedure:

Create a second bot (@BotFather → /newbot) used exclusively for load tests.
Create a private group with only your test accounts; limit to 30 members to avoid user reports.
Ramp from 1 msg/sec to 35 msg/sec in 5 min steps; stop when retry_after appears.
Record the highest sustained rate; set your worker 20 % below that.

Never test on your production bot—token resets erase message history from Telegram’s admin interface, breaking audit continuity.

11. Troubleshooting Matrix

Symptom	Likely Cause	Check	Fix
`429` every second	Worker ignores `retry_after`	Log value, actual sleep	Honour exact seconds
`400 Bad Request`	Malformed markdown/HTML	Parse mode, entities	Escape input or disable preview
`403 Forbidden`	User blocked bot	`can_send_messages`?	Mark chat inactive, stop retries
Success but message missing	Idempotency clash	`payload_hash` dup	Cache Telegram `message_id`

12. Best-Practice Checklist (Copy into PRD)

Always return HTTP 202 to upstream caller before enqueueing.
Store full Telegram response body ≥ 90 days, WORM storage preferred.
Single-thread per chat to keep order; shard by chat_id if multi-worker.
Cap retries at 5; dead-letter afterwards.
Respect exact retry_after; do not add arbitrary padding.
Use idempotency key (payload_hash) to prevent duplicates.
Alert on queue depth > 1 000 or DLQ growth > 10 / hour.
Run annual load test on separate bot; document safe rate.
Keep worker concurrency ≤ 1 unless you own distributed lock.
Version your ledger schema; include raw JSON column for future unknown fields.

13. Future Outlook—What May Change in 2026

Based on public Bot API commits (v7.5, Oct 2025), Telegram is experimenting with adaptive rate hints inside successful responses:

{"ok":true,"result":{"message_id":123,"rate_hint":{"max_per_second":25,"until":1735708800}}}

This optional block is not yet live, but if rolled out, it will allow bots to upload their own congestion-control algorithm instead of hard-coding 30. Recommendation: keep the worker logic table-driven so you can ingest rate_hint without a redeploy.

14. Key Takeaways

Telegram rate limits look forgiving until you scale; a 10 k spike can wipe out your delivery guarantees in minutes. A single-worker queue with Postgres gives you 90-day auditability for pennies. Move to a regional fleet only when latency or availability, not throughput, becomes the bottleneck. Whatever architecture you pick, log the full Telegram response and honour retry_after to the millisecond—your future auditor will open the ledger, not the code.