How to Add Filtering & Pagination to Bot Search

Why Pagination and Filtering Matter for Auditable Bots

Telegram bots that serve enterprise groups or public channels often accumulate thousands of records per day. Without server-side pagination and filtering, every /search request turns into a full-table scan that (1) exhausts memory, (2) leaks paginated data outside retention windows, and (3) complicates later forensics. Adding controlled pagination keeps response times sub-second and produces clear audit points: “offset 200, limit 50, filter=invoice:unpaid, requestor_user_id=123” is trivial to log and replay.

Beyond raw speed, pagination tokens double as immutable query snapshots. When an auditor asks, “Who saw what, when?” you can replay the exact sequence of tokens without second-guessing filter state. This becomes critical when the same user revisits the bot days later: the retained token still points to the original slice, even if newer rows have since been soft-deleted.

Core Building Blocks in Bot API 7.x

There is no native “search” object; pagination is emulated with offset and limit inside answerInlineQuery or custom message replies. Filtering is left to your backend. The two levers you own are:

Inline query ID – a 64-byte string valid for 10 s, perfect for short-lived audit correlation.
Callback data – max 64 bytes, gzip+base64 often needed to squeeze compound filters.

Keep both in your retention store; Telegram does not replay them once acknowledged. Treat the inline query ID as a request UUID and the callback data as a signed continuation cookie—lose either and the audit chain snaps.

Minimal Implementation Path (Node.js Example)

The shortest achievable path uses an offset string that encodes both the database cursor and the applied filter checksum. The client only sees an opaque token, eliminating tampering risk.

// 1. Client sends /search unpaid
// 2. Bot replies with inline keyboard:
[ { text: 'Next 50', callback_data: 'page:eyJvZmZzZXQiOjUwLCJmaWx0ZXIiOiJ1bnBhaWQiLCJja3MiOiI3YWY5ZmMifQ==' } ]
// 3. Click arrives; base64-decode, verify checksum, run SQL with LIMIT 50 OFFSET 50

Because the checksum covers filter values, a user who edits callback data in transit merely receives an empty list—no information disclosure, no crash. Rotate the signing secret monthly; old tokens naturally expire without breaking active sessions.

Platform Differences: Where to Surface Controls

Android & iOS

Inline buttons render above the system keyboard; ideal for quick filter pills. Keep label ≤20 Cyrillic glyphs or ≤24 Latin to prevent line wrap on 5" phones. On iOS, the bottom inset increases when the dictation bar is active—leave an extra 4 px grace row to avoid truncation.

Desktop (macOS/Windows/Linux)

Buttons stack horizontally until width exhaustion, then overflow into a second row. You can fit more filter chips, but avoid >6 options—keyboard-only users must Tab through each. Desktop clients also expose tooltips on long-press; use them to surface the active filter summary without cluttering the label.

Web K & Web A

Touch targets follow Material guidelines (48×48 dp). Add 8 px gap in your CSS if you embed a Web App to prevent double-tap zoom. Web K (K-web) caches inline buttons aggressively; if you change filter labels, increment a version parameter inside callback data to force refresh.

Compliance Hooks: What to Log and for How Long

Under GDPR and most finance clauses you need request, response, legal_basis, erasure_date. Capture:

update_id and inline_query_id (or callback_query_id)
user_id, chat_id (hash if pseudonymisation is required)
raw filter payload, returned message_ids
server-side execution time (helps prove “proportionate effort”)

Store for the statutory term (often 5–7 years for invoices), then purge both index and blobs. Telegram does not timestamp your logs—add server UTC. For cross-border transfers, log the data-center region (e.g., “ams” or “sin”) to satisfy Schrems II transfer impact assessments.

Exceptions and Side Effects

1. Rate-limit Overflows

Bot API allows 30 messages/sec outwards, but inline queries are capped at 1 answer per 10 s per user. If a member rapidly paginates, you must cache the next page in Redis and serve from memory; otherwise the 10-s window triggers “Query is too old” alerts. Use a hashed composite key user_id:query_hash:page with 15 s TTL to bridge the gap.

2. Large Offset = Deep Paging Death

MySQL, Postgres and even Elastic slow down once offset > 10 000 because rows still get scanned. Switch to seek method: WHERE id < $last_seen_id ORDER BY id DESC LIMIT 50. Store last_seen_id inside the callback token instead of numeric offset. The seek method also produces stable slices when rows are inserted concurrently—no duplicate or skipped entries.

3. Edited Messages Break Audit Chain

If you later edit a sent page (e.g., prepend “[ARCHIVED]”), the original text is overwritten in Telegram cloud but not in user notifications already delivered. For evidentiary integrity, send a new message rather than editing. If space is a concern, delete the obsolete message after 48 h; the audit log still retains the original message_id.

Warning: Do not rely on editMessageText for compliance-level changes; the previous version is unrecoverable in client logs.

Verification & Rollback

Before releasing to production, run this 3-step smoke test:

Generate 20 000 dummy rows, user_id randomized.
Request pages 0, 10, 100, 1 000, 5 000 with the same filter; record server time.
Assert that page 5 000 returns in <600 ms on your cheapest VM (1 vCPU, 1 GB RAM).

If above 600 ms, replace offset with seek method and re-test. Rollback is instant: disable the new /search handler route; old commands remain unaffected because pagination tokens are stateless. Keep the previous handler behind a feature flag for one sprint to allow hot rollback without redeploy.

When NOT to Add Pagination

Skip server-side pagination when:

Total result set <200 rows and growth <5 rows/day (client-side slice is simpler).
Regulatory requirement mandates “one-shot export” (e.g., SAR production).
You use a third-party full-text SaaS that already bills per API call—pagination doubles cost.

In these edge cases, deliver the entire payload as a compressed file (ZIP or CSV) and log the SHA-256 hash as the audit artifact. The file itself can be auto-deleted after 30 days while the hash remains provable.

Mini App Hybrid: Offloading Heavy Filters

Telegram Mini Apps (HTML5) can host complex faceted filters—date pickers, sliders, multi-select—then bounce the compressed query back via web_app_data. The bot receives a single signed blob, validates it against a secret HMAC, and proceeds with normal pagination. This keeps the chat history uncluttered while preserving auditability: log the base64 payload and HMAC signature.

Example: A Mini App lets managers select “Status=Overdue & Amount>$500 & Date=Last Quarter”. On submit, the app POSTs the signed blob to the bot; the bot decodes it, translates to SQL, and returns the first 50 rows inside an inline keyboard. The manager never leaves Telegram, yet the query complexity is offloaded to a full browser engine.

Performance Benchmarks (Empirical, 10.12)

Dataset Size	Offset Method	Seek Method	p95 latency
50 k rows	offset 1000	last_id	380 ms → 90 ms
200 k rows	offset 5000	last_id	1.8 s → 120 ms

Test hardware: AWS t3.micro, Postgres 15, 1 GB shared buffer. Your numbers will scale linearly with CPU; the relative gain stays ~75 %. Cold-cache runs were executed after echo 3 > /proc/sys/vm/drop_caches to simulate worst-case I/O.

Best-practice Checklist

Always sign callback data; never trust client-side edits.
Prefer seek-method pagination once table >10 k rows.
Log query_id, user_id, filter payload, returned IDs, execution_ms.
Set erasure_date at insert time to automate GDPR purge.
Disable edits on compliance messages; send new message instead.
Load-test with 2× expected peak traffic; Bot API limit is 30 msgs/sec global.
Document retention schedule in your privacy notice—authorities ask for it.

Review the checklist every quarter; Telegram occasionally raises rate limits or adds new payload fields that may shorten your compliance proofs.

Future Outlook (Bot API 7.2+)

Based on public MRs in the tdlib repo, Telegram is experimenting with server-side searchMessages for bots, including native next_offset string and built-in filtering by media type. If shipped, you can drop manual pagination tokens and rely on the same cursor string used by client apps—further simplifying audit logs to a single next_offset field. Until then, the patterns above remain the gold standard for compliant, high-scale bot search.

Take-away: Add pagination early, design for signed tokens, and log enough to replay any query. When (not if) regulations tighten, you’ll simply point at the immutable audit trail instead of rewriting half your bot.

Case Study 1: 5 000-Seat Neobank

A European neobank deployed a Telegram bot for internal expense approvals. Daily volume reached 8 000 invoices; auditors required full traceability for 7 years. The team implemented seek-method pagination with signed callback tokens. Filters: status, amount band, submitter department. Result: p95 latency dropped from 1.2 s to 110 ms; audit replay time shrank from 3 h to 15 min per month. Reusable tokens also enabled shift handover—managers exchanged deep-links without re-running queries.

Case Study 2: 50-Member Open-Source Project

A small FOSS community needed to search 600 past releases. Initial prototype used client-side slice; load was low but UX suffered on mobile. Switching to server-side pagination added 120 ms overhead yet eliminated the “scroll of death”. The maintainer stored logs in a public GitHub repo (user IDs hashed) to satisfy transparency requests. The project now serves as a reference implementation for other small teams unsure whether pagination is over-engineering.

Monitoring & Rollback Runbook

1. Alert Signals

p99 latency >1 s sustained for 5 min
Bot API 429 “Too Many Requests” >10/min
Postgres seq_scan spikes on search table
Redis memory >90 % (cached pages)

2. Location Drill

Check /metrics endpoint for bot_search_duration_seconds
Correlate callback_query_id in logs with slow SQL
Explain plan; if rows=1000000 and loops=1 → missing index or deep offset

3. Immediate Rollback

kubectl set env deployment/bot SEARCH_HANDLER=legacy
# feature flag reverts to unpaginated /search

4. Post-mortem Checklist

Re-run load test with 2× traffic
Add composite index on (filter_status, id DESC)
Document incident in compliance folder

FAQ

Q: Can I reuse pagination tokens across bots?: A: No—tokens are signed with a bot-specific secret; cross-bot usage fails HMAC verification.
Q: Does seek method work with UUID primary keys?: A: Yes, store the lexical UUID string; comparisons remain lexicographic and index-friendly.
Q: How to handle a user missing the 10-s inline window?: A: Cache the result set in Redis for 30 s; answer with a “Retry” button pointing to the cached page.
Q: Is gzip+base64 safe for 64-byte callback?: A: Empirically, a 5-field filter compresses to ~52 bytes; stay below 59 bytes to leave room for URL encoding overhead.
Q: Do I need user consent to log message_ids?: A: Under GDPR, logging pseudonymous message_ids is legitimate interest; document this in your LIA.
Q: Can Telegram replay a deleted message?: A: No—once deleted, the message_id is unrecoverable for both user and bot.
Q: Seek method vs cursor-based pagination?: A: Seek is a form of cursor pagination; both avoid offset scanning, but seek uses column value rather than opaque cursor.
Q: What hash algorithm for callback checksum?: A: BLAKE2s at 128-bit is fast and fits in 22 base64 chars; HMAC-SHA256 is equally acceptable.
Q: How to prove “proportionate effort” to auditors?: A: Provide logs showing execution_ms <600 ms and CPU <30 % under 2× load—evidence of reasonable engineering.
Q: Mini App vs inline buttons for accessibility?: A: Screen-reader users prefer inline buttons because Mini Apps may not expose ARIA labels reliably.

Terminology

Seek method: Pagination using WHERE id < $last_id instead of OFFSET; first mentioned in “Exceptions”.
Inline query ID: 64-byte Telegram identifier valid for 10 s; see “Core Building Blocks”.
Callback data: 64-byte payload attached to inline buttons; see “Minimal Implementation”.
LIA: Legitimate Interest Assessment under GDPR; see FAQ.
Deep paging death: Performance cliff when OFFSET >10 k; see “Exceptions”.
GDPR: General Data Protection Regulation; see “Compliance Hooks”.
SAR: Subject Access Request; see “When NOT to Add Pagination”.
BLAKE2s: Fast cryptographic hash; see FAQ.
HMAC: Keyed-hash message authentication code; see “Mini App Hybrid”.
p95 latency: 95th percentile response time; see benchmarks table.
t3.micro: AWS instance type used in benchmarks; see “Performance Benchmarks”.
seq_scan: Postgres sequential scan indicator; see runbook.
ARIA: Accessible Rich Internet Applications; see FAQ.
UUID: Universally unique identifier; see FAQ.
SHA-256: Secure hash algorithm; see FAQ.

Risk Matrix & Boundary Conditions

Scenario	Risk	Mitigation / Alternative
Bot API drops `next_offset` support (future)	Audit replay breaks	Keep signed tokens as fallback; abstract behind interface
Callback data >64 bytes	Message rejected	Store filter server-side, put UUID in callback
Seek on non-indexed column	Full table scan	Add composite (filter, id) index or revert to offset
Redis outage	Cached pages lost	Gracefully degrade by re-executing query; alert on miss

Respect these boundaries and you retain both performance headroom and legal defensibility—even if Telegram revises the underlying primitives tomorrow.