Queue-Based Architecture
Understanding Bklit's high-performance event processing pipeline
Queue-Based Architecture
Bklit uses a queue-based architecture for high performance, scalability, and zero data loss.
Overview
Instead of inserting events directly into ClickHouse on every request, events are:
- Queued in Redis (fast, durable)
- Processed in batches by a background worker
- Inserted into ClickHouse in bulk (100x faster)
Architecture Diagram
Tracker SDK ←→ WebSocket Server → Redis Queue → Background Worker → ClickHouse
(bklit.ws:8080) (durable) (batch 100) (8-50ms)
↓
Broadcast to Dashboard
↓
Dashboard UI (WebSocket)Components
1. WebSocket Server
Location: packages/websocket
Port: 8080
URL: wss://bklit.ws (production), ws://localhost:8080 (development)
Purpose:
- Maintain persistent connections with SDK (visitor browsers) and dashboards
- Receive events via WebSocket (pageviews, custom events)
- Validate API tokens (with Redis caching)
- Anonymize IP addresses and enrich with geolocation
- Push to Redis queue
- Broadcast events to connected dashboards
- Detect instant session end on WebSocket disconnect
Benefits:
- Ultra-fast response times
- No ClickHouse connection overhead
- Can scale independently
2. Redis Queue
Key: analytics:queue
Format: LPUSH/RPOP (list)
Persistence: AOF (append-only file)
Purpose:
- Buffer events for batch processing
- Ensure zero data loss
- Decouple ingestion from processing
Benefits:
- Durable (survives crashes)
- Fast (in-memory)
- Scalable (multiple workers can consume)
3. Background Worker
Location: packages/worker
Polling: Every 1 second
Batch Size: Up to 100 events
Processing Steps:
- Pop batch from Redis queue
- Look up EventDefinition UUIDs from Postgres (cached)
- Create/update sessions in ClickHouse
- Batch insert events to ClickHouse
- Publish to Redis pub/sub for real-time
- Verify inserts succeeded
Benefits:
- Batch inserts are 100x faster than single inserts
- Can retry failed inserts
- Monitors queue depth
4. Real-time Updates
Channel: live-events (Redis pub/sub)
Consumer: WebSocket server
Flow:
- Worker publishes processed events
- WebSocket server broadcasts to connected clients
- Dashboard UI updates instantly
Performance Comparison
| Metric | Old (Direct Insert) | New (Queue-Based) | Improvement |
|---|---|---|---|
| API Response | 300-500ms | 1-3ms | 100x faster |
| ClickHouse Write | 1 insert/event | Batch 100 events | 100x faster |
| Data Loss Risk | Medium (if crash) | Zero (queue persists) | Eliminated |
| Scalability | Limited | Horizontal | Unlimited |
| Observable | Console logs only | /terminal UI | Full visibility |
Local Development
Starting Services
Terminal 1:
pnpm dev:servicesWait for:
[prisma] Prisma dev running...[websocket] 🌐 WebSocket server ready[worker] 🔄 Background worker started
Terminal 2:
pnpm devDebug UI
Visit the /terminal page to see real-time event flow:
http://localhost:3000/{organizationId}/{projectId}/terminalFilter by stage, search logs, trace individual events through the entire pipeline.
Monitoring
Queue Depth
Check if events are backing up:
docker exec bklit-redis-local redis-cli LLEN analytics:queueShould be 0 or very low (worker processes fast).
Worker Performance
The /terminal UI shows:
- Events processed per second
- Average latency per stage
- Batch sizes
- Error rates
ClickHouse Data
Verify events are being saved:
docker exec bklit-clickhouse-local clickhouse-client --database=analytics \
--query "SELECT count() FROM page_view_event"Zero Data Loss
The architecture guarantees zero data loss through:
- Redis AOF Persistence - Every queue operation is logged to disk
- Worker Retry Logic - Failed inserts are retried with exponential backoff
- Dead Letter Queue - Permanently failed events moved to
analytics:queue:failed - /terminal Monitoring - Immediate visibility into any failures
Production Deployment
Status: Production deployment guide coming soon.
The architecture is production-ready and tested locally. Deployment involves:
- WebSocket server → Hetzner VPS (bklit.ws, port 8080, gray cloud DNS)
- Background worker → Hetzner (systemd service, co-located with ClickHouse)
- Dashboard → Vercel (Next.js app with tRPC API routes)
- Monitor queue depth and WebSocket connections
Backwards Compatibility
The new system is fully backwards compatible:
- Uses same ClickHouse tables
- Uses same EventDefinition UUIDs from Postgres
- Queries work identically
- Gradual migration possible (dual-write during transition)