System Design Lessons From Scaling to 1M Users
What I learned about caching, databases, and distributed systems while scaling a B2B platform.
Gaurav Talesara
AI Systems Engineer · Agentic Systems Architect

The Journey to 1M
Scaling from 1,000 to 1,000,000 users isn't 1000x harder. But it does require fundamentally different thinking about architecture decisions.
Lesson 1: Cache Everything (But Invalidate Carefully)
Caching gave us the biggest performance wins. We cached: - Database query results - API responses - Computed aggregations - Session data
The hard part isn't caching — it's invalidation. We learned to be explicit about cache lifecycles and to prefer TTL-based expiration over complex invalidation logic.
Lesson 2: Your Database is Lying to You
"The query is fast" means nothing without load testing. We had queries that took 10ms with 100 concurrent users and 10 seconds with 10,000.
Solutions that helped: - Read replicas for read-heavy workloads - Connection pooling (seriously, do this first) - Query optimization based on actual production patterns
Lesson 3: Async Everything
We moved as much as possible out of the request path: - Email sending → queue - Analytics tracking → queue - Webhook delivery → queue
This made our APIs faster and more resilient.
Lesson 4: Observability Isn't Optional
At scale, you can't debug with logs alone. We invested in: - Distributed tracing - Real-time metrics - Alerting on business-critical paths
The cost was worth it. We caught issues before customers noticed.
The Meta-Lesson
Most scaling problems aren't technical — they're about understanding your system's behavior under load. Invest in understanding before optimizing.
More from Insights