Skip to main content
Lakshay Jawa

Lakshay Jawa

Engineering Leader | Architecting AI-Driven Execution | Scaling Distributed Systems

System Design · Behavioral Interviews · Posts


About nSkillHub
#

A passion-driven space for learning — system design, Java, Spring, and software engineering best practices. As a software engineer with years of experience, this blog shares insights, deep-dives, and interview prep material. Future topics will expand into movies, photography, travel, and more. Stay tuned!

Contact Lakshay on LinkedIn

Recent

Dropbox / Google Drive — Distributed File Sync at Scale

1. Hook # In 2011, Dropbox engineers discovered that roughly 70% of all uploaded data was already on their servers — users syncing the same PDFs, stock photos, and installer packages. Switching from file-level to block-level deduplication immediately cut bandwidth costs by more than two-thirds. That insight defines the whole discipline of cloud file sync: the hard problems are not storage capacity or even bandwidth, but delta detection, deduplication, conflict resolution, and consistency across an arbitrarily large fleet of devices. Google Drive went further, embedding a collaborative editing layer (Docs, Sheets, Slides) on top of the same blob store. Today both systems handle hundreds of millions of users, billions of files, and near-real-time sync across mobile, desktop, and web clients — often over flaky connections.

Google Docs — Real-Time Collaborative Editing at Scale

1. Hook # In 2006, Google acquired Writely and within two years turned it into Google Docs — the first mainstream product that let multiple people type in the same document at the same time without locking or “check-out” workflows. The core problem sounds deceptively simple: if Alice deletes character 5 while Bob inserts a character at position 4, whose version wins? The naïve answer (“last write wins”) produces corrupted documents. The real answer — Operational Transformation (OT) — is the algorithm that makes collaborative editing feel like magic, and it is one of the most subtle distributed-systems problems you will encounter in an interview. Every major collaborative editor (Google Docs, Notion, Figma, Microsoft 365) is built on either OT or its younger sibling CRDT (Conflict-free Replicated Data Type). Understanding which to use, and why, separates candidates who have thought deeply about consistency from those who have memorised buzzwords.

Search Engine — Google-Scale Crawl, Index, Rank, and Serve

1. Hook # Google processes 8.5 billion searches per day — roughly 99 000 queries per second at peak — and returns results in under 200 ms. Behind that sub-second response is a pipeline that never fully stops: a web crawler perpetually downloading ~20 billion pages, a MapReduce-scale indexing system converting raw HTML into a compressed inverted index, a multi-stage ranking pipeline that scores hundreds of signals in milliseconds, and a serving layer that shards the index across thousands of machines so no single query touches more than a fraction of the corpus. Building a search engine from scratch is perhaps the canonical “design a distributed system” problem because it combines almost every hard problem in the field: distributed crawling, large-scale data processing, near-real-time index updates, low-latency high-throughput query serving, and machine learning (ML)-based ranking. Even a simplified version at 1/1000th of Google’s scale teaches you more about distributed systems than almost any other exercise.

Tell Me About a Time You Set a Technical Direction That Turned Out to Be Wrong

S1 — What the Interviewer Is Really Probing # The exact scoring dimension here is technical accountability under authority — not whether you’ve been wrong, but whether you can hold the weight of being wrong cleanly. The interviewer wants to see that technical confidence and intellectual honesty coexist in you. Most engineering leaders have made a bad call; very few describe it without hedging, blame-diffusing, or skipping straight to the fix.

Netflix — Video Streaming Platform

1. Hook # At peak, Netflix accounts for 15% of global internet downstream traffic — roughly 700 Gbps flowing to subscribers in 190 countries. What makes this feasible is not raw bandwidth: it is a carefully engineered pipeline that converts every raw title into over 1,200 encoded video files before a single subscriber presses play, then serves those files from ISP-embedded appliances called Open Connect Appliances (OCA) rather than from a traditional cloud CDN. The streaming experience you see — where the picture quality silently improves while you watch — is ABR (Adaptive Bitrate) streaming dynamically switching between those pre-encoded variants based on your network conditions. Behind the personalised rows on the homepage sits a recommendation engine that runs 45+ algorithms to surface the title you are most likely to start watching in the next 30 seconds. Each of these subsystems operates at a scale where a 0.1% drop in streaming reliability translates to 250,000 subscribers unable to watch at that moment.

Uber / Ride-Sharing System

1. Hook # Every time someone taps “Request Ride” on Uber, the platform must answer a deceptively hard spatial query in under a second: which of the thousands of nearby drivers is the best match for this rider, given their location, heading, vehicle type, and current workload? Uber processes 25 million trips per day across 70+ countries, with peak demand spikes during commute hours, concerts, and bad weather — all of which arrive simultaneously in the same city blocks.