Systems Design: Uber
READ MF!
Requirements
Functional
- Drivers need to regularly notify the service about their current location and their availability to pick passengers.
- Passengers get to see all the nearby available drivers.
- Customer can request a ride; nearby drivers are notified that a customer is ready to be picked up.
- Once a driver and a customer accept a ride, they can constantly see each other’s current location until the trip finishes.
- Upon reaching the destination, the driver marks the journey complete to become available for the next ride.
Non-functional
Estimation
Storage
Let’s say we have 500M customers and 1M drivers. Lets assume 1M customers are active daily and 500k drivers are active daily.
Modern day, there are many database machine types (in AWS or Azure) which can offer storage from 16 TB to 64 TB.
So ideally 1
machine should be enough to support our use case. Nevertheless, we will design our infrastructure to support horizontal scaling.
Write QPS
Assuming it’s 10k QPS
Read QPS
Assuming it’s 50k QPS
API Spec
Get Locations API
Create Locations API
Database/Design
Database
Instead of updating quad tree every 3 secs, we keep the positions of drivers in a hashtable in memory and less frequently update the quad tree.
In-memory Index based on Lucene for geospatial data from Google S2.
How much memory do we need?
DriverId (4 bytes)
old lat, lng (8+8)
new lat lng (8+8)
36bytes per driver and we have 1M drivers. 36MB.
Design
Ring Pop
Think of how distributed applications (services, in the Service-Oriented Architecture sense) are traditionally provisioned and run. Each of these instances are, for the most part, treated equally and get their fair share of traffic. These instances also don’t know about one another. They act as if they are the only ones serving traffic for your application! How dare they! :P Their use is orchestrated by something else, like a load-balancer, in front of them.
On the other hand, the very foundations of Ringpop are based on a membership protocol derived from SWIM Gossip protocol. A membership protocol gives the ability for these independent instances to discover one another and comunicate with each other.
Ringpop has applied consistent hashing on top of its membership protocol as a part of its core offering. Consistent hashing is all about partitioning a keyspace and assigning ownership to each of the partitions. The keyspace is mapped onto a range of your choosing, but are typically a hash of the IDs of the entities/objects in your application. The owners are the individual instances of your application that were discovered through Ringpop’s membership protocol. In short, certain objects in your application are owned by certain instances. Ownership is distributed evenly, but not in the same way a round-robin scheduler might distribute evenly.
What happens when membership changes, for instance, when you want to add capacity to your application or some hosts running your application fail? Ringpop automatically detects those changes and integrates them into its membership list and consistent hash ring. This integration may result in partitions being reassigned or rebalanced (aka new owners for the objects in your application).
Why is any of this good? When you want certain guarantees or deterministic behavior out of your application at tremendous scale often not everything can be homogenous. Not everything can serve every request or store all of the data and even if they could, sometimes it’s extremely hard to do efficiently.
The other nice part about Ringpop is that, for any application embedding it, its clients remain completely ignorant of the underlying sharding scheme. Clients neither know nor have to care about who the right instance is to serve a request. They may send a request anywhere and Ringpop forwards it to the correct owner.
In summary, Ringpop provides scalable, fault-tolerant application-layer sharding. More fundamentally, it provides the instances of a distributed application with the ability to cooperate and coordinate in a variety of ways. Choose your own adventure!
QuadTrees
Spatial indexing is an increasingly important area of geo-spatial application design. The use-cases abound around us. Ride-sharing apps (Uber) need to be able to track the location of cars in near real-time, and provide user updates via extremely fast geo-queries. Facebook wants to let you know when your friends are nearby.
Nearest neighbour (NN) queries are important for apps that provide nearby points of interests. Having a ‘nearby friends’ feature seems to be mainstream in social network applications, which can be easily powered by NN queries.
Geo-distance Range Queries: These queries help retrieve entities within a specified range. Some use-cases include finding all cabs within a 2km radius from a user’s location. Yeah, Uber comes to mind here.
The Quadtree is one of my favorite data structures. As the name implies, it is a variant of the Tree data structure. Trees generally have internal nodes (nodes that have at least one child) and leaf nodes which have no children. These nodes holds data that are ordered in a hierarchical order.
A quadtree is a tree data structure in which each node has zero or four children. Its main peculiarity is its way of recursively dividing a flat 2-D space into four quadrants.
Other services
