Posted on April, 2021
In our spirit to understand how things work, especially those that work very well or are offered by leaders in their segment, this time we will focus on a little talk about WhatsApp internals and how, in particular, an early design decision has had a very significant impact on the present.
If you as a reader came looking for the classic WhatsApp article focused on security or privacy risks, we recommend that you read this article: (https://www.rd.com/article/is-whatsapp-safe/). In this article we will focus on the platform design criteria, particularly addressing performance and scalability.
WhatsApp is a messaging, VoIP and Videoconferencing platform, with just 50 employees, founded in 2009. In 2014 it was acquired by Facebook for approximately $19 billion, the largest purchase Facebook has made to date. The platform, which was one of the first modern and 100% mobile messaging platforms to emerge, before WeChat, SnapChat or Telegram, is the most used in the world and has 2 billion frequent users. Part of its success is due to having been one of the first, but also because it requires a very small resource footprint -the Android version requires only 33MB-. Despite its small size, it is an APP that offers a great user experience, being very simple and agile. This has made it dominate, mainly, emerging regions such as Latin America, Africa, India, but also a large part of Europe.
Clearly this platform, from its inception, was designed with scalability and performance in mind. This is attested by the fact of handling more than 70 million messages per second and having been developed in Erlang, a language in which the efficient implementation of multiple concurrent connections is very natural, the fact of having hundreds of terabytes of RAM distributed in thousands of cores, more than 2 million simultaneous TCP connections per node or, above all, that the criterion for the success of the product was to be able to reach the mass market and interconnect billions of users. When you want to manage this level of billions of interconnections, it is essential to have, especially, a very performant and robust backend, which represents a great challenge from the point of view of reliability and high performance engineering.
For more numbers and statistical details of WhatsApp capacity, we invite you to visit this article: http://highscalability.com/blog/2014/2/26/the-whatsapp-architecture-facebook-bought-for-19-billion.html
Therefore, it is logical to assume that the WhatsApp backend was designed adopting many of the standard scalability and performance criteria of its time, most of which are still in force even 12 years later. However, WhatsApp made some relatively innovative and playful design decisions, which later proved to be very successful.
Next, some aspects of scalable architecture will be shown, at a very high level (like someone looking at a city from an airplane) that probably looks a lot like those that make up WhatsApp. This suggested design is the result of investigating the platform as a user, in addition to the scant architecture information that WhatsApp itself has published. Likewise, although it is not 100% accurate, this description represents a very standard architecture for this type of platform.
Each small box represents a client / user accessing WhatsApp with their mobile phone. All mobile phones are connected and registered in a centralized server, which, among other things, forwards messages between cell phones, since it is connected to each of them, has them identified and knows how to access them. If cell phone 1 wants to send a message to cell phone 2, it does not send it directly. First because it wouldn't know how to find it, it knows how to find the server but not how to find the other 2 billion users. Second, for security reasons, each cell phone will have incoming connections blocked, it will only accept messages from the known WhatsApp server to which it registered.
This works very well as a proof of concept, but if you wanted to scale to thousands of users it would be insufficient. The first problem we would have is that the server, no matter how big it is, would eventually run out of resources to handle all the necessary connections and message forwarding. So one solution is to put more servers and distribute the workload.
In this way, the limitation of having a single server is overcome, which no matter how large it is, it will always happen that its resources will be insufficient to serve such workloads. By having several servers, it will be possible to partition the workload between an arbitrary number of servers, so that for each given server, its resources are sufficient and appropriate to support a given load partition.
Consequently, the following question arises: How is this load partitioned? In this type of design, the most effective way to do it will be using a technology called load balancer (LB), which partitions the connection space among the available servers. This architecture offers several advantages, not only the ability to scale the processing capacity of the "server" without limit, but also the possibility of redundancy. If one of the servers fails to function properly, it is simply delisted from the load balancing layer server catalog, replaced, and re-enlisted, all without significant service outage or degradation.
We have already achieved a reasonably scalable design. In this design, when a user wants to send a message to another user, he/she connects to the load balancers. These balancers, according to a weighted balancing algorithm which uses several parameters such as the relative loads of the servers, connect that user with any of the individual backend servers, which will finally forward the message to the end user. However, we are lacking a place to maintain the different states of the users. For example, to which server they are connected, if they are active, if there is any message waiting to be received by a receiving user (store and forward). For this purpose, we need some centralized mechanism that stores and represents these states and general information.
Now we do have a fairly complete design, at very high level though, of the WhatsApp architecture. This database is actually probably a distributed MySQL cluster with mirrors and shards. This will be the point where the states of the users and the messages that temporarily cannot be delivered to the receiver will be persisted due to, for example, the user not being connected to the network.
The 3 most common cases of message communication will be:
1- The user who sends the message (initiator) is connected, but the one who receives it (receiver) is not. In this case, WhatsApp stores the message in its central database and then forwards it to the receiver once it connects. This technology is called store and forward. In this case we will only see the Sent tick (a single gray one).
2- The initiator is not logged in. In this case, the message is saved in a local database of the cell phone, waiting to be sent once the cell phone connects to the WhatsApp network.
3- The initiator and the receiver are both connected. In this case the message is not persisted, it is taken by the backend and immediately forwarded to the receiver.
But WhatsApp not only delivers messages, it also sends and receives multimedia content, sound, images, documents, video, etc. Is it also stored in this centralized database? The answer is no. In order to efficiently process the messages and states of billions of users, this database is extremely optimized to be as performant as possible. This means, among other things, messages of a certain maximum length and a certain structured metadata.
On the other hand, each individual server is capable of managing up to 10 million simultaneous connections, even if it seems impossible. This is due to an almost absolute knowledge of the hardware, operating system (FreeBSD), network library, Erlang, etc., by the development team, which allows them an extraordinary optimization of resources to achieve this level of performance. But performance almost always requires a concession in flexibility, or in other words, it is directly proportional to the number of hypotheses assumed. And in this case the main hypothesis assumed is the handling of relatively short and structured texts.
Therefore, for the management of multimedia content, a CDN (Content Delivery Network) is used. CDNs are networks optimized for storing and making content available, which have an architecture similar to the WhatsApp backend: they have a load balancing layer connected to a server farm, nodes, which store content in a type of storage optimized for large files, called Object Storage.
Every time an image is sent to another WhatsApp user, the initiating cell phone connects to the CDN via HTTP and uploads the file. In return, the CDN returns a hash (an identifier, which is a function of the content), which is then forwarded, through the WhatsApp backend, to the receiver. The receiver does the reverse process, connects to the CDN and requests the content associated with that hash, the CDN delivers it and the receiver downloads it via HTTP.
Now we know why after sending content to a contact, the next time we forward it to other contacts, the content is sent instantly, at least for a certain time.
The reason is that we only upload the content the first time, the following times we only send the hash, at least until the content expires on the CDN. The function of this CDN is not to persist the content forever, its main function is simply to provide the content to the receiver.
So far all very traditional, we are describing a reasonably modern standard scalable client / server platform. So what was the advantage that saved WhatsApp so much money? To see it, let’s go back to February 2009 and compare WhatsApp with, for example, Facebook Messenger.
Back then Messenger was embedded within Facebook and it was called Facebook Chat. It was one of the functionalities of Facebook and it allowed you to chat with other users of the system. Later, in 2011, Facebook launched it as an independent app from Facebook and finally, in 2014, the chat was separated from the main app and Messenger could only be used from its own app. Today, the platform has approximately 1.2 billion active users.
Messenger was born on the web, as one more feature of the social network, which stored all its information in its data cloud. Our social graphs, all our photos and those of all the users, the posts, the walls, etc. Therefore, it also stored all the Messenger conversations. This was the most logical way for several reasons, one being the fact that Facebook is a platform where users consume our content regardless of whether we are connected or not and another reason is that Facebook is pre-smartphone, it was designed to be consumed from a web browser thought more like a virtually disposable thin client.
On the other hand, WhatsApp was born native to the cell phone from day 0. In fact, its founders did it as a proof of concept of a 100% native messaging application for the new smartphone platform. So it was a viable option to consider a distributed storage, in particular the idea was: "what if we distribute all the storage of messages and content in general among the cell phones of all our users?". I have no confirmation that this question was actually asked, but I can imagine perfectly :)
This was a very fortunate design idea, since compared to its competitors, such as Messenger, it represents a significant improvement in cost efficiency and access to information, given that all access and historical storage of all conversations, audios, videos, images and content in general, is distributed among more than 2 billion cell phones, instead of being centralized in a handful of Data Centers (DC).
When Messenger accesses a message or historical multimedia content, if our cell phone does not already have it cached, or if we access it from the web APP, it must first connect to the backend of servers to search for the content. This content can be in a "hot" storage in RAM, or it can be in a "cold" storage, in a hard disk of a DC. In the case that it is a cold content (for example: a photo that has not been accessed for weeks), access to this content will be very slow, the user will have to be patient and wait watching the cell phone activity wheel spin until the information finally appears. This certainly represents a bad user experience. Not to mention if you need to access some historical content and you are in an area with no coverage. It will be impossible.
On the other hand, for Facebook, this design also has some competitive disadvantages, with respect to the WhatsApp design, since it has additional associated costs:
• Network costs. The historical content not cached or accessed through the web APP must go to Facebook, which generates data transfer costs in their DC
• Processing costs. Searching for content and the logic of processing and transferring it has significant associated costs
• Server and storage costs. A hot / cold storage network on the scale of Facebook is very expensive in terms of hardware
• Software maintenance costs. The software that manages all this centralized storage is very expensive to develop and maintain
(It could be argued that in the case of Facebook, not Skype, the same central storage system that is used to persist the social network is probably used, which would help to dilute these costs.)
When WhatsApp accesses historical information, either access to messages or multimedia content, this access is very agile, because the storage is local, all the information from our chats is on our cell phone and nowhere else. The search is being done on the device itself. This represents a better user experience for WhatsApp and enormous cost savings, since it does not need to maintain a huge central storage network or its associated software. The idea is actually brilliant: distribute all your storage among your billions of users. Everyone wins, users gain a better access experience and WhatsApp only focuses on what it really is: a very efficient, scalable and resilient network that only forwards messages between its users.
So what is the advantage of centralized storage design, like Skype or Messenger? There are two main advantages:
1- In WhatsApp the responsibility of supporting our history is ours, which represents supporting a computer, which is so uncomfortable that we never do it or buying a Cloud storage subscription, which has a recurring cost. Additionally, by having all the local storage, WhatsApp ends up occupying a significant percentage of the cell phone's storage, especially the models with 64GB storage.
2- Another advantage of centralized storage is that no matter where I connect from, be it tablet, desktop or the hotel lobby computer, I will always have access to all my content, even if my cell phone is turned off or out of coverage. This is because my history is in the cloud, not on the device.
Therefore, only one question remains: what does a user value the most, the experience of using an extremely agile system, with immediate interaction and equally immediate access to history? Or a system that takes up less storage on the cell phone and leaves me the freedom to connect to from another device, regardless of whether my cell phone is turned off or out of coverage?
I think that the success of WhatsApp validates the first option, because in addition, the disadvantages of WhatsApp's design are less and less relevant: cell phones come with more and more storage, so this problem, over time, tends to be irrelevant. And with regard to access from other devices or the Web: now we know how WhatsApp web works. The web APP connects to our cell phone and allows us to use the cell phone APP remotely. This requires the cell phone to have charge and connectivity, but this is almost always the case. For more and more people, the cell phone is already the key to their car and home.
Will these increasingly irrelevant conveniences offered by Messenger or Skype be enough to justify a billion dollar centralized storage operating cost?
The answer is probably not. But now it is too late. Once a functionality is provided, even if it is not highly valued in the long term, if it is an intrinsic part of its operation, it is very difficult to stop providing it. It would seem that Messenger will be long tied to having to keep its centralized storage.
On the other hand, for WhatsApp it must be very gratifying to look back and acknowledge having made the design decisions they made at the beginning of the project. They could have gone for the most common design shared by all the messaging solutions of their time. However, a wise design decision, at the beginning, generated a huge competitive advantage, which among other things, consolidates them today as global market leaders.