Issue #110
Scaling legacy DB, Patterns of distributed systems, Compartmentalize MultiPaxos, Observability and Monitoring, Mastering Chaos, On call culture . . .
"Programming is not about typing, it's about thinking."
— Rich Hickey
Posts
How does Dream11 serve millions of Personalised Contests in a match
Different types of contests allow users to compete and win big, thereby increasing their engagement. Creating customised or personalised contests are a prime component for best user experience. With hundreds of highly personalised contests running on the platform simultaneously, meeting every user’s requirements, it is inevitable to encounter several challenges during the entire process. To keep these contests running smoothly, we play an entirely different ball game at the backend. - #medium #D11Engg
It’s the Startup Life for Us: Scaling in a High-Growth Environment
We knew this was going to be an issue that we had to face but made the conscious decision to delay investing in a solution until the issue became more urgent. The issue became urgent when we only had enough runway to increase the number of mailboxes being protected by approximately 25% with a projected mailbox growth of 100% in the two months that followed. -#medium #abnormal-security-engineering-blog
Cluster nodes need exclusive access to certain resources. But nodes can crash; they can be temporarily disconnected or experiencing a process pause. Under these error scenarios, they should not keep the access to a resource indefinitely. -#martinfowler
The many lies about reducing complexity part 2: Cloud
The suggestion is this: as we move away from our own IT ‘on-premises’ (which includes whatever co-hosting data center you use, in. this context it just means you own your own IT hardware) to more and more in the cloud, we are outsourcing more and more, we are responsible for less, our life simplifies. More cloud is cheaper, simpler and more flexible. What is there not to like? What there is not to like is that this suggestion is for a large part a lie. And a nasty one. -#ea #rna
Stealing Your Private YouTube Videos, One Frame at a Time
The takeaway from this bug is that situations where two different products interact with each other under the hood are always a good area to focus on, since both product teams probably only know their own systems best, and might miss important details when working with a different product’s resources. -#bugs #xdavidhu
Your legacy database is outgrowing itself
We’ve got over 4M unique daily users and over 7B queries hitting all our MySQL databases combined. Went from under 1M unique users a day a year ago to 1.3M in March, to over 4M at the moment, with over 8M games played each day. I know it’s no where near the biggest players on the market, but our experience could help you “fix” your monolith database, and scale it to new heights. -#unstructed #tech
Building On-Call Culture at GitHub
Insights on how GitHub created a culture of on call given the size of their monolith codebase. It's pretty daunting to be on call where you're unfamiliar with majority of the codebase. They include some nice tips on how they overcome it. -#github [1]
Uber’s Real-Time Push Platform
Over the next year and a half, the push platform saw tremendous adoption across the company. At peak, this system pushed over 70,000 QPS push messages per second to three different types of apps by maintaining up to 600,000 concurrent streaming connections. This system quickly became the most integral part of the server client API infrastructure. -#eng #uber [1]
Grakn 2.0 Alpha: best practices of Distributed Systems and Computer Science
And so we took the best of our design, engineering, mathematics, and computer science, from our work over the past five years — just the parts that we know our community loved — and we re-wrote Grakn from scratch. We rebuilt Grakn with an entirely new architecture of best practices in Distributed Systems and Computer Science. -#blog #grakn
How to Avoid Coupling in Microservices Design
In this article, I am going to focus on the importance of loose coupling as a design principle for microservices. I will give examples of poor design decisions that violate loose coupling and lead to distributed monoliths. Life is already difficult. Why would you make it even harder? That’s why I am going to point out some common design mistakes that by avoiding, will make for a much smoother transition to a microservices architecture. -#medium #capital-one-tech
[1]
Paper
Scaling Replicated State Machines with Compartmentalization
In this paper, we demonstrate how to compartmentalize MultiPaxos to increase its throughput by 6× on a write-only workload and 16× on a mixed read-write workload. Unlike other approaches, we achieve this performance without the need for specialized hardware. Second, compartmentalization is a technique, not a protocol. Industry practitioners can apply compartmentalization to their protocols incrementally without having to adopt a completely new protocol .-#mwhittaker #github
CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data
We have developed CRUSH, a scalable pseudorandom data distribution function designed for distributed object-based storage systems that efficiently maps data objects to storage devices without relying on a central directory. Because large systems are inherently dynamic, CRUSH is designed to facilitate the addition and removal of storage while minimizing unnecessary data movement. The algorithm accommodates a wide variety of data replication and reliability mechanisms and distributes data in terms of userdefined policies that enforce separation of replicas across failure domains. -#ceph
Books
The InfoQ eMag: Managing Observability, Resilience, and Complexity within Distributed Systems
Managing Observability, Resilience, and Complexity within Distributed Systems -#infoq
ALGORITHMS FOR DECISION MAKING
Accounting for these sources of uncertainty and carefully balancing the multiple objectives of the system can be very challenging. We will discuss these challenges from a computational perspective, aiming to provide the theory behind decision-making models and computational approaches. -#algorithmsbook
Videos
Mastering Chaos - A Netflix Guide to Microservices
Moving beyond request-reply: how smart APIs are different - Bernd Rücker (EN) | JCON 2020
Notes
[1] Curated and submitted by @sitaramshelke