Issue #89

Curated list of blogs, videos, papers, podcasts on programming and distributed systems.

Mar 07, 2020

“Yesterday I was clever, so I wanted to change the world. Today I am wise, so I am changing myself.”
― Rumi

99th Percentile Latency at Scale with Apache Kafka

In Kafka terms, data delivery time is defined by end-to-end latency—the time it takes for a record produced to Kafka to be fetched by the consumer. Latency objectives are expressed as both target latency and the importance of meeting this target. For instance, your latency objective could be expressed as: “I would like to get 99th percentile end-to-end latency of 50 ms from Kafka.” - #confluent #blog

Multi-Runtime Microservices Architecture

Creating good distributed applications is not an easy task: such systems often follow the 12-factor app and microservices principles. They have to be stateless, scalable, configurable, independently released, containerized, automatable, and sometimes event-driven and serverless. Once created, they should be easy to upgrade and affordable to maintain in the long term. Finding a good balance among these competing requirements with today’s technology is still a difficult endeavor. - #infoq

When Bloom filters don't bloom

During this journey we learned about random memory access latency, and the power of cache friendly data structures. Fancy data structures are exciting, but in practice reducing random memory loads often brings better results. - #cloudflare #blog

How Netflix uses Druid for Real-time Insights to Ensure a High-Quality Experience

With this data arriving at over 2 million events per second, getting it into a database that can be queried quickly is formidable. We need sufficient dimensionality for the data to be useful in isolating issues and as such we generate over 115 billion rows per day. At Netflix we leverage Apache Druid to help tackle this challenge at our scale. - #netflixtechblog #blog

Kafka Needs No Keeper

Colin McCabe talks about the ongoing effort to replace the use of Zookeeper in Kafka: why they want to do it and how it will work. He discusses the limitations they have found and how Kafka benefits both in terms of stability and scalability by bringing consensus in house. He talks about their progress, what work is remaining, and how contributors can help. - #infoq

The core memory inside a Saturn V rocket's computer

The Launch Vehicle Digital Computer (LVDC) had a key role in the Apollo Moon mission, guiding and controlling the Saturn V rocket. Like most computers of the era, it used core memory, storing data in tiny magnetic cores. - #righto

A successful Git branching model

In this post I present the development model that I’ve introduced for some of my projects (both at work and private) about a year ago, and which has turned out to be very successful - #nvie

How to Make Yourself Into a Learning Machine

So he built an elaborate system to read, retain, and apply the lessons in hundreds of books. And he didn’t just read about infrastructure — he read literature, and scientific history; he read about politics, and philosophy. - #superorganizers #substack

A new Go API for Protocol Buffers

The google.golang.org/protobuf module is a major overhaul of Go's support for protocol buffers, providing first-class support for reflection, custom message implementations, and a cleaned up API surface. We intend to maintain the previous API indefinitely as a wrapper of the new one, allowing users to adopt the new API incrementally at their own pace. - #golang #blog

Paper

Fuzzing Error Handling Code using Context-Sensitive Software Fault Injection

In this paper, we propose a new fuzzing framework named FIFUZZ, to effectively test error handling code and detect bugs. The core of FIFUZZ is a context-sensitive software fault injection (SFI) approach, which can effectively cover error handling code in different calling contexts to find deep bugs hidden in error handling code with complicated contexts. We have implemented FIFUZZ and evaluated it on 9 widelyused C programs. It reports 317 alerts which are caused by 50 unique bugs in terms of the root causes. 32 of these bugs have been confirmed by related developers. We also compare FIFUZZ to existing fuzzing tools (including AFL, AFLFast, AFLSmart and FairFuzz), and find that FIFUZZ finds many bugs missed by these tools. We believe that FIFUZZ can effectively augment existing fuzzing approaches to find many real bugs that have been otherwise missed. - #umn #kjlu #edu

How Did Software Get So Reliable Without Proof?

By surveying current software engineering practice, this paper reveals that the techniques employed to achieve reliability are little different from those which have proved effective in all other branches of modern engineering: rigorous management of procedures for design inspection and review; quality assurance based on a wide range of targeted tests; continuous evolution by removal of errors from products already in widespread use; and defensive programming, among other forms of deliberate over-engineering. Formal methods and proof play a small direct role in large scale programming; but they do provide a conceptual framework and basic understanding to promote the best of current practice, and point directions for future improvement. - #gwern #hoare

Distributed Systems Newsletter

Discussion about this post