Issue #107
Readings on time, why time is evil, Monolithic data lake, Abstraction, CRDTs, tracing and debugging in distributed systems, evidence based software engineering.
“Institutions will try to preserve the problem to which they are the solution.”
— Clay Shirky
Posts
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
We need to shift to a paradigm that draws from modern distributed architecture: considering domains as the first class concern, applying platform thinking to create self-serve data infrastructure, and treating data as a product. - #martinfowler
This article is about a certain aspect missing in today’s mainstream programming languages and systems. I bumped on this idea while reading Alan Kay’s writing about making the difference between mutable and immutable data “moot” in the context of FP vs. OOP by bringing in the concept of managed time. - #prabros
Thoughts on why CRDT didn't work out as well for collaborative editing xi-editor
This reply is going to be scoped to the CRDT only. I have been meaning to write a larger retrospective of xi-editor; consider this the CRDT portion of that. - #github
For Distributed Teams, Code Craft is Critical
Just as distributed systems amplify every design flaw, turning what would be a headache in a monolith into a major outbreak in a service-oriented architecture, distributed working amplifies team dysfunctions as the communication pathways take on extra weight. - #codemanship
Distributed Web and the InterPlanetary File System
The world wide web is roughly 25 years old, and while much has evolved, the core mechanism has stayed the same. Some say the InterPlanetary File System (IPFS), a form of distributed web begun in 2014 by Protocol Labs, may be the answer. - #medium
A Distributed Tracing Adventure in Apache Beam
They say that a picture is worth a thousand words, but in the world of distributed systems, a picture can easily be worth a thousand hours. While I can't promise you that this post will in any way save you a thousand hours, I hope that you find value in the thought process that I explored when introducing tracing and visibility into an Apache Beam pipeline. - #rion
Byte Down: Making Netflix’s Data Infrastructure Cost-Effective
Our efficiency approach, therefore, is to provide cost transparency and place the efficiency context as close to the decision-makers as possible. Our highest leverage tool is a custom dashboard that serves as a feedback loop to data producers and consumers — it is the single holistic source of truth for cost and usage trends for Netflix’s data users. This post details our approach and lessons learned in creating our data efficiency dashboard. - #netflixtechblog
When the abstraction is wrong, the fastest way forward is back. This is not retreat, it's advance in a better direction. Do it. You'll improve your own life, and the lives of all who follow. - #sandimetz
Book
Evidence-based Software Engineering
This book discusses what is currently known about software engineering, based on an analysis of all the publicly available data. This aim is not as ambitious as it sounds, because there is not a great deal of data publicly available.
The intent is to provide material that is useful to professional developers working in industry; until recently researchers in software engineering have been more interested in vanity work, promoted by ego and bluster.
The material is organized in two parts, the first covering software engineering and the second the statistics likely to be needed for the analysis of software engineering data.
Videos
Peter van Roy - KEYNOTE Why time is evil in distributed systems