3rd European Chaos Engineering Day:
Wednesday 4 December 2019, 9:00 – 16:30
KTH Main Campus
Sponsored by the CASTOR Software Research Centre
Programme
- 08:30 – 09:00 Welcome coffee
- 9:00 – 9:15 Opening
- 9:15 – 10:00 Keynote “From Being Wrong(TM) to A Superpower in One Step” (Russ Miles)
- 10:00 – 10:30 Coffee break
- 10:30 – 11:30
- Decomposing for Chaos: resilience through flow-based architecture (Barry O’Reilly)
- Chaos engineering with a service mesh (Julien Bisconti)
- 11:30 – 13:30 Lunch break
- 13:30 – 14:30 Demo session
- Introduction to Royal Chaos (Long Zhang) slides
- Introduction to Chaos ToolKit (Russ Miles)
- 14:30 – 15:00 Coffee break
- 15:00 – 16:00
- Analyzing memory errors in production (Markus Weninger) slides
- Reliable Stream Processing at Scale with Apache Flink (Paris Carbone) slides
- 16:00 – 16:15 Closing
Registration website: https://www.kth.se/form/chaosengineeringday2019
Speakers
[Keynote] Russ Miles, CEO of ChaosIQ, discusses the tools and techniques he uses to turn inevitably being wrong into being successful at being wrong. Being wrong can be turned to your advantage, and Russ shares stories of how this has happened and also the challenges to look out for. Being wrong is often seen as the worst thing that can happen™, especially when you architect, build, and run business-critical applications and services. But the increased velocity of modern software development, plus the increased need for systems to be resilient, reliable, and right has increased the pressure on teams, and in particular architects, exponentially. Never before have software owners had such an opportunity, or the power, to be wrong. We need to get better at being wrong.
In this talk, he presents the AntTracks Analyzer, a memory analysis tool developed at the Johannes Kepler University Linz. Specifically, he will show how this tool provides guidance to support users to analyze memory leaks and high memory churn. The basic idea is that the tool automatically detects and highlights the most important information on the screen, explains why it is important, and which next steps are appropriate based on these findings. This way, the user is guided through the whole analysis process, enabling them to explore the root cause of a problem even without prior experience.
Paris Carbone is open source committer at the Apache Foundation and a senior computer scientist, currently serving as the leader of the “Continuous Deep Analytics” group at RISE. Paris will talk about data stream processing pipelines that involve tens to hundreds of compute instances, exchange messages and leave side effects to internal states as well as external systems (databases, logs etc.). Anything from a single partial failure (e.g., process/network channel failures) to a complete datacenter disaster is capable of producing incorrect side effects. To avoid this, Apache Flink has an underlying snapshotting mechanism that captures state changes correctly and transparently. This talk offers a rigorous overview of Flink’s state of the art error avoidance approach which has been serving hundreds of production pipelines over the last years. Paris further covers how the same mechanism can be exploited for many other useful usages beyond fault tolerance such as provenance, reconfiguration, debugging, pipeline migration and external access isolation on top of stream pipelines.
Topics
- Chaos engineering principles and tools
- Chaos monkey, monkey testing in the field
- DevOps tools and approaches
- Site reliability engineering
- Automated recovery and remediation
- Software antifragility
- Production support for monitoring distributed systems
- Automatic software repair
- Error and anomaly detection
- Chaos & cloud elasticity / scalability
- Continuous integration, testing and deployment
Practical information:
Language of the workshop: English
Meals: lunch & coffee for registered participants, wireless included 🙂
Organizing committee: Long Zhang, Martin Monperrus, Maria Berthelius