Relevant news
- 2019 Edition announced, see https://www.chaos.conf.kth.se/
- [CASTOR] Success for the 2nd Chaos Engineering Workshop
- [IDG.se] Kaos och störningar ger stabilare system – så funkar Netflix-favoriten Chaos Engineering
- [KTH] Netflix kommer till KTH
We are pleased to invite you to the 2nd European Chaos Engineering Day:
Wednesday 5 December 2018, 9:00 – 17:00
KTH Main Campus, Room V1
Overview of the program (long version below):
- 8h30 Coffee is served
- 9h00 Opening (Martin Monperrus)
- 9h30-10h30 Lorin Hochstein, Senior Software Engineer at Netflix, Distinguished keynote speaker, What I’ve learned doing chaos at Netflix slides
- 10h30 11h Break
- 11h-11h30
- Philipp Leitner, Assistant Professor at Chalmers University, Gothenburg, AWS Lambda and #serverless. What’s all the fuzz about? slides
- 11h45-13h30 Lunch buffet
- 13h30-14h00 Long Zhang, PhD student in computer science at KTH Royal Institute of Technology, A Chaos Engineering System for Live Analysis and Falsification of Exception-handling in the JVM slides
- 14h00-14h30 Nazareno V. Feito Matias, Principal engineer at Oracle, Chaos Engineering at Oracle/NSGBU slides
- 14h30-15h Break
- 15h-15h30 Benoit Baudry, Professor at KTH Royal Institute of Technology and Director of the Castor Software Research Center, Software Diversity: Building Resilient Software Communities slides
- 15h30-16h30 Breakout / discussion
- 16h30-17h Closing
Participation is free, but if the lecture hall is full, only registered people will have a seat.
Program:
- Lorin Hochstein, Senior Software Engineer at Netflix, Distinguished keynote speaker
-
Title: What I’ve learned doing chaos at Netflix
-
Abstract: Fred Brooks once observed that “the programmer, like the poet, works only slightly removed from pure thought-stuff. He builds his castles in the air, from air, creating by exertion of the imagination.” And yet, software systems fail in ways that simultaneously defy the imaginations of their creators and are strikingly similar to failures in physical systems. In this talk, I’ll share what I’ve learned about failures in distributed systems based on my experiences working at Netflix. I’ll discuss the strategies we employed for building and applying chaos engineering tools to find vulnerabilities, which strategies worked well, and which ones did not go as well as we hoped.
-
Bio: Lorin Hochstein is a Senior Software Engineer on the CORE (Cloud Operations & Reliability Engineering) Team at Netflix, where he works on ensuring that Netflix remains available. He was previously Senior Software Engineer at SendGrid Labs, Lead Architect for Cloud Services at Nimbis Services, Computer Scientist at the University of Southern California’s Information Sciences Institute, and Assistant Professor in the Department of Computer Science and Engineering at the University of Nebraska–Lincoln. He has a PhD in computer science from the University of Maryland.
- slides
-
- Philipp Leitner, Assistant Professor at Chalmers University, Gothenburg
- AWS Lambda and #serverless. What’s all the fuzz about? To some, AWS Lambda and other “serverless” technologies embody the future of cloud computing. Yet non-trivial industrial success stories are, at least today, few and far between. In this talk we will explore the idea of serverless computing, and discuss (based on recent research results) promises and challenges in industrial adoption. We’ll introduce the “serverless mindset”, and discuss what kinds of applications lend themselves well to being built on top of AWS Lambda. Finally, we will also glimpse into the future of serverless, and discuss some secret (and some not-so-secret) plans that cloud providers have with serverless.
- slides
- Benoit Baudry, Professor at KTH Royal Institute of Technology and Director of the Castor Software Research Center
- Software Diversity: Building Resilient Software Communities. This talk introduces the concept of software diversity. I illustrate it with works spanning various fields of computer science: from operating systems to build systems. All these techniques share one goal: reduce the risks related to massive quantities of software clones (that all have the same performance, siez, bugs, vulnerabilities). While chaos engineering acts as a form of vaccine to check and strengthen the health of a complex software system, diversification acts as an evolutionnary phenomenon to strengthen a community of applications.
- slides
- Nazareno V. Feito Matias, Principal engineer at Oracle
- Chaos Engineering at Oracle/NSGBU. The purpose of the talk is not tool-centric of a tool that people may not use, but on how to do chaos engineering experiments on a company that has many environments, acquired products and a plethora of different teams; and also how difficult it is to get management buy-in on chaos engineering as of now. Also things to consider when designing a chaos engineering tool/system, growing pains basically. Another thing is that a chaos engineering tool is not only shutting down servers but it can have a certain intelligence, with a backend of mathematical models, applied statistics and some machine learning. I might also demonstrate our chaos engineering tool called MadBull (similar to the ChaosMonkey but written in Python, 100% terminal/cli and towards Oracle Cloud)
- slides
- Long Zhang, PhD student in computer science at KTH Royal Institute of Technology
- A Chaos Engineering System for Live Analysis and Falsification of Exception-handling in the JVM. I will introduce a novel design and implementation of a chaos engineering system in Java called ChaosMachine. It provides a unique and actionable analysis on exception-handling capabilities in production, at the level of try-catch blocks. Then I will also share some interesting evaluations on our approach, which reveal both strengths and weaknesses of the resilience code of a software system at the level of exception handling.
- slides
The goal of the workshop is to gather the European chaos engineering community. You will see great talks, learn about chaos engineering, advanced Devops technology and meet cool people.
Giving a talk or presenting a poster
If you would like to give a talk at the 2nd European Chaos Engineering Day or to present a poster during the poster session, please send an email to european.chaos.day@gmail.com before Nov 22 2018 with:
- your talk/poster title
- a one-paragraph description
- your preferred format: poster session / 20-minute talk / 30-minute talk
Practical information:
Where: KTH Main Campus, Room V1
When: 9:00 – 17:00
Language of the workshop: English
Meals: lunch & coffee for registered participants, wireless included 🙂
Organizing committee: Long Zhang, Martin Monperrus, Maria Berthelius
Sponsor: CASTOR Software Research Centre in Stockholm
Topics:
- Chaos engineering principles and tools
- Chaos monkey, monkey testing in the field
- DevOps tools and approaches
- Site reliability engineering
- Software antifragility
- Production support for monitoring distributed systems
- Automatic software repair
- Chaos & cloud elasticity / scalability
- Continuous integration, testing and deployment
Organization:
- Chair: Martin Monperrus
- Program Committee: