The Infoq Podcast

Failure As a Means to Build Resilient Software Systems: A Conversation with Lorin Hochstein

Informações:

Sinopsis

In this podcast Michael Stiefel spoke to Lorin Hochstein about how real-world failures provide insight into how software systems actually work. Our first topic was understanding that while automated fault injection tools can introduce basic robustness into a system, they cannot replicate the understanding that comes from mitigating complicated software failures in the real world. We then pondered how do we get this information to software architects so that they can learn from failure. Ironically, in reliable systems, adding more reliability can often lead to complexity which can lead to new failures. We often focus on making our systems robust against known failure patterns, but we have not learnt how to make software systems resilient to unknown failure modes, or failures due to changes in the external world or the evolving system design. Read a transcript of this interview: https://bit.ly/3NVhtf3 Subscribe to the Software Architects’ Newsletter for your monthly guide to the essential news and experience