@@ -0,0 +1,14 @@

root = true

charset = utf-8
end_of_line = lf
indent_size = 2
indent_style = space
insert_final_newline = true
trim_trailing_whitespace = true

trim_trailing_whitespace = false
layout: default

<style type="text/css" media="screen">
.container {
margin: 10px auto;
max-width: 600px;
text-align: center;
h1 {
margin: 30px 0;
font-size: 4em;
line-height: 1;
letter-spacing: -1px;

<div class="container">

<p><strong>Page not found :(</strong></p>
<p>The requested page could not be found.</p>
<p>Tweet me <a href="">@electron0zero</a> if you think this page should be here :-)</p>
# failure-modes
list of failure modes/stories/postmortems in software systems
## failure-modes - [](
Curated list of failures in software systems, and other literature around the same topic.

### Want to learn more and contribute, see [about](/ page
### Current list, See [index](/ page

# Failure Stories

## PostgreSQL
- [Transaction ID wraparound outage at mandrill](
- [Transaction ID wraparound outage at sentry](
#### running locally
This site is built using jekyll, so you need to install it to run it locally

## Kafka
- [Kafkapocalypse: a postmortem on our service outage](
- [Stories from the Front: Lessons Learned from Supporting Apache Kafka](
- [How to Lose Messages on a Kafka Cluster - Part 1 ](

## Kubernetes
- [Compilation of public failure/horror stories related to Kubernetes](
- [10 Ways to Shoot Yourself in the Foot with Kubernetes, #9 Will Surprise You - Laurent Bernaille](

# Notable Resources

## Postmortems
- [postmortem of global Cloudflare outage](
- [postmortem of major GitHub outage](

## Talks
- [Debugging Under Fire: Keep your Head when Systems have Lost their Mind • Bryan Cantrill](
- [Bryan Cantrill - Docker in Production: Tales From the Engine Room](
- [Keynote: High Reliability Infrastructure Migrations - Julia Evans, Software Engineer, Stripe](
- ["I See What You Mean" by Peter Alvaro](
- [Orchestrated Chaos: Applying Failure Testing Research at Scale](
- [Orchestrating Chaos Applying Database Research in the Wild - Peter Alvaro](

## Research
- [Lineage-driven Fault Injection - the morning paper](
- [The Network is Reliable - the morning paper](
- [Gray failure: the Achilles’ heel of cloud-scale systems - the morning paper](

## Blog posts
- [Chaos Engineering — Review Lineage Driven Failure Injection(LDFI)](
- [I test in prod - Charity Majors](

## Projects
- [Chaos Toolkit](
- [chaos monkey - Netflix](
0. [install jekyll](
1. git clone
2. install gems `bundle install`
3. start local server `jekyll server`
{%- if page.comments != false and jekyll.environment == "production" -%}

<div id="disqus_thread"></div>
var disqus_config = function () { = '{{ page.url | absolute_url }}'; = '{{ page.url | absolute_url }}';

(function() {
var d = document, s = d.createElement('script');

s.src = 'https://{{ site.disqus.shortname }}';

s.setAttribute('data-timestamp', +new Date());
(d.head || d.body).appendChild(s);
<noscript>Please enable JavaScript to view the <a href="" rel="nofollow">comments powered by Disqus.</a></noscript>
{%- endif -%}
