forked from broadinstitute/cromwell
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Document a general-purpose recovery process (broadinstitute#4991)
- Loading branch information
1 parent
765d4e9
commit 937cb05
Showing
14 changed files
with
100 additions
and
31 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
# Documented Processes | ||
|
||
This directory contains a selection of processes which are: | ||
|
||
* Best expressed as a `.dot` chart | ||
* Still manual for now | ||
* Under source control to make edits easy yet reviewable (just like code!) | ||
|
||
## How to update these processes | ||
|
||
Do you have a better idea about how any of these processes should work? | ||
Make a PR and it'll be reviewed, just like a code change! | ||
|
||
* Modify the appropriate `.dot` file(s) | ||
* Navigate to the `processes` directory | ||
* Run `refresh.sh` to update the png files. | ||
* Add and commit the changed `.dot` and `.png` files to git | ||
* Submit a PR for the change to be reviewed - and hopefully adopted! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
#!/usr/bin/env bash | ||
|
||
while IFS= read -r -d '' file | ||
do | ||
echo "Rendering graph ${file} into ${file}.png" | ||
dot -Tpng -o "$file.png" "$file" | ||
done < <(find . -name "*.dot" -print0) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Release Processes | ||
|
||
## How to update these processes | ||
|
||
Have a better idea about how the deployment processes should work? | ||
See our "updating the process" [process](../README.MD)! | ||
|
||
## How to Release Cromwell | ||
|
||
data:image/s3,"s3://crabby-images/665d3/665d3b4d770ff744be50ece782ec3ec82c52c7ba" alt="release-cromwell-version" | ||
|
||
## How to Deploy Cromwell releases in Firecloud | ||
|
||
data:image/s3,"s3://crabby-images/4a12f/4a12ff4f4ee49d398fb9529e89e1f3ea47db7e45" alt="firecloud-develop" | ||
|
||
|
||
## How to Deploy Cromwell in CAAS prod | ||
|
||
data:image/s3,"s3://crabby-images/3c2d0/3c2d0c7074616db36a24ce138422a615a2a13191" alt="caas-prod" | ||
|
File renamed without changes.
File renamed without changes
File renamed without changes.
Binary file renamed
BIN
+170 KB
...lease_processes/firecloud-develop.dot.png → ...lease_processes/firecloud-develop.dot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes.
File renamed without changes
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# Production Troubleshooting Processes | ||
|
||
**Note:** These processes contain shorthand descriptions for various tasks. | ||
If you aren't sure how to achieve any of these steps, look for the details in | ||
the [Cromwell playbook](https://docs.google.com/document/d/1_iRESDzuCgPTOPJnTYxTncIqJU8B1IFWarypDe3gbCY). | ||
|
||
## General Purpose Fallback Process | ||
|
||
* Have you run through the end of the playbook suggestions and not found anything which fixes the issue? | ||
* Do you just want the problem to go away so that you can get back to sleep as quickly as possible? | ||
|
||
This is a (near-) foolproof series of steps to bring Cromwell back into a good state as quickly as | ||
possible if something weird is happening in Cromwell and you don't know why. It also leaves any offending | ||
workflows from a problem-causing submission in the database in a recoverable state for when the issue is resolved. | ||
|
||
data:image/s3,"s3://crabby-images/eb530/eb530d814365c8ca61c45fc16a415df2bad5a542" alt="all-purpose-mess-remover" | ||
|
||
## How to update these processes | ||
|
||
Have a better idea about how the troubleshooting processes should work? | ||
See our "updating the process" [process](../README.MD)! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
digraph { | ||
|
||
# Nodes | ||
|
||
something_wrong [shape=oval label="Something went wrong with Cromwell and I can't 'fix' it.\nI just want it to go away and restore service."]; | ||
|
||
# Always start with a restart: | ||
restart_cromwell_instance [shape=oval label="Restart Cromwell's 'writer' instance"]; | ||
|
||
determine_time [shape=oval label="Determine what time things started going wrong"]; | ||
determine_submissions_of_interest [shape=oval label="Determine a submission of interest from around that time"]; | ||
|
||
place_submissions_on_hold [shape=oval label="Place all workflows from that submission on hold in the database"]; | ||
|
||
|
||
go_to_sleep [shape=oval label="Great!\nYour work here is done."]; | ||
|
||
{ rank=max go_to_sleep } | ||
|
||
|
||
# Edges | ||
|
||
something_wrong -> restart_cromwell_instance | ||
|
||
restart_cromwell_instance -> go_to_sleep [label="That worked!"] | ||
|
||
restart_cromwell_instance -> determine_time [label="The problem persists"] | ||
determine_time -> determine_submissions_of_interest | ||
determine_submissions_of_interest -> place_submissions_on_hold | ||
|
||
place_submissions_on_hold -> restart_cromwell_instance | ||
|
||
|
||
} |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.