Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run db:migrate in pre- and post-install/upgrade (#18, #26) #37

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

angdraug
Copy link

No description provided.

angdraug and others added 3 commits January 15, 2023 15:30
As recommended in Mastodon release notes, run db:migrate with
SKIP_POST_DEPLOYMENT_MIGRATIONS=true in pre-install and pre-upgrade
hooks, and again without the flag in post-install and post-upgrade.

Co-authored-by: Sheogorath <[email protected]>
pre-install and pre-upgrade hooks run before the persistent ConfigMap
resources are installed. As suggested in helm/helm#8694, create a hook
with lower hook-weight and resource-policy=keep to make the same
ConfigMap available in pre- hooks.
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "mastodon.fullname" . }}-env
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need a second configmap with the same name and data?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because Helm doesn't create the other configmap until after the pre-install hooks, and deletes this one after running pre-install hooks. It's a catch-22: for the mastodon-env configmap to be available when db-migrate job runs, it has to be created as a hook with a lower weight, but since it's created as a hook it gets cleaned up after db-migrate is done, so we also have to keep the non-hook version of the same configmap. See also commit message in a749654.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i didn't realize that! thanks for the clarification.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some additional clarification: this is only about the first time the chart gets installed after that it will be there.

@angdraug
Copy link
Author

Not sure if it shows up for you, but GitHub tells me:

First-time contributors need a maintainer to approve running workflows. Learn more.

I don't think this PR is going to land until someone does the needful.

@paolomainardi
Copy link
Contributor

paolomainardi commented Jan 27, 2023

Please can you release this PR? It seems ok. Otherwise, this chart cannot be used within a highly automated environment with Terraform + Helm; the helm installation never ends, and the migration job is never triggered by Helm, making it impossible to use it for a new fresh install.

cc @dunn

@dunn
Copy link
Contributor

dunn commented Jan 27, 2023

I'm not actually a maintainer of this repo, so I can't merge.

@paolomainardi
Copy link
Contributor

I'm not actually a maintainer of this repo, so I can't merge.

Ops, so sorry, I saw your comments and just thought you was a maintainer too.

@paolomainardi
Copy link
Contributor

paolomainardi commented Jan 29, 2023

Just tried this PR, and it doesn't work, the migrate job requires PVC already created otherwise the job cannot be executed.

The question is, does the migration job requires the PVC ?

@paolomainardi
Copy link
Contributor

I tried again using a bucket instead of PVC, and now the problem is with the required redis instance, which should be up and running to finish the job.

@paolomainardi
Copy link
Contributor

I tried running the migration job along with the other deployments, and it worked fine; it is the same approach the GitLab chart is taking. The concept is to let the scheduler restart the services until the migration job finishes to initialize the services; once finished, the pods start to come up and work fine.

Gitlab migration job used as a reference: https://docs.gitlab.com/charts/charts/gitlab/migrations/

@renchap
Copy link
Member

renchap commented Feb 17, 2023

Thanks for your work on this @paolomainardi!

I tried running the migration job along with the other deployments, and it worked fine; it is the same approach the GitLab chart is taking. The concept is to let the scheduler restart the services until the migration job finishes to initialize the services; once finished, the pods start to come up and work fine.

How would this work for version upgrades? The pre-upgrade migrations needs to be run before any of the new version application pods, otherwise those can generate errors on some requests (when trying to access a table that has not been migrated yet), while their /health endpoint returns OK (it does not check the schema version).

I worry it will create user-facing errors during the migration, or even a server to become unavailable if the migration does not happen and all pods are upgraded to the new version.

@paolomainardi
Copy link
Contributor

@renchap yes, you're right; the issue with this approach is that users can face problems while migrations are running.

This issue can only be overcome by just using Helm, and the best choice is to avoid running them as is doing this chart and move most of the complexity to the application side.

Always looking how Gitlab does, they open sourced the database migration types they support: https://docs.gitlab.com/ee/development/migration_style_guide.html

The case for Mastodon is "Regular migrations" which according to their document must be always under 3 minutes if higher than must be moved on post-deployment or background migrations.

Is not very clear indeed, what it happens during the 3-minutes window, maybe the migrations are always written in a way that prev/next releases are always compatible.

This is migration helm chart documentation: https://docs.gitlab.com/charts/charts/gitlab/migrations and from my direct experience, they run along the other deployments.

@jessebot
Copy link
Contributor

jessebot commented Jul 4, 2023

is there any chance this could be moved forward?

@timetinytim
Copy link
Contributor

Apologies for the long delay in getting to this...

Generally I like how this is done, including the generalization of the deploy job into pre & post runs. Though I do want to clarify something first.

I tried running the migration job along with the other deployments, and it worked fine; it is the same approach the GitLab chart is taking. The concept is to let the scheduler restart the services until the migration job finishes to initialize the services; once finished, the pods start to come up and work fine.

@paolomainardi This job is using pre/post-upgrade/install helm hooks, which if I'm not mistaken, means that the pre- job is run first before anything is installed/updated, THEN resources are added/changed, THEN the post- job is run. The scheduler shouldn't be restarting anything while the pre-migration is running. Is this something you saw while trying this yourself?

@paolomainardi
Copy link
Contributor

paolomainardi commented Feb 10, 2025

@timetinytim, thanks for your reply; no worries about the long delay.

It should work like that once the migrations are running. A pod is created as part of the helm hooks; once finished, the rest will be applied, such as updating the Kubernetes deployment with the mastodon container image that will trigger the scheduler to do something.

I have not been managing a Mastodon K8S instance lately, but I can give you some tips.

@timetinytim
Copy link
Contributor

@paolomainardi Alright cool, I just wanted to confirm. That's the behavior we expect. As long as pre-migrations run before the new pods come up, that will hopefully prevent any user-facing errors during an update.

Let me do some testing on my own side on this, and if everything looks good, I'll get this approved/merged in the next day or two.

@timetinytim
Copy link
Contributor

I've been doing some testing, and I've come to a conclusion:

The idea of a pre/post-hood ConfigMap isn't a bad idea, but at the end of the day the job only needs a couple pieces of information: The DB connection info, and the Redis connection info. There are also situations where the former would be different for the migration job (i.e. using a connection pooler, which wouldn't work with migrations).

What I'm suggesting is, rather than create the configmap twice, we just add the env vars it needs directly. It solves for the case mentioned above, and simplifies it a little bit.

@angdraug, let me know what you think. Given how long it's taken me to look at this I totally understand if you've moved on from this, so no worries if that's the case. I'll just pick this up myself.

@WyriHaximus
Copy link
Contributor

What I'm suggesting is, rather than create the configmap twice, we just add the env vars it needs directly. It solves for the case mentioned above, and simplifies it a little bit.

Speaking from experience, this will work perfectly fine. And is the simplest solution tbh

@angdraug
Copy link
Author

I have indeed moved on from this, please feel free to take over, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants