Note: this proposal is also visible in:
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-20-bullsey...
Summary: bullseye upgrades will roll out starting the first weeks of April and May, and should complete before the end of August 2022. Let us know if your service requires special handling.
# Background
Debian 11 [bullseye][] was [released on August 14 2021][]). Tor started the upgrade to bullseye shortly after and hopes to complete the process before the [buster][] EOL, [one year after the stable release][], so normally around August 2022.
In other words, we have until this summer to upgrade *all* of TPA's machine to the new release.
New machines that were setup recently have already been installed in bullseye, as the installers were changed shortly after the release. A few machines were upgraded manually without any ill effects and we do not consider this upgrade to be risky or dangerous, in general.
This work is part of the [%Debian 11 bullseye upgrade milestone][], itself part of the [OKR 2022 Q1/Q2 plan][].
# Proposal
The proposal, broadly speaking, is to upgrade all servers in three batches. The first two are somewhat equally sized and spread over April and May, and the rest will happen at some time that will be announced later, individually, per server.
## Affected users
All service admins are affected by this change. If you have shell access on any TPA server, you want to read this announcement.
## Upgrade schedule
The upgrade is split in multiple batches:
* low complexity (mostly TPA): April * moderate complexity (service admins): May * high complexity (hard stuff): to be announced separately * to be retired or rebuilt servers: not upgraded * already completed upgrades
The free time between the first two will also allow us to cover for unplanned contingencies: upgrades that could drag on and other work that will inevitably need to be performed.
The objective is to do the batches in collective "upgrade parties" that should be "fun" for the team (and work parties *have* generally been generally fun in the past).
### Low complexity, batch 1: April
A first batch of servers will be upgraded in the first week of April.
Those machines are considered to be somewhat trivial to upgrade as they are mostly managed by TPA or that we evaluate that the upgrade will have minimal impact on the service's users.
``` archive-01 build-x86-05 build-x86-06 chi-node-12 chi-node-13 chives ci-runner-01 ci-runner-arm64-02 dangerzone-01 hetzner-hel1-02 hetzner-hel1-03 hetzner-nbg1-01 hetzner-nbg1-02 loghost01 media-01 metrics-store-01 perdulce static-master-fsn submit-01 tb-build-01 tb-build-03 tb-tester-01 tbb-nightlies-master web-chi-03 web-cymru-01 web-fsn-01 web-fsn-02 ```
27 machines. At a worst case 45 minutes per machine, that is 20 hours of work. At three people, this might be doable in a day.
Feedback and coordination of this batch happens in issue [tpo/tpa/team#40690][].
### Moderate complexity, batch 2: May
The second batch of "moderate complexity servers" happens in the first week of May. The main difference with the first batch is that the second batch regroups services mostly managed by service admins, who are given a longer heads up before the upgrades are done.
``` bacula-director-01 bungei carinatum check-01 crm-ext-01 crm-int-01 fallax gettor-01 gitlab-02 henryi majus mandos-01 materculae meronense neriniflorum nevii onionbalance-01 onionbalance-02 onionoo-backend-01 onionoo-backend-02 onionoo-frontend-01 onionoo-frontend-02 polyanthum rude staticiforme subnotabile ```
26 machines. If the worst case scenario holds, this is another day of work, at three people.
Not mentioned here is the `gnt-fsn` Ganeti cluster upgrade, which is covered by ticket [tpo/tpa/team#40689][]. That alone could be a few day-person of work.
Feedback and coordination of this batch happens in issue [tpo/tpa/team#40692][]
### High complexity, individually done
Those machines are harder to upgrade, due to some major upgrades of their core components, and will require individual attention, if not major work to upgrade.
``` alberti eugeni hetzner-hel1-01 pauli ```
Each machine could take a week or two to upgrade, depending on the situation and severity. To detail each server:
* `alberti`: `userdir-ldap` is, in general, risky and needs special attention, but should be moderately safe to upgrade, see ticket [tpo/tpa/team#40693][] * `eugeni`: messy server, with lots of moving parts (e.g. Schleuder, Mailman), Mailman 2 EOL, needs to decide whether to migrate to Mailman 3 or replace with Discourse (and self-host), see [tpo/tpa/team#40471][], followup in [tpo/tpa/team#40694][] * `hetzner-hel1-01`: Nagios AKA Icinga 1 is end-of-life and needs to be migrated to Icinga 2, which involves fixing our git hooks to generate Icinga 2 configuration (unlikely), or rebuilding a Icinga 2 server, or replacing with Prometheus (see [tpo/tpa/team#29864][]), followup in [tpo/tpa/team#40695][] * `pauli`: Puppet packages are severely out of date in Debian, and Puppet 5 is EOL (with Puppet 6 soon to be). doesn't necessarily block the upgrade, but we should deal with this problem sooner than later, see [tpo/tpa/team#33588][], followup in [tpo/tpa/team#40696][]
All of those require individual decision and design, and specific announcements will be made for upgrades once a decision has been made for each service.
### To retire
Those servers are possibly scheduled for removal and may not be upgraded to bullseye at all. If we miss the summer deadline, they might be upgraded as a last resort.
``` cupani gayi moly peninsulare vineale ```
Specifically:
* cupani/vineale is covered by [tpo/tpa/team#40472][] * gayi is [TPA-RFC-11: SVN retirement][], [tpo/tpa/team#17202][] * moly/peninsulare is [tpo/tpa/team#29974][]
### To rebuild
Those machines are planned to be rebuilt and should therefore not be upgraded either:
``` cdn-backend-sunet-01 colchicifolium corsicum nutans ```
Some of those machines are hosted at a Sunet and need to be migrated elsewhere, see [tpo/tpa/team#40684][] for details. `colchicifolium` will is planned to be rebuilt in the `gnt-chi` cluster, no ticket created yet.
They will be rebuilt in new bullseye machines which should allow for a safer transition that shouldn't require specific coordination or planning.
### Completed upgrades
Those machines have already been upgraded to (or installed as) Debian 11 bullseye:
``` btcpayserver-02 chi-node-01 chi-node-02 chi-node-03 chi-node-04 chi-node-05 chi-node-06 chi-node-07 chi-node-08 chi-node-09 chi-node-10 chi-node-11 chi-node-14 ci-runner-x86-05 palmeri relay-01 static-gitlab-shim tb-pkgstage-01 ```
### Other related work
There is other work related to the bullseye upgrade that is mentioned in the [%Debian 11 bullseye upgrade milestone][].
# Alternatives considered
We have not set aside time to automate the upgrade procedure any further at this stage, as this is considered to be a too risky development project, and the current procedure is fast enough for now.
We could also move to the cloud, Kubernetes, serverless, and Ethereum and pretend none of those things exist, but so far we stay in the real world of operating systems.
Also note that this doesn't cover Docker container images upgrades. Each team is responsible for upgrading their image tags in GitLab CI appropriately and is *strongly* encouraged to keep a close eye on those in general. We may eventually consider enforcing stricter control over container images if this proves to be too chaotic to self-manage.
# Costs
It is estimates this will take one or two person-month to complete, full time.
# Approvals required
This proposal needs approval from TPA team members, but service admins can request additional delay if they are worried about their service being affected by the upgrade.
Comments or feedback can be provided in issues linked above.
# Deadline
Upgrades will start in the first week of April 2022 (2022-04-04) unless an objection is raised.
This proposal will be considered adopted by then unless an objection is raised within TPA.
# Status
This proposal is currently in the `proposed` state.
# References
* [TPA bullseye upgrade procedure][] * [%Debian 11 bullseye upgrade milestone][]
[TPA bullseye upgrade procedure]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/upgrades/bullseye/ [%Debian 11 bullseye upgrade milestone]: https://gitlab.torproject.org/groups/tpo/tpa/-/milestones/5 [bullseye]: https://wiki.debian.org/DebianBullseye [released on August 14 2021]: https://www.debian.org/News/2021/20210814 [buster]: howto/upgrades/buster [one year after the stable release]: https://www.debian.org/security/faq#lifespan [OKR 2022 Q1/Q2 plan]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/roadmap/2022 [tpo/tpa/team#40690]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40690 [tpo/tpa/team#40692]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40692 [tpo/tpa/team#40693]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40693 [tpo/tpa/team#40471]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40471 [tpo/tpa/team#29864]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/29864 [tpo/tpa/team#33588]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/33588 [tpo/tpa/team#40684]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40684 [tpo/tpa/team#40694]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40694 [tpo/tpa/team#40695]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40695 [tpo/tpa/team#40696]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40696 [tpo/tpa/team#40472]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40472 [tpo/tpa/team#17202]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/17202 [TPA-RFC-11: SVN retirement]: policy/tpa-rfc-11-svn-retirement [tpo/tpa/team#29974]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/29974 [tpo/tpa/team#40689]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40689