Summary: I deployed a new GitLab CI runner backed by Podman instead of Docker, we hope it will improve the stability and our capacity at building images, but I need help testing it.
# Background
We've been having [stability issues][] with the Docker runners for a while now. We also started looking again at container image builds, which are currently failing without Kaniko.
[stability issues]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41295 [ships instructions on how to build containers inside Podman]: https://docs.gitlab.com/runner/executors/docker.html#use-podman-to-build-con...
# Proposal
## Testers needed
I need help testing the new runner. Right now it's marked as not running "untagged jobs", so it's unlikely to pick your CI jobs and run them. It would be great if people could test the new runner.
See the [GitLab tag documentation][] for how to add tags to your configuration. It's basically done by adding a `tags` field to the `.gitlab-ci.yml` file.
Note that in TPA's [ci-test gitlab-ci.yaml file][], we use a `TPA_TAG_VALUE` variable to be able to pass arbitrary tags down into the jobs without having to constantly change the .yaml file, which might be a useful addition to your workflow.
The tag to use is `podman`.
You can send any job you want to the `podman` runner, but we'd like to test a broad variety of things before we put it in production, but especially image buildings. Upstream even has a [set of instructions to build packages inside podman][].
[ci-test gitlab-ci.yaml file]: https://gitlab.torproject.org/tpo/tpa/ci-test/-/blob/main/.gitlab-ci.yml [GitLab tag documentation]: https://docs.gitlab.com/ee/ci/yaml/#tags [set of instructions to build packages inside podman]: https://docs.gitlab.com/runner/executors/docker.html#use-podman-to-build-con...
## Long term plan
If this goes well, we'd like to converge towards using `podman` for all workloads. It's better packaged in Debian, and better designed, than Docker. It also allows us to run containers as non-root.
That, however, is not part of this proposal. We're already running Podman for another service (MinIO) but we're not proposing to *convert* all existing services to `podman`. If things work well enough for a long enough period (say 30 days), we might turn off the older Docker running instead.
# Alternatives considered
To fix the stability issues in Docker, it might be possible to upgrade to the latest upstream package and abandon the packages from Debian.org. We're hoping that will not be necessary thanks to Podman.
To build images, we could create a "privileged" runner. For now, we're hoping Podman will make building container images easier. If we do create a privileged runner, it needs to take into account the long term [tiered runner approach](https://gitlab.torproject.org/tpo/tpa/team/-/issues/41044).
# Deadline
The service is already available, and will be running untagged jobs in two weeks unless an objection is raised.
# Status
This proposal is currently in the `proposed` state.
# References
Feedback can be provided in the [discussion issue][].
[discussion issue]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41296
On 2023-08-16 13:32:17, Antoine Beaupré wrote:
[...]
## Testers needed
I need help testing the new runner. Right now it's marked as not running "untagged jobs", so it's unlikely to pick your CI jobs and run them. It would be great if people could test the new runner.
See the [GitLab tag documentation][] for how to add tags to your configuration. It's basically done by adding a `tags` field to the `.gitlab-ci.yml` file.
Note that in TPA's [ci-test gitlab-ci.yaml file][], we use a `TPA_TAG_VALUE` variable to be able to pass arbitrary tags down into the jobs without having to constantly change the .yaml file, which might be a useful addition to your workflow.
The tag to use is `podman`.
You can send any job you want to the `podman` runner, but we'd like to test a broad variety of things before we put it in production, but especially image buildings. Upstream even has a [set of instructions to build packages inside podman][].
Update on this: I added the `tpa` tag earlier this week so that the runner would pick up our nightly test jobs. Today I've also added the `amd64` tag to unblock a test pipeline nickm gracefully sent our way.
I'm happy to announce that both tests are doing well and we're on track to enabling the runner to run all jobs normally this coming Wednesday.
Also note that I did an extensive amount of work on the GitLab CI dashboard, which now also features queue wait times:
https://grafana.torproject.org/d/fd0b2fb2-88d0-4f85-bc86-16164c083b51/gitlab...
user: tor-guest, no password, as usual.
That should allow you to answer the question of "is it just me or CI is taking forever to pick up my job". For the last two days we had those stats, all jobs get picked up within one minute of being queued.
Feedback is, as usual, welcome, either here or:
https://gitlab.torproject.org/tpo/tpa/team/-/issues/41296
Thank you for your attention!
a.
On 2023-08-16 13:32:17, Antoine Beaupré wrote:
Summary: I deployed a new GitLab CI runner backed by Podman instead of Docker, we hope it will improve the stability and our capacity at building images, but I need help testing it.
Reminder: if there are no objections, the podman runner will be online for all jobs *tomorrow*.
As an aside, GitLab will soon run it's 100th *thousand* pipeline, so I guess it's a good way to celebrate, with a fresh new runner! :)
a.
On 2023-08-29 10:35:42, Antoine Beaupré wrote:
On 2023-08-16 13:32:17, Antoine Beaupré wrote:
Summary: I deployed a new GitLab CI runner backed by Podman instead of Docker, we hope it will improve the stability and our capacity at building images, but I need help testing it.
Reminder: if there are no objections, the podman runner will be online for all jobs *tomorrow*.
As an aside, GitLab will soon run it's 100th *thousand* pipeline, so I guess it's a good way to celebrate, with a fresh new runner! :)
The podman runner (with the cute name of `ci-runner-x86-02`) is now live and accepts "untagged" jobs or jobs tagged with one of: kvm, linux, debug-terminal, x86_64, x86-64, 16 CPU, 94.30 GiB, amd64, podman, tpa.
Please do notify us if any (unusual) problem occurs. We'll also be monitoring this through the Grafana dashboard:
https://grafana.torproject.org/d/fd0b2fb2-88d0-4f85-bc86-16164c083b51/gitlab...
user: tor-guest, no password
Also, according to that dashboard, we're grossly over capacity now, which means either two things:
1. we need to retire a runner 2. YOU need to RUN MORE CI! :)
I tend towards the latter...
Also remember that you can bring your own runners, any computer can be repurposed into a GitLab runner to pick up your more exotic jobs in all ways imaginable. See our instructions for that at:
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/ci#registering-yo...
Have a good day and thanks for flying TPA!
A.
tor-project@lists.torproject.org