-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve general docker interoperability #2078
Comments
I'd definitely welcome any updates or additions to the Docker vignette at https://rstudio.github.io/renv/articles/docker.html; that's where I want to collect this sort of advice. I can brain-dump some of my thoughts...
I think the general answer is "it depends", but I think you usually want the R library in the Docker container, with the cache potentially mounted externally from somewhere.
Can you elaborate for these points? Some suggestions are enumerated in the vignette, but they could surely be improved.
Can you elaborate here as well? You can use > .libPaths()
[1] "/home/kevin/scratch/example/renv/library/linux-ubuntu-noble/R-4.3/aarch64-unknown-linux-gnu"
[2] "/home/kevin/.cache/R/renv/sandbox/linux-ubuntu-noble/R-4.3/aarch64-unknown-linux-gnu/9a444a72" and this is also the default behavior for R 4.4 and newer. |
It would help to know what kinds of failures you're seeing; examples would be very helpful. |
Library in the container, I agree, this feels cleaner. (As opposed to being mounted from the project directory.) Though not ruling out special cases where this may not be the case.
Here are some problems:
$ R # Bootstrapping renv 1.1.0 --------------------------------------------------- - Downloading renv ... OK - Installing renv ... OK - Project '~/somewhere' loaded. [renv 1.1.0] This will happen every time you run the image and is annoying enough that it should be prevented. To avoid this I need to know what the full path of the library will be in the future once renv is activated, and then install renv in that location. Currently the best way I see, is to have something like this in my Dockerfile:
I run R with EDIT: See related discussion here: #1668
A viable alternative is to keep the renv cache in the docker context (ie in the same project) and then COPY it during docker build. Unless the renv cache runs into 10s of GBs (it won't) its a decent solution. This also means you need to have functionality to export the cache from the image to that folder in the docker context if you restore packages during docker build without cache.
The problem is the cache, not the library. To elaborate - I just now ran
It does not contain my distros version. That path will be the same if I run ubuntu 24, 22 or 20, likely also debian. Each of those distroes will have different versions of c libaryes used by R packages. If I install a package in one of them, and it links to those local libraries. That package then gets copied to a cache path which, as shown, is the same for many distroes and versions. If I then restore on another distro from that same cache, things will either fail during install or later when I use the package. This problem is solvable with the env variable
This one is easier - it's already discussed here: #1893 R packages that depend on system packages (eg. ubuntu packages) become uninstallable (and unrestorable) once those packages are updated. Often you dont really enforce R version either, and they tend to have rolling updates in the lifespan of distroes Im working on a github project with reference implementations that will highlight these problems and more or less hacky solutions to them. It may make disussion easier. EDIT: |
I re-ran this example, and my initial claim was wrong. The cache does contain an OS specific parameter. In my case the ubuntu distro name. This issue can probably be removed from the list. (I had RENV_PATHS_PREFIX set to an empty string when I tried it) |
I wrote up some examples of making cache work in docker here: https://github.com/torbjorn/renv-docker There are essentially two variants there:
Operations can be tucked away in scripts, yes, but I kept them in the Dockerfile for clarity. @kevinushey , in your vignette you mention multistage builds. How exactly does that help? Surely the user will barely notice how layers are composed before restore is run? Re-executing renv::restore() or not seems to be the main issue (to me at least), and multistage builds doesnt really change that? The way I preinstall renv in the project library feels like it could have been done more elegantly, perhaps by a function in renv itself? (So eg. I mentioned problems with rebuilding images "a year later", and linked to a dicsussion. That was mainly about different R versions, so not really renv/docker related. As long as you stick to the same source image, rebuilding should probably work fine enough, I may have to back paddle on that point too. |
This is what I was trying to get at in https://rstudio.github.io/renv/articles/docker.html#dynamically-provisioning-r-libraries-with-renv; in that the "best" solution in these scenarios is to have the renv cache on a mounted drive that is used and updated by containers when they are started.
Wouldn't it also suffice to run
IMHO the most straightforward solution is indeed to ensure you're calling
The simpler solution is to set: RENV_PATHS_PREFIX_AUTO = TRUE This will force |
The bit on multi-stage builds was contributed by other users, so I'm less well-equipped to comment on it. e82f56c
I think the right solution here is (or should be) |
Your
Those are the only two solutions I have meant to outline so far, either renv::restore() during build, in a RUN statement towards the end, or as the first thing being done when you run the container, using mounted cache.
I agree with this, this seems fixed in 4.4, I demonstrated that this works above also, or at least I mean to! |
I agree 100% that the two main strategies for doing restore with functional cache are either:
Point The section on multistage builds I believe shuold be removed, or moved to a section that deals with other optimizations, not related to efficient handling of the renv cache for renv::restore() . The point related to |
When setting up R in containers I repeatedly end up implementing elaborate hacks to make renv integrate seemlessly and efficiently. These are typically related to:
The renv project has been very forthcomming when I have asked for specific changes that makes a big difference to running renv in a container, but it's in a way a never ending story.
Perhaps a "docker task force" could be useful, a group that would maintain recomended reference implementations of renv functionality in docker based projects. They would stay informed by (or come from) core renv development and ideally advice on future renv development from a docker perspective. Things they'd maintain would typically include:
EDIT:
There is already this article, but in many ways it only scratches the surface of these issues:
https://rstudio.github.io/renv/articles/docker.html
The text was updated successfully, but these errors were encountered: