Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-15963: add r tweedie package to our test images #15964

Merged
merged 50 commits into from
Feb 20, 2024

Conversation

tomasfryda
Copy link
Contributor

@tomasfryda tomasfryda commented Dec 11, 2023

@tomasfryda tomasfryda requested a review from sebhrusen December 11, 2023 14:49
@tomasfryda tomasfryda self-assigned this Dec 11, 2023
@tomasfryda tomasfryda added this to the 3.44.0.3 milestone Dec 11, 2023
sebhrusen
sebhrusen previously approved these changes Dec 11, 2023
Copy link
Contributor

@sebhrusen sebhrusen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @tomasfryda

sebhrusen
sebhrusen previously approved these changes Dec 11, 2023
sebhrusen
sebhrusen previously approved these changes Dec 11, 2023
Copy link
Contributor

@sebhrusen sebhrusen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, we really overlooked that

sebhrusen
sebhrusen previously approved these changes Dec 12, 2023
@tomasfryda tomasfryda force-pushed the 15963-add-r-tweedie-package-to-our-test-images branch 3 times, most recently from 2891293 to 8d71e58 Compare December 13, 2023 14:53
DEBIAN_FRONTEND=noninteractive apt-get install -y \
texlive \
texlive-fonts-extra \
texlive-htmlxml \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1,17 +1,20 @@
ARG FROM_VERSION
FROM harbor.h2o.ai/opsh2oai/h2o-3/dev-r-3.4.1:${FROM_VERSION}

RUN \
curl -sL https://deb.nodesource.com/setup_16.x | bash - && \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way of installing Nodejs is deprecated and the script will get removed. So I updated the way to install nodejs==16.

texinfo \
texlive-bibtex-extra \
texlive-formats-extra \
texlive-generic-extra && \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

texlive-generic-extra doesn't exist anymore. It used to be a transitional package that installed texlive-plain-generic which I added in the previous step when removing texlive-htmlxml.

See https://answers.launchpad.net/ubuntu/bionic/amd64/texlive-generic-extra/2017.20180305-1

@mn-mikke mn-mikke removed this from the 3.44.0.3 milestone Dec 20, 2023
@tomasfryda tomasfryda force-pushed the 15963-add-r-tweedie-package-to-our-test-images branch 3 times, most recently from 969b6a5 to 3b36142 Compare January 8, 2024 13:30
apt-get clean && \
rm -rf /var/cache/apt/*
rm -rf /var/cache/apt/* && \
wget "https://www.googleapis.com/download/storage/v1/b/chromium-browser-snapshots/o/Linux_x64%2F1244635%2Fchrome-linux.zip?alt=media" && \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Ubuntu 22.04, the chromium-browser deb package installs just a script that tells you to install chromium using snap, however, snap requires running systemd which is not ideal in docker images.

So we need to install chromium using some other way.

@tomasfryda tomasfryda force-pushed the 15963-add-r-tweedie-package-to-our-test-images branch 3 times, most recently from 3107348 to 1a809bd Compare January 16, 2024 12:03
RUN \
add-apt-repository -y ppa:deadsnakes && \
curl -sL https://deb.nodesource.com/setup_16.x | bash - && \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script is deprecated and doesn't work anymore

@@ -8,7 +8,7 @@ if [[ ! ${H2O_BRANCH} ]]; then
fi
if [[ ! -n ${PYTHON_VERSION} ]]; then
echo "Using default Python version"
PYTHON_VERSION='3.6'
PYTHON_VERSION='3.7'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 3.6 is not supported anymore by the ubuntu 22.04 and is installed (built from source) separately for the py-3.6 tests.

Comment on lines +118 to +173
# How to get dependencies?
#=========================
# Since we can't rely on `versions` package to get older packages (Microsoft's CRAN (MRAN) is discontinued), we have to
# install the packages in some other way. What I did was opened the dev-r-3.3.3-jdk-8 and got the installed packages by
# using the following:
#
########################################################################################################################
# # Get packages and freeze them
# write.csv(installed.packages(), "pkgs.csv")
########################################################################################################################
#
# Now that I got the packages, their versions and their dependencies, I am able to topologically sort them to ensure
# I won't try to install a package that has is missing some dependencies. Then I install it from CRAN using the above
# defined shell function `getAndInstallFromCRAN`.
#
# To make it repeatable I created the following script in R that uses `igraph` package for creating the graph representing
# packages and its dependencies that are then topologically sorted and printed out so it can be easily copy-pasted here.
# Some packages may never reach CRAN and those should be added to the exclude_pkgs vector (like the `isofor` package)
# and then installed separately.
#
########################################################################################################################
# exclude_pkgs <- c("isofor")
#
# # Load them, sort them topologically and generate the code to install them
# pkgs <- read.csv("/Users/tomasfryda/tmp/jenkins_logs/pkgs.csv")
# pkgs <- pkgs[pkgs$Priority != "base" | is.na(pkgs$Priority),]
# rownames(pkgs) <- pkgs$Package
#
# library(igraph)
#
# i2n <- setNames(pkgs$Package, seq_len(nrow(pkgs)))
# n2i <- setNames(names(i2n), i2n)
#
# g <- make_empty_graph(directed = TRUE)
# g <- add_vertices(g, nrow(pkgs), label = i2n)
#
# for (n in i2n) {
# deps <- paste0(pkgs[n, "Depends"], ", ", pkgs[n, "Imports"])
# cat(n, "->", deps, "\n")
# if (is.na(deps))
# next
# deps <- strsplit(deps, ",\\s*", perl = TRUE)[[1]]
# for (d in deps) {
# print(d)
# d <- strsplit(d, "\\s+", perl = TRUE)[[1]][[1]]
# print(d)
# if (d %in% i2n) {
# g <- add_edges(g, c(n2i[[n]], n2i[[d]]))
# }
# }
# }
#
# ts <- topo_sort(g, mode = "in")
# ts <- ts[!ts %in% n2i[exclude_pkgs]]
# cat(paste("getAndInstallFromCRAN", i2n[ts], pkgs[i2n[ts], "Version"]), sep = "\n")
########################################################################################################################
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MRAN doesn't work anymore so I needed some other way of installing old R packages. Even installing devtools is very hard for old R versions as some packages are not available for the old versions so I decided to do equivalent of pip freeze and manually resolve the dependencies and order the packages in such a way that they are installed in correct order. The little R script I have in the comment is responsible for sorting the packages in the correct order.

Copy link
Collaborator

@mn-mikke mn-mikke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tomasfryda for the change! I prefer to merge this change after the major release. The release pipeline has been moved from Jenkins master to regular workers. I would like to reduce a number of things that could break the release.

pygments==2.7.4; python_version <= '3.7'
pygments==2.15.1; python_version > '3.7'
sphinx==3.0.4; python_version <= '3.8'
pygments==2.7.4; python_version < '3.5'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line can be deleted, Python 3.5 is not currently supported

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I wasn't sure if we don't use python 2.7 for building the documentation (pygments does syntax highlighting) so I kept it here. I think we don't but could you confirm that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sphinx==3.0.4; python_version <= '3.8'
pygments==2.7.4; python_version < '3.5'
pygments==2.15.1; python_version >= '3.7'
sphinx==3.0.4; python_version < '3.5'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line can be deleted, Python 3.5 is not currently supported

@tomasfryda
Copy link
Contributor Author

I prefer to merge this change after the major release. The release pipeline has been moved from Jenkins master to regular workers. I would like to reduce a number of things that could break the release.

@mn-mikke I have mixed feelings about this. I completely agree that these changes increase the chance of breaking something during the release but this PR also contains fixes that were discovered thanks to adding the necessary packages.

There is fix for Tweedie dispersion estimation which is non-reproducible (on jenkins, locally I was not able to reproduce it) in current master and rel-3.44. And it also disabled the usage of polars and datatable by default as it can in some cases change types.

I can backport the fixes but they'd be untested in the current master due to the missing packages.

@mn-mikke
Copy link
Collaborator

mn-mikke commented Feb 1, 2024

There is fix for Tweedie dispersion estimation which is non-reproducible (on jenkins, locally I was not able to reproduce it) in current master and rel-3.44. And it also disabled the usage of polars and datatable by default as it can in some cases change types.

@tomasfryda ok, let's merge it then.

mn-mikke
mn-mikke previously approved these changes Feb 1, 2024
Copy link
Collaborator

@mn-mikke mn-mikke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

syzonyuliia
syzonyuliia previously approved these changes Feb 1, 2024
wendycwong
wendycwong previously approved these changes Feb 1, 2024
Basically, some files are not used in the final package probably due to
some non-deterministic issue in roxygen2 which fills our NAMESPACE file
and creates our R man pages.
@tomasfryda tomasfryda dismissed stale reviews from wendycwong, syzonyuliia, and mn-mikke via 73bb536 February 5, 2024 16:47
@tomasfryda tomasfryda force-pushed the 15963-add-r-tweedie-package-to-our-test-images branch from 89791f8 to 80a6a23 Compare February 7, 2024 15:16
wendycwong
wendycwong previously approved these changes Feb 12, 2024
Copy link
Contributor

@wendycwong wendycwong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tons of effort put into it. Good job and thank you!

The R medium stages didn't run tests that contained missing package so
it's understandable that the runtime there increased. Java JUnit tests
also are aborted very often in master so this PR likely doesn't cause
the issue.
@tomasfryda tomasfryda changed the base branch from rel-3.44.0 to master February 19, 2024 13:01
@tomasfryda tomasfryda merged commit 897c43f into master Feb 20, 2024
63 of 100 checks passed
@tomasfryda tomasfryda deleted the 15963-add-r-tweedie-package-to-our-test-images branch February 20, 2024 08:42
@tomasfryda tomasfryda linked an issue Feb 20, 2024 that may be closed by this pull request
poornaSavindi added a commit that referenced this pull request Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add R tweedie package to our test images
6 participants