-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interactive docker image #709
base: main
Are you sure you want to change the base?
Changes from 1 commit
d2d621a
13b4283
f805a97
53c4c54
3e2de92
7f85c97
873fa34
d00509b
afd854e
33dc9f4
dbef269
7fb707f
3e434b6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,31 +2,24 @@ FROM jupyter/base-notebook:ubuntu-18.04 | |
|
||
ARG DOTNET_CORE_VERSION=3.1 | ||
|
||
ENV DOTNET_CORE_VERSION=$DOTNET_CORE_VERSION | ||
ENV PATH="${PATH}:${HOME}/.dotnet/tools" | ||
|
||
ENV DOTNET_RUNNING_IN_CONTAINER=true \ | ||
ENV DOTNET_CORE_VERSION=$DOTNET_CORE_VERSION \ | ||
PATH="${PATH}:${HOME}/.dotnet/tools" \ | ||
DOTNET_RUNNING_IN_CONTAINER=true \ | ||
DOTNET_USE_POLLING_FILE_WATCHER=true \ | ||
NUGET_XMLDOC_MODE=skip \ | ||
DOTNET_TRY_CLI_TELEMETRY_OPTOUT=true | ||
NUGET_XMLDOC_MODE=skip | ||
|
||
USER root | ||
|
||
RUN apt-get update \ | ||
&& apt-get install -y --no-install-recommends \ | ||
apt-utils \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is requiring all of these native dependencies? Several are already provided by the base image so they don't seem necessary to declare. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be cleaned up now. Java obviously is required by spark. |
||
bash \ | ||
dialog \ | ||
libc6 \ | ||
libgcc1 \ | ||
libgssapi-krb5-2 \ | ||
libicu60 \ | ||
libssl1.1 \ | ||
libstdc++6 zlib1g \ | ||
openjdk-8-jdk \ | ||
software-properties-common \ | ||
unzip \ | ||
&& wget -q --show-progress --progress=bar:force:noscroll https://packages.microsoft.com/config/ubuntu/18.04/packages-microsoft-prod.deb -O packages-microsoft-prod.deb \ | ||
&& wget -q https://packages.microsoft.com/config/ubuntu/18.04/packages-microsoft-prod.deb -O packages-microsoft-prod.deb \ | ||
&& dpkg -i packages-microsoft-prod.deb \ | ||
&& add-apt-repository universe \ | ||
&& apt-get install -y apt-transport-https \ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,28 +4,31 @@ FROM dotnet-spark-base-interactive:$DOTNET_SPARK_VERSION | |
ARG SPARK_VERSION=3.0.1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Having version numbers like this hard coded gives me pause. Is this done so that the Dockerfile as it is checked in is buildable without having to specify any args? The problem that introduces is a maintenance burden of keeping it up-to-date. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is related to my earlier point about the purpose of the Dockerfile(s). The intention was to have a build-able Dockerfile even if the build script is not used. I agree with your observation about maintenance. Maybe @rapoth has a view on that. |
||
ARG DOTNET_SPARK_JAR="microsoft-spark-3-0_2.12-$DOTNET_SPARK_VERSION" | ||
|
||
ENV DAEMON_RUN=true | ||
ENV SPARK_VERSION=$SPARK_VERSION | ||
ENV SPARK_HOME=/spark | ||
|
||
ENV HADOOP_VERSION=2.7 | ||
ENV PATH="${SPARK_HOME}/bin:${DOTNET_WORKER_DIR}:${PATH}" | ||
ENV DOTNETBACKEND_PORT=5567 | ||
ENV JUPYTER_ENABLE_LAB=true | ||
ENV DAEMON_RUN=true \ | ||
DOTNETBACKEND_PORT=5567 \ | ||
HADOOP_VERSION=2.7 \ | ||
JUPYTER_ENABLE_LAB=true \ | ||
SPARK_VERSION=$SPARK_VERSION \ | ||
SPARK_HOME=/spark \ | ||
PATH="${SPARK_HOME}/bin:${DOTNET_WORKER_DIR}:${PATH}" | ||
|
||
USER root | ||
|
||
COPY bin/* /usr/local/bin/ | ||
|
||
COPY *.ipynb ${HOME}/dotnet.spark/examples/ | ||
|
||
RUN cd / \ | ||
&& wget -q --show-progress --progress=bar:force:noscroll https://archive.apache.org/dist/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz \ | ||
COPY HelloSpark /dotnet/HelloSpark | ||
|
||
RUN cd /dotnet/HelloSpark \ | ||
&& dotnet build \ | ||
&& cp /dotnet/HelloSpark/bin/Debug/netcoreapp${DOTNET_CORE_VERSION}/microsoft-spark-*.jar ${HOME}/ \ | ||
&& rm -rf /dotnet/HelloSpark \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The unfortunate consequence of this pattern is that HelloSpark remains in the image as a result of obtaining it via COPY. This is not desirable. Is there a way this can be generated during the Docker build or can it be a published tarball so that is can get copied and deleted within a single Dockerfile instruction? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks again @MichaelSimons for your great feedback! Just creating a dummy project during the build process now. |
||
&& cd / \ | ||
&& echo "\nDownloading spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz ..." \ | ||
&& wget -q https://archive.apache.org/dist/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz \ | ||
&& tar -xvzf spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can extract to the spark directory with a single instruction which would eliminate the need for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I assume you mean to use tar with --directory. But wouldn't that required that the directory exist already? In that case I'd have to add a mkdir first. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're correct, I missed what was happening here. Please ignore my comment. |
||
&& mv spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION} spark \ | ||
&& rm spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz \ | ||
&& chmod 755 /usr/local/bin/start-spark-debug.sh \ | ||
&& cp /dotnet/Debug/netcoreapp${DOTNET_CORE_VERSION}/${DOTNET_SPARK_JAR} ${HOME}/ \ | ||
&& chown -R ${NB_UID} ${HOME} | ||
|
||
USER ${NB_USER} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the Dockerfile Best Practices, sort multi-line instructions to improve readability where possible (e.g. cross dependencies)