Skip to content

Commit

Permalink
[CI] Fix Connection time out in docker workflow (#1656)
Browse files Browse the repository at this point in the history
* fix: connection timeout error

* fix: try some ideas

* fix: try some ideas 2/?

* fix: try some ideas 3/?

* fix: try some ideas 4/?

* fix: add debugger

* fix: add debugger 2/?

* fix: add debugger 3/?

* fix: add debugger 4/?

* fix: try some idea 5/?

* fix: add debugger

* fix: add debugger 2/?

* fix: add debugger 3/?

* fix: try some ideas 6/?

* fix: try some ideas 7/?

* fix: try some ideas 8/?

* fix: it should work now.

* fix: remove the debugger

* cleaning up

* introduce global environment

* Update .github/workflows/python.yml

Co-authored-by: Jia Yu <[email protected]>

* fix: docker timeout issues

---------

Co-authored-by: Jia Yu <[email protected]>
  • Loading branch information
furqaankhan and jiayuasu authored Oct 29, 2024
1 parent 61bb000 commit 439657f
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 14 deletions.
6 changes: 2 additions & 4 deletions docker/sedona-spark-jupyterlab/sedona-jupyterlab.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ FROM ubuntu:22.04

ARG shared_workspace=/opt/workspace
ARG spark_version=3.4.1
ARG hadoop_version=3
ARG hadoop_s3_version=3.3.4
ARG aws_sdk_version=1.12.402
ARG spark_xml_version=0.16.0
Expand All @@ -29,8 +28,7 @@ ARG spark_extension_version=2.11.0

# Set up envs
ENV SHARED_WORKSPACE=${shared_workspace}
ENV SPARK_HOME /opt/spark
RUN mkdir ${SPARK_HOME}
ENV SPARK_HOME /usr/local/lib/python3.10/dist-packages/pyspark
ENV SEDONA_HOME /opt/sedona
RUN mkdir ${SEDONA_HOME}

Expand All @@ -44,7 +42,7 @@ COPY ./ ${SEDONA_HOME}/

RUN chmod +x ${SEDONA_HOME}/docker/spark.sh
RUN chmod +x ${SEDONA_HOME}/docker/sedona.sh
RUN ${SEDONA_HOME}/docker/spark.sh ${spark_version} ${hadoop_version} ${hadoop_s3_version} ${aws_sdk_version} ${spark_xml_version}
RUN ${SEDONA_HOME}/docker/spark.sh ${spark_version} ${hadoop_s3_version} ${aws_sdk_version} ${spark_xml_version}

# Install Python dependencies
COPY docker/sedona-spark-jupyterlab/requirements.txt /opt/requirements.txt
Expand Down
13 changes: 3 additions & 10 deletions docker/spark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,20 +19,16 @@ set -e

# Define variables
spark_version=$1
hadoop_version=$2
hadoop_s3_version=$3
aws_sdk_version=$4
spark_xml_version=$5
hadoop_s3_version=$2
aws_sdk_version=$3
spark_xml_version=$4

# Set up OS libraries
apt-get update
apt-get install -y openjdk-19-jdk-headless curl python3-pip maven
pip3 install --upgrade pip && pip3 install pipenv

# Download Spark jar and set up PySpark
curl https://archive.apache.org/dist/spark/spark-"${spark_version}"/spark-"${spark_version}"-bin-hadoop"${hadoop_version}".tgz -o spark.tgz
tar -xf spark.tgz && mv spark-"${spark_version}"-bin-hadoop"${hadoop_version}"/* "${SPARK_HOME}"/
rm spark.tgz && rm -rf spark-"${spark_version}"-bin-hadoop"${hadoop_version}"
pip3 install pyspark=="${spark_version}"

# Add S3 jars
Expand All @@ -42,9 +38,6 @@ curl https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/"${aws_sdk
# Add spark-xml jar
curl https://repo1.maven.org/maven2/com/databricks/spark-xml_2.12/"${spark_xml_version}"/spark-xml_2.12-"${spark_xml_version}".jar -o "${SPARK_HOME}"/jars/spark-xml_2.12-"${spark_xml_version}".jar

# Set up master IP address and executor memory
cp "${SPARK_HOME}"/conf/spark-defaults.conf.template "${SPARK_HOME}"/conf/spark-defaults.conf

# Install required libraries for GeoPandas on Apple chip mac
apt-get install -y gdal-bin libgdal-dev

Expand Down

0 comments on commit 439657f

Please sign in to comment.