Skip to content

Latest commit

 

History

History
175 lines (128 loc) · 6.1 KB

deployment.md

File metadata and controls

175 lines (128 loc) · 6.1 KB

Deployment Guide

English | 简体中文

At present, BitSail only supports flink deployment on Yarn.
Other platforms like native kubernetes will be release recently.

Here are the contents of this part:

Below is a step-by-step guide to help you effectively deploy it on Yarn.


Pre configuration

Configure Hadoop Environment

To support Yarn deployment, HADOOP_CLASSPATH has to be set in system environment properties. There are two ways to set this environment property:

  1. Set HADOOP_CLASSPATH directly.

  2. Set HADOOP_HOME targeting to the hadoop dir in deploy environment. The bitsail scripts will use the following command to generate HADOOP_CLASSPATH.

if [ -n "$HADOOP_HOME" ]; then
  export HADOOP_CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath)
fi

Configure Flink Cluster

After packaging, the project production contains a file conf/bitsail.conf. This file describes the system configuration of deployment environment, including the flink path and some other default parameters.

Here are some frequently-used options in the configuration file:

Prefix Parameter name Description Example
sys.flink. flink_home The root dir of flink. ${BITSAIL_HOME}/embedded/flink
checkpoint_dir The path storing the meta data file and data files of checkpoints.
Reference: Flink Checkpoints
"hdfs://opensource/bitsail/flink-1.11/checkpoints/"
flink_default_properties General flink runtime options configued by "-D". {
classloader.resolve-order: "child-first"
akka.framesize: "838860800b"
rest.client.max-content-length: 838860800
rest.server.max-content-len
}

Submit to Yarn

BitSail only support resource provider yarn's yarn-per-job mode until now, others like native kubernetes will be release recently.

You can use the startup script bin/bitsail to submit flink jobs to yarn.

The specific commands are as follows:

bash ./bin/bitsail run --engine flink --conf [job_conf_path] --execution-mode run --queue [queue_name] --deployment-mode yarn-per-job [--priority [yarn_priority] -p/--props [name=value]] 

Parameter description

  • Required parameters
    • queue_name: Target yarn queue
    • job_conf_path: Path of job configuration file
  • Optional parameters
    • yarn_priority: Job priority on yarn
    • name=value: Flink properties, for example classloader.resolve-order=child-first
      • name: Property key. Configurable flink parameters that will be transparently transmitted to the flink task.
      • value: Property value.

Submit an example job

Submit a fake source to print sink test to yarn.

bash ./bin/bitsail run --engine flink --conf ~/bitsail-archive-0.1.0-SNAPSHOT/examples/Fake_Print_Example.json --execution-mode run -p 1=1  --deployment-mode yarn-per-job  --queue default

Log for Debugging

Client side log file

Please check ${FLINK_HOME}/log/ folder to read the log file of BitSail client.

Yarn task log file

Please go to Yarn WebUI to check the logs of Flink JobManager and TaskManager.


Submit to Local Flink Session

Suppose that BitSail install path is: ${BITSAIL_HOME}.

After building BitSail, we can enter the following path and find runnable jars and example job configuration files:

cd ${BITSAIL_HOME}/bitsail-dist/target/bitsail-dist-0.1.0-SNAPSHOT-bin/bitsail-archive-0.1.0-SNAPSHOT/

Run Fake_to_Print example

Use examples/Fake_Print_Example.json as example to start a BitSail job:

  • <job-manager-address>: the address of job manager, should be host:port, e.g. localhost:8081.
bash bin/bitsail run \
  --engine flink \
  --execution-mode run \
  --deployment-mode local \
  --conf examples/Fake_Print_Example.json \
  --jm-address <job-manager-address>

Then you can visit Flink WebUI to see the running job. In task manager, we can see the output of the Fake_to_Print job in its stdout.

Run Fake_to_Hive example

Use examples/Fake_hive_Example.json as an example:

  • Remember fulfilling the job configuration with an available hive source before run the command:
    • job.writer.db_name: the hive database to write.
    • job.writer.table_name: the hive table to write.
    • job.writer.metastore_properties: add hive metastore address to it, like:
       {
          "job": {
            "writer": {
              "metastore_properties": "{\"hive.metastore.uris\":\"thrift://localhost:9083\"}"
            }
          }
       }

Then you can use the similar command to submit a BitSail job to specified Flink session:

bash bin/bitsail run \
  --engine flink \
  --execution-mode run \
  --deployment-mode local \
  --conf examples/Fake_Hive_Example.json \
  --jm-address <job-manager-address>