spark-ec2

NOTE: The main repository for spark-ec2 development is https://github.com/mesos/spark-ec2

spark-ec2

This repository contains the set of scripts used to setup a Spark cluster on EC2. These scripts are intended to be used by the default Spark AMI and is not expected to work on other AMIs. If you wish to start a cluster using Spark, please refer to http://spark-project.org/docs/latest/ec2-scripts.html

Details

The Spark cluster setup is guided by the values set in ec2-variables.sh.setup.sh first performs basic operations like enabling ssh across machines, mounting ephemeral drives and also creates files named /root/spark-ec2/masters, and /root/spark-ec2/slaves. Following that every module listed in MODULES is initialized.

To add a new module, you will need to do the following:

a. Create a directory with the module's name

b. Optionally add a file named init.sh. This is called before templates are configured and can be used to install any pre-requisites.

c. Add any files that need to be configured based on the cluster setup to templates/. The path of the file determines where the configured file will be copied to. Right now the set of variables that can be used in a template are

  {{master_list}}
  {{active_master}}
  {{slave_list}}
  {{zoo_list}}
  {{cluster_url}}
  {{hdfs_data_dirs}}
  {{mapred_local_dirs}}
  {{spark_local_dirs}}
  {{default_spark_mem}}

You can add new variables by modifying deploy_templates.py

d. Add a file named setup.sh to launch any services on the master/slaves. This is called after the templates have been configured. You can use the environment variables $SLAVES to get a list of slave hostnames and /root/spark-ec2/copy-dir to sync a directory across machines.

e. Modify https://github.com/mesos/spark/blob/master/ec2/spark_ec2.py to add your module to the list of enabled modules.

Name	Name	Last commit message	Last commit date
Latest commit shivaram Update README.md to point to mesos/spark-ec2 Jan 19, 2015 91e9d8f · Jan 19, 2015 History 66 Commits
ephemeral-hdfs	ephemeral-hdfs	Set HDFS_URL which can be used in the AMI	Feb 5, 2013
ganglia	ganglia	Fix graphs by symlinking rrds to /var/lib	Jul 11, 2013
mesos	mesos	Munge cluster_url into format understood by Mesos	May 29, 2013
old-scripts/mesos-ec2	old-scripts/mesos-ec2	Move mesos-ec2 to old-scripts	Jan 27, 2013
persistent-hdfs	persistent-hdfs	Remove redundant mkdir	Jun 29, 2013
spark-standalone	spark-standalone	Remove unused pushd, popd	Feb 18, 2013
templates	templates	Merge branch 'master' of git://github.com/mesos/spark-ec2	Jun 29, 2013
.gitignore	.gitignore	Initial commit	Jan 26, 2013
README.md	README.md	Update README.md to point to mesos/spark-ec2	Jan 19, 2015
copy-dir	copy-dir	Setup new directory structure that can become /root/spark-ec2 on the AMI	Jan 27, 2013
copy-dir.sh	copy-dir.sh	Setup new directory structure that can become /root/spark-ec2 on the AMI	Jan 27, 2013
create-swap.sh	create-swap.sh	Bunch of fixes to the scripts	Jan 27, 2013
deploy_templates.py	deploy_templates.py	Configure JAVA_HOME and SCALA_HOME from environment	May 29, 2013
ec2-variables.sh	ec2-variables.sh	Clarify use case in README and set Spark Master IP in standalone mode	Jan 27, 2013
setup-slave.sh	setup-slave.sh	Fix permissions and format check	Jun 29, 2013
setup.sh	setup.sh	Configure JAVA_HOME and SCALA_HOME from environment	May 29, 2013
ssh-no-keychecking.sh	ssh-no-keychecking.sh	Setup new directory structure that can become /root/spark-ec2 on the AMI	Jan 27, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NOTE: The main repository for spark-ec2 development is https://github.com/mesos/spark-ec2

spark-ec2

Details

About

Releases

Packages

Contributors 3

Languages

shivaram/spark-ec2

Folders and files

Latest commit

History

Repository files navigation

NOTE: The main repository for spark-ec2 development is https://github.com/mesos/spark-ec2

spark-ec2

Details

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages