This project looks at scaling up the Whiskey project using Apache Wayang.
Apache Wayang (incubating) is an API for big data cross-platform processing. It provides an abstraction over other platforms like Apache Spark and Apache Flink as well as a default built-in stream-based "platform". Some keys goals of Wayang are:
- a processing platform independent developer experience when writing applications,
- a processing platform independent approach to executing applications,
- and support for optimizing execution across processing platforms.
The code developers write is intended to be the same regardless of whether a light-weight or highly-scalable processing platform may eventually be used to execute it. Execution of the application is specified in a platform-agnostic logical plan. Wayang transforms the logical plan into a set of physical operators to be executed by one or more nominated underlying processing platforms.
K-Means is the most common form of centroid clustering and is described further in the main Whiskey project. This example uses K-Means centroids to look at grouping together the Whiskey drinks in our case study. We have our own hand-crafted K-Means algorithm which will be executed across the selected processing platform.
Groovy code examples can be found in the src/main/groovy directory.
You have several options for running the programs (see more details from the main README in the root project):
-
If you have opened the repo in IntelliJ (or your favourite IDE) you should be able to execute the examples directly in the IDE.
-
You can run the Java stream backed example online using a Jupyter/Beakerx notebook (slightly different code since it uses Groovy 2.5.6 and JDK8):
-
From the command line, invoke the application using gradlew (use
./gradlew
on unix-like systems) with the run command.
gradlew :WhiskeyWayang:run
-
If the example has @Grab statements commented out at the top, you can cut and paste the examples into the groovyConsole and uncomment the grab statements. Make sure to cut and paste any helper classes too if appropriate.
This example has been tested on JDK8, JDK11, and JDK17. The application nominates possible Java and Spark platforms for processing. Apache Wayang automatically selects between these platforms and for this application will always select Java. Comment out the Java plugin to force execution on the Spark platform.