Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flink #42

Open
jloveland opened this issue Nov 17, 2015 · 1 comment
Open

Flink #42

jloveland opened this issue Nov 17, 2015 · 1 comment

Comments

@jloveland
Copy link

Apache Flink is quickly gaining momentum as an alternative to Spark Streaming, Storm, etc.

Apache Flink is an open source platform for distributed stream and batch data processing. Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams.

What are your thoughts on developing a plugin for Flink Streaming in StreamFlow? The rationale is that Flink provides a Storm compatible API:

Flink provides a Storm compatible API (org.apache.flink.storm.api) that offers replacements for the following classes:

TopologyBuilder replaced by FlinkTopologyBuilder
StormSubmitter replaced by FlinkSubmitter
NimbusClient and Client replaced by FlinkClient
LocalCluster replaced by FlinkLocalCluster

In order to submit a Storm topology to Flink, it is sufficient to replace the used Storm classes with their Flink replacements in the Storm client code that assembles the topology. The actual runtime code, ie, Spouts and Bolts, can be uses unmodified. If a topology is executed in a remote cluster, parameters nimbus.host and nimbus.thrift.port are used as jobmanger.rpc.address and jobmanger.rpc.port, respectively. If a parameter is not specified, the value is taken from flink-conf.yaml.

@christopherlakey
Copy link
Contributor

StreamFlow now uses an external process for deploying a StreamFlow topology to a Storm cluster. It should be relatively straight forward to implement an alternate submitter. More changes will likely be required to provide the hooks for topology status and metrics.

What's the primary motivation for Flink integration? Performance?

I saw the word-count performance comparison, but it was comparing a Storm topology to a Flink DSL approach. Are there any performance comparisons of running an unmodified Storm topology in Flink vs natively in a Storm cluster?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants