Flink interpreter for Apache Zeppelin

Overview

Apache Flink is an open source platform for distributed stream and batch data processing. Flinkā€™s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. Flink also builds batch processing on top of the streaming engine, overlaying native iteration support, managed memory, and program optimization.

Apache Flink is supported in Zeppelin with Flink interpreter group which consists of below five interpreters.

Name Class Description
%flink FlinkInterpreter Creates ExecutionEnvironment/StreamExecutionEnvironment/BatchTableEnvironment/StreamTableEnvironment and provides a Scala environment
%flink.pyflink PyFlinkInterpreter Provides a python environment
%flink.ipyflink IPyFlinkInterpreter Provides an ipython environment
%flink.ssql FlinkStreamSqlInterpreter Provides a stream sql environment
%flink.bsql FlinkBatchSqlInterpreter Provides a batch sql environment

Configuration

The Flink interpreter can be configured with properties provided by Zeppelin. You can also set other flink properties which are not listed in the table. For a list of additional properties, refer to Flink Available Properties.

Property Default Description
FLINK_HOME Location of flink installation. It is must be specified, otherwise you can not use flink in zeppelin
flink.execution.mode local Execution mode of flink, e.g. local/yarn/remote
flink.execution.remote.host jobmanager hostname if it is remote mode
flink.execution.remote.port jobmanager port if it is remote mode
flink.jm.memory 1024 Total number of memory(mb) of JobManager
flink.tm.memory 1024 Total number of memory(mb) of TaskManager
flink.tm.num 2 Number of TaskManager
flink.tm.slot 1 Number of slot per TaskManager
flink.yarn.appName Zeppelin Flink Session Yarn app name
flink.yarn.queue queue name of yarn app
flink.yarn.jars additional user jars (comma separated)
zeppelin.flink.scala.color true whether display scala shell output in colorful format
zeppelin.flink.enableHive false whether enable hive
zeppelin.flink.printREPLOutput true Print REPL output
zeppelin.flink.maxResult 1000 max number of row returned by sql interpreter
zeppelin.flink.planner blink planner or flink table api, blink or flink
zeppelin.pyflink.python python python executable for pyflink

StreamExecutionEnvironment, ExecutionEnvironment, StreamTableEnvironment, BatchTableEnvironment

Zeppelin will create 4 variables to represent flink's entrypoint: * senv (StreamExecutionEnvironment), * env (ExecutionEnvironment) * stenv (StreamTableEnvironment) * btenv (BatchTableEnvironment)

ZeppelinContext

Zeppelin automatically injects ZeppelinContext as variable z in your Scala/Python environment. ZeppelinContext provides some additional functions and utilities. See Zeppelin-Context for more details.

IPython support

By default, zeppelin would use IPython in pyflink when IPython is available, Otherwise it would fall back to the original PyFlink implementation. If you don't want to use IPython, then you can set zeppelin.pyflink.useIPython as false in interpreter setting. For the IPython features, you can refer doc Python Interpreter