Livy Interpreter for Apache Zeppelin
Overview
Livy is an open source REST interface for interacting with Spark from anywhere. It supports executing snippets of code or programs in a Spark context that runs locally or in YARN.
- Interactive Scala, Python and R shells
- Batch submissions in Scala, Java, Python
- Multi users can share the same server (impersonation support)
- Can be used for submitting jobs from anywhere with REST
- Does not require any code change to your programs
Requirements
Additional requirements for the Livy interpreter are:
- Spark 1.3 or above.
- Livy server.
Configuration
We added some common configurations for spark, and you can set any configuration you want.
You can find all Spark configurations in here.
And instead of starting property with spark.
it should be replaced with livy.spark.
.
Example: spark.driver.memory
to livy.spark.driver.memory
Property | Default | Description |
---|---|---|
zeppelin.livy.url | http://localhost:8998 | URL where livy server is running |
zeppelin.livy.spark.maxResult | 1000 | Max number of Spark SQL result to display. |
zeppelin.livy.session.create_timeout | 120 | Timeout in seconds for session creation |
zeppelin.livy.displayAppInfo | true | Whether to display app info |
zeppelin.livy.pull_status.interval.millis | 1000 | The interval for checking paragraph execution status |
livy.spark.driver.cores | Driver cores. ex) 1, 2. | |
livy.spark.driver.memory | Driver memory. ex) 512m, 32g. | |
livy.spark.executor.instances | Executor instances. ex) 1, 4. | |
livy.spark.executor.cores | Num cores per executor. ex) 1, 4. | |
livy.spark.executor.memory | Executor memory per worker instance. ex) 512m, 32g. | |
livy.spark.dynamicAllocation.enabled | Use dynamic resource allocation. ex) True, False. | |
livy.spark.dynamicAllocation.cachedExecutorIdleTimeout | Remove an executor which has cached data blocks. | |
livy.spark.dynamicAllocation.minExecutors | Lower bound for the number of executors. | |
livy.spark.dynamicAllocation.initialExecutors | Initial number of executors to run. | |
livy.spark.dynamicAllocation.maxExecutors | Upper bound for the number of executors. | |
livy.spark.jars.packages | Adding extra libraries to livy interpreter | |
zeppelin.livy.ssl.trustStore | client trustStore file. Used when livy ssl is enabled | |
zeppelin.livy.ssl.trustStorePassword | password for trustStore file. Used when livy ssl is enabled |
We remove livy.spark.master in zeppelin-0.7. Because we sugguest user to use livy 0.3 in zeppelin-0.7. And livy 0.3 don't allow to specify livy.spark.master, it enfornce yarn-cluster mode.
Adding External libraries
You can load dynamic library to livy interpreter by set livy.spark.jars.packages
property to comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. The format for the coordinates should be groupId:artifactId:version.
Example
Property | Example | Description |
---|---|---|
livy.spark.jars.packages | io.spray:spray-json_2.10:1.3.1 | Adding extra libraries to livy interpreter |
How to use
Basically, you can use
spark
%livy.spark
sc.version
pyspark
%livy.pyspark
print "1"
sparkR
%livy.sparkr
hello <- function( name ) {
sprintf( "Hello, %s", name );
}
hello("livy")
Impersonation
When Zeppelin server is running with authentication enabled, then this interpreter utilizes Livy’s user impersonation feature i.e. sends extra parameter for creating and running a session ("proxyUser": "${loggedInUser}"). This is particularly useful when multi users are sharing a Notebook server.
Apply Zeppelin Dynamic Forms
You can leverage Zeppelin Dynamic Form. You can use both the text input
and select form
parameterization features.
%livy.pyspark
print "${group_by=product_id,product_id|product_name|customer_id|store_id}"
FAQ
Livy debugging: If you see any of these in error console
Connect to livyhost:8998 [livyhost/127.0.0.1, livyhost/0:0:0:0:0:0:0:1] failed: Connection refused
Looks like the livy server is not up yet or the config is wrong
Exception: Session not found, Livy server would have restarted, or lost session.
The session would have timed out, you may need to restart the interpreter.
Blacklisted configuration values in session config: spark.master
Edit conf/spark-blacklist.conf
file in livy server and comment out #spark.master
line.
If you choose to work on livy in apps/spark/java
directory in https://github.com/cloudera/hue,
copy spark-user-configurable-options.template
to spark-user-configurable-options.conf
file in livy server and comment out #spark.master
.