Livy Interpreter for Apache Zeppelin
Overview
Livy is an open source REST interface for interacting with Spark from anywhere. It supports executing snippets of code or programs in a Spark context that runs locally or in YARN.
- Interactive Scala, Python and R shells
- Batch submissions in Scala, Java, Python
- Multi users can share the same server (impersonation support)
- Can be used for submitting jobs from anywhere with REST
- Does not require any code change to your programs
Requirements
Additional requirements for the Livy interpreter are:
- Spark 1.3 or above.
- Livy server.
Configuration
We added some common configurations for spark, and you can set any configuration you want.
This link contains all spark configurations: http://spark.apache.org/docs/latest/configuration.html#available-properties.
And instead of starting property with spark.
it should be replaced with livy.spark.
.
Example: spark.master
to livy.spark.master
Property | Default | Description |
---|---|---|
livy.spark.master | local[*] | Spark master uri. ex) spark://masterhost:7077 |
zeppelin.livy.url | http://localhost:8998 | URL where livy server is running |
zeppelin.livy.spark.maxResult | 1000 | Max number of SparkSQL result to display. |
livy.spark.driver.cores | Driver cores. ex) 1, 2. | |
livy.spark.driver.memory | Driver memory. ex) 512m, 32g. | |
livy.spark.executor.instances | Executor instances. ex) 1, 4. | |
livy.spark.executor.cores | Num cores per executor. ex) 1, 4. | |
livy.spark.executor.memory | Executor memory per worker instance. ex) 512m, 32g. | |
livy.spark.dynamicAllocation.enabled | Use dynamic resource allocation. ex) True, False. | |
livy.spark.dynamicAllocation.cachedExecutorIdleTimeout | Remove an executor which has cached data blocks. | |
livy.spark.dynamicAllocation.minExecutors | Lower bound for the number of executors. | |
livy.spark.dynamicAllocation.initialExecutors | Initial number of executors to run. | |
livy.spark.dynamicAllocation.maxExecutors | Upper bound for the number of executors. |
How to use
Basically, you can use
spark
%livy.spark
sc.version
pyspark
%livy.pyspark
print "1"
sparkR
%livy.sparkr
hello <- function( name ) {
sprintf( "Hello, %s", name );
}
hello("livy")
Impersonation
When Zeppelin server is running with authentication enabled, then this interpreter utilizes Livy’s user impersonation feature i.e. sends extra parameter for creating and running a session ("proxyUser": "${loggedInUser}"). This is particularly useful when multi users are sharing a Notebook server.
Apply Zeppelin Dynamic Forms
You can leverage Zeppelin Dynamic Form. You can use both the text input
and select form
parameterization features.
%livy.pyspark
print "${group_by=product_id,product_id|product_name|customer_id|store_id}"
FAQ
Livy debugging: If you see any of these in error console
Connect to livyhost:8998 [livyhost/127.0.0.1, livyhost/0:0:0:0:0:0:0:1] failed: Connection refused
Looks like the livy server is not up yet or the config is wrong
Exception: Session not found, Livy server would have restarted, or lost session.
The session would have timed out, you may need to restart the interpreter.
Blacklisted configuration values in session config: spark.master
Edit conf/spark-blacklist.conf
file in livy server and comment out #spark.master
line.
If you choose to work on livy in apps/spark/java
directory in https://github.com/cloudera/hue,
copy spark-user-configurable-options.template
to spark-user-configurable-options.conf
file in livy server and comment out #spark.master
.