Livy Interpreter for Apache Zeppelin

Overview

Livy is an open source REST interface for interacting with Spark from anywhere. It supports executing snippets of code or programs in a Spark context that runs locally or in YARN.

  • Interactive Scala, Python and R shells
  • Batch submissions in Scala, Java, Python
  • Multi users can share the same server (impersonation support)
  • Can be used for submitting jobs from anywhere with REST
  • Does not require any code change to your programs

Requirements

Additional requirements for the Livy interpreter are:

  • Spark 1.3 or above.
  • Livy server.

Configuration

We added some common configurations for spark, and you can set any configuration you want. This link contains all spark configurations: http://spark.apache.org/docs/latest/configuration.html#available-properties. And instead of starting property with spark. it should be replaced with livy.spark.. Example: spark.master to livy.spark.master

Property Default Description
livy.spark.master local[*] Spark master uri. ex) spark://masterhost:7077
zeppelin.livy.url http://localhost:8998 URL where livy server is running
zeppelin.livy.spark.maxResult 1000 Max number of SparkSQL result to display.
livy.spark.driver.cores Driver cores. ex) 1, 2.
livy.spark.driver.memory Driver memory. ex) 512m, 32g.
livy.spark.executor.instances Executor instances. ex) 1, 4.
livy.spark.executor.cores Num cores per executor. ex) 1, 4.
livy.spark.executor.memory Executor memory per worker instance. ex) 512m, 32g.
livy.spark.dynamicAllocation.enabled Use dynamic resource allocation. ex) True, False.
livy.spark.dynamicAllocation.cachedExecutorIdleTimeout Remove an executor which has cached data blocks.
livy.spark.dynamicAllocation.minExecutors Lower bound for the number of executors.
livy.spark.dynamicAllocation.initialExecutors Initial number of executors to run.
livy.spark.dynamicAllocation.maxExecutors Upper bound for the number of executors.

How to use

Basically, you can use

spark

%livy.spark
sc.version

pyspark

%livy.pyspark
print "1"

sparkR

%livy.sparkr
hello <- function( name ) {
    sprintf( "Hello, %s", name );
}

hello("livy")

Impersonation

When Zeppelin server is running with authentication enabled, then this interpreter utilizes Livy’s user impersonation feature i.e. sends extra parameter for creating and running a session ("proxyUser": "${loggedInUser}"). This is particularly useful when multi users are sharing a Notebook server.

Apply Zeppelin Dynamic Forms

You can leverage Zeppelin Dynamic Form. You can use both the text input and select form parameterization features.

%livy.pyspark
print "${group_by=product_id,product_id|product_name|customer_id|store_id}"

FAQ

Livy debugging: If you see any of these in error console

Connect to livyhost:8998 [livyhost/127.0.0.1, livyhost/0:0:0:0:0:0:0:1] failed: Connection refused

Looks like the livy server is not up yet or the config is wrong

Exception: Session not found, Livy server would have restarted, or lost session.

The session would have timed out, you may need to restart the interpreter.

Blacklisted configuration values in session config: spark.master

Edit conf/spark-blacklist.conf file in livy server and comment out #spark.master line.

If you choose to work on livy in apps/spark/java directory in https://github.com/cloudera/hue, copy spark-user-configurable-options.template to spark-user-configurable-options.conf file in livy server and comment out #spark.master.