Quick Start

Welcome to your first trial to explore Apache Zeppelin! This page will help you to get started and here is the list of topics covered.

Installation

Apache Zeppelin officially supports and is tested on next environments.

Name	Value
Oracle JDK	1.7 (set `JAVA_HOME`)
OS	Mac OSX Ubuntu 14.X CentOS 6.X Windows 7 Pro SP1

There are two options to install Apache Zeppelin on your machine. One is downloading pre-built binary package from the archive. You can download not only the latest stable version but also the older one if you need. The other option is building from the source. Although it can be unstable somehow since it is on development status, you can explore newly added feature and change it as you want.

Downloading Binary Package

If you want to install Apache Zeppelin with a stable binary package, please visit Apache Zeppelin download Page.

If you have downloaded netinst binary, install additional interpreters before you start Zeppelin. Or simply run ./bin/install-interpreter.sh --all.

After unpacking, jump to Starting Apache Zeppelin with Command Line section.

Building from Source

If you want to build from the source, the software below needs to be installed on your system.

Name	Value
Git
Maven	3.1.x or higher

If you don't have it installed yet, please check Before Build section and follow step by step instructions from there.

1. Clone Apache Zeppelin repository

git clone https://github.com/apache/zeppelin.git

2. Build source with options

Each interpreters requires different build options. For the further information about options, please see Build section.

mvn clean package -DskipTests [Options]

Here are some examples with several options

# basic build
mvn clean package -Pspark-1.6 -Phadoop-2.4 -Pyarn -Ppyspark

# spark-cassandra integration
mvn clean package -Pcassandra-spark-1.5 -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests

# with CDH
mvn clean package -Pspark-1.5 -Dhadoop.version=2.6.0-cdh5.5.0 -Phadoop-2.6 -Pvendor-repo -DskipTests

# with MapR
mvn clean package -Pspark-1.5 -Pmapr50 -DskipTests

For the further information about building with source, please see README.md in Zeppelin repository.

Starting Apache Zeppelin with Command Line

Start Zeppelin

bin/zeppelin-daemon.sh start

If you are using Windows

bin\zeppelin.cmd

After successful start, visit http://localhost:8080 with your web browser.

Stop Zeppelin

bin/zeppelin-daemon.sh stop

(Optional) Start Apache Zeppelin with a service manager

Note : The below description was written based on Ubuntu Linux.

Apache Zeppelin can be auto started as a service with an init script, such as services managed by upstart.

The following is an example of upstart script to be saved as /etc/init/zeppelin.conf This also allows the service to be managed with commands such as

sudo service zeppelin start  
sudo service zeppelin stop  
sudo service zeppelin restart

Other service managers could use a similar approach with the upstart argument passed to the zeppelin-daemon.sh script.

bin/zeppelin-daemon.sh upstart

zeppelin.conf

description "zeppelin"

start on (local-filesystems and net-device-up IFACE!=lo)
stop on shutdown

# Respawn the process on unexpected termination
respawn

# respawn the job up to 7 times within a 5 second period.
# If the job exceeds these values, it will be stopped and marked as failed.
respawn limit 7 5

# zeppelin was installed in /usr/share/zeppelin in this example
chdir /usr/share/zeppelin
exec bin/zeppelin-daemon.sh upstart

What is the next?

Congratulation on your successful Apache Zeppelin installation! Here are two next steps you might need.

If you are new to Apache Zeppelin

For an in-depth overview of Apache Zeppelin UI, head to Explore Apache Zeppelin UI.
After getting familiar with Apache Zeppelin UI, have fun with a short walk-through Tutorial that uses Apache Spark backend.
If you need more configuration setting for Apache Zeppelin, jump to the next section: Apache Zeppelin Configuration.

If you need more information about Spark or JDBC interpreter setting

Apache Zeppelin provides deep integration with Apache Spark. For the further informtation, see Spark Interpreter for Apache Zeppelin.
Also, you can use generic JDBC connections in Apache Zeppelin. Go to Generic JDBC Interpreter for Apache Zeppelin.

If you are in multi-user environment

You can set permissions for your notebooks and secure data resource in multi-user environment. Go to More -> Security section.

Apache Zeppelin Configuration

You can configure Apache Zeppelin with both environment variables in conf/zeppelin-env.sh (conf\zeppelin-env.cmd for Windows) and Java properties in conf/zeppelin-site.xml. If both are defined, then the environment variables will take priority.

zepplin-env.sh	zepplin-site.xml	Default value	Description
ZEPPELIN_PORT	zeppelin.server.port	8080	Zeppelin server port
ZEPPELIN_MEM	N/A	-Xmx1024m -XX:MaxPermSize=512m	JVM mem options
ZEPPELIN_INTP_MEM	N/A	ZEPPELIN_MEM	JVM mem options for interpreter process
ZEPPELIN_JAVA_OPTS	N/A		JVM options
ZEPPELIN_ALLOWED_ORIGINS	zeppelin.server.allowed.origins	*	Enables a way to specify a ',' separated list of allowed origins for rest and websockets. i.e. http://localhost:8080
N/A	zeppelin.anonymous.allowed	true	Anonymous user is allowed by default.
ZEPPELIN_SERVER_CONTEXT_PATH	zeppelin.server.context.path	/	A context path of the web application
ZEPPELIN_SSL	zeppelin.ssl	false
ZEPPELIN_SSL_CLIENT_AUTH	zeppelin.ssl.client.auth	false
ZEPPELIN_SSL_KEYSTORE_PATH	zeppelin.ssl.keystore.path	keystore
ZEPPELIN_SSL_KEYSTORE_TYPE	zeppelin.ssl.keystore.type	JKS
ZEPPELIN_SSL_KEYSTORE_PASSWORD	zeppelin.ssl.keystore.password
ZEPPELIN_SSL_KEY_MANAGER_PASSWORD	zeppelin.ssl.key.manager.password
ZEPPELIN_SSL_TRUSTSTORE_PATH	zeppelin.ssl.truststore.path
ZEPPELIN_SSL_TRUSTSTORE_TYPE	zeppelin.ssl.truststore.type
ZEPPELIN_SSL_TRUSTSTORE_PASSWORD	zeppelin.ssl.truststore.password
ZEPPELIN_NOTEBOOK_HOMESCREEN	zeppelin.notebook.homescreen		A notebook id displayed in Apache Zeppelin homescreen i.e. 2A94M5J1Z
ZEPPELIN_NOTEBOOK_HOMESCREEN_HIDE	zeppelin.notebook.homescreen.hide	false	This value can be "true" when to hide the notebook id set by `ZEPPELIN_NOTEBOOK_HOMESCREEN` on the Apache Zeppelin homescreen. For the further information, please read Customize your Zeppelin homepage.
ZEPPELIN_WAR_TEMPDIR	zeppelin.war.tempdir	webapps	A location of jetty temporary directory
ZEPPELIN_NOTEBOOK_DIR	zeppelin.notebook.dir	notebook	The root directory where notebook directories are saved
ZEPPELIN_NOTEBOOK_S3_BUCKET	zeppelin.notebook.s3.bucket	zeppelin	S3 Bucket where notebook files will be saved
ZEPPELIN_NOTEBOOK_S3_USER	zeppelin.notebook.s3.user	user	A user name of S3 bucket i.e. `bucket/user/notebook/2A94M5J1Z/note.json`
ZEPPELIN_NOTEBOOK_S3_ENDPOINT	zeppelin.notebook.s3.endpoint	s3.amazonaws.com	Endpoint for the bucket
ZEPPELIN_NOTEBOOK_S3_KMS_KEY_ID	zeppelin.notebook.s3.kmsKeyID		AWS KMS Key ID to use for encrypting data in S3 (optional)
ZEPPELIN_NOTEBOOK_S3_EMP	zeppelin.notebook.s3.encryptionMaterialsProvider		Class name of a custom S3 encryption materials provider implementation to use for encrypting data in S3 (optional)
ZEPPELIN_NOTEBOOK_AZURE_CONNECTION_STRING	zeppelin.notebook.azure.connectionString		The Azure storage account connection string i.e. `DefaultEndpointsProtocol=https;AccountName=<accountName>;AccountKey=<accountKey>`
ZEPPELIN_NOTEBOOK_AZURE_SHARE	zeppelin.notebook.azure.share	zeppelin	Share where the notebook files will be saved
ZEPPELIN_NOTEBOOK_AZURE_USER	zeppelin.notebook.azure.user	user	An optional user name of Azure file share i.e. `share/user/notebook/2A94M5J1Z/note.json`
ZEPPELIN_NOTEBOOK_STORAGE	zeppelin.notebook.storage	org.apache.zeppelin.notebook.repo.VFSNotebookRepo	Comma separated list of notebook storage
ZEPPELIN_INTERPRETERS	zeppelin.interpreters	org.apache.zeppelin.spark.SparkInterpreter, org.apache.zeppelin.spark.PySparkInterpreter, org.apache.zeppelin.spark.SparkSqlInterpreter, org.apache.zeppelin.spark.DepInterpreter, org.apache.zeppelin.markdown.Markdown, org.apache.zeppelin.shell.ShellInterpreter, ...	Comma separated interpreter configurations [Class] The first interpreter will be a default value. It means only the first interpreter in this list can be available without `%interpreter_name` annotation in notebook paragraph.
ZEPPELIN_INTERPRETER_DIR	zeppelin.interpreter.dir	interpreter	Interpreter directory
ZEPPELIN_WEBSOCKET_MAX_TEXT_MESSAGE_SIZE	zeppelin.websocket.max.text.message.size	1024000	Size in characters of the maximum text message to be received by websocket.