Quick Start

Welcome to your first trial to explore Apache Zeppelin! This page will help you to get started and here is the list of topics covered.

Installation

Apache Zeppelin officially supports and is tested on next environments.

Name Value
Oracle JDK 1.7
(set JAVA_HOME)
OS Mac OSX
Ubuntu 14.X
CentOS 6.X
Windows 7 Pro SP1

There are two options to install Apache Zeppelin on your machine. One is downloading pre-built binary package from the archive. You can download not only the latest stable version but also the older one if you need. The other option is building from the source. Although it can be unstable somehow since it is on development status, you can explore newly added feature and change it as you want.

Downloading Binary Package

If you want to install Apache Zeppelin with a stable binary package, please visit Apache Zeppelin download Page.

If you have downloaded netinst binary, install additional interpreters before you start Zeppelin. Or simply run ./bin/install-interpreter.sh --all.

After unpacking, jump to Starting Apache Zeppelin with Command Line section.

Building from Source

If you want to build from the source, the software below needs to be installed on your system.

Name Value
Git
Maven 3.1.x or higher

If you don't have it installed yet, please check Before Build section and follow step by step instructions from there.

1. Clone Apache Zeppelin repository

git clone https://github.com/apache/zeppelin.git

2. Build source with options

Each interpreters requires different build options. For the further information about options, please see Build section.

mvn clean package -DskipTests [Options]

Here are some examples with several options

# basic build
mvn clean package -Pspark-1.6 -Phadoop-2.4 -Pyarn -Ppyspark

# spark-cassandra integration
mvn clean package -Pcassandra-spark-1.5 -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests

# with CDH
mvn clean package -Pspark-1.5 -Dhadoop.version=2.6.0-cdh5.5.0 -Phadoop-2.6 -Pvendor-repo -DskipTests

# with MapR
mvn clean package -Pspark-1.5 -Pmapr50 -DskipTests

For the further information about building with source, please see README.md in Zeppelin repository.

Starting Apache Zeppelin with Command Line

Start Zeppelin

bin/zeppelin-daemon.sh start

If you are using Windows

bin\zeppelin.cmd

After successful start, visit http://localhost:8080 with your web browser.

Stop Zeppelin

bin/zeppelin-daemon.sh stop

(Optional) Start Apache Zeppelin with a service manager

Note : The below description was written based on Ubuntu Linux.

Apache Zeppelin can be auto started as a service with an init script, such as services managed by upstart.

The following is an example of upstart script to be saved as /etc/init/zeppelin.conf This also allows the service to be managed with commands such as

sudo service zeppelin start  
sudo service zeppelin stop  
sudo service zeppelin restart

Other service managers could use a similar approach with the upstart argument passed to the zeppelin-daemon.sh script.

bin/zeppelin-daemon.sh upstart

zeppelin.conf

description "zeppelin"

start on (local-filesystems and net-device-up IFACE!=lo)
stop on shutdown

# Respawn the process on unexpected termination
respawn

# respawn the job up to 7 times within a 5 second period.
# If the job exceeds these values, it will be stopped and marked as failed.
respawn limit 7 5

# zeppelin was installed in /usr/share/zeppelin in this example
chdir /usr/share/zeppelin
exec bin/zeppelin-daemon.sh upstart

What is the next?

Congratulation on your successful Apache Zeppelin installation! Here are two next steps you might need.

If you are new to Apache Zeppelin

  • For an in-depth overview of Apache Zeppelin UI, head to Explore Apache Zeppelin UI.
  • After getting familiar with Apache Zeppelin UI, have fun with a short walk-through Tutorial that uses Apache Spark backend.
  • If you need more configuration setting for Apache Zeppelin, jump to the next section: Apache Zeppelin Configuration.

If you need more information about Spark or JDBC interpreter setting

If you are in multi-user environment

  • You can set permissions for your notebooks and secure data resource in multi-user environment. Go to More -> Security section.

Apache Zeppelin Configuration

You can configure Apache Zeppelin with both environment variables in conf/zeppelin-env.sh (conf\zeppelin-env.cmd for Windows) and Java properties in conf/zeppelin-site.xml. If both are defined, then the environment variables will take priority.

zepplin-env.sh zepplin-site.xml Default value Description
ZEPPELIN_PORT zeppelin.server.port 8080 Zeppelin server port
ZEPPELIN_MEM N/A -Xmx1024m -XX:MaxPermSize=512m JVM mem options
ZEPPELIN_INTP_MEM N/A ZEPPELIN_MEM JVM mem options for interpreter process
ZEPPELIN_JAVA_OPTS N/A JVM options
ZEPPELIN_ALLOWED_ORIGINS zeppelin.server.allowed.origins * Enables a way to specify a ',' separated list of allowed origins for rest and websockets.
i.e. http://localhost:8080
N/A zeppelin.anonymous.allowed true Anonymous user is allowed by default.
ZEPPELIN_SERVER_CONTEXT_PATH zeppelin.server.context.path / A context path of the web application
ZEPPELIN_SSL zeppelin.ssl false
ZEPPELIN_SSL_CLIENT_AUTH zeppelin.ssl.client.auth false
ZEPPELIN_SSL_KEYSTORE_PATH zeppelin.ssl.keystore.path keystore
ZEPPELIN_SSL_KEYSTORE_TYPE zeppelin.ssl.keystore.type JKS
ZEPPELIN_SSL_KEYSTORE_PASSWORD zeppelin.ssl.keystore.password
ZEPPELIN_SSL_KEY_MANAGER_PASSWORD zeppelin.ssl.key.manager.password
ZEPPELIN_SSL_TRUSTSTORE_PATH zeppelin.ssl.truststore.path
ZEPPELIN_SSL_TRUSTSTORE_TYPE zeppelin.ssl.truststore.type
ZEPPELIN_SSL_TRUSTSTORE_PASSWORD zeppelin.ssl.truststore.password
ZEPPELIN_NOTEBOOK_HOMESCREEN zeppelin.notebook.homescreen A notebook id displayed in Apache Zeppelin homescreen
i.e. 2A94M5J1Z
ZEPPELIN_NOTEBOOK_HOMESCREEN_HIDE zeppelin.notebook.homescreen.hide false This value can be "true" when to hide the notebook id set by ZEPPELIN_NOTEBOOK_HOMESCREEN on the Apache Zeppelin homescreen.
For the further information, please read Customize your Zeppelin homepage.
ZEPPELIN_WAR_TEMPDIR zeppelin.war.tempdir webapps A location of jetty temporary directory
ZEPPELIN_NOTEBOOK_DIR zeppelin.notebook.dir notebook The root directory where notebook directories are saved
ZEPPELIN_NOTEBOOK_S3_BUCKET zeppelin.notebook.s3.bucket zeppelin S3 Bucket where notebook files will be saved
ZEPPELIN_NOTEBOOK_S3_USER zeppelin.notebook.s3.user user A user name of S3 bucket
i.e. bucket/user/notebook/2A94M5J1Z/note.json
ZEPPELIN_NOTEBOOK_S3_ENDPOINT zeppelin.notebook.s3.endpoint s3.amazonaws.com Endpoint for the bucket
ZEPPELIN_NOTEBOOK_S3_KMS_KEY_ID zeppelin.notebook.s3.kmsKeyID AWS KMS Key ID to use for encrypting data in S3 (optional)
ZEPPELIN_NOTEBOOK_S3_EMP zeppelin.notebook.s3.encryptionMaterialsProvider Class name of a custom S3 encryption materials provider implementation to use for encrypting data in S3 (optional)
ZEPPELIN_NOTEBOOK_AZURE_CONNECTION_STRING zeppelin.notebook.azure.connectionString The Azure storage account connection string
i.e. DefaultEndpointsProtocol=https;AccountName=<accountName>;AccountKey=<accountKey>
ZEPPELIN_NOTEBOOK_AZURE_SHARE zeppelin.notebook.azure.share zeppelin Share where the notebook files will be saved
ZEPPELIN_NOTEBOOK_AZURE_USER zeppelin.notebook.azure.user user An optional user name of Azure file share
i.e. share/user/notebook/2A94M5J1Z/note.json
ZEPPELIN_NOTEBOOK_STORAGE zeppelin.notebook.storage org.apache.zeppelin.notebook.repo.VFSNotebookRepo Comma separated list of notebook storage
ZEPPELIN_INTERPRETERS zeppelin.interpreters org.apache.zeppelin.spark.SparkInterpreter,
org.apache.zeppelin.spark.PySparkInterpreter,
org.apache.zeppelin.spark.SparkSqlInterpreter,
org.apache.zeppelin.spark.DepInterpreter,
org.apache.zeppelin.markdown.Markdown,
org.apache.zeppelin.shell.ShellInterpreter,
...
Comma separated interpreter configurations [Class]
The first interpreter will be a default value.
It means only the first interpreter in this list can be available without %interpreter_name annotation in notebook paragraph.
ZEPPELIN_INTERPRETER_DIR zeppelin.interpreter.dir interpreter Interpreter directory
ZEPPELIN_WEBSOCKET_MAX_TEXT_MESSAGE_SIZE zeppelin.websocket.max.text.message.size 1024000 Size in characters of the maximum text message to be received by websocket.