Quick Start
Welcome to your first trial to explore Apache Zeppelin! This page will help you to get started and here is the list of topics covered.
Installation
Apache Zeppelin officially supports and is tested on next environments.
Name | Value |
---|---|
Oracle JDK | 1.7 (set JAVA_HOME ) |
OS | Mac OSX Ubuntu 14.X CentOS 6.X Windows 7 Pro SP1 |
There are two options to install Apache Zeppelin on your machine. One is downloading pre-built binary package from the archive. You can download not only the latest stable version but also the older one if you need. The other option is building from the source. Although it can be unstable somehow since it is on development status, you can explore newly added feature and change it as you want.
Downloading Binary Package
If you want to install Apache Zeppelin with a stable binary package, please visit Apache Zeppelin download Page.
If you have downloaded netinst
binary, install additional interpreters before you start Zeppelin. Or simply run ./bin/install-interpreter.sh --all
.
After unpacking, jump to Starting Apache Zeppelin with Command Line section.
Building from Source
If you want to build from the source, the software below needs to be installed on your system.
Name | Value |
---|---|
Git | |
Maven | 3.1.x or higher |
If you don't have it installed yet, please check Before Build section and follow step by step instructions from there.
1. Clone Apache Zeppelin repository
git clone https://github.com/apache/zeppelin.git
2. Build source with options
Each interpreters requires different build options. For the further information about options, please see Build section.
mvn clean package -DskipTests [Options]
Here are some examples with several options
# basic build
mvn clean package -Pspark-1.6 -Phadoop-2.4 -Pyarn -Ppyspark
# spark-cassandra integration
mvn clean package -Pcassandra-spark-1.5 -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests
# with CDH
mvn clean package -Pspark-1.5 -Dhadoop.version=2.6.0-cdh5.5.0 -Phadoop-2.6 -Pvendor-repo -DskipTests
# with MapR
mvn clean package -Pspark-1.5 -Pmapr50 -DskipTests
For the further information about building with source, please see README.md in Zeppelin repository.
Starting Apache Zeppelin with Command Line
Start Zeppelin
bin/zeppelin-daemon.sh start
If you are using Windows
bin\zeppelin.cmd
After successful start, visit http://localhost:8080 with your web browser.
Stop Zeppelin
bin/zeppelin-daemon.sh stop
(Optional) Start Apache Zeppelin with a service manager
Note : The below description was written based on Ubuntu Linux.
Apache Zeppelin can be auto started as a service with an init script, such as services managed by upstart.
The following is an example of upstart script to be saved as /etc/init/zeppelin.conf
This also allows the service to be managed with commands such as
sudo service zeppelin start
sudo service zeppelin stop
sudo service zeppelin restart
Other service managers could use a similar approach with the upstart
argument passed to the zeppelin-daemon.sh
script.
bin/zeppelin-daemon.sh upstart
zeppelin.conf
description "zeppelin"
start on (local-filesystems and net-device-up IFACE!=lo)
stop on shutdown
# Respawn the process on unexpected termination
respawn
# respawn the job up to 7 times within a 5 second period.
# If the job exceeds these values, it will be stopped and marked as failed.
respawn limit 7 5
# zeppelin was installed in /usr/share/zeppelin in this example
chdir /usr/share/zeppelin
exec bin/zeppelin-daemon.sh upstart
What is the next?
Congratulation on your successful Apache Zeppelin installation! Here are two next steps you might need.
If you are new to Apache Zeppelin
- For an in-depth overview of Apache Zeppelin UI, head to Explore Apache Zeppelin UI.
- After getting familiar with Apache Zeppelin UI, have fun with a short walk-through Tutorial that uses Apache Spark backend.
- If you need more configuration setting for Apache Zeppelin, jump to the next section: Apache Zeppelin Configuration.
If you need more information about Spark or JDBC interpreter setting
- Apache Zeppelin provides deep integration with Apache Spark. For the further informtation, see Spark Interpreter for Apache Zeppelin.
- Also, you can use generic JDBC connections in Apache Zeppelin. Go to Generic JDBC Interpreter for Apache Zeppelin.
If you are in multi-user environment
- You can set permissions for your notebooks and secure data resource in multi-user environment. Go to More -> Security section.
Apache Zeppelin Configuration
You can configure Apache Zeppelin with both environment variables in conf/zeppelin-env.sh
(conf\zeppelin-env.cmd
for Windows) and Java properties in conf/zeppelin-site.xml
. If both are defined, then the environment variables will take priority.
zepplin-env.sh | zepplin-site.xml | Default value | Description |
---|---|---|---|
ZEPPELIN_PORT | zeppelin.server.port | 8080 | Zeppelin server port |
ZEPPELIN_MEM | N/A | -Xmx1024m -XX:MaxPermSize=512m | JVM mem options |
ZEPPELIN_INTP_MEM | N/A | ZEPPELIN_MEM | JVM mem options for interpreter process |
ZEPPELIN_JAVA_OPTS | N/A | JVM options | |
ZEPPELIN_ALLOWED_ORIGINS | zeppelin.server.allowed.origins | * | Enables a way to specify a ',' separated list of allowed origins for rest and websockets. i.e. http://localhost:8080 |
N/A | zeppelin.anonymous.allowed | true | Anonymous user is allowed by default. |
ZEPPELIN_SERVER_CONTEXT_PATH | zeppelin.server.context.path | / | A context path of the web application |
ZEPPELIN_SSL | zeppelin.ssl | false | |
ZEPPELIN_SSL_CLIENT_AUTH | zeppelin.ssl.client.auth | false | |
ZEPPELIN_SSL_KEYSTORE_PATH | zeppelin.ssl.keystore.path | keystore | |
ZEPPELIN_SSL_KEYSTORE_TYPE | zeppelin.ssl.keystore.type | JKS | |
ZEPPELIN_SSL_KEYSTORE_PASSWORD | zeppelin.ssl.keystore.password | ||
ZEPPELIN_SSL_KEY_MANAGER_PASSWORD | zeppelin.ssl.key.manager.password | ||
ZEPPELIN_SSL_TRUSTSTORE_PATH | zeppelin.ssl.truststore.path | ||
ZEPPELIN_SSL_TRUSTSTORE_TYPE | zeppelin.ssl.truststore.type | ||
ZEPPELIN_SSL_TRUSTSTORE_PASSWORD | zeppelin.ssl.truststore.password | ||
ZEPPELIN_NOTEBOOK_HOMESCREEN | zeppelin.notebook.homescreen | A notebook id displayed in Apache Zeppelin homescreen i.e. 2A94M5J1Z |
|
ZEPPELIN_NOTEBOOK_HOMESCREEN_HIDE | zeppelin.notebook.homescreen.hide | false | This value can be "true" when to hide the notebook id set by ZEPPELIN_NOTEBOOK_HOMESCREEN on the Apache Zeppelin homescreen. For the further information, please read Customize your Zeppelin homepage. |
ZEPPELIN_WAR_TEMPDIR | zeppelin.war.tempdir | webapps | A location of jetty temporary directory |
ZEPPELIN_NOTEBOOK_DIR | zeppelin.notebook.dir | notebook | The root directory where notebook directories are saved |
ZEPPELIN_NOTEBOOK_S3_BUCKET | zeppelin.notebook.s3.bucket | zeppelin | S3 Bucket where notebook files will be saved |
ZEPPELIN_NOTEBOOK_S3_USER | zeppelin.notebook.s3.user | user | A user name of S3 bucket i.e. bucket/user/notebook/2A94M5J1Z/note.json |
ZEPPELIN_NOTEBOOK_S3_ENDPOINT | zeppelin.notebook.s3.endpoint | s3.amazonaws.com | Endpoint for the bucket |
ZEPPELIN_NOTEBOOK_S3_KMS_KEY_ID | zeppelin.notebook.s3.kmsKeyID | AWS KMS Key ID to use for encrypting data in S3 (optional) | |
ZEPPELIN_NOTEBOOK_S3_EMP | zeppelin.notebook.s3.encryptionMaterialsProvider | Class name of a custom S3 encryption materials provider implementation to use for encrypting data in S3 (optional) | |
ZEPPELIN_NOTEBOOK_AZURE_CONNECTION_STRING | zeppelin.notebook.azure.connectionString | The Azure storage account connection string i.e. DefaultEndpointsProtocol=https;AccountName=<accountName>;AccountKey=<accountKey> |
|
ZEPPELIN_NOTEBOOK_AZURE_SHARE | zeppelin.notebook.azure.share | zeppelin | Share where the notebook files will be saved |
ZEPPELIN_NOTEBOOK_AZURE_USER | zeppelin.notebook.azure.user | user | An optional user name of Azure file share i.e. share/user/notebook/2A94M5J1Z/note.json |
ZEPPELIN_NOTEBOOK_STORAGE | zeppelin.notebook.storage | org.apache.zeppelin.notebook.repo.VFSNotebookRepo | Comma separated list of notebook storage |
ZEPPELIN_INTERPRETERS | zeppelin.interpreters | org.apache.zeppelin.spark.SparkInterpreter, org.apache.zeppelin.spark.PySparkInterpreter, org.apache.zeppelin.spark.SparkSqlInterpreter, org.apache.zeppelin.spark.DepInterpreter, org.apache.zeppelin.markdown.Markdown, org.apache.zeppelin.shell.ShellInterpreter, ... |
Comma separated interpreter configurations [Class] The first interpreter will be a default value. It means only the first interpreter in this list can be available without %interpreter_name annotation in notebook paragraph. |
ZEPPELIN_INTERPRETER_DIR | zeppelin.interpreter.dir | interpreter | Interpreter directory |
ZEPPELIN_WEBSOCKET_MAX_TEXT_MESSAGE_SIZE | zeppelin.websocket.max.text.message.size | 1024000 | Size in characters of the maximum text message to be received by websocket. |