Apache Zeppelin Configuration

Zeppelin Properties

Zeppelin can be configured via several sources.

Sources descending by priority: - environment variables can be defined conf/zeppelin-env.sh(conf\zeppelin-env.cmd for Windows). - system properties - configuration file can be defined in conf/zeppelin-site.xml

Mouse hover on each property and click then you can get a link for that.

zeppelin-env.sh zeppelin-site.xml Default value Description
ZEPPELIN_ADDR
zeppelin.server.addr
127.0.0.1 Zeppelin server binding address
ZEPPELIN_PORT
zeppelin.server.port
8080 Zeppelin server port
Note: Please make sure you're not using the same port with Zeppelin web application development port (default: 9000).
ZEPPELIN_SSL_PORT
zeppelin.server.ssl.port
8443 Zeppelin Server ssl port (used when ssl environment/property is set to true)
ZEPPELIN_JMX_ENABLE
zeppelin.jmx.enable
false Enable JMX by defining "true"
ZEPPELIN_JMX_PORT
zeppelin.jmx.port
9996 Port number which JMX uses
ZEPPELIN_MEM
N/A -Xmx1024m -XX:MaxMetaspaceSize=512m JVM mem options
ZEPPELIN_INTP_MEM
N/A ZEPPELIN_MEM JVM mem options for interpreter process
ZEPPELIN_JAVA_OPTS
N/A JVM options
ZEPPELIN_ALLOWED_ORIGINS
zeppelin.server.allowed.origins
* Enables a way to specify a ',' separated list of allowed origins for REST and websockets.
e.g. http://localhost:8080
ZEPPELIN_CREDENTIALS_PERSIST
zeppelin.credentials.persist
true Persist credentials on a JSON file (credentials.json)
ZEPPELIN_CREDENTIALS_ENCRYPT_KEY
zeppelin.credentials.encryptKey
If provided, encrypt passwords on the credentials.json file (passwords will be stored as plain-text otherwise
ZEPPELIN_SERVER_CONTEXT_PATH
zeppelin.server.context.path
/ Context path of the web application
ZEPPELIN_NOTEBOOK_COLLABORATIVE_MODE_ENABLE
zeppelin.notebook.collaborative.mode.enable
true Enable basic opportunity for collaborative editing. Does not change the logic of operation if the note is used by one person.
ZEPPELIN_SSL
zeppelin.ssl
false
ZEPPELIN_SSL_CLIENT_AUTH
zeppelin.ssl.client.auth
false
ZEPPELIN_SSL_KEYSTORE_PATH
zeppelin.ssl.keystore.path
keystore
ZEPPELIN_SSL_KEYSTORE_TYPE
zeppelin.ssl.keystore.type
JKS
ZEPPELIN_SSL_KEYSTORE_PASSWORD
zeppelin.ssl.keystore.password
ZEPPELIN_SSL_KEY_MANAGER_PASSWORD
zeppelin.ssl.key.manager.password
ZEPPELIN_SSL_TRUSTSTORE_PATH
zeppelin.ssl.truststore.path
ZEPPELIN_SSL_TRUSTSTORE_TYPE
zeppelin.ssl.truststore.type
ZEPPELIN_SSL_TRUSTSTORE_PASSWORD
zeppelin.ssl.truststore.password
ZEPPELIN_SSL_PEM_KEY
zeppelin.ssl.pem.key
This directive points to the PEM-encoded private key file for the server.
ZEPPELIN_SSL_PEM_KEY_PASSWORD
zeppelin.ssl.pem.key.password
Password of the PEM-encoded private key.
ZEPPELIN_SSL_PEM_CERT
zeppelin.ssl.pem.cert
This directive points to a file with certificate data in PEM format.
ZEPPELIN_SSL_PEM_CA
zeppelin.ssl.pem.ca
This directive sets the all-in-one file where you can assemble the Certificates of Certification Authorities (CA) whose clients you deal with. These are used for Client Authentication. Such a file is simply the concatenation of the various PEM-encoded Certificate files.
ZEPPELIN_NOTEBOOK_HOMESCREEN
zeppelin.notebook.homescreen
Display note IDs on the Apache Zeppelin homescreen
e.g. 2A94M5J1Z
ZEPPELIN_NOTEBOOK_HOMESCREEN_HIDE
zeppelin.notebook.homescreen.hide
false Hide the note ID set by ZEPPELIN_NOTEBOOK_HOMESCREEN on the Apache Zeppelin homescreen.
For the further information, please read Customize your Zeppelin homepage.
ZEPPELIN_WAR_TEMPDIR
zeppelin.war.tempdir
webapps Location of the jetty temporary directory
ZEPPELIN_NOTEBOOK_DIR
zeppelin.notebook.dir
notebook The root directory where notebook directories are saved
ZEPPELIN_NOTEBOOK_S3_BUCKET
zeppelin.notebook.s3.bucket
zeppelin S3 Bucket where notebook files will be saved
ZEPPELIN_NOTEBOOK_S3_USER
zeppelin.notebook.s3.user
user User name of an S3 bucket
e.g. bucket/user/notebook/2A94M5J1Z/note.json
ZEPPELIN_NOTEBOOK_S3_ENDPOINT
zeppelin.notebook.s3.endpoint
s3.amazonaws.com Endpoint for the bucket
N/A
zeppelin.notebook.s3.timeout
120000 Bucket endpoint request timeout in msec
ZEPPELIN_NOTEBOOK_S3_KMS_KEY_ID
zeppelin.notebook.s3.kmsKeyID
AWS KMS Key ID to use for encrypting data in S3 (optional)
ZEPPELIN_NOTEBOOK_S3_EMP
zeppelin.notebook.s3.encryptionMaterialsProvider
Class name of a custom S3 encryption materials provider implementation to use for encrypting data in S3 (optional)
ZEPPELIN_NOTEBOOK_S3_SSE
zeppelin.notebook.s3.sse
false Save notebooks to S3 with server-side encryption enabled
ZEPPELIN_NOTEBOOK_S3_CANNED_ACL
zeppelin.notebook.s3.cannedAcl
Save notebooks to S3 with the given [Canned ACL](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/CannedAccessControlList.html) which determines the S3 permissions.
ZEPPELIN_NOTEBOOK_S3_PATH_STYLE_ACCESS
zeppelin.notebook.s3.pathStyleAccess
false Access S3 bucket using path style
ZEPPELIN_NOTEBOOK_S3_SIGNEROVERRIDE
zeppelin.notebook.s3.signerOverride
Optional override to control which signature algorithm should be used to sign AWS requests
ZEPPELIN_NOTEBOOK_AZURE_CONNECTION_STRING
zeppelin.notebook.azure.connectionString
The Azure storage account connection string
e.g.
DefaultEndpointsProtocol=https;
AccountName=<accountName>;
AccountKey=<accountKey>
ZEPPELIN_NOTEBOOK_AZURE_SHARE
zeppelin.notebook.azure.share
zeppelin Azure Share where the notebook files will be saved
ZEPPELIN_NOTEBOOK_AZURE_USER
zeppelin.notebook.azure.user
user Optional user name of an Azure file share
e.g. share/user/notebook/2A94M5J1Z/note.json
ZEPPELIN_NOTEBOOK_STORAGE
zeppelin.notebook.storage
org.apache.zeppelin.notebook.repo.GitNotebookRepo Comma separated list of notebook storage locations
ZEPPELIN_NOTEBOOK_ONE_WAY_SYNC
zeppelin.notebook.one.way.sync
false If there are multiple notebook storage locations, should we treat the first one as the only source of truth?
ZEPPELIN_NOTEBOOK_PUBLIC
zeppelin.notebook.public
true Make notebook public (set only owners) by default when created/imported. If set to false will add user to readers and writers as well, making it private and invisible to other users unless permissions are granted.
ZEPPELIN_INTERPRETER_DIR
zeppelin.interpreter.dir
interpreter Interpreter directory
ZEPPELIN_INTERPRETER_DEP_MVNREPO
zeppelin.interpreter.dep.mvnRepo
https://repo1.maven.org/maven2/,https://repo2.maven.org/maven2/ Remote principal repository for interpreter's additional dependency loading
ZEPPELIN_INTERPRETER_OUTPUT_LIMIT
zeppelin.interpreter.output.limit
102400 Output message from interpreter exceeding the limit will be truncated
ZEPPELIN_INTERPRETER_CONNECT_TIMEOUT
zeppelin.interpreter.connect.timeout
600s Interpreter process connect timeout. Default time unit is msec
ZEPPELIN_DEP_LOCALREPO
zeppelin.dep.localrepo
local-repo Local repository for dependency loader.
ex)visualiztion modules of npm.
ZEPPELIN_HELIUM_NODE_INSTALLER_URL
zeppelin.helium.node.installer.url
https://nodejs.org/dist/ Remote Node installer url for Helium dependency loader
ZEPPELIN_HELIUM_NPM_INSTALLER_URL
zeppelin.helium.npm.installer.url
http://registry.npmjs.org/ Remote Npm installer url for Helium dependency loader
ZEPPELIN_HELIUM_YARNPKG_INSTALLER_URL
zeppelin.helium.yarnpkg.installer.url
https://github.com/yarnpkg/yarn/releases/download/ Remote Yarn package installer url for Helium dependency loader
ZEPPELIN_WEBSOCKET_MAX_TEXT_MESSAGE_SIZE
zeppelin.websocket.max.text.message.size
1024000 Size(in characters) of the maximum text message that can be received by websocket.
ZEPPELIN_SERVER_DEFAULT_DIR_ALLOWED
zeppelin.server.default.dir.allowed
false Enable directory listings on server.
ZEPPELIN_NOTEBOOK_GIT_REMOTE_URL
zeppelin.notebook.git.remote.url
GitHub's repository URL. It could be either the HTTP URL or the SSH URL. For example git@github.com:apache/zeppelin.git
ZEPPELIN_NOTEBOOK_GIT_REMOTE_USERNAME
zeppelin.notebook.git.remote.username
token GitHub username. By default it is `token` to use GitHub's API
ZEPPELIN_NOTEBOOK_GIT_REMOTE_ACCESS_TOKEN
zeppelin.notebook.git.remote.access-token
token GitHub access token to use GitHub's API. If username/password combination is used and not GitHub API, then this value is the password
ZEPPELIN_NOTEBOOK_GIT_REMOTE_ORIGIN
zeppelin.notebook.git.remote.origin
token GitHub remote name. Default is `origin`
ZEPPELIN_RUN_MODE
zeppelin.run.mode
auto Run mode. 'auto|local|k8s'. 'auto' autodetect environment. 'local' runs interpreter as a local process. k8s runs interpreter on Kubernetes cluster
ZEPPELIN_K8S_PORTFORWARD
zeppelin.k8s.portforward
false Port forward to interpreter rpc port. Set 'true' only on local development when zeppelin.k8s.mode 'on'. Don't use 'true' on production environment
ZEPPELIN_K8S_CONTAINER_IMAGE
zeppelin.k8s.container.image
apache/zeppelin:0.11.0 Docker image for interpreters
ZEPPELIN_K8S_SPARK_CONTAINER_IMAGE
zeppelin.k8s.spark.container.image
apache/spark:latest Docker image for Spark executors
ZEPPELIN_K8S_TEMPLATE_DIR
zeppelin.k8s.template.dir
k8s Kubernetes yaml spec files
ZEPPELIN_K8S_SERVICE_NAME
zeppelin.k8s.service.name
zeppelin-server Name of the Zeppelin server service resources
ZEPPELIN_K8S_TIMEOUT_DURING_PENDING
zeppelin.k8s.timeout.during.pending
true Value to enable/disable timeout handling when starting Interpreter Pods. Caution: This can lead to an infinity loop
ZEPPELIN_METRIC_ENABLE_PROMETHEUS
zeppelin.metric.enable.prometheus
false Value to enable/disable Prometheus metric endpoint on /metric
ZEPPELIN_NOTEBOOK_CRON_ENABLE
zeppelin.notebook.cron.enable
false Value to enable/disable Cron support in Notes
ZEPPELIN_NOTEBOOK_CRON_FOLDERS
zeppelin.notebook.cron.folders
comma-separated list of folder, where cron is allowed
ZEPPELIN_NOTE_CACHE_THRESHOLD
zeppelin.note.cache.threshold
50 Threshold for the number of notes in the cache before an eviction occurs.
ZEPPELIN_NOTEBOOK_VERSIONED_MODE_ENABLE
zeppelin.notebook.versioned.mode.enable
true Value to enable/disable version control support in Notes.

SSL Configuration

Enabling SSL requires a few configuration changes. First, you need to create certificates and then update necessary configurations to enable server side SSL and/or client side certificate authentication.

Creating and configuring the Certificates

Information how about to generate certificates and a keystore can be found here.

A condensed example can be found in the top answer to this StackOverflow post.

The keystore holds the private key and certificate on the server end. The trustore holds the trusted client certificates. Be sure that the path and password for these two stores are correctly configured in the password fields below. They can be obfuscated using the Jetty password tool. After Maven pulls in all the dependency to build Zeppelin, one of the Jetty jars contain the Password tool. Invoke this command from the Zeppelin home build directory with the appropriate version, user, and password.

java -cp ./zeppelin-server/target/lib/jetty-all-server-<version>.jar \
org.eclipse.jetty.util.security.Password <user> <password>

If you are using a self-signed, a certificate signed by an untrusted CA, or if client authentication is enabled, then the client must have a browser create exceptions for both the normal HTTPS port and WebSocket port. This can by done by trying to establish an HTTPS connection to both ports in a browser (e.g. if the ports are 443 and 8443, then visit https://127.0.0.1:443 and https://127.0.0.1:8443). This step can be skipped if the server certificate is signed by a trusted CA and client auth is disabled.

Configuring server side SSL

The following properties needs to be updated in the zeppelin-site.xml in order to enable server side SSL.

<property>
  <name>zeppelin.server.ssl.port</name>
  <value>8443</value>
  <description>Server ssl port. (used when ssl property is set to true)</description>
</property>

<property>
  <name>zeppelin.ssl</name>
  <value>true</value>
  <description>Should SSL be used by the servers?</description>
</property>

<property>
  <name>zeppelin.ssl.keystore.path</name>
  <value>keystore</value>
  <description>Path to keystore relative to Zeppelin configuration directory</description>
</property>

<property>
  <name>zeppelin.ssl.keystore.type</name>
  <value>JKS</value>
  <description>The format of the given keystore (e.g. JKS or PKCS12)</description>
</property>

<property>
  <name>zeppelin.ssl.keystore.password</name>
  <value>change me</value>
  <description>Keystore password. Can be obfuscated by the Jetty Password tool</description>
</property>

<property>
  <name>zeppelin.ssl.key.manager.password</name>
  <value>change me</value>
  <description>Key Manager password. Defaults to keystore password. Can be obfuscated.</description>
</property>

Enabling client side certificate authentication

The following properties needs to be updated in the zeppelin-site.xml in order to enable client side certificate authentication.

<property>
  <name>zeppelin.server.ssl.port</name>
  <value>8443</value>
  <description>Server ssl port. (used when ssl property is set to true)</description>
</property>

<property>
  <name>zeppelin.ssl.client.auth</name>
  <value>true</value>
  <description>Should client authentication be used for SSL connections?</description>
</property>

<property>
  <name>zeppelin.ssl.truststore.path</name>
  <value>truststore</value>
  <description>Path to truststore relative to Zeppelin configuration directory. Defaults to the keystore path</description>
</property>

<property>
  <name>zeppelin.ssl.truststore.type</name>
  <value>JKS</value>
  <description>The format of the given truststore (e.g. JKS or PKCS12). Defaults to the same type as the keystore type</description>
</property>

<property>
  <name>zeppelin.ssl.truststore.password</name>
  <value>change me</value>
  <description>Truststore password. Can be obfuscated by the Jetty Password tool. Defaults to the keystore password</description>
</property>

Storing user credentials

In order to avoid having to re-enter credentials every time you restart/redeploy Zeppelin, you can store the user credentials. Zeppelin supports this via the ZEPPELINCREDENTIALSPERSIST configuration.

Please notice that passwords will be stored in plain text by default. To encrypt the passwords, use the ZEPPELINCREDENTIALSENCRYPT_KEY config variable. This will encrypt passwords using the AES-128 algorithm.

You can generate an appropriate encryption key any way you'd like - for instance, by using the openssl tool:

openssl enc -aes-128-cbc -k secret -P -md sha1

Important: storing your encryption key in a configuration file is not advised. Depending on your environment security needs, you may want to consider utilizing a credentials server, storing the ZEPPELINCREDENTIALSENCRYPT_KEY as an OS env variable, or any other approach that would not colocate the encryption key and the encrypted content (the credentials.json file).

Obfuscating Passwords using the Jetty Password Tool

Security best practices advise to not use plain text passwords and Jetty provides a password tool to help obfuscating the passwords used to access the KeyStore and TrustStore.

The Password tool documentation can be found here.

After using the tool:

java -cp $ZEPPELIN_HOME/zeppelin-server/target/lib/jetty-util-9.2.15.v20160210.jar \
         org.eclipse.jetty.util.security.Password  \
         password

2016-12-15 10:46:47.931:INFO::main: Logging initialized @101ms
password
OBF:1v2j1uum1xtv1zej1zer1xtn1uvk1v1v
MD5:5f4dcc3b5aa765d61d8327deb882cf99

update your configuration with the obfuscated password :

<property>
  <name>zeppelin.ssl.keystore.password</name>
  <value>OBF:1v2j1uum1xtv1zej1zer1xtn1uvk1v1v</value>
  <description>Keystore password. Can be obfuscated by the Jetty Password tool</description>
</property>

Create GitHub Access Token

When using GitHub to track notebooks, one can use GitHub's API for authentication. To create an access token, please use the following link https://github.com/settings/tokens. The value of the access token generated is set in the zeppelin.notebook.git.remote.access-token property.

Note: After updating these configurations, Zeppelin server needs to be restarted.