Generic JDBC Interpreter for Apache Zeppelin
Overview
This interpreter lets you create a JDBC connection to any data source, by now it has been tested with:
- Postgres
- MySql
- MariaDB
- Redshift
- Apache Hive
- Apache Phoenix
- Apache Drill (Details on using Drill JDBC Driver)
- Apache Tajo
If someone else used another database please report how it works to improve functionality.
Create Interpreter
When you create a interpreter by default use PostgreSQL with the next properties:
name | value |
---|---|
common.max_count | 1000 |
default.driver | org.postgresql.Driver |
default.password | ******** |
default.url | jdbc:postgresql://localhost:5432/ |
default.user | gpadmin |
It is not necessary to add driver jar to the classpath for PostgreSQL as it is included in Zeppelin.
Simple connection
Prior to creating the interpreter it is necessary to add maven coordinate or path of the JDBC driver to the Zeppelin classpath. To do this you must edit dependencies artifact(ex. mysql:mysql-connector-java:5.1.38
) in interpreter menu as shown:
To create the interpreter you need to specify connection parameters as shown in the table.
name | value |
---|---|
common.max_count | 1000 |
default.driver | driver name |
default.password | ******** |
default.url | jdbc url |
default.user | user name |
Multiple connections
JDBC interpreter also allows connections to multiple data sources. It is necessary to set a prefix for each connection to reference it in the paragraph in the form of %jdbc(prefix)
. Before you create the interpreter it is necessary to add each driver's maven coordinates or JDBC driver's jar file path to the Zeppelin classpath. To do this you must edit the dependencies of JDBC interpreter in interpreter menu as following:
You can add all the jars you need to make multiple connections into the same JDBC interpreter. To create the interpreter you must specify the parameters. For example we will create two connections to MySQL and Redshift, the respective prefixes are default
and redshift
:
name | value |
---|---|
common.max_count | 1000 |
default.driver | com.mysql.jdbc.Driver |
default.password | ******** |
default.url | jdbc:mysql://localhost:3306/ |
default.user | mysql-user |
redshift.driver | com.amazon.redshift.jdbc4.Driver |
redshift.password | ******** |
redshift.url | jdbc:redshift://examplecluster.abc123xyz789.us-west-2.redshift.amazonaws.com:5439 |
redshift.user | redshift-user |
Bind to Notebook
In the Notebook
click on the settings
icon at the top-right corner. Use select/deselect to specify the interpreters to be used in the Notebook
.
More Properties
You can modify the interpreter configuration in the Interpreter
section. The most common properties are as follows, but you can specify other properties that need to be connected.
Property Name | Description |
---|---|
{prefix}.url | JDBC URL to connect, the URL must include the name of the database |
{prefix}.user | JDBC user name |
{prefix}.password | JDBC password |
{prefix}.driver | JDBC driver name. |
common.max_result | Max number of SQL result to display to prevent the browser overload. This is common properties for all connections |
To develop this functionality use this method. For example if a connection needs a schema parameter, it would have to add the property as follows:
name | value |
---|---|
{prefix}.schema | schema_name |
Examples
Hive
Properties
Name | Value |
---|---|
hive.driver | org.apache.hive.jdbc.HiveDriver |
hive.url | jdbc:hive2://localhost:10000 |
hive.user | hiveuser |
hive.password | hivepassword |
Dependencies
Artifact | Excludes |
---|---|
org.apache.hive:hive-jdbc:0.14.0 | |
org.apache.hadoop:hadoop-common:2.6.0 |
Phoenix
Phoenix supports thick
and thin
connection types:
- Thick client is faster, but must connect directly to ZooKeeper and HBase RegionServers.
- Thin client has fewer dependencies and connects through a Phoenix Query Server instance.
Use the appropriate phoenix.driver
and phoenix.url
for your connection type.
Properties:
Name | Value | Description |
---|---|---|
phoenix.driver | org.apache.phoenix.jdbc.PhoenixDriver | 'Thick Client', connects directly to Phoenix |
phoenix.driver | org.apache.phoenix.queryserver.client.Driver | 'Thin Client', connects via Phoenix Query Server |
phoenix.url | jdbc:phoenix:localhost:2181:/hbase-unsecure | 'Thick Client', connects directly to Phoenix |
phoenix.url | jdbc:phoenix:thin:url=http://localhost:8765;serialization=PROTOBUF | 'Thin Client', connects via Phoenix Query Server |
phoenix.user | phoenixuser | |
phoenix.password | phoenixpassword |
Dependencies:
Include the dependency for your connection type (it should be only one of the following).
Artifact | Excludes | Description |
---|---|---|
org.apache.phoenix:phoenix-core:4.4.0-HBase-1.0 | 'Thick Client', connects directly to Phoenix | |
org.apache.phoenix:phoenix-server-client:4.7.0-HBase-1.1 | 'Thin Client' for Phoenix 4.7, connects via Phoenix Query Server | |
org.apache.phoenix:phoenix-queryserver-client:4.8.0-HBase-1.2 | 'Thin Client' for Phoenix 4.8+, connects via Phoenix Query Server |
Tajo
Properties
Name | Value |
---|---|
tajo.driver | org.apache.tajo.jdbc.TajoDriver |
tajo.url | jdbc:tajo://localhost:26002/default |
Dependencies
Artifact | Excludes |
---|---|
org.apache.tajo:tajo-jdbc:0.11.0 |
How to use
Reference in paragraph
Start the paragraphs with the %jdbc
, this will use the default
prefix for connection. If you want to use other connection you should specify the prefix of it as follows %jdbc(prefix)
:
%jdbc
SELECT * FROM db_name;
or
%jdbc(prefix)
SELECT * FROM db_name;
Apply Zeppelin Dynamic Forms
You can leverage Zeppelin Dynamic Form inside your queries. You can use both the text input
and select form
parametrization features
%jdbc(prefix)
SELECT name, country, performer
FROM demo.performers
WHERE name=''
Bugs & Reporting
If you find a bug for this interpreter, please create a JIRA ticket.