Elasticsearch Interpreter for Apache Zeppelin

1. Configuration


Property Default Description
elasticsearch.cluster.name elasticsearch Cluster name
elasticsearch.host localhost Host of a node in the cluster
elasticsearch.port 9300 Connection port (important: this is not the HTTP port, but the transport port)
elasticsearch.result.size 10 The size of the result set of a search query

Interpreter configuration

Note #1: you can add more properties to configure the Elasticsearch client.

Note #2: if you use Shield, you can add a property named shield.user with a value containing the name and the password (format: username:password). For more details about Shield configuration, consult the Shield reference guide. Do not forget, to copy the shield client jar in the interpreter directory (ZEPPELIN_HOME/interpreters/elasticsearch).


2. Enabling the Elasticsearch Interpreter

In a notebook, to enable the Elasticsearch interpreter, click the Gear icon and select Elasticsearch.


3. Using the Elasticsearch Interpreter

In a paragraph, use %elasticsearch to select the Elasticsearch interpreter and then input all commands. To get the list of available commands, use help.

| %elasticsearch
| help
Elasticsearch interpreter:
General format: <command> /<indices>/<types>/<id> <option> <JSON>
  - indices: list of indices separated by commas (depends on the command)
  - types: list of document types separated by commas (depends on the command)
Commands:
  - search /indices/types <query>
    . indices and types can be omitted (at least, you have to provide '/')
    . a query is either a JSON-formatted query, nor a lucene query
  - size <value>
    . defines the size of the result set (default value is in the config)
    . if used, this command must be declared before a search command
  - count /indices/types <query>
    . same comments as for the search
  - get /index/type/id
  - delete /index/type/id
  - index /ndex/type/id <json-formatted document>
    . the id can be omitted, elasticsearch will generate one

Tip: use (CTRL + .) for completion

get

With the get command, you can find a document by id. The result is a JSON document.

| %elasticsearch
| get /index/type/id

Example: Elasticsearch - Get

search

With the search command, you can send a search query to Elasticsearch. There are two formats of query:

  • You can provide a JSON-formatted query, that is exactly what you provide when you use the REST API of Elasticsearch.
  • You can also provide the content of a query_string
    • This is a shortcut to a query like that: { "query": { "query_string": { "query": "__HERE YOUR QUERY__", "analyze_wildcard": true } } }
    • See Elasticsearch query string syntax for more details about the content of such a query.
| %elasticsearch
| search /index1,index2,.../type1,type2,...  <JSON document containing the query or query_string elements>

If you want to modify the size of the result set, you can add a line that is setting the size, before your search command.

| %elasticsearch
| size 50
| search /index1,index2,.../type1,type2,...  <JSON document containing the query or query_string elements>

A search query can also contain aggregations. If there is at least one aggregation, the result of the first aggregation is shown, otherwise, you get the search hits.

Examples:

  • With a JSON query:

    | %elasticsearch
    | search / { "query": { "match_all": {} } }
    |
    | %elasticsearch
    | search /logs { "query": { "query_string": { "query": "request.method:GET AND status:200" } } }
    |
    | %elasticsearch
    | search /logs { "aggs": {
    |   "content_length_stats": {
    |     "extended_stats": {
    |       "field": "content_length"
    |     }
    |   }
    | } } 
    
  • With query_string elements:

    | %elasticsearch
    | search /logs request.method:GET AND status:200
    |
    | %elasticsearch
    | search /logs (404 AND (POST OR DELETE))
    

Important: a document in Elasticsearch is a JSON document, so it is hierarchical, not flat as a row in a SQL table. For the Elastic interpreter, the result of a search query is flattened.

Suppose we have a JSON document:

{
  "date": "2015-12-08T21:03:13.588Z",
  "request": {
    "method": "GET",
    "url": "/zeppelin/4cd001cd-c517-4fa9-b8e5-a06b8f4056c4",
    "headers": [ "Accept: *.*", "Host: apache.org"]
  },
  "status": "403",
  "content_length": 1234
}

The data will be flattened like this:

content_length date request.headers[0] request.headers[1] request.method request.url status
1234 2015-12-08T21:03:13.588Z Accept: *.* Host: apache.org GET /zeppelin/4cd001cd-c517-4fa9-b8e5-a06b8f4056c4 403

Examples:

  • With a table containing the results: Elasticsearch - Search - table

  • You can also use a predefined diagram: Elasticsearch - Search - diagram

  • With a JSON query: Elasticsearch - Search with query

  • With a query string: Elasticsearch - Search with query string

  • With a query containing a multi-value metric aggregation: Elasticsearch - Search with aggregation (multi-value metric)

  • With a query containing a multi-bucket aggregation: Elasticsearch - Search with aggregation (multi-bucket)

count

With the count command, you can count documents available in some indices and types. You can also provide a query.

| %elasticsearch
| count /index1,index2,.../type1,type2,... <JSON document containing the query OR a query string>

Examples:

  • Without query: Elasticsearch - Count

  • With a query: Elasticsearch - Count with query

index

With the index command, you can insert/update a document in Elasticsearch.

| %elasticsearch
| index /index/type/id <JSON document>
|
| %elasticsearch
| index /index/type <JSON document>

delete

With the delete command, you can delete a document.

| %elasticsearch
| delete /index/type/id

Apply Zeppelin Dynamic Forms

You can leverage Zeppelin Dynamic Form inside your queries. You can use both the text input and select form parameterization features

| %elasticsearch
| size ${limit=10}
| search /index/type { "query": { "match_all": {} } }