HDFS File System Interpreter for Apache Zeppelin

Overview

Hadoop File System is a distributed, fault tolerant file system part of the hadoop project and is often used as storage for distributed processing engines like Hadoop MapReduce and Apache Spark or underlying file systems like Alluxio.

Configuration

Property	Default	Description
hdfs.url	http://localhost:50070/webhdfs/v1/	The URL for WebHDFS
hdfs.user	hdfs	The WebHDFS user
hdfs.maxlength	1000	Maximum number of lines of results fetched

This interpreter connects to HDFS using the HTTP WebHDFS interface. It supports the basic shell file commands applied to HDFS, it currently only supports browsing.

You can use ls [PATH] and ls -l [PATH] to list a directory. If the path is missing, then the current directory is listed. ls supports a -h flag for human readable file sizes.
You can use cd [PATH] to change your current directory by giving a relative or an absolute path.
You can invoke pwd to see your current directory.

Tip : Use ( Ctrl + . ) for autocompletion.

Create Interpreter

In a notebook, to enable the HDFS interpreter, click the Gear icon and select HDFS.

WebHDFS REST API

You can confirm that you're able to access the WebHDFS API by running a curl command against the WebHDFS end point provided to the interpreter.

Here is an example:

$> curl "http://localhost:50070/webhdfs/v1/?op=LISTSTATUS"