使用 node. js 和 webhdfs rest api 访问 hadoop hdfs 数据

xiaoxiao2022-07-12 168

Posted by Socrates on 2019年1月31日

在 webhdfs rest api 的帮助下, apache hadoop 公开用于访问和操作 hdfs 内容的服务。要查看此正式文档, 请点击此处。

可提供的服务

以下是可用的服务集:

1) 文件和目录操作

1.1 创建和写入文件: 创建 (http put) 1.2 追加到文件: 追加 (http 开机自检) 1.3 打开和读取文件: 打开 (http 获取) 1.4 制作目录: mkdirs (http put) 1.5 重命名文件目录: rename (http put) 1.6 删除文件目录: 删除 (http 删除) 1.7 file/目录的状态: getfilestatus (http get) 1.8 列出目录: 状态 (http get)

2) 其他文件系统操作

2.1 获取目录的内容摘要: 获取时间摘要 (http 获取) 2.2 获取文件校验和: getfilechecksum (http 获取) 2.3 获取主目录: gotomirecor度 (http get) 2.4 设置权限: setpersion (http put) 2.5 设置所有者: setowner (http put) 2.6 集复制因子: 分离 (http put) 2.7 设置访问或修改时间: 时间 (http put)

启用网络 hdfs api

确保在 hdfs-site. xml 文件中将配置参数 dfs.webhdfs.enabled设置为true (在文件中可以找到此配置文件 {your_hadoop_home_dir}/etc/hadoop 。

<configuration> <property> ..... </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>

从 node. js 连接到 webhdfs

我希望你熟悉 node. js 和软件包安装。如果你不是, 请仔细看这件事。有一个 npm 模块, “节点 web dfs”, 带有一个包装器, 允许您访问 hadoop webhdfs api。您可以使用 npm 安装节点 web hdfs 包:

npm install webhdfs

完成上述步骤后, 您可以编写 node. js 程序来访问此 api。下面是帮助你的几个步骤。

导入相关模块

以下是要导入的外部模块:

const WebHDFS = require("webhdfs"); var request = require("request");

准备连接 url

让我们准备连接 url:

let url = "http://<<your hdfs host name here>>"; let port = 50070; //change here if you are using different port let dir_path = "<<path to hdfs folder>>"; let path = "/webhdfs/v1/" + dir_path + "?op=LISTSTATUS&user.name=hdfs"; let full_url = url+':'+port+path;

列出一个目录

获取 api 并获得结果:

request(full_url, function(error, response, body) { if (!error && response.statusCode == 200) { console.log(".. response body..", body); let jsonStr = JSON.parse(body); let myObj = jsonStr

文件状态;

让 objlength = 对象. 条目 (myobj). 长度;

控制台. 日志 (“.。文件夹中的文件数: “, objlength);

} 否则 {控制台. log (“.。发生错误!.. “);}

以下是 api 的请求和响应 LISTSTATUS 示例:

https://hadoop.apache.org/docs/r1.0.4/webhdfs.html#LISTSTATUS

获取和显示hdfs文件的内容

为 hdfs 文件名分配路径:

let hdfs_file_name = '<<HDFS file path>>' ;

下面的代码将使用客户端连接到 hdfs WebHDFS , 而不是我们在上述部分中使用的请求模块:

let hdfs = WebHDFS.createClient({ user: "<<user> >", host: "<<host/IP >>", port: 50070, //change here if you are using different port path: "webhdfs/v1/" });

下面的代码将读取和显示 hdfs 文件的内容,

下面是 open api 的请求和响应示例:

https://hadoop.apache.org/docs/r1.0.4/webhdfs.html#OPEN

如何读取目录中的所有文件

这不是一件简单的事情, 因为我们没有直接的方法, 但我们可以通过结合上述两个操作来实现它-读取目录, 然后逐一读取该目录中的文件。

结论

我希望您对连接到 hdfs 和使用 node 和 webhdfs 模块执行基本操作有一些想法。一切都好!

最新回复(0)