阿里云 OSS-HDFS 服务(JindoFS 服务)元数据导出使用说明§
(从 4.6.0 开始支持)
介绍§
使用元数据导出功能,可以将当前 OSS-HDFS bucket 下的文件元数据清单导出到 /.sysinfo/inventory 目录下,格式为 json 文件,方便用户对元数据进行统计分析
-
配置 JindoFS 命令行工具,配置对应 OSS-HDFS bucket 的访问密钥,参考 JindoFS 命令行工具使用说明
-
执行导出命令
./jindofs admin -dumpInventory oss://<hdfs_bucket>/
此时可以观察到输出路径
=============Dump Inventory=============
Job Id: 0177388834774116055076952082867238
Data Location: /.sysinfo/inventory/1773888347741.0177388834774116055076952082867238
..........
FINISHED.
该命令为阻塞命令,请耐心等待10秒钟~10分钟(根据元数据量大小),知道最后输出FINISHED表示导出成功。
- 下载结果文件
./jindofs fs -get oss://<oss_bucket>/.sysinfo/inventory/1773888347741.0177388834774116055076952082867238
下载到本地,使用vi/vim打开即可。
示例结果参考
{"id":16385,"path":"/","type":"directory","size":0,"user":"admin","group":"supergroup","atime":0,"mtime":1666581702933,"permission":511,"state":1}
{"id":6246684106789500068,"path":"/dls-1000326249","type":"directory","size":0,"user":"hadoop","group":"supergroup","atime":0,"mtime":1660889124590,"permission":511,"state":0}
{"id":6246684106789500069,"path":"/dls-1000326249/benchmark","type":"directory","size":0,"user":"hadoop","group":"supergroup","atime":0,"mtime":1660889124590,"permission":511,"state":0}
{"id":6246684106789500070,"path":"/dls-1000326249/benchmark/n1","type":"directory","size":0,"user":"hadoop","group":"supergroup","atime":0,"mtime":1660889124590,"permission":511,"state":0}
{"id":6246684106789500071,"path":"/dls-1000326249/benchmark/n1/490747449","type":"directory","size":0,"user":"hadoop","group":"supergroup","atime":0,"mtime":1660895613953,"permission":511,"state":0}
输出字段参考
| 字段 | 说明 |
|---|---|
| id | 文件或目录的唯一标识符 |
| path | 文件或目录的绝对路径 |
| type | 类型,可选值:directory(目录)或 file(文件) |
| size | 文件大小,单位为字节(Byte),目录大小为 0 |
| user | 文件或目录的所属用户 |
| group | 文件或目录的所属用户组 |
| ctime | 文件创建时间(Create Time),Unix 时间戳,单位为毫秒 |
| atime | 最后访问时间(Access Time),Unix 时间戳,单位为毫秒 |
| mtime | 最后修改时间(Modify Time),Unix 时间戳,单位为毫秒 |
| storagePolicy | 存储策略,可选值:UNSPECIFIED(默认值,等同于标准)、CLOUD_STD(标准)、CLOUD_IA(低频)、CLOUD_AR(归档)、CLOUD_COLD_AR(冷归档)、CLOUD_DEEP_COLD_AR(深度冷归档)、CLOUD_AR_RESTORED(归档已解冻)、CLOUD_COLD_AR_RESTORED(冷归档已解冻)、CLOUD_DEEP_COLD_AR_RESTORED(深度冷归档已解冻) |
| permission | 权限值,以十进制数值表示(如 511 对应八进制 777) |
| state | 内部字段 |
| storageConvertTime | 内部字段 |
| storageState | 内部字段 |
进阶使用§
1. 指定元数据输出字段§
(从 6.9.1 开始支持)
该功能用于指定所需文件信息字段,默认输出所有字段。
用法:
## -field field : 指定元数据字段
## path为必选字段,另外还需指定一个及以上字段
## 可选字段 : id type size user group ctime atime mtime permission state storagePolicy storageConvertTime storageState
./jindofs admin -dumpInventory oss://<hdfs_bucket>/ -field path -field mtime
示例结果参考
{"path":"/","mtime":1666581702933}
{"path":"/dls-1000326249","mtime":1660889124590}
{"path":"/dls-1000326249/benchmark","mtime":1660889124590}
{"path":"/dls-1000326249/benchmark/n1","mtime":1660889124590}
{"path":"/dls-1000326249/benchmark/n1/490747449","mtime":1660895613953}
2. 指定元数据分析路径§
(从 6.10.0 开始支持)
该功能用于指定文件清单分析路径,默认分析根路径。
用法:
## -path path : 指定元数据分析路径
./jindofs admin -dumpInventory oss://<hdfs_bucket>/ -path oss://<hdfs_bucket>/dls-1000326249/benchmark
示例结果参考
{"id":6246684106789500069,"path":"/dls-1000326249/benchmark","type":"directory","size":0,"user":"hadoop","group":"supergroup","atime":0,"mtime":1660889124590,"permission":511,"state":0}
{"id":6246684106789500070,"path":"/dls-1000326249/benchmark/n1","type":"directory","size":0,"user":"hadoop","group":"supergroup","atime":0,"mtime":1660889124590,"permission":511,"state":0}
{"id":6246684106789500071,"path":"/dls-1000326249/benchmark/n1/490747449","type":"directory","size":0,"user":"hadoop","group":"supergroup","atime":0,"mtime":1660895613953,"permission":511,"state":0}