Hadoop File System Forensics Toolkit (HDFS FTK).
View and extract files from an offline image of Hadoop file system.
Supports:
+ Support for multiple datanodes
+ Support of fsimage XML format
+ Search filenames and filter by filetype
+ File recovery while preserving metadata

hdfs ftk
Motivation
Hadoop File Systems is one of the most widely used distributed file systems in the world. However, forensic techniques to analyze and audit the systems remain limited.
In HDFS, metadata is separated from the actual data blocks. The namenode contains metadata (file name, timestamps, permissions); while actual data is stored in the datanodes in blocks. Although HDFS has command client tools to manage the extraction files, it only works with a running cluster of HDFS machines. This tool aims to provide investigators with the ability to perform forensics analysis on offline evidence captures of Hadoop File System images.
PreRequisites
+ Python 3 and above
+ PrettyTable: pip install prettytable
* Evidence Acquisition Procedure
Obtain metadata from namenode
$namenode: hdfs dfsadmin -safemode enter $namenode: hdfs dfsadmin –saveNamespace $namenode: hdfs oiv -i <PATH_TO_FSIMAGE> -o <FILE> -p XML
* Archive data from datanodes
$datanodes: tar czf datanodex.tar.gz $HADOOP_HOME/Hadoop_data
* SCP files to a local forensic workstation and untar the datanodes’ data directory.
Use and Download:
git clone https://github.com/edisonljh/hadoop_ftk && cd hadoop_ftk pip3 install prettytable Example commands: To view contents of HDFS File System: python hdfs_ftk.py -f fsimage.xml Display fsimage: python hdfs_ftk.py -f test/fsimage.xml -displayfsimage Filtering by name: python hdfs_ftk.py -f test/fsimage.xml -displayfsimage -filterByName tartans Extract block id 16386 from HDFS with three datanodes: python hdfs_ftk.py -f test/fsimage.xml -v -r 16386 -o /output -d 3
Source: https://github.com/edisonljh
Click to Open Code Editor