Download MASTIFF source: http://sourceforge.net/projects/mastiff/
Unpack.
$ tar -xzf mastiff-0.6.0.tar.gz
Install prerequisites for MASTIFF:
$ sudo aptitude install python-setuptools
$ sudo aptitude install python-magic
Install (for development testing):
$ sudo make dev
(For testing OfficeDissector plugins, installing all other plugin requirements is not necessary. However, errors occur if they are present without the required modules. Therefore, if not installing all plugin requirements, remove all Generic plugins from: plugins/GEN/.)
To uninstall:
$ sudo make dev-clean
The mastiff.conf file controls all settings and paths for MASTIFF. MASTIFF (apparently) searches for it in /etc/mastiff/mastiff.conf, ~/.mastiff.conf, and ./mastiff.conf. You can also force MASTIFF to use a specific path to mastiff.conf through the -c command line option.
Copy plugins to the MASTIFF source file directory (created when unpacking the source tarball):
$ cp path/to/officedissector/mastiff-plugins/* path/to/mastiff/plugins/Office
(MASTIFF’s plugins directory is specified in the mastiff.conf file.)
MASTIFF uses the magic file to detect OOXML documents. If running:
$ file test/unit_test/testdocs/test.docx
does not show Microsoft Word 2007+, upgrade the magic library to identify OOXML files:
$ cp /etc/magic /tmp/magic.backup
$ sudo sh -c 'cat mastiff-plugins/magic-ooxml >> /etc/magic'
For more information, see: http://serverfault.com/questions/338087/making-libmagic-file-detect-docx-files
Alternatively, installing TrID ( http://mark0.net/soft-trid-e.html ) and editing mastiff.conf to provide the path to TrID should fix this problem.
$ mas.py ooxml_file(s) ...
Output can be found in the /path/to/mastiff-source-files/work/log directory.
If MASTIFF is not being run from the MASTIFF source directory, the user must specify the location of mastiff.conf using the -c flag. Eg.
$ mas.py -c /path/to/mastiff/mastiff.conf ooxml_files(s) ...
Also note that if mastiff.conf is using the default settings, MASTIFF will run only if the current working directory is the MASTIFF source directory. Otherwise, specify location of the plugins directory in mastiff.conf.
The architecture to create OfficeDissector plugins is located at: mastiff-plugins/Office-officedissector-skel.py See the beginning of that file for instructions for creating OfficeDissector plugins.
There are three sample plugins:
To see this plugin run, use:
$ mas.py -p 'OfficeDissector Extract Multimedia' /path/to/officedissector/test/unit_test/testdocs/037027.pptx
MASTIFF stores results in a folder with path work/log/[md5sum of file]/.
To find the multimedia parts, use:
$ ls work/log/8aeb72b3751238a37aff319585434327/parts
To see this plugin run, use:
$ mas.py -p 'OfficeDissector Extract URLs' /path/to/officedissector/test/unit_test/testdocs/037027.pptx
To find the URLs, use:
$ cat work/log/8aeb72b3751238a37aff319585434327/urls.txt
To see this plugin run, use:
$ mas.py -p 'OfficeDissector Extract Embedded Code' /path/to/officedissector/test/unit_test/testdocs/macros.xlsm
To find the embedded code parts, use:
$ ls work/log/39c7ca586fefb547b8d7474130ec0fe5/parts
To test, first install the plugins as mentioned above. Then:
# Functional tests
$ cd test/unit_test $ python test_plugins.py PATH_TO_MASTIFF_SOURCE_DIR
# Performance tests
$ cd test $ python test_plugin_performance.py PATH_TO_MASTIFF_SOURCE_DIR
Note that the tests assume that MASTIFF will output results in PATH_TO_MASTIFF_SOURCE_DIR. If you have configured MASTIFF to output results to the current directory, you may need to specify that instead:
$ cd test/unit_test $ python test_plugins.py . # If MASTIFF is configured to output results to the current directory
For accurate performance results: