Analyze office files

Basics

There are two generations of Office file format:

  • the OLE formats (file extensions like RTF, DOC, XLS, PPT),

  • the "Office Open XML" formats (file extensions that include DOCX, XLSX, PPTX).

Both formats are structured, compound file binary formats that enable Linked or Embedded content (Objects).

OOXML files are actually zip file containers, meaning that one of the easiest ways to check for hidden data is to simply unzip the document

unzip file.docx

Are they really malicious ?

# install basic tools
sudo pip3 install -U oletools

# oleid : analyze OLE files to detect specific characteristics usually found in malicious files
oleid file.xls

# upload the file to virustotal and see...
# https://www.virustotal.com/gui/home/upload

Macros

#
# oledump
#
wget https://raw.githubusercontent.com/DidierStevens/DidierStevensSuite/master/oledump.py
python3 oledump.py -h

# List all OLE2 streams present in file.xls
python3 oledump.py file.xls -i

# Extract VBA source code from stream 3 in file.xls
python3 oledump.py file.xls -s 3 -v

# Find obfuscated URLs in file.xls macros
python3 oledump.py file.xls -p plugin_http_heuristics

# 
# olevba
#
# Extract VBA macros in clear text with deobfuscation and analysis
olevba file.doc

Last updated

Was this helpful?