# Analyze office files

<figure><img src="/files/5HBl1l5xh4vO2mVq1Fw1" alt="" width="249"><figcaption></figcaption></figure>

## Basics

{% hint style="info" %}
There are **two generations** of **Office** **file** **format**:&#x20;

* the **OLE** formats (file extensions like RTF, DOC, XLS, PPT),&#x20;
* the "**Office Open XML**" formats (file extensions that include DOCX, XLSX, PPTX).&#x20;

Both formats are **structured**, compound file binary formats that enable Linked or Embedded content (Objects).&#x20;

OOXML files are actually **zip** file containers, meaning that one of the easiest ways to check for hidden data is to simply **`unzip`** the document

```
unzip file.docx
```

{% endhint %}

## Are they really malicious ?

```bash
# install basic tools
sudo pip3 install -U oletools

# oleid : analyze OLE files to detect specific characteristics usually found in malicious files
oleid file.xls

# upload the file to virustotal and see...
# https://www.virustotal.com/gui/home/upload
```

### Macros

<pre class="language-bash"><code class="lang-bash">#
# oledump
#
wget https://raw.githubusercontent.com/DidierStevens/DidierStevensSuite/master/oledump.py
python3 oledump.py -h

# List all OLE2 streams present in file.xls
python3 oledump.py file.xls -i

# Extract VBA source code from stream 3 in file.xls
<strong>python3 oledump.py file.xls -s 3 -v
</strong><strong>
</strong><strong># Find obfuscated URLs in file.xls macros
</strong>python3 oledump.py file.xls -p plugin_http_heuristics

# 
# olevba
#
# Extract VBA macros in clear text with deobfuscation and analysis
olevba file.doc
</code></pre>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://book.redsquad.xyz/windows-hacking/office/analyze-office-files.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
