Welcome to our

Cyber Security News Aggregator

Cyber Tzar

provide a

cyber security risk management

platform; including automated penetration tests and risk assesments culminating in a "cyber risk score" out of 1,000, just like a credit score.

First slide label

Some representative placeholder content for the first slide.

Second slide label

Some representative placeholder content for the second slide.

Third slide label

Some representative placeholder content for the third slide.

Detecting Embedded Content in OOXML Documents

published on 2021-08-18 15:30:00 UTC by Aaron Stephens
Content:

On Advanced Practices, we are always looking for new ways to find malicious activity and track adversaries over time. Today we’re sharing a technique we use to detect and cluster Microsoft Office documents—specifically those in the Office Open XML (OOXML) file format. Additionally, we’re releasing a tool so analysts and defenders can automatically generate YARA rules using this technique.

OOXML File Format

Beginning with Microsoft Office 2007, the default file format for Excel, PowerPoint, and Word documents switched from an Object Linking and Embedding (OLE) based format to OOXML. For now, the only part of this that’s important to understand is OOXML documents are just a bunch of folders and files packaged into a ZIP archive. Let’s look at the Word document this blog post is being written in (Figure 1), for example:

➜ file example.docx
example.docx: Microsoft Word 2007+

➜ unzip -v example.docx
Archive: example.docx

Length Method Size Cmpr Date Time CRC-32 Name

-------- ------ ------- ---- ---------- ----- -------- ----

1445 Defl:S 358 75% 01-01-1980 00:00 576f9132 [Content_Types].xml

590 Defl:S 239 60% 01-01-1980 00:00 b71a911e _rels/.rels

1559 Defl:S 407 74% 01-01-1980 00:00 33ce17ac word/_rels/document.xml.rels

10861 Defl:S 2480 77% 01-01-1980 00:00 f0af2147 word/document.xml

8393 Defl:S 1746 79% 01-01-1980 00:00 9867f4b6 word/theme/theme1.xml

4725 Defl:S 1416 70% 01-01-1980 00:00 718205c5 word/settings.xml

655 Defl:S 295 55% 01-01-1980 00:00 bf8dd4bd word/webSettings.xml

755 Defl:S 367 51% 01-01-1980 00:00 5bf1cf49 docProps/core.xml

991 Defl:S 476 52% 01-01-1980 00:00 bad67489 docProps/app.xml

30308 Defl:S 3104 90% 01-01-1980 00:00 ce0f21cd word/styles.xml

7781 Defl:S 952 88% 01-01-1980 00:00 9f45bf02 word/numbering.xml

2230 Defl:S 559 75% 01-01-1980 00:00 63baaf8c word/fontTable.xml

-------- ------- --- -------

70293 12399 82% 12 files

Figure 1: unzip -v output for example.docx

Now, even though we used the unzip command, we didn’t actually unzip the archive. The output provided by the -v option is derived from the ZIP local file headers, which contain a wealth of information on the compressed files. Of particular interest is the CRC-32 value.

A cyclic redundancy check (CRC) is an algorithm designed to detect errors or unintended changes to data. The idea is a system can calculate a CRC value before and after a transfer or transformation of data as a simple way to ensure its integrity. For ZIP archives, the CRC-32 values confirm the decompressed files are the same as they were prior to compression. Which is great and all, but they can serve other use cases too.

Detection

Forget about error-detection. A ZIP CRC-32 value is essentially a small hash of the uncompressed file, and what better way to identify a file than by its hash? While the chance of a collision for CRC-32 is significantly higher than other algorithms such as SHA-256 or even MD5, it can be paired with additional metadata like the file name (or extension) and size to reduce false positives.

Here’s a hex dump of the first local file header from the previous example (Figure 2):

Figure 2: Hex dump of the first local file header for example.docx

Using the CRC-32, uncompressed file size, and file name fields, a YARA rule for this entry can be written as follows:

rule content_types {
meta:
author = "Aaron Stephens <aaron.stephens@mandiant.com>"
description = "Example OOXML rule."

strings:
$crc = { 32 91 6f 57 }
$name = "[Content_Types].xml"
$size = { a5 05 00 00 }

condition:
$size at @crc[1] + 8 and $name at @crc[1] + 16
}

NOTE: The numeric fields are stored in little-endian.

Examples

Advanced Practices uses this technique to find similar documents that contain the same embedded file over time. Here are a couple real-world examples:

Document: 397ba1d0601558dfe34cd5aafaedd18e
File: 0dc39af4899f6aa0a8d29426aba59314 (word\media\image1.png)
Groups: UNC1130, UNC1837, UNC1965

rule png_397ba1d0601558dfe34cd5aafaedd18e {
meta:
author = "Aaron Stephens <aaron.stephens@mandiant.com>"
description = "PNG in OOXML document."

strings:
$crc = {f8158b40}
$ext = ".png"
$ufs = {b42c0000}

condition:
$ufs at @crc[1] + 8 and $ext at @crc[1] + uint16(@crc[1] + 12) + 16 - 4
}

This rule detects OOXML documents, which contain a specific PNG image seen in Figure 3.

Figure 3: PNG embedded in phishing documents

Figure 3 is found in several documents dropping LATEOP, and has been attributed to groups such as UNC1130, a North Korean state-sponsored threat actor.

Document: 252227b8701d45deb0cc6b0edad98836
File: 3bdfaf98d820a1d8536625b9efd3bb14 ([Content_Types].xml)
Groups: FIN7

rule xml_252227b8701d45deb0cc6b0edad98836 {
meta:
author = "Aaron Stephens <aaron.stephens@mandiant.com>"
description = "[Content_Types].xml in OOXML document."

strings:
$crc = {8cf0d220}
$name = "[Content_Types].xml"
$ufs = {9b060000}

condition:
$ufs at @crc[1] + 8 and $name at @crc[1] + 16
}

This rule detects a specific [Content_Types].xml file, which is shown (formatted) in Figure 4.

Figure 4: Formatted [Content_Types].xml file

This file maps different parts of the OOXML package to their content type. Given a unique enough combination of parts and types, the [Content_Types].xml file can be a great way to find similar OOXML documents. This particular example is found in multiple FIN7 GRIFFON samples.

Tooling

Last but not least, it’s time to introduce apooxml, a Python tool that can be used to quickly and easily generate YARA rules just like these. Here’s how it works:

➜ python3 apooxml.py -h
usage: apooxml.py [-h] [-a AUTHOR] [-n NAME] [-o OUT] sample

Generate YARA rules for OOXML documents.

positional arguments:
sample OOXML document to generate YARA rule from.

optional arguments:
-h, --help show this help message and exit
-a AUTHOR, --author AUTHOR
YARA rule author.
-n NAME, --name NAME YARA rule name.
-o OUT, --out OUT YARA rule file name.

➜ python3 apooxml.py -o 'example.yara' 397ba1d0601558dfe34cd5aafaedd18e
1. [Content_Types].xml             1980-01-01 00:00:00 14506c9d 1613
2. _rels/.rels                     1980-01-01 00:00:00 b71a911e 590
3. word/_rels/document.xml.rels    1980-01-01 00:00:00 ab5e83b7 1207
4. word/document.xml               1980-01-01 00:00:00 44c9bf93 2692
5. word/_rels/vbaProject.bin.rels 1980-01-01 00:00:00 ef601408 277
6. word/vbaProject.bin             1980-01-01 00:00:00 ab54dacf 10752
7. word/media/image1.png           1980-01-01 00:00:00 408b15f8 11444
8. word/theme/theme1.xml           1980-01-01 00:00:00 4276c88b 7088
9. word/settings.xml               1980-01-01 00:00:00 17044d98 2750
10. word/vbaData.xml                1980-01-01 00:00:00 9209afe1 1292
11. word/fontTable.xml              1980-01-01 00:00:00 37e3715b 960
12. word/stylesWithEffects.xml      1980-01-01 00:00:00 c883d0b1 16755
13. docProps/app.xml                1980-01-01 00:00:00 3cc6382c 982
14. word/webSettings.xml            1980-01-01 00:00:00 4e16a017 428
15. docProps/core.xml               1980-01-01 00:00:00 8cef183c 643
16. word/styles.xml                 1980-01-01 00:00:00 1f9b9145 16002

Enter a number corresponding to the desired entry: 7

Wrote YARA rule to example.yara.

➜ cat example.yara
rule ooxml_png_crc_397ba1d0601558dfe34cd5aafaedd18e {
meta:
author = "apooxml"
description = "Generated by apooxml."
reference_md5 = "397ba1d0601558dfe34cd5aafaedd18e"