Ghetto Forensics: 2014

11 November 2014

DJ Forensics: Analysis of Sound Mixer Artifacts

In many forensics examinations, including those of civil and criminal nature, there is an art to finding remnants of previously installed applications. Fearing detection, or assuming that an examination is forthcoming, many suspects attempt to remove unauthorized or suspicious applications from a system. Such attempts are usually unsuccessful and result only in additional hours of processing for forensics. But even with a clean uninstall there are traces left within the Windows registry that note such a program was installed.

The most popular of these is the Windows Shim Cache (a/k/a Application Compatibility Database, a/k/a AppCompatCache), a resource that can be used to catalog applications not natively compiled for newer Windows. It's also a resource that works great for finding APT-related malware running on a system, but not so much legitimate applications.

For a few months I've been playing with another repository of applications: the Windows Sound Mixer. Whenever an application requests the use of the Windows audio drivers, Windows will automatically register this application in the registry. This information is stored so that Windows can create per-application sound settings:

This was a resource I dismissed for a year. It existed only in Windows Vista and newer, it didn't catch any of the malware I threw at it, and wasn't relevant to any of the Incident Response work I do**. Its importance came to me when working some cases that came mixed in with many of my intrusion cases where I had to examine the systems owned by various hackers. One in particular involved tracking the use of alternative web browsers and discovering that the Sound Mixer had catalogued the use, and location, of Tor Browser launched from a TrueCrypt volume. Clear as day, the path even noted that it was a TrueCrypt volume based upon the Windows device name:

\Device\TrueCryptVolumeP\Tor\App\Firefox\firefox.exe

I learned that the registry keys were useful for such cases, but there has been no prior public discussion of the forensic use of this data.

A Walkthrough for FLARE RE Challenges

The FireEye Labs Advanced Reverse Engineering (FLARE) challenge was causing a bit of a buzz when it was announced and launched in early July. It read like a recruitment campaign for a new division within FireEye, but still a fun challenge to partake in. The challenge started ... and I was on-site at a client site for the week and forgot all about it.

Busy under the pressure of releasing the new Dissecting the Hack book, the challenge went to the back of my mind until the 24th of July. I was facing a pretty hard-hitting bit of writer's block and frustration. I agreed to let myself have a break to do the challenge for one week before getting back to my commitments.

After the challenge finished, and answers and methods started floating around (notably CodeAndSec's), I realized that many of my tactics were completely different from others. I'm sure that's to be expected from this field; many of my methods were from working in an environment without Internet access and without an expert on-call for answers. I'm also fully from a forensics background (hex editors and data structures and file parsers, etc), and work in commercial incident response with RSA, so I'd be sure to tackle problems different from someone who had a pentesting or vuln discovery background.

Although the challenge started in early July, it ran up until 15 September. There was an unspoken moratorium on answers/solutions while the challenge ran, but now that the samples are all freely available from their website, some are coming forth with how they completed them.

This is my story.

Challenge 1

The first challenge started out with downloading an executable from the flare-on.com website. I immediately threw it into IDA and started poking around, only to discover that it was just a dropper for the real challenge :) I let it do its thing, agreed to a EULA without reading it, and received the first challenge file:

File Name       : Challenge1.exe
File Size       : 120,832 bytes
MD5             : 66692c39aab3f8e7979b43f2a31c104f
SHA1            : 5f7d1552383dc9de18758aa29c6b7e21ca172634
Fuzzy           : 3072:vaL7nzo5UC2ShGACS3XzXl/ZPYHLy7argeZX:uUUC2SHjpurG
Import Hash     : f34d5f2d4577ed6d9ceec516c1f5a744
Compiled Time   : Wed Jul 02 19:01:33 2014 UTC
PE Sections (3) : Name       Size       MD5
                  .text      118,272    e4c64d5b55603ecef3562099317cad76
                  .rsrc      1,536      6adbd3818087d9be58766dccc3f5f2bd
                  .reloc     512        34db3eafce34815286f13c2ea0e2da70
Magic           : PE32 executable for MS Windows (GUI) Intel 80386 32-bit Mono/.Net assembly
.NET Version    : 2.0.0.0
SignSrch        : offset   num  description [bits.endian.size]
                  0040df85 2875 libavcodec ff_mjpeg_val_ac_luminance [..162]
                  0040e05d 2876 libavcodec ff_mjpeg_val_ac_chrominance [..162]

Based on this output from the file, I can see first of all that it was built from .NET 2.0 (based on the mscorlib import). That'll be important later to determine what tool to analyze the file in; IDA Pro is not the most conducive tool for .NET applications.

At this point, let's run it and see what happens:

The image shows some happy little trees and a great, big "DECODE" button. Clicking this button changes the image thusly:

Without any clue of what to do with the challenge, or what I'm actually looking for, I open it up for analysis with Red Gate .NET Reflector.

Malware with No Strings Attached Part 2 - Static Analysis

In the previous post I showed some dynamic analysis procedures for a variant of a trojan known to Symantec as Coreflood. Based on the dynamic analysis, we discovered that the analyzed sample contained very few strings of use. It decrypted an embedded executable, which was injected into memory for execution. It dropped an encrypted file to the hard drive, then downloaded a second-stage executable from the Internet. This downloaded file was then encrypted in what appeared to be a similar routine as the dropped file.

However, in the end, we still had many questions that couldn't be answered:

What is the encryption routine used for thr1.chm and mmc109.exe?
Why does the malware rename mmc109.exe to mmc61753109.exe?
Why does the malware first make a network connection to Adobe.com? What does that POST value mean?
Why does the initial loader contain so few strings?

These questions are best answered through static analysis. However, static analysis can be an extremely time-consuming activity. A full "deep dive" static analysis could take days, or even weeks, to fully document all functionality within a malware sample. Instead, this post will go through the process of a targeted static analysis. With the questions laid out here, we'll focus our static analysis solely on answering those questions, which will mean leaving many aspects of the malware undocumented.

Therefore, we need to focus on what questions can be answered within an hour or two of static analysis.

These questions mirror some of those performing Incident Response work. During incident response work a malware analyst works in conjunction with forensics and the responders to help answer the really difficult questions on why certain indicators are seen, what they mean, and what other indicators should be searched for that may have been missed.

This also answers the concerns of inaccurate preconceptions from those outside the field. When I tell a client that a sample has encoded data and will take a bit longer, I immediately get push back on the expectation that a simple routine may add 40+ hours of billable time. Likewise, if I say that a sample has a custom encryption routine, I'd often get pinged every hour on why it's not decoded yet.

This post will show some of my personal workflow and mentality when trying to analyze malware, while trying to extract indicators to assist in an overall forensics examination or incident response. As I'm new to much in the RE world, having only learned through self-training and on-the-job work, I'd love any feedback for ways in which to speed up or better hone topics that I covered here.

Malware with No Strings Attached Part 1 - Dynamic Analysis

I had the honor of lecturing for Champlain College's graduate level Malware Analysis course this week. One of the aspects of the lecture was showing off dynamic analysis with my Noriben script and some of the indicators I would look for when running malware.

While every malware site under the sun can tell you how to do malware dynamic analysis, I wanted to write a post on how I, personally, perform dynamic analysis. Some of the things I look for, some things I've learned to ignore, and how to go a little bit above and beyond to answer unusual questions. And, if the questions can't be answered, how to obtain good clues that could help you or another analyst understand the data down the road.

Additionally. I've been meaning to write up a malware analysis post for awhile, but haven't really found any malware that's been really interesting enough. Most were overly complex, many overly simple, and most just too boring to write on. Going back through prior incidents, I remembered a large scale response we worked involving a CoreFlood compromise. While this post won't be on the same malware, it's from a similar variant:

MD5: 4f45df18209b840a2cf4de91501847d1
SSDEEP: 768:ofATWbDPImK/fJQTR5WSgRlo5naTKczgYtWc5bCQHg:uk6chnWESgRKcnWc5uF
Size: 48640 bytes

Note: I cannot host the file here, but it can be obtained through VirusTotal (for those with privileges) or directly from Malwr with a free registered account.

This is not a ground-breaking malware sample. The techniques here are not new. I want to simply show a typical workflow of analyzing malware and overcoming the challenges that appear in doing so.

There are multiple levels of complexity to this sample, too much for a single post, including ways in which it encrypts embedded data and strings. Therefore, this post will focus on the dynamic artifacts of running the malware and examining the files left behind. On the next post, we'll use IDA Pro to dig deeper into reversing the logic used by the malware.

Is Google Scanning Malware Email Attachments Between Researchers

Disclaimer: This post is based upon experiences I found when sending malware via GMail (Google Mail). I'm documenting them here for others to: disprove, debate, confirm, or to downplay its importance.

Update:

In the comments below, a member of Google's AntiVirus Infrastructure team provided insight into this issue. A third-party AV engine used by GMail was designed by the third-party to automatically open ZIP files with a password of 'infected'. I want to thank Google for their attention to the matter as it shows that there was no ill-intent or deliberate scanning.

As a professional malware analyst and a security researcher, a sizable portion of my work is spent collaborating with other researchers over attack trends and tactics. If I hit a hurdle in my analysis, it's common to just send the binary sample to another researcher with an offset location and say "What does this mean to you?"

That was the case on Valentine's Day, 14 Feb 2014. While working on a malware static analysis blog post, to accompany my dynamic analysis blog post on the same sample, I reached out to a colleague to see if he had any advice on an easy way to write an IDAPython script (for IDA Pro) to decrypt a set of encrypted strings.

There is a simple, yet standard, practice for doing this type of exchange. Compress the malware sample within a ZIP file and give it a password of 'infected'. We know we're sending malware samples, but need to do it in a way that:

a. an ordinary person cannot obtain the file and accidentally run it;
b. an automated antivirus system cannot detect the malware and prevent it from being sent.

However, on that fateful day, the process stopped. Upon compressing a malware sample, password protecting it, and attaching it to an email I was stopped. GMail registered a Virus Alert on the attachment.

Stunned, I try again to see the same results. My first thought was that I forgot to password-protect the file. I erased the ZIP, recreated it, and received the same results. I tried with a different password - same results. I used a 24-character password... still flagged as malicious.

The instant implications of this initial test were staggering; was Google password cracking each and every ZIP file it received, and had the capability to do 24-character passwords?! No, but close.

Google already opens any standard ZIP file that is attached to emails. The ZIP contents are extracted and scanned for malware, which will prevent its attachment. This is why we password protect them. However, Google is now attempting to guess the password to ZIP files, using the password of 'infected'. If it succeeds, it extracts the contents and scans them for malware.

Google is attempting to unzip password-protected archives by guessing at the passwords. To what extent? We don't know. But we can try to find out.

I obtained the list of the 25 most common passwords and integrated them (with 'infected') into a simple Python script below:

import subprocess
pws = ["123456","password","12345678","qwerty","abc123","123456789","111111","1234567","iloveyou","adobe123","123123","sunshine","1234567890","letmein","photoshop","1234","monkey","shadow","sunshine","12345","password1","princess","azerty","trustno1","000000","infected"]
for pw in pws:
    cmdline = "7z a -p%s %s.zip malware.livebin" % (pw, pw)
    subprocess.call(cmdline)

This script simply compressed a known malware sample (malware.livebin) into a ZIP of the same password name. I then repeated these steps to create respective 7zip archives.

I then created a new email and simply attached all of the files:

Of all the files created, all password protected, and each containing the exact same malware, only the ZIP file with a password of 'infected' was scanned. This suggests that Google likely isn't using a sizable word list, but it's known that they are targeting the password of 'infected'.

To compensate, researchers should now move to a new password scheme, or the use of 7zip archives instead of ZIP.

Further testing was performed to determine why subsequent files were flagged as malicious, even with a complex password. As soon as Google detects a malicious attachment, it will flag that attachment File Name and prevent your account from attaching any file with the same name for what appears to be five minutes. Therefore, even if I recreated infected.zip with a 50-char password, it would still be flagged. Even if I created infected.zip as an ASCII text document full of spaces, it would still be flagged.

In my layman experience, this is a very scary grey area for active monitoring of malware. In the realm of spear phishing it is common to password protect an email attachment (DOC/PDF/ZIP/EXE) and provide the password in the body to bypass AV scanners. However, I've never seen any attack foolish enough to use a red flag word like "infected", which would scare any common computer user away (... unless someone made a new game called Infected? ... or a malicious leaked album set from Infected Mushroom?)

Regardless of the email contents, if they are sent from one consenting adult to another, in a password-protected container, there is an expectation of privacy between the two that is violated upon attempting to guess passwords en masse.

And why is such activity targeted towards the malware community, who uses this process to help build defenses against such attacks?

Notes:

Emails were sent from my Google Apps (GAFYD) account.
Tests were also made using non-descript filenames (e.g. a.txt).
Additional tests were made to alter the CRC32 hash within the ZIPs (appending random bytes to the end of each file), and any other metadata that could be targeted.
The password "infected" was not contained in the subject nor body during the process.

Updates:
There was earlier speculation that the samples may have been automatically sent to VirusTotal for scanning. As shown in the comments below, Bernardo Quintero from VirusTotal has denied that this is occurring. I've removed the content from this post to avoid any future confusion.

Others have come forth to say that they've seen this behavior for some time. However, I've been able to happily send around files until late last week. This suggests that the feature is not evenly deployed to all GMail users.

A member of Google's team replied below noting that this activity was due to a third-party antivirus engine used by Google.

The owner of VirusShare.com, inspired by this exchange, attempted to locate what engine this could be by uploading choice samples to VirusTotal. His uploads showed one commonality, NANO-Antivirus:

My own tests also showed positive hits from NANO-Antivirus.

At the very least, this shows how one minor, well-meaning feature in an obscure antivirus engine can cause waves of doubt and frustration to anyone who decides to use it without thorough testing.

03 January 2014

A GhettoForensics Look Back on 2013

This site, Ghetto Forensics, was started this year as the beginning of an effort to better document some of the side work that I do that I thought would be appealing, or humorous, to the overall industry. This content was originally posted to my personal web site, thebaskins.com, but really needed a site of its own.

My first public project this year was reversing, documenting, and writing a parser for Java IDX files, cached files that accompany any file downloaded via Java. It was a bit of a painful project, mainly due to the bad documentation provided by Oracle, not to mention the horrendous style in which they designed it. I immediately released the code to the public and have received great feedback for improvements, as well as quite a few examiners touting how much they used it in their examinations. Thank you!

However, my greatest project this year was the release of Noriben. I first designed Noriben as a simple script for me to use at home for really quick malware dynamic analysis. I lacked many of the tools and sandboxes that I use at my day job, and needed a quick triage tool for research. After a few months, I realized that many commercial groups were in the exact same situation as I was at home: a severe lack of funding to purchase software to help. So, I cleaned up the code, gave it a silly name, and released it into the world. I've received numerous feedback and suggestions from all over, all of which were incorporated into the code. While its usage is widely unknown, for practical reasons, I did learn of quite a few Defense organizations, as well as a handful of Fortune organizations that incorporated it into their workflow. Awesome!

Research-wise, I released a comparison of various Java disassembly and decompilation tools, having found the standard JD-GUI to be extremely lacking for modern Java malware. The positive side of this is introducing tools to security professionals that were previously unknown to them. The research itself changed the tools that I use on a regular basis and allowed me to create a better product, faster, for reversing Java applications.

For community projects, I wrote a small malware configuration dumper template for Volatility, based on some time-reducing work I've been practicing. Whenever I do a full reversal of malware, I now try to write a memory configuration dumper. That way, in a few months when they change the encryption routine, I can still retrieve the same configuration and getting the report out instantly, then go back and figure out the encryption.

Ghetto Forensics