22 September 2014

A Walkthrough for FLARE RE Challenges



The FireEye Labs Advanced Reverse Engineering (FLARE) challenge was causing a bit of a buzz when it was announced and launched in early July. It read like a recruitment campaign for a new division within FireEye, but still a fun challenge to partake in. The challenge started ... and I was on-site at a client site for the week and forgot all about it. 

Busy under the pressure of releasing the new Dissecting the Hack book, the challenge went to the back of my mind until the 24th of July. I was facing a pretty hard-hitting bit of writer's block and frustration. I agreed to let myself have a break to do the challenge for one week before getting back to my commitments.

After the challenge finished, and answers and methods started floating around (notably CodeAndSec's), I realized that many of my tactics were completely different from others. I'm sure that's to be expected from this field; many of my methods were from working in an environment without Internet access and without an expert on-call for answers. I'm also fully from a forensics background (hex editors and data structures and file parsers, etc), and work in commercial incident response with RSA, so I'd be sure to tackle problems different from someone who had a pentesting or vuln discovery background.

Although the challenge started in early July, it ran up until 15 September. There was an unspoken moratorium on answers/solutions while the challenge ran, but now that the samples are all freely available from their website, some are coming forth with how they completed them.

This is my story.


Challenge 1


The first challenge started out with downloading an executable from the flare-on.com website. I immediately threw it into IDA and started poking around, only to discover that it was just a dropper for the real challenge :)  I let it do its thing, agreed to a EULA without reading it, and received the first challenge file:

File Name       : Challenge1.exe
File Size       : 120,832 bytes
MD5             : 66692c39aab3f8e7979b43f2a31c104f
SHA1            : 5f7d1552383dc9de18758aa29c6b7e21ca172634
Fuzzy           : 3072:vaL7nzo5UC2ShGACS3XzXl/ZPYHLy7argeZX:uUUC2SHjpurG
Import Hash     : f34d5f2d4577ed6d9ceec516c1f5a744
Compiled Time   : Wed Jul 02 19:01:33 2014 UTC
PE Sections (3) : Name       Size       MD5
                  .text      118,272    e4c64d5b55603ecef3562099317cad76
                  .rsrc      1,536      6adbd3818087d9be58766dccc3f5f2bd
                  .reloc     512        34db3eafce34815286f13c2ea0e2da70
Magic           : PE32 executable for MS Windows (GUI) Intel 80386 32-bit Mono/.Net assembly
.NET Version    : 2.0.0.0
SignSrch        : offset   num  description [bits.endian.size]
                  0040df85 2875 libavcodec ff_mjpeg_val_ac_luminance [..162]
                  0040e05d 2876 libavcodec ff_mjpeg_val_ac_chrominance [..162]

Based on this output from the file, I can see first of all that it was built from .NET 2.0 (based on the mscorlib import). That'll be important later to determine what tool to analyze the file in; IDA Pro is not the most conducive tool for .NET applications.

At this point, let's run it and see what happens:



The image shows some happy little trees and a great, big "DECODE" button. Clicking this button changes the image thusly:



Without any clue of what to do with the challenge, or what I'm actually looking for, I open it up for analysis with Red Gate .NET Reflector.

23 April 2014

Moving On to New Career Opportunities

In the next few days I will be moving on from my current work and into a new and exciting opportunity. As I work through this effort, while writing a book and preparing con talks, I started to think of the practical and emotional tasks needed to ensure that my current employer and clients are taken care of while I prepare for the future.

In this effort, I wanted to pass on a few ideas that may help others.

Personal Side


To begin with, let's discuss the personal side to the move. I've been working with the Defense Cyber Crime Center (DC3) for almost 14 years. I've been with them since before they were even named DC3, and were just the Defense Computer Forensics Lab (DCFL) and the Defense Cyber Investigations Training Program (DCITP a.k.a. The 'TIP). Also, since we had "Cyber" in our agency name since the late 90's, I'm fully allowed to use it in regular conversation without drinks.

I've said goodbye to DC3 once before, temporarily, as I moved on from being the Deputy Technical Lead of the Training Academy. I left with the weight of a serious case of burnout and needed a break. Getting into the down and dirty of regular forensic work was the fix.

I joined my good friends Eoghan Casey, Terrance Maguire, and Chris Daywalt in their venture, cmdLabs. We worked out awesome incident response cases together and delved into research projects and code development. At about this time, cmdLabs was acquired by Newberry Group, run by a CEO and VP that I had known for over a decade. Life was good and, after a cool-down period, I went back into DC3 on a separate contract to work on their Intrusions team.

17 February 2014

Malware with No Strings Attached Part 2 - Static Analysis

In the previous post I showed some dynamic analysis procedures for a variant of a trojan known to Symantec as Coreflood. Based on the dynamic analysis, we discovered that the analyzed sample contained very few strings of use. It decrypted an embedded executable, which was injected into memory for execution. It dropped an encrypted file to the hard drive, then downloaded a second-stage executable from the Internet. This downloaded file was then encrypted in what appeared to be a similar routine as the dropped file.

However, in the end, we still had many questions that couldn't be answered:

  • What is the encryption routine used for thr1.chm and mmc109.exe?
  • Why does the malware rename mmc109.exe to mmc61753109.exe?
  • Why does the malware first make a network connection to Adobe.com? What does that POST value mean?
  • Why does the initial loader contain so few strings?
These questions are best answered through static analysis. However, static analysis can be an extremely time-consuming activity. A full "deep dive" static analysis could take days, or even weeks, to fully document all functionality within a malware sample. Instead, this post will go through the process of a targeted static analysis. With the questions laid out here, we'll focus our static analysis solely on answering those questions, which will mean leaving many aspects of the malware undocumented.

Therefore, we need to focus on what questions can be answered within an hour or two of static analysis.

These questions mirror some of those performing Incident Response work. During incident response work a malware analyst works in conjunction with forensics and the responders to help answer the really difficult questions on why certain indicators are seen, what they mean, and what other indicators should be searched for that may have been missed.

This also answers the concerns of inaccurate preconceptions from those outside the field. When I tell a client that a sample has encoded data and will take a bit longer, I immediately get push back on the expectation that a simple routine may add 40+ hours of billable time. Likewise, if I say that a sample has a custom encryption routine, I'd often get pinged every hour on why it's not decoded yet. 

This post will show some of my personal workflow and mentality when trying to analyze malware, while trying to extract indicators to assist in an overall forensics examination or incident response. As I'm new to much in the RE world, having only learned through self-training and on-the-job work, I'd love any feedback for ways in which to speed up or better hone topics that I covered here.

16 February 2014

Is Google Scanning Malware Email Attachments Between Researchers

Disclaimer: This post is based upon experiences I found when sending malware via GMail (Google Mail). I'm documenting them here for others to: disprove, debate, confirm, or to downplay its importance.

Update:

In the comments below, a member of Google's AntiVirus Infrastructure team provided insight into this issue. A third-party AV engine used by GMail was designed by the third-party to automatically open ZIP files with a password of 'infected'. I want to thank Google for their attention to the matter as it shows that there was no ill-intent or deliberate scanning.


As a professional malware analyst and a security researcher, a sizable portion of my work is spent collaborating with other researchers over attack trends and tactics. If I hit a hurdle in my analysis, it's common to just send the binary sample to another researcher with an offset location and say "What does this mean to you?"

That was the case on Valentine's Day, 14 Feb 2014. While working on a malware static analysis blog post, to accompany my dynamic analysis blog post on the same sample, I reached out to a colleague to see if he had any advice on an easy way to write an IDAPython script (for IDA Pro) to decrypt a set of encrypted strings.

There is a simple, yet standard, practice for doing this type of exchange. Compress the malware sample within a ZIP file and give it a password of 'infected'. We know we're sending malware samples, but need to do it in a way that:

          a. an ordinary person cannot obtain the file and accidentally run it;
          b. an automated antivirus system cannot detect the malware and prevent it from being sent.

However, on that fateful day, the process stopped. Upon compressing a malware sample, password protecting it, and attaching it to an email I was stopped. GMail registered a Virus Alert on the attachment.

Stunned, I try again to see the same results. My first thought was that I forgot to password-protect the file. I erased the ZIP, recreated it, and received the same results. I tried with a different password - same results. I used a 24-character password... still flagged as malicious.

The instant implications of this initial test were staggering; was Google password cracking each and every ZIP file it received, and had the capability to do 24-character passwords?! No, but close.

Google already opens any standard ZIP file that is attached to emails. The ZIP contents are extracted and scanned for malware, which will prevent its attachment. This is why we password protect them. However, Google is now attempting to guess the password to ZIP files, using the password of 'infected'. If it succeeds, it extracts the contents and scans them for malware.

Google is attempting to unzip password-protected archives by guessing at the passwords. To what extent? We don't know. But we can try to find out.

I obtained the list of the 25 most common passwords and integrated them (with 'infected') into a simple Python script below:


import subprocess
pws = ["123456","password","12345678","qwerty","abc123","123456789","111111","1234567","iloveyou","adobe123","123123","sunshine","1234567890","letmein","photoshop","1234","monkey","shadow","sunshine","12345","password1","princess","azerty","trustno1","000000","infected"]
for pw in pws:
    cmdline = "7z a -p%s %s.zip malware.livebin" % (pw, pw)
    subprocess.call(cmdline)

This script simply compressed a known malware sample (malware.livebin) into a ZIP of the same password name. I then repeated these steps to create respective 7zip archives.

I then created a new email and simply attached all of the files:


Of all the files created, all password protected, and each containing the exact same malware, only the ZIP file with a password of 'infected' was scanned. This suggests that Google likely isn't using a sizable word list, but it's known that they are targeting the password of 'infected'.

To compensate, researchers should now move to a new password scheme, or the use of 7zip archives instead of ZIP.

Further testing was performed to determine why subsequent files were flagged as malicious, even with a complex password. As soon as Google detects a malicious attachment, it will flag that attachment File Name and prevent your account from attaching any file with the same name for what appears to be five minutes. Therefore, even if I recreated infected.zip with a 50-char password, it would still be flagged. Even if I created infected.zip as an ASCII text document full of spaces, it would still be flagged.

In my layman experience, this is a very scary grey area for active monitoring of malware. In the realm of spear phishing it is common to password protect an email attachment (DOC/PDF/ZIP/EXE) and provide the password in the body to bypass AV scanners. However, I've never seen any attack foolish enough to use a red flag word like "infected", which would scare any common computer user away (... unless someone made a new game called Infected? ... or a malicious leaked album set from Infected Mushroom?)

Regardless of the email contents, if they are sent from one consenting adult to another, in a password-protected container, there is an expectation of privacy between the two that is violated upon attempting to guess passwords en masse.

And why is such activity targeted towards the malware community, who uses this process to help build defenses against such attacks?

Notes:
  • Emails were sent from my Google Apps (GAFYD) account.
  • Tests were also made using non-descript filenames (e.g. a.txt).
  • Additional tests were made to alter the CRC32 hash within the ZIPs (appending random bytes to the end of each file), and any other metadata that could be targeted.
  • The password "infected" was not contained in the subject nor body during the process.

Updates:
There was earlier speculation that the samples may have been automatically sent to VirusTotal for scanning. As shown in the comments below, Bernardo Quintero from VirusTotal has denied that this is occurring. I've removed the content from this post to avoid any future confusion.

Others have come forth to say that they've seen this behavior for some time. However, I've been able to happily send around files until late last week. This suggests that the feature is not evenly deployed to all GMail users.

A member of Google's team replied below noting that this activity was due to a third-party antivirus engine used by Google.

The owner of VirusShare.com, inspired by this exchange, attempted to locate what engine this could be by uploading choice samples to VirusTotal. His uploads showed one commonality, NANO-Antivirus:


My own tests also showed positive hits from NANO-Antivirus.

At the very least, this shows how one minor, well-meaning feature in an obscure antivirus engine can cause waves of doubt and frustration to anyone who decides to use it without thorough testing.

08 February 2014

Malware with No Strings Attached - Dynamic Analysis

I had the honor of lecturing for Champlain College's graduate level Malware Analysis course this week. One of the aspects of the lecture was showing off dynamic analysis with my Noriben script and some of the indicators I would look for when running malware.

While every malware site under the sun can tell you how to do malware dynamic analysis, I wanted to write a post on how I, personally, perform dynamic analysis. Some of the things I look for, some things I've learned to ignore, and how to go a little bit above and beyond to answer unusual questions. And, if the questions can't be answered, how to obtain good clues that could help you or another analyst understand the data down the road.

Additionally. I've been meaning to write up a malware analysis post for awhile, but haven't really found any malware that's been really interesting enough. Most were overly complex, many overly simple, and most just too boring to write on. Going back through prior incidents, I remembered a large scale response we worked involving a CoreFlood compromise. While this post won't be on the same malware, it's from a similar variant:

MD5: 4f45df18209b840a2cf4de91501847d1
SSDEEP: 768:ofATWbDPImK/fJQTR5WSgRlo5naTKczgYtWc5bCQHg:uk6chnWESgRKcnWc5uF
Size: 48640 bytes

This is not a ground-breaking malware sample. The techniques here are not new. I want to simply show a typical workflow of analyzing malware and overcoming the challenges that appear in doing so.

There are multiple levels of complexity to this sample, too much for a single post, including ways in which it encrypts embedded data and strings. Therefore, this post will focus on the dynamic artifacts of running the malware and examining the files left behind. On the next post, we'll use IDA Pro to dig deeper into reversing the logic used by the malware.


03 January 2014

A GhettoForensics Look Back on 2013

This site, Ghetto Forensics, was started this year as the beginning of an effort to better document some of the side work that I do that I thought would be appealing, or humorous, to the overall industry. This content was originally posted to my personal web site, thebaskins.com, but really needed a site of its own.

My first public project this year was reversing, documenting, and writing a parser for Java IDX files, cached files that accompany any file downloaded via Java. It was a bit of a painful project, mainly due to the bad documentation provided by Oracle, not to mention the horrendous style in which they designed it. I immediately released the code to the public and have received great feedback for improvements, as well as quite a few examiners touting how much they used it in their examinations. Thank you!

However, my greatest project this year was the release of Noriben. I first designed Noriben as a simple script for me to use at home for really quick malware dynamic analysis. I lacked many of the tools and sandboxes that I use at my day job, and needed a quick triage tool for research. After a few months, I realized that many commercial groups were in the exact same situation as I was at home: a severe lack of funding to purchase software to help. So, I cleaned up the code, gave it a silly name, and released it into the world. I've received numerous feedback and suggestions from all over, all of which were incorporated into the code. While its usage is widely unknown, for practical reasons, I did learn of quite a few Defense organizations, as well as a handful of Fortune organizations that incorporated it into their workflow. Awesome!

Research-wise, I released a comparison of various Java disassembly and decompilation tools, having found the standard JD-GUI to be extremely lacking for modern Java malware. The positive side of this is introducing tools to security professionals that were previously unknown to them. The research itself changed the tools that I use on a regular basis and allowed me to create a better product, faster, for reversing Java applications.

For community projects, I wrote a small malware configuration dumper template for Volatility, based on some time-reducing work I've been practicing. Whenever I do a full reversal of malware, I now try to write a memory configuration dumper. That way, in a few months when they change the encryption routine, I can still retrieve the same configuration and getting the report out instantly, then go back and figure out the encryption.


11 October 2013

Dumping Malware Configuration Data from Memory with Volatility



When I first start delving in memory forensics, years ago, we relied upon controlled operating system crashes (to create memory crash dumps) or the old FireWire exploit with a special laptop. Later, software-based tools like regular dd, and win32dd, made the job much easier (and more entertaining as we watched the feuds between mdd and win32dd).

In the early days, our analysis was basically performed with a hex editor. By collecting volatile data from an infected system, we'd attempt to map memory locations manually to known processes, an extremely frustrating and error-prone procedure. Even with the advent of graphical tools such as HBGary Responder Pro, which comes with a hefty price tag, I've found most of my time spent viewing raw memory dumps in WinHex.

The industry has slowly changed as tools like Volatility have gained maturity and become more feature-rich. Volatility is a free and open-source memory analysis tool that takes the hard work out of mapping and correlating raw data to actual processes. At first I shunned Volatility for it's sheer amount of command line memorization, where each query required memorizing a specialized command line. Over the years, I've come to appreciate this aspect and the flexibility it provides to an examiner.

It's with Volatility that I focus the content for this blog post, to dump malware configurations from memory.

For those unfamiliar with the concept, it's rare to find static malware. That is, malware that has a plain-text URL in its .rdata section mixed in with other strings, and other data laid bare in plain sight. Modern malware tends to be more dynamic, allowing for configurations to be downloaded upon infection, or be strategically injected into the executable by its author. Crimeware malware (Carberp, Zeus) tend to favor the former, connecting to a hardcoded IP address or domain to download a detailed configuration profile (often in XML) that is used to determine how the malware is to operate. What domains does it beacon to, on which ports, and with what campaign IDs - these are the items we determine from malware configurations.

Other malware rely upon a known block of configuration data within the executable, sometimes found within .rdata or simply in the overlay (the data after the end of the actual executable). Sometimes this data is in plain text, often it's encoded or encrypted. A notable example of this is in Mandiant's APT1 report on TARSIP-MOON, where a block of encrypted data is stored in the overlay. The point of this implementation is that an author can compile their malware, and then add in the appropriate configuration data after the fact.

As a method to improving the timeliness of malware analysis, I've been advocating for greater research and implementation of configuration dumpers. By identifying where data is stored within the file, and by knowing its encryption routine, one could simply write a script to extract the data, decrypt it, and print it out. Without even running the malware we know its intended C2 communications and have immediate signatures that we can then implement into our network defenses.

While this data may appear as a simple structure in plaintext in a sample, often it's encoded or encrypted via a myriad of techniques. Often this may be a form of encryption that we, or our team, deemed as too difficult to decrypt in a reasonable time. This is pretty common, advanced encryption or compression can often take weeks to completely unravel and is often left for when there's downtime in operations.

What do we do, then? Easy, go for the memory.

We know that the malware has a decryption routine that intakes this data and produces decrypted output. By simply running the malware and analyzing its memory footprint, we will often find the decrypted results in plaintext, as it has already been decrypted and in use by the malware.

Why break the encryption when we can let the malware just decrypt it for us?