23 April 2014

Moving On to New Career Opportunities

In the next few days I will be moving on from my current work and into a new and exciting opportunity. As I work through this effort, while writing a book and preparing con talks, I started to think of the practical and emotional tasks needed to ensure that my current employer and clients are taken care of while I prepare for the future.

In this effort, I wanted to pass on a few ideas that may help others.

Personal Side


To begin with, let's discuss the personal side to the move. I've been working with the Defense Cyber Crime Center (DC3) for almost 14 years. I've been with them since before they were even named DC3, and were just the Defense Computer Forensics Lab (DCFL) and the Defense Cyber Investigations Training Program (DCITP a.k.a. The 'TIP). Also, since we had "Cyber" in our agency name since the late 90's, I'm fully allowed to use it in regular conversation without drinks.

I've said goodbye to DC3 once before, temporarily, as I moved on from being the Deputy Technical Lead of the Training Academy. I left with the weight of a serious case of burnout (i.e. with medical intervention), helping lead development of 20+ technical courses, managing my own technical team, and being a technical lead and presenter on a large contract re-compete... while helping write a book. I needed a break and getting into the down and dirty of regular forensic cases was the fix.

I joined my good friends Eoghan Casey, Terrance Maguire, and Chris Daywalt in their venture, cmdLabs as employee #1. We worked out awesome incident response cases together and delved into research projects and code development. At about this time, cmdLabs was acquired by Newberry Group, run by a CEO and VP that I had known for over a decade. Life was good and, after a cool-down period, I went back into DC3 on a separate contract to work on their Intrusions team.

17 February 2014

Malware with No Strings Attached Part 2 - Static Analysis

In the previous post I showed some dynamic analysis procedures for a variant of a trojan known to Symantec as Coreflood. Based on the dynamic analysis, we discovered that the analyzed sample contained very few strings of use. It decrypted an embedded executable, which was injected into memory for execution. It dropped an encrypted file to the hard drive, then downloaded a second-stage executable from the Internet. This downloaded file was then encrypted in what appeared to be a similar routine as the dropped file.

However, in the end, we still had many questions that couldn't be answered:

  • What is the encryption routine used for thr1.chm and mmc109.exe?
  • Why does the malware rename mmc109.exe to mmc61753109.exe?
  • Why does the malware first make a network connection to Adobe.com? What does that POST value mean?
  • Why does the initial loader contain so few strings?
These questions are best answered through static analysis. However, static analysis can be an extremely time-consuming activity. A full "deep dive" static analysis could take days, or even weeks, to fully document all functionality within a malware sample. Instead, this post will go through the process of a targeted static analysis. With the questions laid out here, we'll focus our static analysis solely on answering those questions, which will mean leaving many aspects of the malware undocumented.

Therefore, we need to focus on what questions can be answered within an hour or two of static analysis.

These questions mirror some of those performing Incident Response work. During incident response work a malware analyst works in conjunction with forensics and the responders to help answer the really difficult questions on why certain indicators are seen, what they mean, and what other indicators should be searched for that may have been missed.

This also answers the concerns of inaccurate preconceptions from those outside the field. When I tell a client that a sample has encoded data and will take a bit longer, I immediately get push back on the expectation that a simple routine may add 40+ hours of billable time. Likewise, if I say that a sample has a custom encryption routine, I'd often get pinged every hour on why it's not decoded yet. 

This post will show some of my personal workflow and mentality when trying to analyze malware, while trying to extract indicators to assist in an overall forensics examination or incident response. As I'm new to much in the RE world, having only learned through self-training and on-the-job work, I'd love any feedback for ways in which to speed up or better hone topics that I covered here.

16 February 2014

Is Google Scanning Malware Email Attachments Between Researchers

Disclaimer: This post is based upon experiences I found when sending malware via GMail (Google Mail). I'm documenting them here for others to: disprove, debate, confirm, or to downplay its importance.

Update:

In the comments below, a member of Google's AntiVirus Infrastructure team provided insight into this issue. A third-party AV engine used by GMail was designed by the third-party to automatically open ZIP files with a password of 'infected'. I want to thank Google for their attention to the matter as it shows that there was no ill-intent or deliberate scanning.


As a professional malware analyst and a security researcher, a sizable portion of my work is spent collaborating with other researchers over attack trends and tactics. If I hit a hurdle in my analysis, it's common to just send the binary sample to another researcher with an offset location and say "What does this mean to you?"

That was the case on Valentine's Day, 14 Feb 2014. While working on a malware static analysis blog post, to accompany my dynamic analysis blog post on the same sample, I reached out to a colleague to see if he had any advice on an easy way to write an IDAPython script (for IDA Pro) to decrypt a set of encrypted strings.

There is a simple, yet standard, practice for doing this type of exchange. Compress the malware sample within a ZIP file and give it a password of 'infected'. We know we're sending malware samples, but need to do it in a way that:

          a. an ordinary person cannot obtain the file and accidentally run it;
          b. an automated antivirus system cannot detect the malware and prevent it from being sent.

However, on that fateful day, the process stopped. Upon compressing a malware sample, password protecting it, and attaching it to an email I was stopped. GMail registered a Virus Alert on the attachment.

Stunned, I try again to see the same results. My first thought was that I forgot to password-protect the file. I erased the ZIP, recreated it, and received the same results. I tried with a different password - same results. I used a 24-character password... still flagged as malicious.

The instant implications of this initial test were staggering; was Google password cracking each and every ZIP file it received, and had the capability to do 24-character passwords?! No, but close.

Google already opens any standard ZIP file that is attached to emails. The ZIP contents are extracted and scanned for malware, which will prevent its attachment. This is why we password protect them. However, Google is now attempting to guess the password to ZIP files, using the password of 'infected'. If it succeeds, it extracts the contents and scans them for malware.

Google is attempting to unzip password-protected archives by guessing at the passwords. To what extent? We don't know. But we can try to find out.

I obtained the list of the 25 most common passwords and integrated them (with 'infected') into a simple Python script below:


import subprocess
pws = ["123456","password","12345678","qwerty","abc123","123456789","111111","1234567","iloveyou","adobe123","123123","sunshine","1234567890","letmein","photoshop","1234","monkey","shadow","sunshine","12345","password1","princess","azerty","trustno1","000000","infected"]
for pw in pws:
    cmdline = "7z a -p%s %s.zip malware.livebin" % (pw, pw)
    subprocess.call(cmdline)

This script simply compressed a known malware sample (malware.livebin) into a ZIP of the same password name. I then repeated these steps to create respective 7zip archives.

I then created a new email and simply attached all of the files:


Of all the files created, all password protected, and each containing the exact same malware, only the ZIP file with a password of 'infected' was scanned. This suggests that Google likely isn't using a sizable word list, but it's known that they are targeting the password of 'infected'.

To compensate, researchers should now move to a new password scheme, or the use of 7zip archives instead of ZIP.

Further testing was performed to determine why subsequent files were flagged as malicious, even with a complex password. As soon as Google detects a malicious attachment, it will flag that attachment File Name and prevent your account from attaching any file with the same name for what appears to be five minutes. Therefore, even if I recreated infected.zip with a 50-char password, it would still be flagged. Even if I created infected.zip as an ASCII text document full of spaces, it would still be flagged.

In my layman experience, this is a very scary grey area for active monitoring of malware. In the realm of spear phishing it is common to password protect an email attachment (DOC/PDF/ZIP/EXE) and provide the password in the body to bypass AV scanners. However, I've never seen any attack foolish enough to use a red flag word like "infected", which would scare any common computer user away (... unless someone made a new game called Infected? ... or a malicious leaked album set from Infected Mushroom?)

Regardless of the email contents, if they are sent from one consenting adult to another, in a password-protected container, there is an expectation of privacy between the two that is violated upon attempting to guess passwords en masse.

And why is such activity targeted towards the malware community, who uses this process to help build defenses against such attacks?

Notes:
  • Emails were sent from my Google Apps (GAFYD) account.
  • Tests were also made using non-descript filenames (e.g. a.txt).
  • Additional tests were made to alter the CRC32 hash within the ZIPs (appending random bytes to the end of each file), and any other metadata that could be targeted.
  • The password "infected" was not contained in the subject nor body during the process.

Updates:
There was earlier speculation that the samples may have been automatically sent to VirusTotal for scanning. As shown in the comments below, Bernardo Quintero from VirusTotal has denied that this is occurring. I've removed the content from this post to avoid any future confusion.

Others have come forth to say that they've seen this behavior for some time. However, I've been able to happily send around files until late last week. This suggests that the feature is not evenly deployed to all GMail users.

A member of Google's team replied below noting that this activity was due to a third-party antivirus engine used by Google.

The owner of VirusShare.com, inspired by this exchange, attempted to locate what engine this could be by uploading choice samples to VirusTotal. His uploads showed one commonality, NANO-Antivirus:


My own tests also showed positive hits from NANO-Antivirus.

At the very least, this shows how one minor, well-meaning feature in an obscure antivirus engine can cause waves of doubt and frustration to anyone who decides to use it without thorough testing.

08 February 2014

Malware with No Strings Attached - Dynamic Analysis

I had the honor of lecturing for Champlain College's graduate level Malware Analysis course this week. One of the aspects of the lecture was showing off dynamic analysis with my Noriben script and some of the indicators I would look for when running malware.

While every malware site under the sun can tell you how to do malware dynamic analysis, I wanted to write a post on how I, personally, perform dynamic analysis. Some of the things I look for, some things I've learned to ignore, and how to go a little bit above and beyond to answer unusual questions. And, if the questions can't be answered, how to obtain good clues that could help you or another analyst understand the data down the road.

Additionally. I've been meaning to write up a malware analysis post for awhile, but haven't really found any malware that's been really interesting enough. Most were overly complex, many overly simple, and most just too boring to write on. Going back through prior incidents, I remembered a large scale response we worked involving a CoreFlood compromise. While this post won't be on the same malware, it's from a similar variant:

MD5: 4f45df18209b840a2cf4de91501847d1
SSDEEP: 768:ofATWbDPImK/fJQTR5WSgRlo5naTKczgYtWc5bCQHg:uk6chnWESgRKcnWc5uF
Size: 48640 bytes

This is not a ground-breaking malware sample. The techniques here are not new. I want to simply show a typical workflow of analyzing malware and overcoming the challenges that appear in doing so.

There are multiple levels of complexity to this sample, too much for a single post, including ways in which it encrypts embedded data and strings. Therefore, this post will focus on the dynamic artifacts of running the malware and examining the files left behind. On the next post, we'll use IDA Pro to dig deeper into reversing the logic used by the malware.


03 January 2014

A GhettoForensics Look Back on 2013

This site, Ghetto Forensics, was started this year as the beginning of an effort to better document some of the side work that I do that I thought would be appealing, or humorous, to the overall industry. This content was originally posted to my personal web site, thebaskins.com, but really needed a site of its own.

My first public project this year was reversing, documenting, and writing a parser for Java IDX files, cached files that accompany any file downloaded via Java. It was a bit of a painful project, mainly due to the bad documentation provided by Oracle, not to mention the horrendous style in which they designed it. I immediately released the code to the public and have received great feedback for improvements, as well as quite a few examiners touting how much they used it in their examinations. Thank you!

However, my greatest project this year was the release of Noriben. I first designed Noriben as a simple script for me to use at home for really quick malware dynamic analysis. I lacked many of the tools and sandboxes that I use at my day job, and needed a quick triage tool for research. After a few months, I realized that many commercial groups were in the exact same situation as I was at home: a severe lack of funding to purchase software to help. So, I cleaned up the code, gave it a silly name, and released it into the world. I've received numerous feedback and suggestions from all over, all of which were incorporated into the code. While its usage is widely unknown, for practical reasons, I did learn of quite a few Defense organizations, as well as a handful of Fortune organizations that incorporated it into their workflow. Awesome!

Research-wise, I released a comparison of various Java disassembly and decompilation tools, having found the standard JD-GUI to be extremely lacking for modern Java malware. The positive side of this is introducing tools to security professionals that were previously unknown to them. The research itself changed the tools that I use on a regular basis and allowed me to create a better product, faster, for reversing Java applications.

For community projects, I wrote a small malware configuration dumper template for Volatility, based on some time-reducing work I've been practicing. Whenever I do a full reversal of malware, I now try to write a memory configuration dumper. That way, in a few months when they change the encryption routine, I can still retrieve the same configuration and getting the report out instantly, then go back and figure out the encryption.


11 October 2013

Dumping Malware Configuration Data from Memory with Volatility



When I first start delving in memory forensics, years ago, we relied upon controlled operating system crashes (to create memory crash dumps) or the old FireWire exploit with a special laptop. Later, software-based tools like regular dd, and win32dd, made the job much easier (and more entertaining as we watched the feuds between mdd and win32dd).

In the early days, our analysis was basically performed with a hex editor. By collecting volatile data from an infected system, we'd attempt to map memory locations manually to known processes, an extremely frustrating and error-prone procedure. Even with the advent of graphical tools such as HBGary Responder Pro, which comes with a hefty price tag, I've found most of my time spent viewing raw memory dumps in WinHex.

The industry has slowly changed as tools like Volatility have gained maturity and become more feature-rich. Volatility is a free and open-source memory analysis tool that takes the hard work out of mapping and correlating raw data to actual processes. At first I shunned Volatility for it's sheer amount of command line memorization, where each query required memorizing a specialized command line. Over the years, I've come to appreciate this aspect and the flexibility it provides to an examiner.

It's with Volatility that I focus the content for this blog post, to dump malware configurations from memory.

For those unfamiliar with the concept, it's rare to find static malware. That is, malware that has a plain-text URL in its .rdata section mixed in with other strings. Modern malware tends to be more dynamic, allowing for configurations to be downloaded upon infection, or be strategically injected into the executable by its author. Crimeware malware (Carberp, Zeus) tend to favor the former, connecting to a hardcoded IP address or domain to download a detailed configuration profile (often in XML) that is used to determine how the malware is to operate. What domains does it beacon to, on which ports, and with what campaign IDs - these are the items we determine from malware configurations.

Other malware rely upon a known block of configuration data within the executable, sometimes found within .rdata or simply in the overlay (the data after the end of the actual executable). Sometimes this data is in plain text, often it's encoded or encrypted. A notable example of this is in Mandiant's APT1 report on TARSIP-MOON, where a block of encrypted data is stored in the overlay. The point of this implementation is that an author can compile their malware, and then add in the appropriate configuration data after the fact.

As a method to improving the timeliness of malware analysis, I've been advocating for greater research and implementation of configuration dumpers. By identifying where data is stored within the file, and by knowing its encryption routine, one could simply write a script to extract the data, decrypt it, and print it out. Without even running the malware we know its intended C2 communications and have immediate signatures that we can then implement into our network defenses.

While this data may appear as a simple structure in plaintext in a sample, often it's encoded or encrypted via a myriad of techniques. Often this may be a form of encryption that we, or our team, deemed as too difficult to decrypt in a reasonable time. This is pretty common, advanced encryption or compression can often take weeks to completely unravel and is often left for when there's downtime in operations.

What do we do, then? Easy, go for the memory.

We know that the malware has a decryption routine that intakes this data and produces decrypted output. By simply running the malware and analyzing its memory footprint, we will often find the decrypted results in plaintext, as it has already been decrypted and in use by the malware.

Why break the encryption when we can let the malware just decrypt it for us?

16 September 2013

Noriben version 1.4 released

It's been a few months since the last official release of Noriben. The interim time has been filled with a few ninja-edits of updated filters, and wondering what to put in next.

Noriben started out as a simple wrapper to Sysinternals procmon to automatically gather all of the runtime details for malware analysis within a VM. It then filters out all the unnecessary system details until what's left is a fairly concise view of what the malware did to the system while running. It is a great alternative to a dedicated sandbox environment. More details on Noriben can be found here.

Over the months I was ecstatic to hear of organizations using Noriben in various roles. Many had modified the script to use it as an automated sandbox to run alongside their commercial products, which was exactly one of my goals for the script. However, the current requirement of manual interaction was an issue and I saw many ugly hacks of how people bypassed it. The new version should take care of that issue.
This was originally a release for version 1.3, which I pushed up on Friday. However, I received quite a bit of feedback for other new features and so quickly I pushed up version 1.4.

In the new version 1.4, I've introduced a few new features:

  • A non-interactive mode that runs for a specified number of seconds on malware that is specified from the command line
  • The ability to generalize strings, using Windows environment variables
  • The ability to specify an output directory
Non-Interactive Mode
The non-interactive mode was needed for a long time, and I apologize it took some time to implement it, as it was a very easy addition. It can be set in one of two ways:
The beginning of the source has a new line:

timeout_seconds = 0

By setting this to a value other than zero, Noriben will automatically wait that number of seconds to monitor the file system. This can be hardcoded for automated implementations, such as in a sandbox environment.

This value can also be overridden with a command line option of --timeout (-t). When using this argument, Noriben will enable the timeout mode and use the specified number of seconds. This is useful if you have a particularly long runtime sample. Even if Noriben.py was modified to have a 120-second timeout, you can override this on the command line with a much greater value (3600 seconds, for example).

Noriben now also lets you note the malware from the command line, making it completely non-interactive:

Noriben.py --cmd "C:\malware\bad.exe www.badhost.com 80" --timeout 300

This command line will launch bad.exe, with a given command line, for a period of 5 minutes. At such time, Noriben will stop monitoring the malware, but it will continue to run.

Output Directory
An alternate output directory can be specified on the command line with --output. If this folder does not exist, it will be created. If Noriben is unable to create the directory, such as when it doesn't have access (e.g. C:\Windows\System32\), then it will give an error and quit.


String Generalization
One requested feature was to replace the file system paths with the Windows environment variables, to make them generic. Many people copy and paste their Noriben results which may show system-specific values, such as "C:\Documents and Settings\Bob\malware.exe". This string will be generalized to "%UserProfile%\malware.exe".

This feature is turned off by default, but can be enabled by changing a setting in the file:

generalize_paths = False

Or by setting --generalize on the command line.


All in all, these features could be summed up with:

Noriben.py --output C:\Logs\Malware --timeout 300 --generalize --cmd "C:\Malware\evil.exe"

Download Noriben