12 January 2013

Java Malware - Identification and Analysis

DIY Java Malware Analysis

Parts Required: AndroChef ($) or JD-GUI (free), My Java IDX Parser (in Python), Malware Samples
Skill Level: Beginner to Intermediate
Time Required: Beginner (90 minutes), Intermediate (45 minutes), Advanced (15 minutes)

Java has once again been thrown into the limelight with another insurgence of Java-based drive-by malware attacks reminiscent of the large-scale BlackHole exploit kits seen in early 2012. Through our cmdLabs commercial incident response and forensics team at Newberry Group, I've had the opportunity to perform numerous investigations into data breaches and financial losses due to such malware being installed.
Based on my own experience in Java-related infections, and seeing some very lackluster reports produced by others, I've decided to write a simple How-To blog post on basic Java malware analysis from a forensic standpoint. Everyone has their own process, this is basically mine, and it takes the approach of examining the initial downloaded files, seen as Java cached JAR and IDX files, examining the first-stage Java malware to determine its capabilities, and then looking for the second-stage infection.

Java Cached Files

One critical step in any Java infection is to check for files within the Java cache folder. This folder stores a copy of each and every Java applet (JAR) downloaded as well as a metadata file, the IDX file, that denotes when the file was downloaded and from where. These files are stored in the following standard locations:
  • Windows XP: %AppData%\Sun\Java\Deployment\Cache
  • Windows Vista/7/8: %AppData%\LocalLow\Sun\Java\Deployment\Cache
This folder contains numerous subdirectories, each corresponding to an instance of a downloaded file. By sorting the directory recursively by date and time, one can easily find the relevant files to examine. These files will be found auto-renamed to a random series of hexadecimal values, so don't expect to find "express.jar", or whatever file name the JAR was initially downloaded as.

Java IDX Files

In my many investigations, I've always relied upon the Java IDX files to backup my assertions and provide critical indicators for the analysis. While I may know from the browser history that the user was browsing to a malicious landing page on XYZ.com, it doesn't mean that the malware came from the same site. And, as Java malware is downloaded by a Java applet, there will likely be no corresponding history for the download in any web browser logs. Instead, we look to the IDX files to provide us this information.

The Java IDX file is a binary-structured file, but one that is reasonably readable with a basic text editor. Nearly all of my analysis is from simply opening this file in Notepad++ and mentally parsing out the results. For an example of this in action, I would recommend Corey Harrell's excellent blog post: "(Almost) Cooked Up Some Java". This textual data is of great interest to an examiner, as it notes when the file was downloaded from the remote site, what URL the file originated from, and what IP address the domain name resolved to at the time of the download.

I was always able to retrieve the basic text information from the file, but the large blocks of binary data always bugged me. What data was I missing? Were there any other critical indicators in the file left undiscovered?

Java IDX Parser

At the time, no one had written a parser for the IDX file. Harrell's blog post above provided the basic text structure of the file for visual analysis, and a search led me to a Perl-based tool written by Sploit that parsed the IDX for indicators to output into a forensic timeline file at: Java Forensics using TLN Timelines. However, none delved into the binary analysis. Facing a new lifestyle change, and a drastic home move, I found myself with a lot of extra time on my hands for January/February so I decided to sit down and unravel this file. I made my mark by writing my initial IDX file parser, which only carved the known text-based data, and placed it up on Github to judge interest. At the time I wrote my parser, there were no other parsers on the market.