DIY Java Malware Analysis
Parts Required: AndroChef ($) or JD-GUI (free), My Java IDX Parser (in Python), Malware Samples
Skill Level: Beginner to Intermediate
Time Required: Beginner (90 minutes), Intermediate (45 minutes), Advanced (15 minutes)
Java has once again been thrown into the limelight with another insurgence of Java-based drive-by malware attacks reminiscent of the large-scale BlackHole exploit kits seen in early 2012. Through our cmdLabs commercial incident response and forensics team at Newberry Group, I've had the opportunity to perform numerous investigations into data breaches and financial losses due to such malware being installed.
Based on my own experience in Java-related infections, and seeing some very lackluster reports produced by others, I've decided to write a simple How-To blog post on basic Java malware analysis from a forensic standpoint. Everyone has their own process, this is basically mine, and it takes the approach of examining the initial downloaded files, seen as Java cached JAR and IDX files, examining the first-stage Java malware to determine its capabilities, and then looking for the second-stage infection.
Java Cached FilesOne critical step in any Java infection is to check for files within the Java cache folder. This folder stores a copy of each and every Java applet (JAR) downloaded as well as a metadata file, the IDX file, that denotes when the file was downloaded and from where. These files are stored in the following standard locations:
- Windows XP: %AppData%\Sun\Java\Deployment\Cache
- Windows Vista/7/8: %AppData%\LocalLow\Sun\Java\Deployment\Cache
This folder contains numerous subdirectories, each corresponding to an instance of a downloaded file. By sorting the directory recursively by date and time, one can easily find the relevant files to examine. These files will be found auto-renamed to a random series of hexadecimal values, so don't expect to find "express.jar", or whatever file name the JAR was initially downloaded as.
Java IDX Files
The Java IDX file is a binary-structured file, but one that is reasonably readable with a basic text editor. Nearly all of my analysis is from simply opening this file in Notepad++ and mentally parsing out the results. For an example of this in action, I would recommend Corey Harrell's excellent blog post: "(Almost) Cooked Up Some Java". This textual data is of great interest to an examiner, as it notes when the file was downloaded from the remote site, what URL the file originated from, and what IP address the domain name resolved to at the time of the download.
I was always able to retrieve the basic text information from the file, but the large blocks of binary data always bugged me. What data was I missing? Were there any other critical indicators in the file left undiscovered?