11 October 2013

Dumping Malware Configuration Data from Memory with Volatility



When I first start delving in memory forensics, years ago, we relied upon controlled operating system crashes (to create memory crash dumps) or the old FireWire exploit with a special laptop. Later, software-based tools like regular dd, and win32dd, made the job much easier (and more entertaining as we watched the feuds between mdd and win32dd).

In the early days, our analysis was basically performed with a hex editor. By collecting volatile data from an infected system, we'd attempt to map memory locations manually to known processes, an extremely frustrating and error-prone procedure. Even with the advent of graphical tools such as HBGary Responder Pro, which comes with a hefty price tag, I've found most of my time spent viewing raw memory dumps in WinHex.

The industry has slowly changed as tools like Volatility have gained maturity and become more feature-rich. Volatility is a free and open-source memory analysis tool that takes the hard work out of mapping and correlating raw data to actual processes. At first I shunned Volatility for it's sheer amount of command line memorization, where each query required memorizing a specialized command line. Over the years, I've come to appreciate this aspect and the flexibility it provides to an examiner.

It's with Volatility that I focus the content for this blog post, to dump malware configurations from memory.

For those unfamiliar with the concept, it's rare to find static malware. That is, malware that has a plain-text URL in its .rdata section mixed in with other strings. Modern malware tends to be more dynamic, allowing for configurations to be downloaded upon infection, or be strategically injected into the executable by its author. Crimeware malware (Carberp, Zeus) tend to favor the former, connecting to a hardcoded IP address or domain to download a detailed configuration profile (often in XML) that is used to determine how the malware is to operate. What domains does it beacon to, on which ports, and with what campaign IDs - these are the items we determine from malware configurations.

Other malware rely upon a known block of configuration data within the executable, sometimes found within .rdata or simply in the overlay (the data after the end of the actual executable). Sometimes this data is in plain text, often it's encoded or encrypted. A notable example of this is in Mandiant's APT1 report on TARSIP-MOON, where a block of encrypted data is stored in the overlay. The point of this implementation is that an author can compile their malware, and then add in the appropriate configuration data after the fact.

As a method to improving the timeliness of malware analysis, I've been advocating for greater research and implementation of configuration dumpers. By identifying where data is stored within the file, and by knowing its encryption routine, one could simply write a script to extract the data, decrypt it, and print it out. Without even running the malware we know its intended C2 communications and have immediate signatures that we can then implement into our network defenses.

While this data may appear as a simple structure in plaintext in a sample, often it's encoded or encrypted via a myriad of techniques. Often this may be a form of encryption that we, or our team, deemed as too difficult to decrypt in a reasonable time. This is pretty common, advanced encryption or compression can often take weeks to completely unravel and is often left for when there's downtime in operations.

What do we do, then? Easy, go for the memory.

We know that the malware has a decryption routine that intakes this data and produces decrypted output. By simply running the malware and analyzing its memory footprint, we will often find the decrypted results in plaintext, as it has already been decrypted and in use by the malware.

Why break the encryption when we can let the malware just decrypt it for us?



For example, the awesome people at Malware.lu released a static configuration dumper for a known Java-based RAT. This dumper, available here on their GitHub repo, extracts the encryption key and configuration data from the malware's Java ZIP and decrypts it. It uses Triple DES (TDEA), but once that routine became public knowledge, the author quickly switched to a new routine. The author has then continued switching encryption routines regularly to avoid easy decryption. Based on earlier analysis, we know that the data is decrypted as:

Offset      0  1  2  3  4  5  6  7   8  9 10 11 12 13 14 15

00000000   70 6F 72 74 3D 33 31 33  33 37 53 50 4C 49 54 01   port=31337SPLIT.
00000016   6F 73 3D 77 69 6E 20 6D  61 63 53 50 4C 49 54 01   os=win macSPLIT.
00000032   6D 70 6F 72 74 3D 2D 31  53 50 4C 49 54 03 03 03   mport=-1SPLIT...
00000048   70 65 72 6D 73 3D 2D 31  53 50 4C 49 54 03 03 03   perms=-1SPLIT...
00000064   65 72 72 6F 72 3D 74 72  75 65 53 50 4C 49 54 01   error=trueSPLIT.
00000080   72 65 63 6F 6E 73 65 63  3D 31 30 53 50 4C 49 54   reconsec=10SPLIT
00000096   10 10 10 10 10 10 10 10  10 10 10 10 10 10 10 10   ................
00000112   74 69 3D 66 61 6C 73 65  53 50 4C 49 54 03 03 03   ti=falseSPLIT...
00000128   69 70 3D 77 77 77 2E 6D  61 6C 77 61 72 65 2E 63   ip=www.malware.c
00000144   6F 6D 53 50 4C 49 54 09  09 09 09 09 09 09 09 09   omSPLIT.........
00000160   70 61 73 73 3D 70 61 73  73 77 6F 72 64 53 50 4C   pass=passwordSPL
00000176   49 54 0E 0E 0E 0E 0E 0E  0E 0E 0E 0E 0E 0E 0E 0E   IT..............
00000192   69 64 3D 43 41 4D 50 41  49 47 4E 53 50 4C 49 54   id=CAMPAIGNSPLIT
00000208   10 10 10 10 10 10 10 10  10 10 10 10 10 10 10 10   ................
00000224   6D 75 74 65 78 3D 66 61  6C 73 65 53 50 4C 49 54   mutex=falseSPLIT
00000240   10 10 10 10 10 10 10 10  10 10 10 10 10 10 10 10   ................
00000256   74 6F 6D 73 3D 2D 31 53  50 4C 49 54 04 04 04 04   toms=-1SPLIT....
00000272   70 65 72 3D 66 61 6C 73  65 53 50 4C 49 54 02 02   per=falseSPLIT..
00000288   6E 61 6D 65 3D 53 50 4C  49 54 06 06 06 06 06 06   name=SPLIT......
00000304   74 69 6D 65 6F 75 74 3D  66 61 6C 73 65 53 50 4C   timeout=falseSPL
00000320   49 54 0E 0E 0E 0E 0E 0E  0E 0E 0E 0E 0E 0E 0E 0E   IT..............
00000336   64 65 62 75 67 6D 73 67  3D 74 72 75 65 53 50 4C   debugmsg=trueSPL
00000352   49 54 0E 0E 0E 0E 0E 0E  0E 0E 0E 0E 0E 0E 0E 0E   IT..............

Or, even if we couldn't decrypt this, we know that it's beaconing to a very unique domain name and port which can be searched upon. Either way, we now have a sample where we can't easily get to this decrypted information. So, let's solve that.

By running the malware within a VM, we should have a logical file for the memory space. In VMWare, this is a .VMEM file (or .VMSS for snapshot memory). In VirtualBox, it's a .SAV file. After running our malware, we suspend the guest operating system and then focus our attention on the memory file.

The best way to start is to simply grep the file (from the command line or a hex editor) for the unique C2 domains or artifacts. This should get us into the general vicinity of the configuration and show us the structure of it:

E:\VMs\WinXP_Malware>grep "www.malware.com" *
Binary file WinXP_Malware.vmem matches

With this known, we open the VMEM file and see a configuration that matches that of what we've previously seen. This tells us that the encryption routine changed, but not that of the configuration, which is common. This is where we bring out Volatility.

Searching Memory with Volatility


We know that the configuration data begins with the text of "port=<number>SPLIT", where "SPLIT" is used to delimit each field. This can then be used to create a YARA rule of:

rule javarat_conf {
    strings: $a = /port=[0-9]{1,5}SPLIT/ 
    condition: $a
}

This YARA rule uses the regular expression structure (defined with forward slashes around the text) to search for "port=" followed by a number that is 1 - 5 characters long. This rule will be used to get us to the beginning of the configuration data. If there is no good way to get to the beginning, but only later in the data, that's fine. Just note that offset variance between where the data should start and where the YARA rule puts us.

Let's test this rule with Volatility first, to ensure that it works:

E:\Development\volatility>vol.py -f E:\VMs\WinXP_Malware\WinXP_Malware.vmem yarascan -Y "/port=[0-9]{1,5}SPLIT/"
Volatile Systems Volatility Framework 2.3_beta
Rule: r1
Owner: Process VMwareUser.exe Pid 1668
0x017b239b  70 6f 72 74 3d 33 31 33 33 37 53 50 4c 49 54 2e   port=31337SPLIT.
0x017b23ab  0a 30 30 30 30 30 30 31 36 20 20 20 36 46 20 37   .00000016...6F.7
0x017b23bb  33 20 33 44 20 37 37 20 36 39 20 36 45 20 32 30   3.3D.77.69.6E.20
0x017b23cb  20 36 44 20 20 36 31 20 36 33 20 35 33 20 35 30   .6D..61.63.53.50
Rule: r1
Owner: Process javaw.exe Pid 572
0x2ab9a7f4  70 6f 72 74 3d 33 31 33 33 37 53 50 4c 49 54 01   port=31337SPLIT.
0x2ab9a804  6f 73 3d 77 69 6e 20 6d 61 63 53 50 4c 49 54 01   os=win.macSPLIT.
0x2ab9a814  6d 70 6f 72 74 3d 2d 31 53 50 4c 49 54 03 03 03   mport=-1SPLIT...
0x2ab9a824  70 65 72 6d 73 3d 2d 31 53 50 4c 49 54 03 03 03   perms=-1SPLIT...

One interesting side effect to working within a VM is that some data may appear under the space of VMWareUser.exe. The data is showing up somewhere outside of the context of our configuration. We could try to change our rule, but the simpler solution within the plugin is to just rule out hits from VMWareUser.exe and only allow hits from executables that contain "java".

Now that we have a rule, how do we automate this? By writing a quick and dirty plugin for Volatility.

Creating a Plugin


A quick plugin that I'm demonstrating is composed of two primary components: a YARA rule, and a configuration dumper. The configuration dumper scans memory for the YARA rule, reads memory, and displays the parsed results. An entire post could be written on just this file format, so instead I'll post a very generic plugin and highlight what should be modified. I wrote this based on the two existing malware dumpers already released with Volatility: Zeus and Poison Ivy.

Jamie Levy and Michael Ligh, both core developers on Volatility, provided some critical input on ways to improve and clean up the code.


# JavaRAT detection and analysis for Volatility - v 1.0
# This version is limited to JavaRAT's clients 3.0 and 3.1, and maybe others 
# Author: Brian Baskin <brian@thebaskins.com>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or (at
# your option) any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details. 
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA 

import volatility.plugins.taskmods as taskmods
import volatility.win32.tasks as tasks
import volatility.utils as utils
import volatility.debug as debug
import volatility.plugins.malware.malfind as malfind
import volatility.conf as conf
import string

try:
    import yara
    has_yara = True
except ImportError:
    has_yara = False


signatures = {
    'javarat_conf' : 'rule javarat_conf {strings: $a = /port=[0-9]{1,5}SPLIT/ condition: $a}'
}

config = conf.ConfObject()
config.add_option('CONFSIZE', short_option = 'C', default = 256,
                           help = 'Config data size',
                           action = 'store', type = 'int')
config.add_option('YARAOFFSET', short_option = 'Y', default = 0,
                           help = 'YARA start offset',
                           action = 'store', type = 'int')

class JavaRATScan(taskmods.PSList):
    """ Extract JavaRAT Configuration from Java processes """

    def get_vad_base(self, task, address):
        for vad in task.VadRoot.traverse():
            if address >= vad.Start and address < vad.End:
                return vad.Start
        return None

    def calculate(self):
        """ Required: Runs YARA search to find hits """ 
        if not has_yara:
            debug.error('Yara must be installed for this plugin')

        addr_space = utils.load_as(self._config)
        rules = yara.compile(sources = signatures)
        for task in self.filter_tasks(tasks.pslist(addr_space)):
            if 'vmwareuser.exe' == task.ImageFileName.lower():
                continue
            if not 'java' in task.ImageFileName.lower():
                continue
            scanner = malfind.VadYaraScanner(task = task, rules = rules)
            for hit, address in scanner.scan():
                vad_base_addr = self.get_vad_base(task, address)
                yield task, address

    def make_printable(self, input):
        """ Optional: Remove non-printable chars from a string """
        input = input.replace('\x09', '')  # string.printable doesn't remove backspaces
        return ''.join(filter(lambda x: x in string.printable, input))

    def parse_structure(self, data):
        """ Optional: Parses the data into a list of values """
        struct = []
        items = data.split('SPLIT')
        for i in range(len(items) - 1):  # Iterate this way to ignore any slack data behind last 'SPLIT'
            item = self.make_printable(items[i])
            field, value = item.split('=')
            struct.append('%s: %s' % (field, value))
        return struct
    
    def render_text(self, outfd, data):
        """ Required: Parse data and display """
        delim = '-=' * 39 + '-'
        rules = yara.compile(sources = signatures)
        outfd.write('YARA rule: {0}\n'.format(signatures))
        outfd.write('YARA offset: {0}\n'.format(self._config.YARAOFFSET))
        outfd.write('Configuration size: {0}\n'.format(self._config.CONFSIZE))
        for task, address in data:  # iterate the yield values from calculate()
            outfd.write('{0}\n'.format(delim))
            outfd.write('Process: {0} ({1})\n\n'.format(task.ImageFileName, task.UniqueProcessId))
            proc_addr_space = task.get_process_address_space()
            conf_data = proc_addr_space.read(address + self._config.YARAOFFSET, self._config.CONFSIZE)
            config = self.parse_structure(conf_data)
            for i in config:
                outfd.write('\t{0}\n'.format(i))

This code is also available on my GitHub.

In a nutshell, you first have a signature to key on for the configuration data. This is a fully qualified YARA signature, seen as:

signatures = {
    'javarat_conf' : 'rule javarat_conf {strings: $a = /port=[0-9]{1,5}SPLIT/ condition: $a}'
}

This rule is stored in a Python dictionary format of 'rule_name' : 'rule contents'.

The plugin allows a command line argument (-Y) to set the the YARA offset. If your YARA signature hits 80 bytes past the beginning of the structure, then set this value to -80, and vice versa. This can also be hardcoded by changing the default value.

There a second command line argument (-C) to set the size of data to read for parsing. This can also be hardcoded. This will vary based upon the malware; I've seen some multiple kilobytes in size.

Rename the Class value, seen here as JavaRATScan, to whatever fits for your malware. It has to be a unique name. Additionally, the """ """ comment block below the class name contains the description which will be displayed on the command line.

I do have an optional rule to limit the search to a certain subset of processes. In this case, only processes that contain the word "java" - this is a Java-based RAT, after all. It also skips any process of "VMWareUser.exe".

The plugin contains a parse_structure routine that is fed a block of data. It then parses it into a list of items that are returned and printed to the screen (or file, or whatever output is desired). This will ultimately be unique to each malware, and the optional function of make_printable() is one I made to clean up the non-printable characters from the output, allowing me to extending the blocked keyspace.

Running the Plugin


As a rule, I place all of my Volatility plugins into their own unique directory. I then reference this upon runtime, so that my files are cleanly segregated. This is performed via the --plugins option in Volatility:

E:\Development\volatility>vol.py --plugins=..\Volatility_Plugins

After specifying a valid plugins folder, run vol.py with the -h option to ensure that your new scanner appears in the listing:

E:\Development\volatility>vol.py --plugins=..\Volatility_Plugins -h
Volatile Systems Volatility Framework 2.3_beta
Usage: Volatility - A memory forensics analysis platform.

Options:
...

        Supported Plugin Commands:

                apihooks        Detect API hooks in process and kernel memory
...
                javaratscan  Extract JavaRAT Configuration from Java processes
...

The names are automatically populated based upon your class names. The text description is automatically pulled from the "docstring", which is the comment that directly follows the class name in the plugin. 

With these in place, run your scanner and cross your fingers:



For future use, I'd recommend prepending your plugin name with a unique identifier to make it stand out, like "SOC_JavaRATScan". Prepending with a "zz_" would make the new plugins appear at the bottom of Volality's help screen. Regardless, it'll help group the built-in plugins apart from your custom ones.

The Next Challenge: Data Structures


The greater challenge is when data is read from within the executable into a data structure in memory. While the data may have a concise and structured form when stored in the file, it may be transformed into a more complex and unwieldy format once read into memory by the malware. Some samples may decrypt the data in-place, then load it into a structure. Others decrypt it on-the-fly so that it is only visible after loading into a structure.

For example, take the following fictitious C2 data stored in the overlay of an executable:

Offset      0  1  2  3  4  5  6  7   8  9 10 11 12 13 14 15

00000000   08 A2 A0 AC B1 A0 A8 A6  AF 17 89 95 95 91 DB CE   .¢ ¬± ¨¦¯.‰••‘ÛÎ
00000016   CE 96 96 96 CF 84 97 88  8D 92 88 95 84 CF 82 8E   Ζ––Ï„—ˆ’ˆ•„Ï‚Ž
00000032   8C 03 D5 D5 D2 08 B1 A0  B2 B2 B6 AE B3 A5 05 84   Œ.ÕÕÒ.± ²²¶®³¥.„
00000048   99 95 93 80                                        ™•“€

By reversing the malware, we determine that this composed of Pascal-strings XOR encoded by 0xE1. Pascal-string are length prefixed, so applying the correct decoding would result in:

Offset      0  1  2  3  4  5  6  7   8  9 10 11 12 13 14 15

00000000   08 43 41 4D 50 41 49 47  4E 17 68 74 74 70 3A 2F   .CAMPAIGN.http:/
00000016   2F 77 77 77 2E 65 76 69  6C 73 69 74 65 2E 63 6F   /www.evilsite.co
00000032   6D 03 34 34 33 08 50 41  53 53 57 4F 52 44 05 65   m.443.PASSWORD.e
00000048   78 74 72 61                                        xtra

This is a very simple encoding routine, which I made with just:

items = ['CAMPAIGN', 'http://www.evilsite.com', '443', 'PASSWORD', 'extra']
data = ''
for i in items:
    data += chr(len(i))
    for x in i: data += chr(ord(x) ^ 0xE1)


Data structures are a subtle and difficult component of reverse engineering, and vary in complexity with the skill of the malware author. Unfortunately, data structures are some of the least shared indicators in the industry.

Once completed, a sample structure could appear similar to the following:

struct Configuration
{
    CHAR campaign_id[12];
    CHAR password[16];
    DWORD heartbeat_interval;
    CHAR C2_domain[48];
    DWORD C2_port;
}

With this structure, and the data shown above, the malware reads each variable in and applies it to the structure. But, we can already see some discrepancies: the items are in a differing order, and some are of a different type. While the C2 port is seen as a string, '443', in the file, it appears as a DWORD once read into memory. That means that we'll be searching for 0x01BB (or 0xBB01 based on endianness) instead of '443'. Additionally, there are other values introduced that did not exist statically within the file to contend with.

An additional challenge is that depending on how the memory was allocated, there could be slack data found within the data. This could be seen if the malware sample allocates memory malloc() without a memset(), or by not using calloc().

When read and applied to the structure, this data may appear as the following:

Offset      0  1  2  3  4  5  6  7   8  9 10 11 12 13 14 15

00000000   43 41 4D 50 41 49 47 4E  00 0C 0C 00 00 50 41 53   CAMPAIGN.....PAS
00000016   53 57 4F 52 44 00 00 00  00 00 00 00 00 00 17 70   SWORD..........p
00000032   68 74 74 70 3A 2F 2F 77  77 77 2E 65 76 69 6C 73   http://www.evils
00000048   69 74 65 2E 63 6F 6D 00  00 00 00 00 00 00 00 00   ite.com.........
00000064   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00   ................
00000080   00 00 01 BB                                        ...»

We can see from this that our strategy changes considerably when writing a configuration dumper. The dumper won't be written based upon the structure in the file, but instead upon the data structure in memory, after it has been converted and formatted. We'll have to change our parser slightly to account for this. For example, if you know that the Campaign ID is 12 bytes, then read 12 bytes of data and find the null terminator to pull the actual string.

This just scratches the surface of what you can do with encrypted data in memory, but I hope it can inspire others to use this template code to make quick and easy configuration dumpers to improve their malware analysis.

6 comments:

  1. Great post! I myself use similar techniques for analyzing encrypted code segments. But a different question. How are you doing your color coding theme? It looks very nice!

    ReplyDelete
  2. Thanks Sander. The code segment was done through http://hilite.me and the monokai color scheme. I then modified the CSS to do the yellow backgrounds.

    ReplyDelete
  3. No shortage of poison Ivy that's for sure. We've been running something similar to pull out the interesting bits for PI but without Volatility.

    Fodder:

    http://totalhash.com/network/av:*poison*

    We take the fodder and re-run them through a more rigorous process to pull out the config.

    ReplyDelete
    Replies
    1. @totalhash: Have you tried using the PoisonIvy plugins for Volatility?

      http://code.google.com/p/volatility/source/browse/trunk/contrib/plugins/malware/poisonivy.py

      You can move it from the "contrib" folder to to the volatility/plugins folder or run with the --plugins= option

      You can see an example of the config file being pulled out here:

      http://volatility-labs.blogspot.com/2012/12/what-do-upclicker-poison-ivy-cuckoo-and.html

      Delete
  4. Very cool post. I'm starting to use volatility so i found it very helpful. One question though. You mention that "earlier analysis" gave you the info that you used to create your yara rule, which looked for the string /port=[0-9]{5}/. Could you shed some light on that process? Specifically, how did you get the hex/ascii output, was it from pcap?

    ReplyDelete
    Replies
    1. That earlier analysis was from doing a static analysis of the sample, finding its decryption routine, and applying that manually to the data. However, within a few days, new samples changed to a different decryption routine and we had to get a report out quickly. So, this helped up get the results and produce a report without having to go back in and determine the new decryption. We still did it, but afterward when there was downtime.

      Had I to do it again, static analysis showed that variables were being parsed and split by the string "SPLIT". I would've just ran it, attached a debugger (or used volatility), and searched for "SPLIT" to find the configuration block. That's the most straight forward way. A quicker way would be to capture the C2 network DNS, and search memory for that domain name to find the appropriate block of data.

      Delete