14 September 2015

Solving the 2015 FLARE On Challenges

The second annual FLARE On is a reverse engineering challenge put forth by the FireEye Labs Advanced Reverse Engineering (FLARE). While accepted as a very advanced and tactical recruiting method, it resonates with those who love CTF challenges.

In 2014 the inaugural FLARE On presented seven challenges. As a finisher, you can read my write-up here. Each participant has a different take on the challenges. Each person has different methods, skills, and strengths. Mine are forged by years of forensics, log analysis, and working a mission where results are required regardless of ability, training, or excuses. At the end of this post I've linked to other write-ups that I've seen.

Let's begin by setting a level of expectation. You are reading a blog named GhettoForensics. The ultimate goal of Ghetto Forensics is to get by with whatever tools and knowledge you have to complete a mission. You will not find first-rate techniques and solutions here. In fact, when presented with multiple options, I often went out of my way to choose to worst, most cringe-worthy option available. For the lulz, and to show that you don't need advanced reverse engineering training and experience to survive the industry. I hope you enjoy.

For simplicity sake, unless necessary all IDA output will be as decompiled.

Without further ado.

Challenge #1

Let's roll up our sleeves and ... oh, nevermind, there's the routine.

The routine takes a given email address through ReadFile(), XOR's it by 0x7D, and compares it to an embedded value. So, just find that value in the executable with WinHex (one of my favorite tools) and XOR it there to get the answer. WinHex lets you just highlight text and do basic on-the-fly modification (rotate, addition, subtraction, XOR, etc).


Challenge #2

The difficulty jump to challenge 2 was slightly higher than expected for many people. It's unfortunate that most dropped out here.  #2 was best done in a debugger, and actually best demonstrated with IDA graph view:

This is a routine where I would re-implement the instructions, step by step. Load the values into a python script, mimic the values, and after each step make sure my script produces the same result as the debugger, until all done. The challenge takes an encoded value stored in-line with the code and decodes it. This value is best seen referenced in a debugger, but is seen here statically:

We see it load a WORD value of 0x1C7 into AX, but it actually only uses the lower half 0xC7. From there, just basic register operations. I used the ROL function found on the comments of a Didier Steven's post.

def rol(byte, count):
    byte = (byte << count | byte >> (8 - count)) & 0xFF
    return byte

email = '\xAF\xAA\xAD\xEB\xAE\xAA\xEC\xA4\xBA\xAF\xAE\xAA\x8A\xC0\xA7\xB0\xBC\x9A\xBA\xA5\xA5\xBA\xAF\xB8\x9D\xB8\xF9\xAE\x9D\xAB\xB4\xBC\xB6\xB3\x90\x9A\xA8'
email = email[::-1]
AH = AL = AX = BX = DX = 0
result = ''

for i in range(0, len(email)):
    AH = rol(1, DX)
    AL = (ord(email[i]) - AH - 1) ^ 0xC7
    BX = BX + ord(email[i])
    DX = BX & 3
    result += chr(AL)
print result

When executed, this script prints the email address of:


Challenge #3

I loved #3, mostly because I love goats. Who doesn't?

When you look at the executable, it has the tell-tale icon for a Python executable. This makes things a bit easier:

I've worked a lot with Python executables and knew where to go. You would eventually find it through static analysis, it looks for a "PYZ" overlay in the executable, decompresses it, and runs the resulting compiled Python code:

Everyone has their favorite tools for dealing with such instances. My go-to is pyinstextractor, hosted on SourceForge. Run this against the original executable and it'll dump the results in your current directory. Now, the issue with this, which had me confused for honestly 30 minutes, is that it will overwrite anything in your directory. As it dumped the Python code to a file named 'elfie', overwriting the executable of 'elfie', I scrambled trying to find the original source. I didn't think to look again at the original file to realize it was overwritten. After a herp-derp moment, I opened the file and saw legitimate Python code, though obfuscated:

OOO0000O0OO0OOOOO000O00O0OO0O00O += 'Rbk51WXI4dmRaOXlwV3NvME0ySGp'
##removed for brevity##
import base64
exec(base64.b64decode(OOO0OOOOOOOO0000O000O00O0OOOO00O + O0O00OO0OO00OO00OO00O000OOO0O000 + O00OO0000OO0OO0OOO00O00000OO0OO0 + O00OO00000O0OOO0OO0O0O0OO0OOO0O0 + ...

In this 56,694 line script there are thousands of variables holding what is obviously Base64 encoded data. While you could manually rename these and rebuild them, you could also just replace 'exec' with 'print' :)

The result is another massive Python script. But, in this case, it's only 48 lines and the email is pretty apparent, though in reverse:

Reverse it out to show:


Challenge #4

This challenge was a UPX-packed executable (youPecks. You-P-Ecks. UPX. Hah!) that, when unpacked, showed some unusual results:

A view through the unpacked main() shows that it takes an integer command line argument and performs an MD5 hash of it. Tracing this MD5 data we see that it is used to proceed to the second part, but has no other purpose. So, unnecessary and can be patched away.

UPX is very easy to work with, if you've never done it before. Open the unpacked version in IDA to find the entry point. Open in a debugger and scroll down until you see a JMP followed by a lot of DBs. Follow that jump, then go to the appropriate entry point and set a breakpoint. Done.

As we debug it, we see that 2 + 2 does, indeed, equal 4. This is a good sign.

The code does a few health checks. If 2 + 2 = 5, it would quit. If there wasn't an argument, it would quit. 

In this case, I know that a successful MD5 check will jump to a new location in the same function. So, before it even does that check, I'll just manually enter a jump to that new location:

After this, I follow down until I see the pretty apparent decoding routine:

I don't even bother at this point. I just trace the results in memory as this loops and out shoots the email address:


#ProTip: If you think you went too far in any program and missed what you're looking for, just search memory for "flare-on.com". In Olly/Imm open Memory map, go to top, and Ctrl-L / Ctrl-B down.

Challenge #5

This was an awesome challenge, and a good change-up from what we're used to. In it we have an application that takes information from a local file, key.txt, and transmits it to a remote server. Given in the challenge is this application and a PCAP of the traffic, from which we need to recreate the original key.txt.

An analysis of the PCAP shows multiple HTTP POST sessions, each containing four bytes of ASCII. The final session contains the text "ZW==" which especially signifies that it is Base64 data.

Instead of ripping them out piece by piece, I just dump and reformat with a script:

Yes, I could've used scapy or dpkt, but where's the ghetto in that? :)

We'll come back to that string later. Let's take a look at the application now. The sender is extremely basic and can be summarized in a very small main():

The contents of key.txt are read in, passed into encode_flarebearstare(), chunked into 3-byte segments, each Base64 encoded and transmitted by HTTP. What we really care about is the encoding routine, which is also pretty basic:

The value of each byte in the key is added to by its respective value of the string 'flarebearstare'.

That's all.

Can I just take a moment to say how awesome I think 'flarebearstare' is? I think they named their team FLARE solely to use that phrase, and I would've done the same!
To decode, then, we just need to Base64 decode the transmitted text and then take each byte and _subtract_ its respective 'flarebearstare' value. Easy peasy.

But, not so.

A first pass gave exceptions of negative numbers. Huh, that's weird. OK, we'll just make sure the result is a positive. and ... Nope. WTF?

A closer look at the application eventually shows the issue. The Base64 alphabet is wrong. The case is swapped!

After a few side tests, the only output difference is swapped case in the output string. With that, I take the transmitted Base64 string, swap the case, and it decodes perfectly with this script:

import base64
key = 'flarebearstare'
data_base64 = 'UDYs1D7bNmdE1o3g5ms1V6RrYCVvODJF1DpxKTxAJ9xuZW=='.swapcase()
data = base64.b64decode(data_base64)

result = ''
for i in range(0, len(data)):
    result += chr(ord(data[i]) - ord(key[i%14]))
print result


Challenge #6

By this point, I was feeling good. There were no big hurdles, the challenges were fun, and I was getting to exercise some brain cells that had gone dormant from drinking. Until I got to challenge 6.

Then it was all like.

This challenge was an Android APK that, when executed, displays a screen to input an email address. I'll jump to the chase on this one; there's really only one function of note in this library, Java_com_flareon_flare_ValidateActivity_validate. There's some basic math operations here, but I'll let the other write-ups talk to those.

The algorithm checks to see if the passed input is 46 bytes. It will then take two bytes at a time, perform magic math on those two bytes, and then compare the results to a respective output array. With 23 arrays, the results seem simple. Do the math on each two bytes, if those bytes match the array, then they are correct.

Beyond that, I have no clue what this function is doing. I know what I've been told it's doing, I've read other people's explanations of it, and even had someone afterward sit down and walk me through it. Nope. Still no clue. I do believe that the brain is sometimes 'color blind' to things it shouldn't be, and this challenge fell within that for me.

After spending a month poking at this on almost a daily basis, I had mentally given up. The answer eventually came to me and, upon completion on 28 Aug, I even made a public joke about this based on the time durations of my challenges :)

Dat Gap Doe :D

After a week of trying to reimplement the routine in Python, I gave up. There was just too many unknowns to deal with with Python's limited type casting, when you don't know what the intent of the code is. I needed to know what the expected outputs should look like. Therefore, I attempted to debug it using various local Android virtual machines. I first tried to use GenyMotion which failed as they removed all ARM support. I then switched to BlueStacks. However, that has a 'broken' NAT implementation that only allowed outgoing traffic. And AndyVM kept crashing on a regular basis when making connections.

From there, I installed the IDA server on my own HTC One M7, which worked, but I then ran up against IDA Pro issues:

At this point, I was greatly urged to recreate the routine in C++, which I'm very weak at. I spent a few days trying to adapt to GCC, then gave up again.  It wasn't until someone noted that they had the code completely reimplemented that I learned you could just use Visual Studio, include 'windows.h', and have functional IDA decompiler code. I quickly installed VS2015, then worked to reimplement the routine, with a simple brute force wrapper that I stole from the Internet. I tested it out, running a set of two bytes and writing the block to the disk, comparing to the check tables. The structures checked out. More debugging helped show what was going on, to an extent.

For each run, I would copy one of the 23 check tables into the code, brute force it, and add that to my output email. This was made easy with the HxD hex editor as you can simply highlight a block of text and "Copy As C#", automatically formatting it for source code.

// FLARE6.cpp : Defines the entry point for the console application.

#include "stdafx.h"
#include "stdio.h"
#include "windows.h"

static const char alphabet[] = "abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_-.=!#$%+@";
static const int alphabetSize = sizeof(alphabet) - 1;

const unsigned char rawData[92] = {
 0x00, 0x00, 0x00, 0x00, 0x02, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00,
 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00

unsigned char table[6952] = {
 0x02, 0x00, 0x03, 0x00, 0x05, 0x00, 0x07, 0x00, 0x0B, 0x00, 0x0D, 0x00,
 0x11, 0x00, 0x13, 0x00, 0x17, 0x00, 0x1D, 0x00, 0x1F, 0x00, 0x25, 0x00,
 0x29, 0x00, 0x2B, 0x00, 0x2F, 0x00, 0x35, 0x00, 0x3B, 0x00, 0x3D, 0x00,
// Truncated for brevity
 0x2B, 0x7E, 0x2F, 0x7E, 0x35, 0x7E, 0x41, 0x7E, 0x43, 0x7E, 0x47, 0x7E,
 0x55, 0x7E, 0x61, 0x7E, 0x67, 0x7E, 0x6B, 0x7E, 0x71, 0x7E, 0x73, 0x7E,
 0x79, 0x7E, 0x7D, 0x7E

void validate(const char* email)
 char a7E7E[1]; // I dunno?
 char s[6952];
 int byte1 = 0;
 int i = 0;
 int table_pos = 0;
 int table_value = 0;

 memset(s, 0, 3476);
 memset(a7E7E, 0, 1);

 if (email[i])
  byte1 = email[i];
  if (email[i + 1])
   byte1 = (unsigned int)&a7E7E >= ((email[i] << 8) | email[i + 1]) ? (email[i] << 8) | email[i + 1] : 0;

  table_value = *(WORD *)((char *)&table + table_pos);

  while (!(byte1 % table_value & 0xFFFF))
   ++*(WORD *)&s[table_pos];
   byte1 = byte1 / table_value & 0xFFFF;
   if (byte1 <= 1) { goto LABEL_10; }
  table_pos += 2;
 } while (table_pos != 6952);


 if (!memcmp(&rawData, s, 92)) { printf("%c%c\n", email[i], email[i + 1]); }

void bruteImpl(char* str, int index, int maxDepth)
 for (int i = 0; i < alphabetSize; ++i)
  str[index] = alphabet[i];
  if (index == maxDepth - 1) { validate(str); }
  else { bruteImpl(str, index + 1, maxDepth); }

int main()
 char a[2];
 bruteImpl(a, 0, 2);
 return 0;

After running through each set of characters I obtained the email address:


Challenge #7

With #6 done, I first took a break to cry tears of relief into a bottle of bourbon. With two weeks left I had low expectations for finishing and so decided to have fun with the rest of the challenges.

Challenge 7 was another console application where you were to enter in a valid password (clue there, not an email). The trick here is that it is a .NET application.

Loading it into ILSpy/Reflector shows initially that there is encoding of some sort: the function names are all junk unicode names.

Running the file through de4dot produces output that is much more usable for analysis.

Like some of the later challenges, we see a lot of excessive junk here. There are five namespaces, 
each containing multiple attributes and classes. For now, ignore the attributes and focus on the handful of class files. Of those, one stands out as relevant:

namespace ns2
    using ns1;
    using System;
    using System.IO;
    using System.Reflection;
    using System.Security.Cryptography;
    using System.Text;

    internal class Class3
        /* private scope */ static void Main(string[] args)
            Class1 class2 = new Class1();
            byte[] buffer = new byte[] { 0xec, 0x35, 0xdd, 0x8f, 0xb3, 0xd9, 0xcb, 0x17, 0x57, 0x7e, 40, 0x41, 0x42, 230, 0x98, 180 };
            byte[] buffer2 = new byte[] { 
                0x1f, 100, 0x74, 0x61, 0, 0x54, 0x45, 0x15, 0x73, 0x61, 0x6d, 0x1d, 0x4f, 0x44, 0x15, 0x68, 
                0x73, 0x68, 0x15, 0x54, 0x4e
            byte[] bytes  =  new byte[] { "Warning! This program is 100% tamper-proof!"             };
            byte[] buffer4 = new byte[] { "Please enter the correct password:"             };
            byte[] buffer5 = new byte[] { "Y U tamper with me?" };
            byte[] buffer6 = new byte[] { "Thank you for providing the correct password."  };
            byte[] buffer7 = new byte[] { "Use the following email address to proceed to the next challenge:"};
            string str = Console.ReadLine().Trim();
            string str2 = smethod_0(class2, buffer2) + '_' + smethod_3();
            if (str == str2)
                Console.WriteLine(smethod_1(str, buffer));

        /* private scope */ static string smethod_0(Class1 class1_0, byte[] byte_0)
            byte[] buffer = smethod_2();
            string str = "";
            for (int i = 0; i < byte_0.Length; i++)
                str = str + ((char) (byte_0[i] ^ buffer[i % buffer.Length]));
            return str;

        /* private scope */ static string smethod_1(string string_0, byte[] byte_0)
            RijndaelManaged managed = (RijndaelManaged) Rijndael.Create();
            byte[] buffer = new byte[] { 
                0x1a, 0xcb, 20, 0x9c, 0xc4, 15, 0x38, 0x5e, 0x77, 0xe3, 0x31, 0x42, 0x24, 0xfc, 0x92, 0xc3, 
                0x77, 80, 0xdf, 0x67, 0xfb, 240, 0x3d, 0x27, 10, 0x16, 150, 0x8e, 0xa2, 0xa7, 100, 0x99
            byte[] bytes = new Rfc2898DeriveBytes(string_0, byte_0.Length) { Salt = byte_0 }.GetBytes(0x20);
            managed.IV = new byte[0x10];
            managed.Key = bytes;
            managed.Mode = CipherMode.CBC;
            managed.Padding = PaddingMode.ANSIX923;
            RijndaelManagedTransform transform = (RijndaelManagedTransform) managed.CreateDecryptor(managed.Key, managed.IV);
            MemoryStream stream = new MemoryStream(buffer);
            CryptoStream stream2 = new CryptoStream(stream, transform, CryptoStreamMode.Read);
            StreamReader reader = new StreamReader(stream2);
            string str = reader.ReadToEnd();
            return str;

        /* private scope */ static byte[] smethod_2()
            return Assembly.GetExecutingAssembly().ManifestModule.ResolveMethod(0x6000001).GetMethodBody().GetILAsByteArray();

        /* private scope */ static string smethod_3()
            StringBuilder builder = new StringBuilder();
            MD5 md = MD5.Create();
            foreach (CustomAttributeData data in CustomAttributeData.GetCustomAttributes(Assembly.GetExecutingAssembly()))
            byte[] bytes = Encoding.Unicode.GetBytes(builder.ToString());
            return BitConverter.ToString(md.ComputeHash(bytes)).Replace("-", "");

From this routine a few things pop out. A call to Console.Readline().Trim() takes in the password from the user. Immediately after, calls to smethod_0() and smethod_3() are performed with the results separated by an "_". If these match the input, you get the email. These functions also all take place within the same class, so we can ignore the remaining files.

One problem here is that part of the answer relies upon the metadata of the executable, a big block of metadata stored elsewhere in the file. Here's the original view of this data:

De-obfuscating the executable changes that block, so the resultant values will be completely different.

You can only work off the original. And that's not easy to do statically, nor with ILSpy.

Instead, we'll use dnSpy, which makes the solution almost effortless. In it we can simply look for the string builder with the underscore and the comparison immediately afterward:

Now, just debug. Step through the program until you get to this comparison, mouseover text2, and get your password



Re-run the program, type that in, and get your email!


Challenge #8

Challenge 8 was steganography, something that eluded many early in the challenge. The easy part of stego is having a wide selection of tools available. The hard part is knowing when to use them or not. I cannot even express the anguish over Robert Hanssen's actions and certain sectors of the forensic community having to use AnaDisk on every. single. floppy. disk. they processed. (In my knowledge, there were no positive results from trying it on every single investigation).

The challenge started with a plain executable, gdssagh. While it prints a single message to the screen, it almost entirely contains a single stream of Base64 encoded data. Extracting this data, removing the carriage returns, and decoding results in a pretty picture.

From that point, I just throw tools at it :)  Honestly, if you're looking at what could be steg, your first stops should be StegSolve and ZSteg. StegSolve allows you to manually manipulate the data until you see what could be hidden data. It acts as a good first pass of the data, especially when viewing color planes.

A color plane is a sliced view of an image solely off of the bit of a single color. What would an image look like if you only saw the Most Significant Bit of Green? This:

The story comes out when we look at the Least Significant Bits (shown here of each color):

At a basic level, this tells us where the data "is" for a certain bit plane. In the LSB, we see a significant black area at the top. These black areas show "no value" (null) bytes. From knowing an executable structure, you can make a fairly good guess that one is in there. At a close enough view, you could imagine picking out the MZ, This Program Cannot Be ..., and PE headers. With that, I play with StegSolve's Data Analyzer and focus on the LSB planes for red, green, and blue (since all had the same data structure):

Close. Let's try MSB first (basically switching the bit order) ... I get the same results? I try again, and again with different files. It's a bug in the tool.

I hop over to ZSteg which, when executed, immediately finds our executable :) It detected it as data on RGB planes, MSB first, on an X>Y orientation.

But, that StegSolver bug really got on my nerves. I dumped the output and then just changed the bit pattern myself, with more stolen internet code:

def reverse(x):
    result = 0
    for i in xrange(8):
        if (x >> i) & 1: result |= 1 << (8 - 1 - i)
    return result

data = open('stegsolver.out.dat', 'rb').read()
output = ''

for i in data:
    output += chr(reverse(ord(i)))
print output[0:4     ]
open('correct.exe=', 'wb').write(output)

And, that worked! I received a file with the same MD5 hash as ZSteg produced. Stupid bug... When executed, this program spat out the email:


Challenge #9

Now, we get to the harder challenges. This is where I can show my true ghetto analysis attitude! And where I start taking studious notes on everything. I have a week left to get three more challenges done, so the pressure is on.

And let's start off with a backhanded compliment of a program.

Followed by a look at some instructions and then a big sea of data.


I really dislike the IDA debugger (I'm heavily reliant on Right Click>Follow in Dump) but it's best for this challenge. There's a lot of code to get through and most of it useless and, for me, IDA does a better job of recognizing and assembling this code as you step along.

The first goal is to focus on the actual input portion in all of that. So, let's run it in the debugger, then step through until we get to the input. Set a breakpoint after that part, type in some unique junk ('ABCD_1234_ABCD_1234@flare-on.com'). Then start a debugger trace with Instruction Tracing. Then, hit F9, and relax.

This trace output contained 9,600 instructions. Not bad. Not easily readable either. Let's channel our inner Unix admin. I'm at an advantage: I work from home, I've already started growing out my neck beard.

Wait, what? Where am I going with all this ... We're looking for loops. We're looking for the same instructions to be called with varying registers. We've seeded the registers with somewhat unique values. I'm hoping to find a mov, xor, cmp, or something usable.

A first pass shows that there are no EAX = 00000031 or 00000065. After digging a little deeper, I see it:

I know that at 0x401A9C each respective byte is loaded into AL. Let's then poke around for any single-byte XOR's with 'grep'  (Are you cringing at this process yet? I know you are. And I like that.)

Boom! So at 0x012FDF8 are calls regarding single-byte XOR. This may not even be relevant, but I like to just log this stuff as I see it. While we're at it, let's hunt for any other math routines:

We know from our input breakpoint that the program picks up around 0x40173B. I can see that also as the top of a loop. Based on that, I can search through the trace to find the bottom of the loop that causes a jz/jnz back to there. I see that at 0x401BC8. So now we have a fairly confined boundary to focus on.

Since we see the routine looping, we can sort-of conclude that it's not exiting if a byte is wrong. Based on this, can we determine the overall email length? Let's try.

Run a new trace with a unique and long "email". For this test, I'll use:


Because we know each character is unique, and we know the location, we can run a simple:

At 41 bytes it stops checking bytes, so we have a pretty high fidelity guess to the email length. The only reason I do a sort | uniq here is that the results are repeated twice, for some reason. So they show up as 82 bytes (two checks of 41 bytes each).

At this point, I'll follow the code from AL all the way down to see what happens to it.

.text:00401A9C mov     al, [eax+ecx]
Stack[000007B0]:0012FDF4 mov     ah, [esp+ebx+0B4h]    ; XOR key as AH
.text:00401B14 rol     al, cl                          ; ROL key as CL
.text:00401B16 mov     ebx, [esp+ebx+2Ch]              ; Load cmpxchg value into EBX
Stack[000007B0]:0012FDF8 cmpxchg bl, dl

That last exchange, cmpxchg, was elusive to discover. When debugging, IDA would never display this opcode properly, nor the hex bytes around it, shown here at address 0x12FDF8:

I knew something was happening here, but could not determine exactly what. So, I switched to Immunity and saw the operation jump out:

At the very end, the respective input byte, performed with these operations, would be compared to a static table using cmpxchg. Knowing this, I think of all the possible ways to collect these values and map them out. Then I thought of the worst way possible... spreadsheets!

Yes. I loaded an Excel spreadsheet and, for each byte, marked the XOR byte, ROL byte, and ultimate CMPX value. Is that a look of disgust I see? Oh yeaaahh

Once the routine was discovered, that was about 5 minutes to collect, reverse, and decode the email of:


Challenge #10

Challenge 10 had a lot of different things going on but, at the end, it came down to a few small gimmick hurdles. Let's get to them one at a time. You're given an executable, loader. When executed it does quite a few things as I'll show in my awesome tool that's on Github and you should contribute to and I totally gave a demo on it at BlackHat 2015 Arsenal, Noriben.

-=] Sandbox Analysis Report generated by Noriben v1.6.2
-=] Developed by Brian Baskin: brian @@ thebaskins.com  @bbaskin
-=] The latest release can be found at https://github.com/Rurik/Noriben

-=] Execution time: 28.18 seconds
-=] Processing time: 0.20 seconds
-=] Analysis time: 4.90 seconds

Processes Created:
[CreateProcess] Explorer.EXE:1824 > "C:\FLARE\loader.exe "  [Child PID: 2700]
[CreateProcess] loader.exe:2700 > "%WinDir%\system32\ioctl.exe 22E0DC"  [Child PID: 3412]

File Activity:
[CreateFile] loader.exe:2700 > %UserProfile%\Local Settings\Temp\aut1.tmp  [File no longer exists]
[CreateFile] loader.exe:2700 > %WinDir%\system32\challenge.sys  [MD5: 399a3eeb0a8a2748ec760f8f666a87d0]  [VT: 0/57]
[DeleteFile] loader.exe:2700 > %UserProfile%\Local Settings\Temp\aut1.tmp
[CreateFile] loader.exe:2700 > %UserProfile%\Local Settings\Temp\aut2.tmp  [File no longer exists]
[CreateFile] loader.exe:2700 > %WinDir%\system32\ioctl.exe  [MD5: 205af3831459df9b7fb8d7f66e60884e]  [VT: 0/57]
[DeleteFile] loader.exe:2700 > %UserProfile%\Local Settings\Temp\aut2.tmp

Registry Activity:
[RegSetValue] services.exe:720 > HKLM\System\CurrentControlSet\Services\challenge\Type  =  1
[RegSetValue] services.exe:720 > HKLM\System\CurrentControlSet\Services\challenge\Start  =  3
[RegSetValue] services.exe:720 > HKLM\System\CurrentControlSet\Services\challenge\ErrorControl  =  1
[RegSetValue] services.exe:720 > HKLM\System\CurrentControlSet\Services\challenge\ImagePath  =  \??\C:\WINDOWS\system32\challenge.sys
[RegSetValue] services.exe:720 > HKLM\System\CurrentControlSet\Services\challenge\DisplayName  =  challenge
[RegSetValue] services.exe:720 > HKLM\System\CurrentControlSet\Services\challenge\Security\Security  =  01 00 14 80 90 00 00 00 9C 00 00 00 14 00 00 00
[RegSetValue] services.exe:720 > HKLM\System\CurrentControlSet\Services\challenge\Enum\0  =  Root\LEGACY_CHALLENGE\0000
[RegSetValue] services.exe:720 > HKLM\System\CurrentControlSet\Services\challenge\Enum\Count  =  1
[RegSetValue] services.exe:720 > HKLM\System\CurrentControlSet\Services\challenge\Enum\NextInstance  =  1
[RegSetValue] System:4 > HKLM\System\CurrentControlSet\Control\Class\{DDEEAAFF-1337-BEEF-8877-665511223344}\Class  =  challenge
[RegSetValue] System:4 > HKLM\System\CurrentControlSet\Control\Class\{DDEEAAFF-1337-BEEF-8877-665511223344}\NoDisplayClass  =  1
[RegSetValue] System:4 > HKLM\System\CurrentControlSet\Control\Class\{DDEEAAFF-1337-BEEF-8877-665511223344}\NoUseClass  =  1
[RegSetValue] System:4 > HKLM\System\CurrentControlSet\Control\Class\{DDEEAAFF-1337-BEEF-8877-665511223344}\Properties\Security  =  01 00 0C 90 00 00 00 00 00 00 00 00 00 00 00 00

At a high level, loader.exe is run as PID 2700. It drops aut1.tmp and aut2.tmp to %Temp%. After each, an immediate file is created in C:\Windows\System32. Respectively, challenge.sys and ioctl.exe.  Then, a service is created (shown as services.exe:720 as the source) to create a service named "challenge" to point to that challenge.sys.  We also then see a new Class created for that service. Finally, loader runs "ioctl.exe" with the argument of 22E0DC.

And those [VT 0/57] ratings? Come on people, you upload your challenges to VirusTotal? That should be an automatic disqualification.

Upon loading loader into IDA, we quickly see that it's the wrong way to go about this:

It's an AutoIt executable, for which there will be an encoded, embedded script. These are automatically extracted with aut2exe.exe, which will produce a script that begins with a few hundred lines of code for service management. Discard these; they're generic and copy pasted from elsewhere. Focus below that:

If @OSArch <> "X86" Then
    MsgBox(0, "Unsupported architecture", "Must be run on x86 architecture")
If @OSVersion = "WIN_7" Then
    FileInstall("challenge-7.sys", @SystemDir & "\challenge.sys")
ElseIf @OSVersion = "WIN_XP" Then
    FileInstall("challenge-xp.sys", @SystemDir & "\challenge.sys")
    MsgBox(0, "Unsupported OS", "Must be run on Windows XP or Windows 7")
FileInstall("ioctl.exe", @SystemDir & "\ioctl.exe")
$nret = dothis("0x96c581bc009905e76931875a583f97a738b764eb67f35c802194bf86123b943d1907619488a31a26cf29ba5f5e57ed5c5a37cb5d67dc2020a7e6d55cadefba32aba3ed77f0e18e41a571e74a8a7614a895d7c8827c46028761994543bf449138c65a6e7b5039792c85be5b4998c9950d2497f73cd88d186a6bffe3634bd250ec59e2", "flarebearstare")
If $nret Then
    If dothis("0x96d587b8139933d17e3598505e729da736bb66aa6cfa5180289fb6845530", "flarebearstare") Then
        dothis("0x9aee96b50da818d16f368556131aecfc69ef21a440f24fcc6bd1f3bd1e76db69574a6c8d81ed53688a7eaa364e53fd0700", "flarebearstare")

Func decrypt($data, $key)
    Local $opcode = "0xC81001006A006A005356578B551031C989C84989D7F2AE484829C88945F085C00F84DC000000B90001000088C82C0188840DEFFEFFFFE2F38365F4008365FC00817DFC000100007D478B45FC31D2F775F0920345100FB6008B4DFC0FB68C0DF0FEFFFF01C80345F425FF0000008945F48B75FC8A8435F0FEFFFF8B7DF486843DF0FEFFFF888435F0FEFFFFFF45FCEBB08D9DF0FEFFFF31FF89FA39550C76638B85ECFEFFFF4025FF0000008985ECFEFFFF89D80385ECFEFFFF0FB6000385E8FEFFFF25FF0000008985E8FEFFFF89DE03B5ECFEFFFF8A0689DF03BDE8FEFFFF860788060FB60E0FB60701C181E1FF0000008A840DF0FEFFFF8B750801D6300642EB985F5E5BC9C21000"
    Local $codebuffer = DllStructCreate("byte[" & BinaryLen($opcode) & "]")
    DllStructSetData($codebuffer, 1, $opcode)
    Local $buffer = DllStructCreate("byte[" & BinaryLen($data) & "]")
    DllStructSetData($buffer, 1, $data)
    DllCall("user32.dll", "none", "CallWindowProc", "ptr", DllStructGetPtr($codebuffer), "ptr", DllStructGetPtr($buffer), "int", BinaryLen($data), "str", $key, "int", 0)
    Local $ret = DllStructGetData($buffer, 1)
    $buffer = 0
    $codebuffer = 0
    Return $ret

Func dothis($data, $key)
    $exe = decrypt($data, $key)
    $exe = BinaryToString($exe)
    Return Execute($exe)

This is pretty straight forward. If Win7, drop this, if XP, drop that, otherwise do nothing. Beyond the dropping we see calls of hex strings to "dothis()" with a second argument of "flarebearstare". dothis() simply passes this along to decrypt() and executes the result. decrypt() is the odd ball out, taking a big string of shellcode and throwing it up into memory.

For now, extract the shellcode, convert to hex, save to file, and open in IDA (which is like three key presses with WinHex, just saying).

A 256 count loop to build an array with byte swapping, followed by a whole other loop that XOR's based on that array? My money's on RC4. Let's whip up a quick Python script with the encoded values and check:

from Crypto.Cipher import ARC4 as cipher
strings = ('96D587B8139933D17E3598505E729DA736BB66AA6CFA5180289FB6845530', '9aee96b50da818d16f368556131aecfc69ef21a440f24fcc6bd1f3bd1e76db69574a6c8d81ed53688a7eaa364e53fd0700', '96c581bc009905e76931875a583f97a738b764eb67f35c802194bf86123b943d1907619488a31a26cf29ba5f5e57ed5c5a37cb5d67dc2020a7e6d55cadefba32aba3ed77f0e18e41a571e74a8a7614a895d7c8827c46028761994543bf449138c65a6e7b5039792c85be5b4998c9950d2497f73cd88d186a6bffe3634bd250ec59e2')
for str in strings:
    dec = cipher.new("flarebearstare")

This results in the output of:

_StartService("", "challenge")
ShellExecute(@SystemDir & "\ioctl.exe", "22E0DC")
_CreateService("", "challenge", "challenge", @SystemDir & "\challenge.sys", "", "", $SERVICE_KERNEL_DRIVER, $SERVICE_DEMAND_START)

Nice!  Fill back into our original script to get:

If @OSArch <> "X86" Then
    MsgBox(0, "Unsupported architecture", "Must be run on x86 architecture")
If @OSVersion = "WIN_7" Then
    FileInstall("challenge-7.sys", @SystemDir & "\challenge.sys")
ElseIf @OSVersion = "WIN_XP" Then
    FileInstall("challenge-xp.sys", @SystemDir & "\challenge.sys")
    MsgBox(0, "Unsupported OS", "Must be run on Windows XP or Windows 7")
FileInstall("ioctl.exe", @SystemDir & "\ioctl.exe")
$nret = Exec(_StartService("", "challenge"))
If $nret Then
    If Exec(ShellExecute(@SystemDir & "\ioctl.exe", "22E0DC")) Then
        Exec(_CreateService("", "challenge", "challenge", @SystemDir & "\challenge.sys", "", "", $SERVICE_KERNEL_DRIVER, $SERVICE_DEMAND_START))
Yup, that was a pretty bit of work for such non-climatic results. I'm bored. Let's go look at ioctl.exe.

Welp, that was equally boring. Take a hex value as arg1, pass it along to DeviceIoControl as dwIoControlCode, where the hDevice (v7) is the "FileName" of \\.\challenge. So, take an arg and pass it to a memory-existent driver. Check.

Because I'm not a glutton for punishment on non-Fridays, I would typically focus on the XP driver for the rest. However, there's a glitch with that. The dwIoControlCodes in the XP are shown as as WORD values while the Windows 7 driver shows as proper DWORDs:

They both have the same functionality so for static analysis the Win7 driver may be more appropriate to use. There are a few things you should see with these drivers. There are 199 referenced functions. Typically, then, I'd sort functions by size and look at the smallest, then the largest. The largest are more fun here...

It's ... so beautiful. m0n0sapiens put it most succinctly:

Or, in a more disco groove:

If you follow the big three functions you'll see that all three end with data pushed into the same function, that feeds into this:

As with any unusual math routine that may be encoding, look for seed values and Google them. In this case, you'll see it referenced as XTEA (eXtended Tiny Encryption Algorithm), a well known routine. At the end of each of those three routines is a buffer passed into this decryptor. But, how are each called?

In this case, there is a single subroutine with a switch statement of 101 cases, each a DWORD value. If we find the one used by the dropper we see it pointing to the large "Triangle" routine. I'll point it out below along with the other three large ones (which I'll name Parse1, Parse2, and Parse3). I've modified this image to remove cruft:

Here we see the code sent from the dropper: 22E0DC, which points to that massive triangle function. Others have written up details of this function and how it works. I skipped it. It had no meaningful calls from it and wasn't related to the XTEA decryption routine, so I put it on the backburner.

I focus on the XTEA and work back. For each Parse routine this decryptor is called with a buffer of data and a buffer size. That size is slightly obfuscated just because it is set at the very beginning in a mess of other values. I'll do some magic photoshopping to demonstrate these.

Parse1() calls the decryptor with a 40 (0x28) byte buffer while the other two call it with an 80 (0x50) byte buffer. Each buffer is made up of individual global bytes that are created from subroutines underneath each Parse() routine. The obvious and professional route is clear from a static perspective. Follow the xref's back from each byte, grab the value, and populate it into the binary.

That's what others did. That's not how I roll. Let's do this live in a debugger. Our hurdle here is to attach to a device driver in memory. That would typically involve using WinDbg at a kernel level, which I do not know how to do (it's on my bucket list, trust me, right below base jumping in South America). I don't need to run it properly, I just need to throw it in memory for me to mess with.

So, I use CFF Explorer to modify the PE header, change the Subsystem to a DLL, and save it. I then debug rundll32.exe with an argument calling this new "DLL". It works!

I take the entry point as it appears in the debugger (0x9C0000) and rebase IDA. Now I can directly see where changes and calls are made. However, as I quickly learn, I have many errors in actually running this. The memory segments that it is loaded in are Executable only. So, in Immunity, switch to the memory map view and just set them all as Full Access.  (Didn't I warn you about how ghetto I was going to make this? You haven't seen anything yet!)

I throw calls to the three Parse() routines and notice that Parse1() ends with a blank buffer. Passing it into TEA fills it with garbage. I try to place data into the buffer, different junk comes back. This must be an INOUT buffer. But it's not populated at all. I trace the calls to populate these bytes back, set a few breakpoints, and see that they're never called. There are 40 conditions that are never met. From a debugger POV I can now try to change those conditions, or BP at each and change the Z flag. Or I can make ghetto calls (my personal favorite).

While in ntoskrnl space, just because I was arbitrarily sitting there, I pull the xref from each subroutine in IDA and just ... call them. One at a time. And watch the buffer fill. You can ghetto call because there are no arguments to pass in and no results back. It doesn't break the stack ... much.

I then call Parse1(), track it to the end, make the call, and get my email address:


You cringe at how I did that, but I got it done in just a few hours, so phooey on you.

Challenge #11

This is the final challenge and it shows. With the exception of the issues with #6, this sample took me the longest out of all challenges from this year and last. There is a lot going on. And, being honest, the other write-ups will explain this challenge much better than I and will provide more professional answers. Read at your own risk.

The executable, CryptoGraph, contains fairly customized encryption that is seeded by a command line argument to decrypt an embedded resource into, ultimately, a JPG. For one, I'm glad they used JPG so that we could avoid the whole GIF vs JIF debate.

Part One of this challenge is processing the command line argument directly against embedded data to produce a new set of data. This data will vary based on the argument passed and how many times it had to verify the data contents.

Part Two takes the results of Part One to seed an RC5 decryption of another embedded resource to the disk.

This seems fairly straightforward. We can brute force the command line options until we get a JPG. This is quite similar to the final challenge last year. However...

  1. The runtime duration of this application is approximately 15-20 hours.
  2. Even with the correct command line argument, the correct number of data loops needs to be determined. Running to the end will produce a garbage JPG.

Knowing that, I can see where people can write debugger scripts to fuzz registers or values at certain points. But, I have my limitations. I'm going straight in through the front door. That begins, however, with understanding what's going on. Therefore I spend a few days doing nothing but debugging, following traces, and keeping notes. A LOT of notes.

Based on such notes, I'm proud to share one of the worst ways possible of finishing this challenge successfully.

For one, now that I've read other write-ups, I feel foolish in missing one of the very first checks for a null value at 0x401714. Instead, I focused far past that. The issue here is that there are three distinct ways to view code in IDA: hex view, graph view, and decompiler view. Due to the sheer size of many functions I remained in hex view and decompiler view. However, as others learned during this challenge, graph view made it very easy to track unusual jumps past certain areas that should be reached. There's a learning lesson.

When checking for the first argument there is an early loop where the correct argument will match a value from the embedded resource and then skip to the rest. If it doesn't match, a global integer (which I've named Data_Checks) is incremented, and the process continues.

Past this is the main loop of the program, shown below, that repeats 32 times. Each time, the speed becomes slower and slower, based on the v16 value passed into Core_Decoding_Loops(), which often numbers in the millions.

  do                                            // Main Loop
    v16 = *(v15 - 44) + v31[398] * (result >> 4);
    v31[398] = v16;
    MD5_Chunks(&v32, (v15 - 40), 8);
    MD5_chunk_and_byteswap(&v32, &v36);
    Core_Decoding_Loops(&v34, &v36, v17, (v15 - 40 + 8), v17, v16, 1);
    memcpy_s(&Dst, 16u, &v34, 16u);
    v18 = v30;
    v25 = v30 + v31[5];
    v26 = 2 * v25 + 2;
    v24 = Malloc(4 * v26 | -(v26 >> 30 != 0));
    if ( v24 )
      EQUATION_RC6(&v24, &Dst, 16);
    RoundNum_of32 = v18 + 1;
    loop48_round = v27 + 1;
    v29 = v18 + 1;
      Loop1_48(&v24, v15, v15, 48, &RoundNum_of32);
    while ( loop48_round );
    if ( *v15 == v18 )
      MD5_Chunks(&v32, v15, 32);
      MD5_chunk_and_byteswap(&v32, &v35);
      v20 = (v15 + 32);
      v21 = 12;
      v22 = &v35;
      while ( *v22 == *v20 )
        v22 += 4;
        v13 = v21 < 4;
        v21 -= 4;
        if ( v13 )
          goto LABEL_24;
    if ( v24 )
    v15 += 48;
    result = RoundNum_of32;
    v23 = __ROL4__(v27, 1);
    v30 = RoundNum_of32;
    v27 = v23;
  while ( RoundNum_of32 < 32 );

There are a few references to incrementing Data_Checks and I tried my hardest to make sure the flow got to that value. After every loop that number incremented, which I took to be a good thing. (Spoiler Alert: It wasn't).

For example, in this flow graph, I continually tried to follow the cyan (blue) lines leading to Data_Checks.

After following all of the logic at this point, things started to make sense. The continual iterations were due to data not being found at certain offsets of the resource during each round of modifications. There appeared to be at least one exit condition on the loops that would prevent continuous processing at certain points. A proper command line argument should make the data shift correctly to break out of  such loops and speed up code execution. But, how do we test that theory?

There are many proper ways of doing it. Instead, here was mine: Find the slowest computing procedure and, after complete, patch the program to quit. Then brute force and see which number makes it end the soonest. For this, I chose to end immediately after that Core_Decoding_Loops(). Through standard execution, getting from the beginning and past that loop with an arbitrary argument would take two minutes. That sounded like a good spread. I went to the instruction after that call, used Immunity to change the code to "call _cexit" and patched the resulting bytes into the executable.

I wrote a quick Python script to brute force the numbers, timing out any process longer than 60 seconds, and waited.

import subprocess
seconds = 60

for i in range(0, 255):  # Honestly, I broke this up into 6 simultaneous scripts to run faster
    cmd = 'breakme.exe %d' % i
        stdout = subprocess.check_output(cmd, stderr=subprocess.STDOUT, timeout=seconds)
    except subprocess.TimeoutExpired:

Now, first, this is not the proper way of doing that. Second, that patch doesn't make the program actually exit, it just crashes it with an unknown software exception (0xc0000417). So I'd have a ton of numbers do nothing and a small handful that crashed.

Of the three command line arguments that crashed for being less than 60 seconds (205, 238, 240) 205 was unique in reaching that point in literally less than a second. That seemed odd enough to investigate further.

Using 205 as an argument changed the entire outlook of the program. Now, early checks that would increase the Data_Checks global value were skipped. On the very first pass, at 0x4016D4, a routine to ROR and XOR data was tested to ensure that the first DWORD was all nulls. Without a proper command line argument, it would appear similar to this:

However, once given 205, it produced:

Every additional check would also produce expected results, skipping large amounts of number crunching. Additionally, the Data_Checks value was never incremented. This value counts the number of loops in which the data validation failed, suggesting that this value should always stay null.

The second part of this challenge was determining that after every large round of computation, shown in pseudocode earlier, the data is re-encoded. As this data is integral to the second part, it needs to be correct before sending it back.  From letting the program run with '205' on a second computer overnight (12 hours to run), I discovered that it would produce a garbage JPG by default. Therefore, we need to break out of this loop before it reaches 32 rounds. But, how many rounds do we let it run?

Others found the clean answer to this problem by examining comparisons on the back end. Me? I had a jug of sangria and time to kill on a Saturday afternoon. So, I manually brute forced it while catching up on my Black Butler episodes. It turns out that it didn't take that long.

At the end of each round of checks I set a break point and disabled all prior others. I would run to this CMP EAX, 20 then, at the following JB, just change the C flag to cause it to break.

Each round produced junk JPGs until I hit round 10, opened the JPG expecting another round of garbage, and screamed like a teenage girl at a Justin Bieber concert. There I saw some sort of SportsBall player with an email!


After sending off the email I tried to figure out who this was and why he was there. TinEye reports him as Lionel Messi who is apparently a good SportsBall player. Or, is he?

There you have it. This was an amazingly fun challenge (except #6) and I learned much along the way. I am now prepared to go back and re-do the challenges using the methods detailed by others. My methods tend to be very brute-force-ish, very 'mess with things in memory until they work', CTF-speed hacks. But I am slowly forcing myself to learn the proper methods: WinDbg/GDB scripts, PIN tracing, more IDAPython, debugger memory fuzzing.

How often in life we complete a task that was beyond the capability of the person we were when we started it.
—  Robert Brault

The Prize

Last year FLARE presented each winner with a serialized challenge coin (RMO) for their completion order. I received coin 0x83 (#131). This year they changed it up a bit and introduced ... a FLARE belt buckle!

Jokes aside, it's an awesome design and is self-supporting.

Additional Write-Ups

FireEye's Official Solutions
Topher Timzen's A Successful Yet Failed Flare ON Challenge - The Write-up
AcidShout's 2015 FLARE-ON challenges writeup
Reno Robert's v0ids3curity writeup
Mohamed Shetta's FLARE On 2015 Walkthrough
z3r0zh0u's XLOYE Write Ups
Julien Perrot Flare On 2 write-up
A Disturbing Lack of Taste Challenges #7 and #8
0x0A Tang Solving for Hashes in Flare-On #5

Did you find benefit or enjoyment from this post? Was it a waste of your time? Please, leave feedback! I'm open to critiques, criticisms, and attaboys.  If you like it, I'll keep creating them. Though, next time, a more Forensics related one.


  1. Oh, so you added a link to my writeup, sweet! (I'm AcidShout)

    I added you too. Nice writeup.

    I just got two things to say:
    1) On #6, did you finally understand the algorithm? As in, can you make a solution that's not brute-force?
    2) On #11, did you see the function that had a hint on how many rounds the algorithm should do?


    1. Thanks!

      #6: I think I finally got an understanding in the last week, and that was actually thanks to your write-up. Even when writing this up I tried from scratch to go at it, and got lost again. Your quote definitely helped, though: "I got the number X, and I divided it N times by the prime P"

      #11: I saw afterward, but not during my analysis. I knew it was doing bit counting in that routine with some 'extras'. However, when I fully went through that function it was before I knew what that DWORD of data meant. I assumed it was just part of the passed data. After finishing the cmdline arg I should've gone back to the second half and re-analyzed these functions dynamically to see what each variable did. with the correct argument. I probably would have gotten to that if brute forcing didn't work as quick, but it did :)


    2. Heh, no problem!

      As for #6:
      what it does is loop over all the primes and check if the current group of 2 characters is divisible by that prime, then log it to the table. Using that table, we see that prime 0 is divided 3 times, and primes 6 and 36 are divided once (so if the algorithm is dividing, we do the opposite, multiplying, to reverse the process). prime 0 = 2, prime 6 = 17, prime 36 = 157. Therefore: (1 * 2 * 2 * 2) * (1 * 17) * (1 * 157) == 0x5368. We now have that, so we split it like: 0x53, 0x68, and converting it to ASCII gives us "Sh", which is the first two characters from "Shold_have_[...]@flare-on.com".

      You don't really understand something till you're able to teach it to your grandma, so please tell me if my "quick teach" is good enough :)

      P.S. Have you received your prize yet?

    3. That does make sense. It helped to work backwards from the final tables. If debugged it would be quick obvious, but being ARM made that the challenge :)

      I did receive my prize, updated the post to include that at the end.