/* CAT(1) */

By Jeff White (karttoon)

After I published a blog at $dayjob on how I came to realize that what I thought was a sample of Agent Tesla turned out to actually be a new malware called OriginLogger, a fellow threat researcher botlabsDev reached out some months later. They had noticed that the two GitHub repositories I referenced for the profile "0xFD3" each had a commit from a different account which exposed some new e-mail addresses to pivot on. Holy shit. I was unaware this was even a thing so looking into this further showed that each repository received an update from a different account on the same days the respective code was initially committed. This crucial piece of info took me down a fun little rabbit hole that I wanted to share wherein I was able to identify an individual who may be the developer behind one of the most prominent keylogger malware families - OriginLogger and Agent Tesla.

Starting with the two GitHub repositories, I wanted to show both of the commit logs and the connections that resulted from them.

For the first one, Chrome-Password-Recovery, we see a commit by "Omer Demir" with an e-mail address of "omer.demir-@hotmail.com" on March 11th, 2020. This was six months before the builder I found in the original blog, which was compiled in 2020 as well and that authenticated against the domain which led me to this GitHub profile originally.

Chrome-Password-Recovery $ git log commit 5d0c09a9c3e23004a08017dfc916196ac8971983 (HEAD -> master, origin/master, origin/HEAD) Author: Omer Demir <33671489+0Fdemir@users.noreply.github.com> Date: Wed Mar 11 04:23:11 2020 +0300 Update README.md commit 305ef43b4b138660bdfd1cdee638cce47e487769 Author: Omer Demir <omer.demir-@hotmail.com> Date: Wed Mar 11 04:14:39 2020 +0300 first commit commit ffbe0189c30804844e767dcfc2fa38aef1813b1d Author: Omer Demir <33671489+0Fdemir@users.noreply.github.com> Date: Wed Mar 11 03:50:52 2020 +0300 Initial commit

Checking the e-mail against the usual database leak sites revealed that the e-mail has been observed in breaches for "ledger.com", "leet.cc", and Tumblr. This also provided a bit more information about the account - specifically a middle name, address, and phone number located in Turkey. This is an important piece of information as will be discussed shortly.

Name Omer Faruk Demir Email omer.demir-@hotmail.com Address Miralay Rafet Sokak 34360 Istanbul Turkey Phone 5464200269

For the other repository, OutlookPasswordRecovery, there is a commit by account "0Fdemir" with the e-mail "ssfenks@windowslive.com" on the 15th of November, 2017 - three years prior to the above. At this point, seeing the "0Fdemir" moniker and recalling the "0xfd3" one, I realized the hex code used is representative of "Omer Faruk Daemir" or "0xFD". Definitely a cool moniker in my book but more importantly it links both accounts.

OutlookPasswordRecovery $ git log commit c6816ce933dd42d81048e658a58720f9f1f75cb3 (HEAD -> master, origin/master, origin/HEAD) Author: 0Fdemir <33671489+0Fdemir@users.noreply.github.com> Date: Wed Nov 15 01:00:06 2017 +0300 Update README.md commit 0f5208d79a3178c1e45ebbcf8afac19298374741 Author: 0Fdemir <33671489+0Fdemir@users.noreply.github.com> Date: Wed Nov 15 00:44:03 2017 +0300 Update README.md commit b459361e74b28584ef487d4929f5707663055265 Author: 0Fdemir <33671489+0Fdemir@users.noreply.github.com> Date: Wed Nov 15 00:43:39 2017 +0300 Update REEDME.md commit 67b9632d6bf147eb5ccec3e4f7fb8a8a0bee7d3d Author: 0Fdemir <ssfenks@windowslive.com> Date: Wed Nov 15 00:42:38 2017 +0300 nocommit commit f431ace4378c2db84e14f59cc3086b6eee4dd09d Author: 0Fdemir <33671489+0Fdemir@users.noreply.github.com> Date: Wed Nov 15 00:37:19 2017 +0300 Initial commit

This e-mail address is likewise observed in database leaks and was seen using the username of "agenttesla" from a forum dump.

In June 2015, two years prior to the aforementioned GitHub commit, I observed a post by user "sifenks", which is associated to the e-mail address "ssfenks@windowslive.com", to the "nulled.cr" forum titled "[FREE] [FUD] Agent Tesla Keylogger [Beta]". The post is an advertisement for an early version of Agent Telsa keylogger that could be downloaded at "http://www.agenttesla.com/en/download/free/". In the post, it also stated the following:

P.S.: Agent Tesla that my last project beta version with you! Please leave a message for requests , needs , bugs and errors. Program is tested. Each function working flawlessly. If you like and If you want to continuousness for Agent Tesla, please donate... Enjoy! P.S.2: Please close all AV! P.S.3: Please dont use Virustotal, jotti etc...

Focusing in on the wording here, Sifenks stated "my last project" and requests for users to message their account for "requests , needs , bugs and errors". This implies to me that the Sifenks account is a developer.

Going back a little further, this user was also observed in 2014 posting an earlier version of Agent Tesla on the hackforums.net website wherein they quickly responded to users and fixed bugs in the code that they found. One consumer of Agent Tesla posted "Before i talk with my error, i have to say sifenks is the most active person that responds to your problems! All in the other keyloggers, the keylogger creator hasn't spoke once!".

In 2018, a few months before Agent Tesla announced they were closing up shop and to instead use OriginLogger, Brian Krebs released an article titled "Who is Agent Tesla?" in which he details that the earlier version of Agent Tesla, in 2014, was made available on a Turkish-language WordPress site ("agenttesla.wordpress.com") before they eventually migrated to "agenttesla.com". Brian reported that the subsequent domain was registered in 2014 by a person named "Mustafa can Ozaydin" in Antalya, Turkey who used the e-mail address "mcanozaydin@gmail.com" before they were hidden behind WHOIS privacy services in 2016.

I'll be honest and say that it was a bit of an emotional roller coaster researching this and then finding out Krebs had written an article years ago about the very same topic. D'oh! As I started to read it though, I realized it's an entirely different person...What the hell? Brian's research here was solid, I recreated his steps and it all seemingly lined up so I was left wondering who the heck this other guy was and how he related to mine. It turned from a simple OSINT attribution exercise into a proper mystery.

In Brian's research he linked the Gmail address that registered the domain to a YouTube account by a Turkish individual with the same name who uploaded tutorials on using the Agent Telsa web panel. Brian went on to state that the administrator of the 24x7 live support channel for Agent Tesla had the same profile picture as a Twitter account "MCanOZAYDIN". This information was used to eventually identify Mustafa's LinkedIn profile. This profile listed Mustafa as a "systems support expert" for a hospital in Istanbul, Turkey at the time of his writing.

I decided to see if I could find any social media profiles for Omer and started by looking at Mustafa's Twitter account. One of the things I like to do when researching Twitter accounts is to look at the followers and following, which are displayed in the order in which they followed. It was a win in this scenario and one of the early accounts followed by Mustafa is for an Omer Demir.

Pivoting off of this account name "omerfademir" led me to a Facebook profile with the same profile picture (uncropped) and name.

A couple of interesting points to note, even if the profile is otherwise light on information. He listed self-employment in 2014, the year that Agent Tesla came out, and he studied at Anadolu University. Mustafa's LinkedIn also showed that he attended Anadolu University and both men started the same year in 2013. Taking it a step further with that connection, in one of the previously mentioned forum posts by the "sifenks" account, which was advertising an early version of Agent Tesla in 2015, a user reported a bug that stated they were receiving the following message - "The remote name could not be resloved: '48982689868.home.anadolu.edu.tr". It's unknown if they knew each other before college, but it stands to reason they knew each other while at university.

The profile also provides a clearer picture of the individual with additional photos and stated they were from Samsun, narrowing down the geography.

This led me back to LinkedIn to see if I could find a profile there with all of this new information. Filtering by "Omer Faruk Demir" resulted in 182 hits, adding location to 51 hits, and then alumni from Anadolu University to 18 hits. With the remaining hits, visual inspection and relative age comparison I was able to find his account. It provides a bit more background information, such as skills and knowledge, but further solidified the link to the Facebook profile and closed the loop.

Specifically, the timing they both attended Anadolu University, hailing from Samsun, studying in the languages used by Agent Tesla and OriginLogger, and finally the way the career experience is listed on both sites.

Comparing the LinkedIn for each individual, I'd say Mustafa's work experience shows him in more of a support role and less focused on development or coding whereas Omer's shows he has the skills and knowledge necessary to write the code for Agent Tesla and OriginLogger.

Based on this, I would surmise that Mustafa may have acted more in a support capacity for Agent Tesla, maybe keeping the business side running smoothly while Ömer worked on developing the malware itself. Once the Brian Krebs article came out and Agent Tesla suddenly closed down a few months later, it would appear Omer used the code to continue development under a new brand - OriginLogger.

Not too long after my blog came out, OriginLogger added a new exfiltration method for Discord...and then...silence. Updates stopped, the marketplaces to purchase it vanished, good and bad guys alike were asking "does anyone know where to find this??", and then the builder got pirated.

Fast-forward to late September 2023, it re-emerges on my radar at a new marketplace site - http://originpro.nl.

Note the giant "Join Telegram" button they've added to the new marketplace site? Of course fam.

Everyone will be shocked to see the admin is none other than the one and only 0xFD aka Omer Faruk Demir. :surprised_pikachu: But, if nothing else, at least they are done hiding now which helps connect all of the other research together.

I spent a few days reading through the chat history and found a couple of nuggets I'll share. The first one is that 0xFD states there have been no updates to OriginLogger because its feature complete.

So what's 0xFD been up to if not working on OriginLogger then? Apparently a new product called "OriginLoader" which is touted as an HTTP based botnet that comes complete with keylogging and all the usual bells and whistles.

0xFD doesn't like when people don't understand the difference between these two products and has to break it down repeatedly for potential customers. They are also very fond of crowdsourcing ideas for new features to add to either product. At this point, it's a hallmark of their character - always placing the customer first.

OriginLoader will be one to keep an eye on and see if it continues to evolve and be as successful as the other products.

Finally, in one post they share a video on installing the panel for OriginLoader and within the video, every instance where they open the file explorer is blurred out...except the very first frame of the video :facepalm:. It shows the file names for OriginLoader and further strengthens the idea that the developer is someone of Turkish origin due to the language pack.

Annnnnd that's a wrap! It was a fun OSINT rabbit hole to go down...*gets on soapbox* now please stop calling OriginLogger Agent Tesla.

By Jeff White (karttoon)

I suppose I've been living under a rock for the past 15 years but I had never heard of a "Rich Header" until fairly recently. The Rich Header (RH) is a structure of data found in PE files generated by Microsoft linkers since around 1998. My interest was spurred a few months back when I saw the following Tweet which linked to an excellent article about this undocumented artifact.

I highly recommend reading it but the TL;DR is that at some point Microsoft introduced a function into their linker which embeds a "signature" in the DOS Stub, right after the DOS executable, but before the NT Header. You've probably seen it a thousand times when looking at files and never realized it existed. Back in the early 2000's, when the existence of the header was known for a while, everyone originally assumed it included unique data to identify systems or people, such as with a GUID, and it spawned numerous conspiracy theories - they even nick named it "the Devil's Mark". Eventually someone got around to actually reverse engineering (RE) the linker and figured out how the structure of information was being generated and what it actually reflected. Turns out the tin-foils were half-right. The blog post linked above in the Tweet shows what is actually in them and, while not truly unique to a system or person, it can still serve for some identifying purposes to a certain degree. My interest was peaked!

Effectively, this structured data contains information about the tools used during the compilation of the program. So, for example, one entry in the array may indicate Visual Studio 2008 SP1 build 301729 was used to create 20 C objects. For each entry, a tool, an object type, and a "count" are represented. None of this is really super important at the moment but just know that on some level, there is a lightweight profile of the build environment that is encoded and stuffed into the PE file.

When I began reading more about this topic, I came across an article from Kaspersky stating they used Rich Header information to identify two separate pieces of malware with matching headers. This is exactly what I was hoping for so I read on. Back when the Olympic Destroyer campaign occurred, every vendor under the sun rushed to attribution - from Chinese APT to Russian APT and finally North Korean APTs. After the article throws some shade at Cisco Talos, they state the Rich Header found in one sample of the Olympic Destroyer wiper malware matched exactly with a previous sample used by Lazarus group, along with the fact that there was zero overlap across other known good or bad files. Sounded promising! Then they dive into the actual contents of the Rich Header though, an analyst at Kaspersky noted a major discrepancy. Specifically, a reference to a DLL within the wiper malware that didn't yet exist if the Rich Header tools were to be believed. The (older) Rich Header in the Lazarus sample showed it was compiled with VB6, but this DLL was introduced in later versions of VB. They concluded the Rich Header was planted as part of a false-flag operation.

In this case the Rich Header was used to disprove actor attribution - the reverse of what I expected but fascinating none the less. It's an interesting read but the main point I want to make is that an adversary had the foresight to use this lightweight profile as part of multiple false-flag plants to throw off researchers and put them on the path of inaccurate attribution. That, to me, implied that on some level it's a viable attribution mechanism if an adversary is going out of their way to plant false ones. There is some merit to the technique and, as such, what prompted me to write this blog. What I plan to do then is take you along as I look at Rich Headers and try to determine there place in the analysis phase of hunting. To do this, I'll try to answer the following questions:

Alright, without further ado then...

Rich Header Overview

It makes sense to start out by giving a quick overview of how the Rich Header is created. I won't be doing a deep-technical dive here because there are tons of great resources on it already, which I highly recommend reading.

So what does the Rich Header look like in the file?

Like I said above, you've probably seen it a thousand times and never realized it. Before diving in, here are a handful of key characteristics you should be aware of.

To parse this data out, I created a Python script (yararich.py) that will print out the structure while also generating YARA rules based off said structure. Below is an example of the parsed output.

[+] Parsed Rich Header Offset | Data // Meaning -------------------------------------------------- 0x0090 | 0x000C1C7B // entry 1 0x0094 | 0x00000001 // id12=7291, uses=1 0x0098 | 0x000A1F6F // entry 2 0x009C | 0x0000000B // id10=8047, uses=11 0x00A0 | 0x000E1C83 // entry 3 0x00A4 | 0x00000005 // id14=7299, uses=5 0x00A8 | 0x00041F6F // entry 4 0x00AC | 0x00000004 // id4=8047, uses=4 0x00B0 | 0x005D0FC3 // entry 5 0x00B4 | 0x00000007 // id93=4035, uses=7 0x00B8 | 0x00010000 // entry 6 0x00BC | 0x0000004D // id1=0, uses=77 0x00C0 | 0x000B2636 // entry 7 0x00C4 | 0x00000003 // id11=9782, uses=3 [+] Details File: ae9a4e244a9b3c77d489dee8aeaf35a7c3ba31b210e76d81ef2e91790f052c85.bin Clear Data: 44616E530000000000000000000000007B1C0C00010000006F1F0A000B000000831C0E00050000006F1F040004000000C30F5D0007000000000001004D00000036260B0003000000 Clear Data Hash: 0895C7ECCC58DECF2B8ED537BA6DCEEEF555B2229700894E986418F703F5E8E6 XOR Key: 0x2A497F97 Checksum Match: True

In this example, which is from the Olympic Destroyer wiper malware looked at in Kasperskys blog, you'll see the last entry in the array ("id11=9782, uses=3") displays three values. The ID of 9782 has been mapped to "Visual Basic 6.0 SP6", the object type of 11 indicates it will be C++ OBJ file created by the compiler, and then the "uses" value of 3 is how many of those object files were created.

I haven't been able to find any kind of truly comprehensive listing for the Tool ID/value mappings, but across the websites linked in this post you'll find various tables for them that cover most cases I've seen. These have been reverse engineered from the compilers/linkers over the years and also enumerated through extensive testing by diligent individuals by compiling a program, changing things, re-compiling and noting the differences.

YARA and Rich Headers

Alright, now that you have an idea of what information is available, I can talk about using YARA to hunt Rich Headers and some current limitations of this method. There are really four key parts to each Rich Header: the array itself and associated values, the XOR key, the integrity of the XOR key, and the full byte string which make up the entire Rich Header structure. These are effectively what you can utilize in YARA, minus verifying the integrity of the XOR key.

Throughout my analysis, I ran across quite a few YARA related gotcha's that I wanted to detail up front.

1) YARA's implementation of the Python PE module only parses out the Tool ID and the associated value (type of object). It does not take into account the "uses" count. This value, I feel, can be very important to the overall goal of using the Rich Header data as a profiling signature but there are cons to it as well. For example. there could be a major difference between a PE that used 4 source C files and 20 imports versus 200 source C files and 500 imports. Alternatively, if you used the count value you could miss related samples because the counts changed during the development of the malware, causing things like imports to change. I'll get into a concept of "complexity" for the Rich Headers later, but know that a higher complexity ensures a consistency across samples that means the coupling of only Tool ID/Object type is fine, whereas with a "low" complexity Rich Header would be easier to hunt on if counts were included.

2) In YARA there is not a concise way to establish the order of entries within the array when using the PE module as they are defined under the "conditions" section in YARA, thus not providing a mechanism to refer to the "location" (first match > second match). When you hunt with YARA, since the "uses" data is missing, you are limited to using the array entries in the PE module and will run into two problems. First, the order matters. The order in which the linker inserts these entries into the array, which is based on the input, should weigh into the idea of trying to match the same environment between samples and, if they are out of order, then even if the values were the same, it could be entirely different source code and layout.

Since Kasperky illustrated that the Rich Header is already on shaky ground in terms of how much you can trust them, making sure our rule is as precise as possible to match a specific environment takes a top priority.

Below is the output from a Hancitor malware sample.

Offset | Data // Meaning -------------------------------------------------- 0x0090 | 0x007BC627 // entry 1 0x0094 | 0x00000002 // id123=50727, uses=2 0x0098 | 0x00937809 // entry 2 0x009C | 0x0000000D // id147=30729, uses=13 0x00A0 | 0x00010000 // entry 3 0x00A4 | 0x00000044 // id1=0, uses=68 0x00A8 | 0x00E19EB5 // entry 4 0x00AC | 0x00000007 // id225=40629, uses=7 0x00B0 | 0x00DE9EB5 // entry 5 0x00B4 | 0x00000001 // id222=40629, uses=1

The way this is represented in YARA follows.

rule UNORDERED_ARRAY { condition: pe.rich_signature.toolid(123,50727) and pe.rich_signature.toolid(147,30729) and pe.rich_signature.toolid(1,0) and pe.rich_signature.toolid(225,40629) and pe.rich_signature.toolid(222,40629) }

Searching for this via VirusTotal will result in 527 hits (all of which are likely unrelated to Hancitor or the group behind it), but we can further refine this then by using the XOR encoded Tool ID/value entries, skipping every other 4-bytes (the "uses" value DWORD), and set the order via string match position in YARA.

FD 65 B2 38 B9 04 DC 6B B9 04 DC 6B B9 04 DC 6B 9E C2 A7 6B BB 04 DC 6B B0 7C 4F 6B B4 04 DC 6B B9 04 DD 6B FD 04 DC 6B 0C 9A 3D 6B BE 04 DC 6B 0C 9A 02 6B B8 04 DC 6B 52 69 63 68 B9 04 DC 6B

This can be turned into the below YARA.

rule ORDERED_XOR { strings: $entry1 = { 9E C2 A7 6B } $entry2 = { B0 7C 4F 6B } $entry3 = { B9 04 DD 6B } $entry4 = { 0C 9A 3D 6B } $entry5 = { 0C 9A 02 6B } condition: @entry1[1] < @entry2[1] and @entry2[1] < @entry3[1] and @entry3[1] < @entry4[1] and @entry4[1] < @entry5[1] }

This is effectively the same data but will return 5 hits, which is more accurate, and they are all actually related to Hancitor group; however, one major caveat to this method, and one that really makes it unfeasible in the end, is that the encoded values are of course tied to the XOR key. Womp womp. You'll recall that the XOR key is derived, in part, by the array and the DOS stub, thus if the the DOS stub is different or the Rich Header array is different, then the XOR key will follow suit and these won't match at all. At that point, you're better off just searching for the XOR key or raw/encoded data. It would be nice if YARA was patched to allow for ordering of the decoded array entries.

3) Next, the example above with Hancitor falls apart because of over-matching. Specifically, the above examples contain 5 entries. The last entry is always the linker and the one above that is usually the compiler entry, so we have 3 other entries related to the build environment. That's not a lot to go on and you'll frequently run into situations where samples with high entry counts can trigger a match simply due to having the same entries included. In general, I'd say the larger the array, the more likely it will be accurate and easier to search on but I'll prod this angle further during analysis.

I haven't extensively tested this next piece but a lot of the over matching seemed to stem from DLL's. The following is the parsed array from one of the matches pulled from the 500+ samples returned via the first Hancitor YARA example.

Offset | Data // Meaning -------------------------------------------------- 0x0090 | 0x00C7A09E // entry 1 0x0094 | 0x00000002 // id199=41118, uses=2 0x0098 | 0x00DF520D // entry 2 0x009C | 0x00000004 // id223=21005, uses=4 0x00A0 | 0x00E0520D // entry 3 0x00A4 | 0x0000000C // id224=21005, uses=12 0x00A8 | 0x00E1520D // entry 4 0x00AC | 0x00000004 // id225=21005, uses=4 0x00B0 | 0x00DD520D // entry 5 0x00B4 | 0x00000004 // id221=21005, uses=4 0x00B8 | 0x00937809 // entry 6 0x00BC | 0x00000006 // id147=30729, uses=6 0x00C0 | 0x007BC627 // entry 7 0x00C4 | 0x00000003 // id123=50727, uses=3 0x00C8 | 0x00010000 // entry 8 0x00CC | 0x000000A8 // id1=0, uses=168 0x00D0 | 0x00E19EB5 // entry 9 0x00D4 | 0x0000000E // id225=40629, uses=14 0x00D8 | 0x00DC9EB5 // entry 10 0x00DC | 0x00000001 // id220=40629, uses=1 0x00E0 | 0x00DB520D // entry 11 0x00E4 | 0x00000001 // id219=21005, uses=1 0x00E8 | 0x00DE9EB5 // entry 12 0x00EC | 0x00000001 // id222=40629, uses=1

You'll note that the entries are also out-of-order from the Hancitor sample but I digress. In addition to patching YARA to allow for specifying ordered array entries, another useful feature to address over-matching would be a way to specify bounds for the rule entries, something like Sample A matches if it only contains these Y entries.

4) The last item I'll touch on is searching by XOR key and the raw data. I clump these together because they, for the most part, represent the same thing even though the XOR key figures in the PE DOS header/stub, which can differ between samples, while the array remains the same.

Below is an example of using the XOR key and the raw data (hashed to avoid conflicting characters which may be in the raw data, such as quotes) in YARA.

pe.rich_signature.key == 0x6BDC04B9 hash.sha256(pe.rich_signature.clear_data) == "0b5d6dfcaba9d377a7f23fd219d921438862d9a2c8009f363767c778f936acef"

Note that the hash must be lowercase in YARA. I haven't looked into the underlying code to validate but it appears this function is doing a string comparison and if you use uppercase in the hash it will fail to match. Either way, both of those will return the same 4 hashes, which isn't unexpected.

When doing any serious hunting you'll want to attack Rich Headers from multiple angles. A full YARA rule generated from my script, using the above Hancitor sample, is below.

import "pe" import "hash" // File: hancitor/291fc089688d7b6dff031022298d1b939746f3d6dafbb5a419f2a2bbc29f614d_S1.exe rule UNORDERED_ARRAY { condition: pe.rich_signature.toolid(123,50727) and pe.rich_signature.toolid(147,30729) and pe.rich_signature.toolid(1,0) and pe.rich_signature.toolid(225,40629) and pe.rich_signature.toolid(222,40629) } rule ORDERED_XOR { strings: $entry1 = { 9E C2 A7 6B } $entry2 = { B0 7C 4F 6B } $entry3 = { B9 04 DD 6B } $entry4 = { 0C 9A 3D 6B } $entry5 = { 0C 9A 02 6B } condition: @entry1[1] < @entry2[1] and @entry2[1] < @entry3[1] and @entry3[1] < @entry4[1] and @entry4[1] < @entry5[1] } rule XOR_KEY { condition: pe.rich_signature.key == 0x6BDC04B9 } rule CLEAR_DATA { condition: hash.sha256(pe.rich_signature.clear_data) == "0b5d6dfcaba9d377a7f23fd219d921438862d9a2c8009f363767c778f936acef" }

Case Studies

I didn't have an easy way to directly access Rich Headers so I created a corpus of samples to test against. To start, I downloaded 21,996 samples which covered a wide range of malware families, actors, or campaigns - 962 different ones to be exact. These were identified by iterating over identifiers (eg "APT1", "Operation DustySky", "Hancitor") and, after removing duplicate files, I was left with a total of 17,932 unique samples. Out of that, 14,075 included Rich Header values, which was 78.4% of the total and seemed like a good baseline to begin testing against. This also highlights just how prevalent Rich Headers are in the wild.

I've parsed out the Rich Header from every one of these files and will be using that data for my analysis. As there is far too much to go through individually, I approached the problem with various analytical lenses and then explored interesting aspects that reared up during this exercise.

Rich Header Overlaps

The first thing I was really interested in was whether the Rich Headers overlapped between those identifiers as it may be a strong indicator for profiling. A lot of this analysis will be referencing the XOR keys as I feel these are really the defining feature of the Rich Header when it comes to hunting, since it encompasses everything.

Below are the top 10 XOR keys by unique sample count that shared more than one identifier.

1009 || 0xB6CCD171 || APT33,DropShot,TurnedUp 0547 || 0x89A99A19 || Grabsir,HideProc,POSFight,PoisonIvy,TinyNuke,TrickBot,V0lk 0215 || 0xD180F4F9 || 7ev3n,Paskod,PoisonIvy,TrickBot 0112 || 0xFA28276F || Gh0st,Zegost 0107 || 0x2F3719A6 || LazarusGroup,OpBlockbuster 0101 || 0x7DFEF65C || KillDisk,ModifyMBR,Petya 0090 || 0x69EB1175 || Alina,BigBong,Carbanak,CryptInfinite,CryptoBit,GodzillaLoader,Goopic,GovRAT,Hancitor,ModifyDNSSearchOrder,NitlovePOS,NitlovePOSDownloader,Odinaff,Oztratz,PClock,PandaBanker,PhiladelphiaRansom,RanserKD,ResolveParkedDomain,SevenPointedDagger,SundownEKMalwarePayload,TinyNuke,Toshliph,V0lk,ZCrypt,Zeprox 0073 || 0x89A56EF9 || CoreBot,KHRAT,OpSMNPoisonIvy,PoisonIvy 0072 || 0x886973F3 || AldiBot,Backoff,BlackShades,Blohi,Carbanak,CrypAura,CryptoDevil,DHSpyware,HideProc,Jorik,Maktub,NewPOSThings,Packrat,PoisonIvy,PonyForxx,PowerShellCaretObfuscation,Punkey,ShortenedURLRequest,TrickBot,Troldesh,UsesDynamicDNS,V0lk,Vilsel,iRAT 0066 || 0xC1FC1252 || Carbanak,CryptoJoker,CryptoWire,Downeks,GazaCyberGang,Ishtar,NICIndiaAttack,PClock,Patchwork,PhiladelphiaRansom,PowerShellEncodedCommand,PowershellEmpireLauncher,PowershellLoadDotNet,PsExec,TinyNuke,V0lk,WordCallsWinHTTPDll,njw0rm

0xB6CCD171 - APT33,DropShot,TurnedUp

Looking at the first entry, 1,009 unique hashes across three identifiers. TurnedUp was a piece of malware used in the DropShot campaign by APT33 so starting off, we have a known relationship between the samples. Furthermore, they have a fairly sizable Rich Header with 10 entries so it's less likely to be "generic". In total, I have 1,014 APT33 samples and 999 of them shared this XOR key. Looking at one of the samples which doesn't share that key highlights a previous point I made regarding the "counts". Check out the below parsed Rich Header output for 0xB6CCD171 and the output of 0xD78F6BA1 right next to it; I've highlighted the differences which exist only in the "uses" field.

Offset | Data // Meaning Offset | Data // Meaning -------------------------------------------------- -------------------------------------------------- 0x0090 | 0x00984E93 // entry 1 0x0090 | 0x00984E93 // entry 1 0x0094 | 0x00000002 // id152=20115, uses=2 0x0094 | 0x00000002 // id152=20115, uses=2 0x0098 | 0x00AB9D1B // entry 2 0x0098 | 0x00AB9D1B // entry 2 0x009C | 0x0000003E // id171=40219, uses=62 0x009C | 0x0000003D // id171=40219, uses=61 0x00A0 | 0x009E9D1B // entry 3 0x00A0 | 0x009E9D1B // entry 3 0x00A4 | 0x0000001A // id158=40219, uses=26 0x00A4 | 0x0000001A // id158=40219, uses=26 0x00A8 | 0x00AA9D1B // entry 4 0x00A8 | 0x00AA9D1B // entry 4 0x00AC | 0x000000AB // id170=40219, uses=171 0x00AC | 0x000000AB // id170=40219, uses=171 0x00B0 | 0x00837809 // entry 5 0x00B0 | 0x00837809 // entry 5 0x00B4 | 0x00000001 // id131=30729, uses=1 0x00B4 | 0x00000001 // id131=30729, uses=1 0x00B8 | 0x00010000 // entry 6 0x00B8 | 0x00010000 // entry 6 0x00BC | 0x000000B0 // id1=0, uses=176 0x00BC | 0x000000CE // id1=0, uses=206 0x00C0 | 0x00937809 // entry 7 0x00C0 | 0x00937809 // entry 7 0x00C4 | 0x00000013 // id147=30729, uses=19 0x00C4 | 0x00000015 // id147=30729, uses=21 0x00C8 | 0x00AF9D1B // entry 8 0x00C8 | 0x00AF9D1B // entry 8 0x00CC | 0x0000000E // id175=40219, uses=14 0x00CC | 0x0000000E // id175=40219, uses=14 0x00D0 | 0x009A9D1B // entry 9 0x00D0 | 0x009A9D1B // entry 9 0x00D4 | 0x00000001 // id154=40219, uses=1 0x00D4 | 0x00000001 // id154=40219, uses=1 0x00D8 | 0x009D9D1B // entry 10 0x00D8 | 0x009D9D1B // entry 10 0x00DC | 0x00000001 // id157=40219, uses=1 0x00DC | 0x00000001 // id157=40219, uses=1

So what does this mean? The entry "id171=40219" corresponds to VS2010 SP1 build 40219 and indicates 62 C++ files were linked on the left, 61 on the right. Similarly for the next highlighted entry, this is a count of the imported symbols and you can see on the right it's grown by quite a bit. Finally, with entry "id147=30729", which corresponds to VS2008 SP1 build 30729, it indicates on the left that it linked 19 DLL's while on the right is 21. This helps establish a pattern of consistency between the samples in that the structure of the array remained the same but, with increasing counts, was likely modified over time.

Outside of the Rich Header, we can take a look at some other artifacts and see if the samples remain consistent in any other ways.

PDB: 0xB6CCD171 File Name,c:\users\xman_1365_x\desktop\homework\13930308\bot_70_fix header_fix_longurl 73_stableandnewprotocol - login all\release\bot.pdb 0xD78F6BA1 File Name,c:\users\xman_1365_x\desktop\new folder (2)\2015_4-2\bot_70_fix header_fix_longurl 73_stableandnewprotocol - login all\release\bot.pdb Compile Times: 0xB6CCD171 TimeDateStamp,0x538B07F4 (Sun Jun 01 07:01:08 2014) 0xD78F6BA1 TimeDateStamp,0x55012B51 (Thu Mar 12 01:59:45 2015) Shared Strings: deskcapture.jpg User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:23.0) Gecko/20100101 Firefox/23.0 StikyNote.exe \msdtcvtr.bat

Based on the above information, it would appear the original author (as indicated by the PDB path retained in the debug information when it was compiled) later updated this particular malware source. The compile times show 2014 and 2015 respectively and the PDB path on the newer one reflects the 2015 information. Outside of that, the user path is the same and they share a number of strings for dropped file names, User-Agent, so on and so forth. Either way, they seem to confirm the samples are related which means that the Rich Header in this case is likely accurate and, given that it's not particularly generic, should be suitable for hunting.

0x69EB1175 - Alina,BigBong,Carbanak,CryptInfinite,CryptoBit, ...

Alright, so let's take a look at a key with groups that seems to be unrelated. For XOR Key 0x69EB1175 there are 26 different identifiers and they appear to be all over the place - cryptominers, droppers, trojans, ransomware, etc. I've taken the parsed Rich Header and annotated it with what I could find regarding the Tool Id values.

Offset | Data // Meaning -------------------------------------------------- 0x0090 | 0x005F0FC3 // entry 1 0x0094 | 0x00000002 // id95=4035, uses=2 * VS2003 (.NET) build 4035 / C objects 0x0098 | 0x00010000 // entry 2 0x009C | 0x0000009E // id1=0, uses=158 * Total imports 0x00A0 | 0x005D0FC3 // entry 3 0x00A4 | 0x00000011 // id93=4035, uses=17 * VS2003 (.NET) build 4035 / Imports 0x00A8 | 0x00302354 // entry 4 0x00AC | 0x0000000A // id48=9044, uses=10 * "Utc12_2_C" 0x00B0 | 0x000606C7 // entry 5 0x00B4 | 0x00000001 // id6=1735, uses=1 * VS98 (6.0) SP6 cvtres build 1736

If we assume VS98 and "Utc12_2_C" are the compiler/linker, then there is really just one additional Tool (but two object types) in use for this Rich Header. That doesn't build a lot of confidence in being unique so for now we'll just define this as low complexity/generic. Given that, I pulled a couple of samples into an analysis box to see what similarities, if any, exist between them. Almost immediately it's apparent what the commonality here is - everybody's favorite friend...Nullsoft's NSIS Installer! This makes sense why it spans so many disparate malware families and campaigns. NSIS Installer will compile a brand new executable so the entire "development" environment is tied to this version of NSIS. This also means the Rich Header is effectively useless for defining anything but NSIS, which does have value in its own right but not the purpose of this exploration. Different versions of NSIS most likely have different Rich Headers that are embedded when it compiles the scripts into the executable, but I haven't tested this.

0xD180F4F9 - 7ev3n,Paskod,PoisonIvy,TrickBot

Picking another key to look at, 0xD180F4F9 has samples matching identifiers for 7ev3n, Paskod, PoisonIvy, and TrickBot malware families; however, it has even less entries than the previous example.

Offset | Data // Meaning -------------------------------------------------- 0x0090 | 0x000E1C83 // entry 1 0x0094 | 0x00000001 // id14=7299, uses=1 0x0098 | 0x00091F69 // entry 2 0x009C | 0x0000000B // id9=8041, uses=11 0x00A0 | 0x000D1FE9 // entry 3 0x00A4 | 0x00000001 // id13=8169, uses=1

While not high in entry volume, all of these ToolID and types are not documented across any of the sources that I frequent to look them up. That alone is interesting to me - possibly newer or more obscure tools. Either way, assuming one is the compiler and one is the linker than we have but one additional "Tool" so it's extremely generic.

Taking a look at a PoisonIvy and TrickBot sample, they both match signatures for Microsoft Visual Basic v5.0-6.0 and each have one imported library "msvbvm60.dll". A general review of the samples doesn't show anything that looks related but these strings stand out as interesting.

C:\Program Files\Microsoft Visual Studio\VB98\VB6.OLB C:\Program Files (x86)\Microsoft Visual Studio\VB98\VB6.OLB

The "VB6.OLB" file existed in two different paths at compilation time. If these strings are to believed, this would indicate a separate build environment. There does appear to be some similarity in the Visual Basic project files being somewhat randomized. Not a strong link but it's something. The language packs used in the binaries, along with the username listed in the one VBP, would make me lean towards it still being unrelated.

A*\AD:\ZmbSsZ1q6c4yn7ph\pyzjwmbyocjn.vbp A*\AC:\Users\;5:A0=4@\Tesktop\PNYINp6J1\PNYINp6J1\CDsGT9YqpuFCQu\EqZt8h.vbp

I believe this is likely another instance where the Rich Header is primarily useful for product identification and not much else unfortunately.

Group Overlap

Below is a table of the aforementioned XOR keys and a short blurb about each. I've highlighted the ones in RED which seemed to support plausible attribution

0xB6CCD171 - Known cluster between APT33/DropShot/TurnedUp. RH has 10 entries. 0x89A99A19 - VB5 with varied VBP paths. Noted one reference to a school project. RH has 3 entries. 0xD180F4F9 - Nullsoft NSIS as discussed before. RH has 5 entries. 0xFA28276F - Samples were packed with either UPX or Themida. Share icons, multiple file paths, API's, and similar domains. RH has 13 entries. 0x2F3719A6 - Known cluster between Lazarus/Operation Blockbuster. RH has 8 entries 0x7DFEF65C - UPX packed and all are the same file. Original "Petya" identifier is inaccurate I believe. RH has 10 entries. 0x69EB1175 - Nullsoft NSIS packer. RH has 5 entries. 0x89A56EF9 - Only contains linker/compiler, low entry count. RH has 2 entries. 0x886973F3 - VB6 but otherwise samples are all over the place, RH all but useless in this one. RH has 1 entry. 0xC1FC1252 - AutoIT based malware. RH has 13 entries.

There are a couple of interesting observations I made while going through these. First, it seems clear that the Rich Header can be useful for identifying software that generates new executables, eg NSIS and AutoIT. Overall I'd say anything with a large disparate set of identifiers is going to be created through some automated tool or have a low complexity array which is likely to just be picked up by unrelated samples. All of the "good" hits included more complex Rich Headers, when not built by automated tools, and that seems like a decent value-gauge. I felt around 8-10 was a sweet spot for "complexity".

Second, traditional packers like UPX and Themida do not modify the actual DOS Header/Stub so the Rich Header, in the case of 0xFA28276F, remains tied to the packed executable. As such, it was able to identify the underlying malware through the packer even though multiple layers of obfuscation were employed. This is a solid match in my opinion and also a pretty cool potential window into embedded malware.

Finally, there is some value when you look at dynamic analysis. As the Rich Header is a static feature, it allows another opportunity to identify samples that may otherwise fail to detonate in a sandbox. I was able to identify 15 unknown Brambul malware samples associated to Lazarus group by Rich Header alone as the samples had failed to actually detonate/do anything of note within a sandboxed environment. That's pretty handy.

Raw XOR Keys

The next phase of analysis I'll look at is from the angle of just raw XOR key volume. The below are the Top 20 XOR key counts and the identifiers for each key. For the keys previously analyzed I've just marked with a double-dash and will skip talking about here.

1009 0xB6CCD171 // -- 0547 0x89A99A19 // -- 0233 0x00A3CC80 // GandCrab (RH 11 entries) 0215 0xD180F4F9 // -- 0183 0xD0A79B13 // GandCrab (RH 11 entries) 0165 0xF5740263 // GandCrab (RH 11 entries) 0124 0x9D2100DB // GandCrab (RH 7 entries) 0123 0xD97E80D5 // Qakbot (RH 5 entries) 0115 0x7BD6FBAB // GandCrab (RH 8 entries) 0112 0xFA28276F // -- 0110 0x00A3CCE0 // GandCrab (RH 11 entries) 0107 0x2F3719A6 // -- 0101 0x7DFEF65C // -- 0090 0x69EB1175 // -- 0079 0x7497C648 // GandCrab (RH 7 entries) 0073 0x89A56EF9 // -- 0072 0xD5FE80D5 // Qakbot (RH 5 entries) 0072 0x886973F3 // -- 0066 0xC1FC1252 // -- 0066 0x7563E767 // GandCrab (RH 8 entries)

Looking at the list, the obvious repetition of GandCrab and Qakbot really stands out so I'll dive into each of these.

GandCrab

Out of the 1,075 GandCrab listed in the Top 20, every single one of them triggered a signature mismatch in my script. This means that the extracted XOR key was found to be different than what the script generated when it rebuilt the Rich Header XOR key. This is a clear sign of tampering and manipulation.

Looking at the set with the extracted XOR key of 0x00A3CC80 shows that every single hash has a different generated key with no overlap across the set. Upon further inspection, this is not caused by the actual Rich Header information, which is consistent between each sample, but due to the DOS Stub, which is factored into the algorithm that generates the Rich Header XOR key. Below shows the DOS Stub from two samples.

Samples 1:

Sample 2:

The DOS Stub is an actual DOS executable that the linker embeds in your program which simply prints the familiar "This program cannot be run in DOS mode" message before exiting. It appears that the bytes which account for the beginning of the message "This progra" are being overwritten after the Rich Header is inserted.

31 30 32 30 00 E9 03 17 63 2B 74 1020.é..c+t 31 30 33 34 00 1D CE 92 0A 1A 79 1034..Î’..y

Four ASCII numerals followed by a null byte followed by six bytes. For this particular XOR key, the initial ASCII of the four bytes are listed below.

1018 1035 1020 1036 1023 1037 1024 1038 1026 1043 1027 1045 1030 1047 1034 1049

The six bytes after the null didn't stand out as having any observable pattern. I read through a handful of GandCrab blogs from the usual suspects and didn't see any mention of this particular artifact. Additionally, I checked the other XOR keys in this list to see if they had similar behaviors, which they all did. Some of the keys didn't modify the initial four bytes but all included the null followed by the six bytes.

As this artifact seems to be unknown, I was curious if it's possibly campaign related or maybe some sort of decryption key. I ended up loading one of these samples into OllyDbg with a hardware breakpoint on access to any of the bytes in question. At some point the sample ends up overwriting the entire DOS Header and DOS Stub with a different value which triggered the breakpoint. The sample proceeds to unpack an embedded payload which contained its own Rich Header entry.

Searching my sample set for the new Rich Header (0xEC5AB2D9) led me to five other GandCrab samples. A quick pivot on the temporal aspect of these 5 samples, along with the XOR key I've been looking at, shows the 233 0x00A3CC80 samples starting to appear in the wild on 19APR2018 with a huge ramp up between 23APR2018-25APR2018, while the 5 samples which shared the newly extracted XOR key first appeared on 22APR2018, one day before the campaign kicked into high gear. I did a quick check on the rest of the GandCrab in this list and they are all from that time frame, but samples as recent as yesterday (months after April), show the same pattern, with the most recent sample displaying "1018" for the first four bytes. Interesting stuff to be sure and anytime I see an observable pattern I tend to think it's significant, but that would be a tangent for a separate blog. Either way, this was a good example of signature mismatch and why it's important to check Rich Headers on embedded/dropped payloads. These particular modifications add credibility to the idea that these samples were produced by the same environment and so, in this case, the Rich Headers led to artifacts that can be coupled together for profiling.

Qakbot

Shifting over to the two Qakbot entries, both XOR keys were from samples delivered during the same couple of days. One characteristic both XOR keys showed was that they, within their respective key group, were all the same file size except a small cropping of outliers in the XOR key 0xD5FE80D5. Specifically, 61 of the samples with that specific key are 548,864 bytes but there is 10 samples at 532,480 bytes and one sample with 520,192 byte. It's always interesting when you see outliers like this when looking at a large sample set. I continued by analyzing a sample from each file size and they all share the same dynamic behaviors, along with a few static ones, but otherwise they appear quite different. For example, they each import 3 libraries and 7 symbols but none of them are the same - this also just looks wrong and out of place based on my experience.

Upon closer inspection, Qakbot is using some sort of packer so, after the success of embedded payloads from the GandCrab work, I manually unpacked one sample from each of the three file sizes to see if anything stood out. We know the files are the same in terms of family already due to my collection method, but the unpacked payloads all matched 225 functions and over a thousand basic blocks, so that adds a lot of weight to the idea of the malware being related. When I looked at the unpacked samples, they each had their own respective Rich Headers so I've dug in a bit more here. Keep in the mind that the question I'm interested in is whether we can use these to determine if they were created by the same actor or in the same environment.

Below are the parsed Rich Headers for the 3 samples and I've sorted them from left to right based on their listed compile times, which incidentally aligns perfectly from smallest to largest of the original files. That is to say the 520,192 byte files embedded payload was compiled first and the 548,864 byte files embedded payload was compiled last.

0xDCE3EAC8 4223... 19DEC2017 04:51:12 0x6D28F629 8e0f... 29JAN2018 09:10:34 0xAE08C3FD e4de... 16APR2018 10:24:55 Offset | Data // Meaning Offset | Data // Meaning Offset | Data // Meaning -------------------------------------------- -------------------------------------------- ------------------------------------------ 0x0090 | 0x00837809 // entry 1 0x0090 | 0x00937809 // entry 1 0x0090 | 0x00937809 // entry 1 0x0094 | 0x00000003 // id131=30729, uses=3 0x0094 | 0x00000013 // id147=30729, uses=19 0x0094 | 0x00000015 // id147=30729, uses=21 0x0098 | 0x00AB9D1B // entry 2 0x0098 | 0x00010000 // entry 2 0x0098 | 0x00010000 // entry 2 0x009C | 0x00000001 // id171=40219, uses=1 0x009C | 0x0000007E // id1=0, uses=126 0x009C | 0x00000083 // id1=0, uses=131 0x00A0 | 0x00937809 // entry 3 0x00A0 | 0x00837809 // entry 3 0x00A0 | 0x00837809 // entry 3 0x00A4 | 0x00000013 // id147=30729, uses=19 0x00A4 | 0x00000004 // id131=30729, uses=4 0x00A4 | 0x00000004 // id131=30729, uses=4 0x00A8 | 0x00010000 // entry 4 0x00A8 | 0x000F0C05 // entry 4 0x00A8 | 0x000F0C05 // entry 4 0x00AC | 0x0000007C // id1=0, uses=124 0x00AC | 0x00000001 // id15=3077, uses=1 0x00AC | 0x00000001 // id15=3077, uses=1 0x00B0 | 0x000F0C05 // entry 5 0x00B0 | 0x00AA9D1B // entry 5 0x00B0 | 0x00AA9D1B // entry 5 0x00B4 | 0x00000001 // id15=3077, uses=1 0x00B4 | 0x00000038 // id170=40219, uses=56 0x00B4 | 0x00000038 // id170=40219, uses=56 0x00B8 | 0x00AA9D1B // entry 6 0x00B8 | 0x009D9D1B // entry 6 0x00B8 | 0x009D9D1B // entry 6 0x00BC | 0x00000037 // id170=40219, uses=55 0x00BC | 0x00000001 // id157=40219, uses=1 0x00BC | 0x00000001 // id157=40219, uses=1 0x00C0 | 0x009D9D1B // entry 7 0x00C4 | 0x00000001 // id157=40219, uses=1

In addition, I've listed out what each entry correlates to; however, the 0xDCE3EAC8 embedded payload includes one additional CompID (171).

171 = Utc1600_CPP || [C++] VS2010 SP1 build 40219 --------------------------------------------------- 147 = Implib900 || [Imp] VS2008 SP1 build 30729 001 = Import0 || Total imports 131 = Utc1500_C || [ C ] VS2008 SP1 build 30729 015 = Masm710 || [ASM] VS2003 (.NET) build 3077 170 = Utc1600_C || [ C ] VS2010 SP1 build 40219 157 = Linker1000 || [LNK] VS2010 SP1 build 40219

Alright, first off if you look at the tools listed in the Rich Header above, you can see the payloads were built with components from VS2K8/10 SP1 with a specific build number that remains consistent across the 120 day period. Furthermore, you can see the "count" values increasing for some of the entries too. Now, this isn't rocket science but the idea that as a developer continues working on their program, the code base will usually increase as new functionality is introduced. This could directly translate into more imports/objects and thus increases in these counts when compiled. The specific increases can be viewed here.

Imports for VS2K8 [147] 019 -> 019 -> 021 C Objects for VS2K8 [131] 003 -> 004 -> 004 -------------------------------------------- C Objects for VS2K10 [170] 055 -> 056 -> 056 -------------------------------------------- Total imports [001] 124 -> 126 -> 131

Another thing that stands out is that the earliest file included an entry for C++ object files created by the compiler that were later not present, along with the order of CompID 131 and 147 being flipped. All of this evidence seems to indicate a reordering of code that again may be related to development. Finally, the most interesting artifact from earliest embedded payload is that it was compiled with the `/DEBUG` flag on and contains a path to the PDB.

j:\projects\qbot3\dllinject\release\win32\qbot_main_exe.pdb

Increasing counts for objects, a temporal and file size alignment, and a solid consistency of entries across the set shows, to me, a clear pattern consistency across the embedded payloads that establish they were all created in the same environment by the same actor.

When talking about OPSEC, the more the bad guys have to worry about the more likely it is they'll screw up. Using the Rich Header from embedded payloads seems to be a very successful technique for painting a fairly convincing picture that makes contextual sense for attribution.

Conclusions

Wrapping things up then, I think the most important takeaway is essentially what Kaspersky showed - Rich Headers are NOT to be trusted. They are simply static bytes in a file and can be manipulated like anything else, along with the fact that adversaries are actively already doing this. Rich Headers should be treated as a supporting artifact that can provide a lot of contextual value to your analysis and can help in attribution. In addition to that, the most "valuable" Rich Headers seem to be the ones with a high complexity and large entry count. These offer a more granular picture of the environment within which the program was constructed.

What I hope I've shown through this analysis though is that Rich Headers shouldn't be written off entirely. They are like more nuanced then I originally thought they would be and they can provide context by establishing patterns in environmental details. This context is enhanced when you start peeling away layers of packed/embedded payloads. Finally, the meta data contained within Rich Headers can provide extremely useful context to other expected artifacts that should be found in a binary and vice versa.

Hopefully this proves to be helpful to other analysts out there getting started with Rich Headers and provide some insight into the pro's and con's of what Rich Headers can offer.

*EDIT - 12AUG2018*
Received a few DM's and e-mails containing additional links and asking for more info so I decided to just add a short resources section for others interested in learning more.

Resources

Mappings for ToolId/Obj Types

Blogs about Rich Headers from a RE perspective

Blogs about Rich Headers from a Threat perspective

White paper/Con preso on the subject

By Jeff White (karttoon)

In this blog post I want to share a Python script that I wrote a few years back which generates YARA rules by using a byte->opcode abstraction method to match similar parts of code across files. At a high level, it's pretty straight-forward. It takes a cluster of files (ideally PE files) and attempts to find the longest common sequence (LCS) of x86/x64 opcodes with the idea that once you abstract back to just opcodes, you can find similar functions in other files and possibly unknown malware. There are a few other matching techniques employed that I'll dive into later but that is the gist. Hopefully the script can compliment hunters and analysts in searching for similar malware samples through automation. There are a lot of other similar tools already out there doing this, but it never hurts to have another option in the proverbial toolbox.

This script was written in Python and heavily utilizes Capstone and YARA Python modules. What initially started as a small script for CTF challenges, and an excuse to play with Capstone <3, has matured over time into more of an operational tool to leverage VirusTotal's Retrohunt capability. I've had a few wins with it but I can't say it's ever been terribly useful in my current capacity and thus it was more of an exercise in programming logic and interacting with files at a low-level.

The reality of this approach is that due to different versions of compilers, different compile time flags, architectures, variable values, etc, all can cause similar code in logic and structures that, at a byte level, is quite different. This is the gap that BinSequencer is trying to bridge.

Overview

The core matching technique in BinSequencer attempts to first determine where executable code may reside within a PE by looking at whether the "code" or "executable" bit is set within each section of the Windows PE file. Once it's identified the potential locations for code, it will disassemble the bytes into assembly instructions and strip out the opcode mnemonics. These opcode mnemonics will be strung together as a sequence in which the script attempts to find the longest match that exists across the sample set and then reverses the process of disassembling it back into bytes for YARA rules.

Below is an image I cobbled together to try and illustrate this particular process.

One of the pro's of this method is that it makes logic matching easier. The mnemonic strings are essentially byte-agnostic, meaning that going from byte to opcode is easy. Looking at a sequence of just mnemonic opcodes is also much faster than byte-by-byte comparison so it helps when it comes to scaling this across large sample sets as well. Unfortunately, we can't have a pro without an equal con. Converting from mnemonic opcode back into a byte can be extremely problematic and this additional processing can slow things down significantly. You end up having to test various byte-variations and variable lengths of the operands used by the instructions to properly generate an accurate YARA rule, not to mention the problems with following wrong branches when building up the YARA rules, but I'll touch on this more later.

Assembly Primer

Before getting into the actual script, it behooves me to attempt and give a mini-x86 assembly overview in order to help illustrate some of the aforementioned pros and cons. The x86 instruction layout is below.

[prefix][opcode][mod-r/m][simb][operands]

When we go from low (bytes) to high (opcode) we will convert a byte like 0x75 to the opcode "JNZ" (jump if not zero). The bytes 0xF85 also use this same "JNZ" opcode so, regardless of the underlying bytes (0x75 vs 0xF85), we end up with the same opcode "JNZ" and the logic of the code remains intact. This is the major benefit of this abstraction technique.

Most instructions are fairly straight forward, "PUSHAD" will always be 0x60 and "CALL" will always be 0xE8 or 0xFF; however, the variations in the underlying bytes dictate things like operand length and change the overall size of the bytes in use.

; e80ef2ffff call 0xfffff213 ; ff1548304000 call dword ptr [0x403048]

One of the above "CALL" instructions is 5 bytes in total while the other is 6 bytes. This is one area of variation in which it begins to complicate things when we end up disassembling "CALL" back into bytes for matching in a YARA rule. There are also built-in optimizations that x86 uses, such as 0x5, which is "ADD EAX, ???" where a value is added to the EAX register. These optimizations create additional variations in potential bytes because the initial bytes change a lot. The "XOR" opcode is a good example of one with baked in optimizations creating a lot of variation which must be accounted for within our YARA rule.

0x30 XOR r/m8 r8 0x31 XOR r/m16/32 r16/32 0x32 XOR r8 r/m8 0x33 XOR r16/32 r/m16/32 0x34 XOR AL imm8 0x35 XOR EAX imm16/32 0x80 XOR r/m8 imm8 0x81 XOR r/m16/32 imm16/32 0x82 XOR r/m8 imm8 0x83 XOR r/m16/32 imm8

This is 10 different byte-representations of the same opcode, so going from byte to opcode is easy, but reversing it can start to get tricky. As if that wasn't bad enough, it gets complicated further by things like the mod-r/m byte which changes the actual opcode meaning. The general 8-bit breakdown for the mod-r/m byte follows:

MOD (7,6) | Reg/Opcode (5,4,3) | R/M (2,1,0)

Using the 0x80 byte from the previous "XOR" example, take a look at the following two instructions.

xor byte ptr [0x418e58] = 8035 588e410037 sub byte ptr [0x418e58] = 802D 588e410037

The mod-r/m byte for the first instruction is 0x35 and 0x2D for the second. The 3 bits that dictate the opcode are 5,4, and 3. These correspond to a table of potential opcodess for that particular byte, so we can't just assume 0x80 is "XOR" when we do our conversion. For the 0x35 mod-r/m byte, the bits in question are 110, which equals 0x6 - the bits for 0x2D are 101, which equals 0x5. The table for that specific 0x80 byte is below.

0x0 = ADD 0x1 = OR 0x2 = ADC 0x3 = SBB 0x4 = AND 0x5 = SUB 0x6 = XOR 0x7 = CMP

I've probably re-written this script 3 times over the past 2-3 years tackling problems as they've developed from these types of variations that would pop-up. The "MOV" opcode has over 20 different variations by itself so sometimes things got quite messy!

BinSequencer

Alright, so back to the script. The Python "pefile" module is used for the extraction of code from the files, assuming it is a PE, and then the Capstone module is used for the actual disassembly engine. For each file, it will convert the extracted area of data and pull each instruction out, followed by each opcode mnemonic, which get strung into a "blob" which is used as the base for matching.

There are various ways to tune the functionality of this script from the command line which can be seen below.

usage: binsequencer.py [-h] [-c <integer_percent>] [-m <integer>] [-l <integer>] [-v] [-a {x86,x64}] [-g <file>] [-d] [-Q] [-n] [-o] [-s] ... Sequence a set of binaries to identify commonalities in code structure. positional arguments: file optional arguments: -h, --help show this help message and exit -c <integer_percent>, --commonality <integer_percent> Commonality percentage the sets criteria for matches, default 100 -m <integer>, --matches <integer> Set the minimum number of matches to find, default 1 -l <integer>, --length <integer> Set the minimum length of the instruction set, default 25 -v, --verbose Prints data while processing, use only for debugging -a {x86,x64}, --arch {x86,x64} Select code architecture of samples, default x86 -g <file>, --gold <file> Override gold selection -d, --default Accept default prompt values -Q, --quiet Disable output except for YARA rule -n, --nonpe Process non-PE files (eg PCAP/JAR/PDF/DOC) -o, --opcode Use only the opcode matching technique -s, --strings Include strings in YARA for matched hashes

The `-c` flag lets you specify the percentage of samples the sequence needs to exist in and the `-l` flag sets the minimum length of linear opcodes required. I've found 25 to be around the sweet spot for bare minimum accuracy as it usually covers a couple of functional blocks but, in general, the higher the better. When you go too low, it leads to non-unique common sequences of instructions, such as with prologue and epilogues or basic code re-use. The "blobs" will look like "jnz|jmp|add|mov|xor|call" with hundreds to a few hundred thousand opcodes.

Since we're doing comparisons, the script tries to identify the best initial file to use for analysis as the "gold" file. If it's doing a 100% match, it will search for the file with the lowest volume of instructions since it has to exist in every other file; it does the reverse if it dips below 100%. As an aside, it's also worth noting a limitation of YARA at this point. YARA has a hard-coded limit of 10,000 hex bytes that can used for a rule so I artificially limit the amount of opcodes that can be used to 4,000 as a safeguard.

To do the actual comparison, it uses a simple sliding window technique between the low match limit (25) and 4,000 opcodes, starting at the highest size and subtracting one opcode after each iteration until it empties and moves the window up by one offset. From there, it utilizes a number of tricks to speed things up, like black listing known bad sequences, so that it can zero in on the optimal matching length and reduce the number of iterations required.

Once it has a match of sequenced opcodes, it moves into the actual YARA generation which is where things get fun. One nice feature of YARA is the ability to do a boolean OR in the hex match. A "JMP" opcode, which can be the 0xE8 or 0xFF byte, would be "(E8|FF)". You can also do wildcards in YARA so "PUSH", which is 0x50 through 0x57 and 0x6A, can be represented as "(5?|6A)". Finally, you can account for overall instruction length with a byte jump in YARA like "[4-5]", which skips between 4 to 5 bytes before the next match.

As I intended to use this on primarily on VirusTotal, I began running into a few undocumented limitations VirusTotal imposed on the rules that can be used for retrohunting. If you have too many jumps, boolean OR's, and even in some scenarios just the length of the jump could cause your rule to fail. I suspect these limitations are primarily for efficiency/performance management but it's something that I had to account for throughout the script - your mileage may vary and I have not kept up with any changes.

Output

It's probably easier to just show some examples of the expected output to highlight what the script does. I've truncated parts of it and will interject commentary between the sections to explain each one. For this first example, I'll take a look at my favorite lame Malware - Hancitor!

python binsequencer.py Hancitor_Malware/ [+] Extracting instructions and generating sets [-]Hancitor_Malware/e7b3ef04c211fafa36772da62ab1d250970d27745182d0f3736896cf7673dc3a_S1.exe .text - 3740 instructions extracted .edata - 477 instructions extracted [-]Hancitor_Malware/6e73879ca49b40974cce575626e31541b49c07daa12ec2e9765c432bfac07a20_S1.exe .text - 3740 instructions extracted .edata - 477 instructions extracted [-]Hancitor_Malware/2b3c920dca2fd71ecadd0ae500b2be354d138841de649c89bacb9dee81e89fd4_S1.exe .text - 3740 instructions extracted .edata - 477 instructions extracted ... [-]Hancitor_Malware/ab90ed6cb461f17ce1f901097a045aba7c984898a0425767f01454689698f2e9_S1.exe .text - 3740 instructions extracted .edata - 477 instructions extracted [-]Hancitor_Malware/594ab467454aafa64fc6bbf2b4aa92f7628d5861560eee1155805bd0987dbac3_S1.exe .text - 3740 instructions extracted .edata - 477 instructions extracted [-]Hancitor_Malware/643951eee2dac8c3677f5ef7e9cb07444f12d165f6e401c1cd7afa27d7552367_S1.exe .text - 3740 instructions extracted .edata - 477 instructions extracted

The script takes a path to where the files reside and will iterate over each one to extract instructions. If you specify the `-n` flag, it will treat the files as non-PE, this was done for analyzing files of raw shellcode but the opcode technique holds up over other files as well (PCAP/JAR/whatever), but I wouldn't recommend it and you may run into bugs.

[+] Golden hash (3736 instructions) - Hancitor_Malware/fff786ec23e6385e1d4f06dcf6859cc2ce0a32cee46d8f2a0c8fd780b3ecf89a_S1.exe

Since I ran the script with it's default settings, it attempts a 100% match across all samples and thus finds the sample with the lowest instruction count since any match *must* exist in this file.

[+] Zeroing in longest mnemonic instruction set in .text [-] Matches - 0 Block Size - 3259 Time - 0.00 seconds [-] Matches - 0 Block Size - 1630 Time - 0.13 seconds [-] Matches - 0 Block Size - 816 Time - 0.13 seconds [-] Matches - 0 Block Size - 409 Time - 0.14 seconds [-] Matches - 0 Block Size - 206 Time - 0.12 seconds [-] Matches - 0 Block Size - 105 Time - 0.10 seconds [-] Matches - 0 Block Size - 55 Time - 0.08 seconds [-] Matches - 0 Block Size - 30 Time - 0.07 seconds [+] Zeroing in longest mnemonic instruction set in .edata [-] Moving 1 instruction sets to review with a length of 477

The script should iterate over each section in the golden hash that it identified as potentially having code and check these against the other samples. In this case, it does 8 iterations of the sliding window for sizes above the minimum of 25 and finds no matches in ".text". Then it moves on to the ".edata" section, which has 477 instructions, and finds that this match exists in all of the samples.

[*] Do you want to display matched instruction set? [Y/N] y push|mov|sub|push|push|push|call|mov|add|test|je|push|mov|lea|add|push|push|push|call|cmp|jne|test|je|mov|xor|test|je|xor|inc|cmp|jb|pop|mov|pop|mov|pop|ret|pop|xor|pop|mov|pop|ret|mov|pop|mov|pop|ret|int3|int3|int3|int3|int3|int3|int3|push|mov|push|push|call|dec|add|neg|sbb|inc|pop|ret|int3|int3|int3|int3|int3|push|mov|sub|mov|push|push|push|mov|xor|add|movzx|lea|movzx|add|mov|mov|test|je|mov|mov|mov|mov|movzx|or|sub|mov|mov|mov|mov|movzx|or|sub|jne|mov|test|je|mov|inc|movzx|or|movzx|or|sub|je|mov|mov|test|js|jg|je|inc|add|add|mov|cmp|jae|mov|jmp|mov|lea|pop|pop|pop|lea|mov|pop|ret|pop|pop|xor|pop|mov|pop|ret|int3|int3|int3|int3|int3|int3|int3|int3|int3|push|mov|mov|mov|and|sub|xor|sub|cmp|jne|cmp|je|inc|cmp|jl|xor|pop|ret|int3|int3|int3|int3|push|mov|sub|push|mov|xor|push|push|xor|mov|mov|add|mov|mov|add|mov|add|mov|add|mov|mov|mov|mov|mov|mov|test|je|movsx|mov|cmp|jae|cmp|jae|mov|mov|mov|mov|movzx|add|mov|or|movzx|or|sub|jne|sub|test|je|mov|inc|movzx|or|movzx|or|sub|je|test|js|jg|je|mov|inc|cmp|jae|mov|mov|jmp|mov|mov|mov|add|pop|pop|mov|pop|mov|pop|ret|pop|xor|pop|mov|pop|mov|pop|ret|pop|pop|xor|pop|mov|pop|ret|int3|int3|int3|push|xor|mov|js|mov|mov|lodsd|mov|jmp|mov|lea|mov|pop|ret|int3|int3|int3|int3|int3|int3|int3|int3|int3|int3|int3|int3|int3|int3|push|mov|sub|mov|push|push|push|mov|add|call|mov|push|push|call|mov|push|push|mov|call|push|push|mov|call|mov|add|mov|test|jne|test|je|test|je|mov|mov|add|mov|cmp|je|mov|mov|add|add|cmp|je|mov|mov|add|add|mov|test|je|push|call|jmp|test|je|push|push|push|call|mov|test|je|mov|test|movzx|js|lea|push|push|call|cmp|je|mov|mov|add|mov|add|mov|cmp|jne|mov|add|mov|cmp|jne|pop|pop|pop|mov|pop|ret|int3|int3|int3|int3|int3|int3|int3|int3|int3|push|push|call|mov|push|call|push|call|add|pop|test|jne|ret|call|int3|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add

Again, designing this with the intent of being used during analysis, it prompts the user multiple times to display the results in various ways. The above lets you see the opcode sequence and gives a quick impression of whether or not the code construct seems to make logical sense. For example, you can see near the beginning 3 "PUSH" and then a "CALL", which is common behavior for pushing values to the stack before calling a function. This tells me what we're looking at is most likely code as opposed to data that just happened to be intermingled within a code section and disassembled.

As was stated before, it doesn't have to be code to work but the abstraction method will be less useful if it's not.

[*] Do you want to disassemble the underlying bytes? [Y/N] y 0x10003000: push ebp | 55 0x10003001: mov ebp, esp | 8BEC 0x10003003: sub esp, 0x1c | 83EC1C 0x10003006: push edi | 57 0x10003007: push dword ptr [ebp + 0xc] | FF750C 0x1000300a: push dword ptr [ebp + 8] | FF7508 0x1000300d: call 0x10003090 | E87E000000 0x10003012: mov edi, eax | 8BF8 0x10003014: add esp, 8 | 83C408 0x10003017: test edi, edi | 85FF ... 0x10003360: push esi | 56 0x10003361: push 0x403360 | 6860334000 0x10003366: call 0x10003140 | E8D5FDFFFF 0x1000336b: mov esi, eax | 8BF0 0x1000336d: push esi | 56 0x1000336e: call 0x10003270 | E8FDFEFFFF 0x10003373: push esi | 56 0x10003374: call 0x10003070 | E8F7FCFFFF 0x10003379: add esp, 0xc | 83C40C 0x1000337c: pop esi | 5E 0x1000337d: test eax, eax | 85C0 0x1000337f: jne 0x10003382 | 7501 0x10003381: ret | C3 0x10003382: call 0x10002010 | E889ECFFFF 0x10003387: int3 | CC ...

Similar to the previous display but with more details so you can see exactly what the assembly looks like.

[*] Do you want to display the raw byte blob? [Y/N] y 558BEC83EC1C57FF750CFF7508E87E0000008BF883C40885FF744456 ... [*] Do you want to keep this set? [Y/N] y [+] Keeping 1 mnemonic set using 100 % commonality out of 29 hashes [-] Length - 477 Section - .edata

If you did not want to keep the match, possibly due to the code being part of a known library, not being unique, the match being just data and not an actual code construct you're after, then it's possible to decline it and restart the matching process.

[+] Printing offsets of type: longest [-] Gold matches ----------v SET rule0 v---------- push|mov|sub|push|push|push|call|mov|add|test|je|push|mov|lea|add|push|push|push|call|cmp|jne|test|je|mov|xor|test|je|xor|inc|cmp|jb|pop|mov|pop|mov|pop|ret|pop|xor|pop|mov|pop|ret|mov|pop|mov|pop|ret|int3|int3|int3|int3|int3|int3|int3|push|mov|push|push|call|dec|add|neg|sbb|inc|pop|ret|int3|int3|int3|int3|int3|push|mov|sub|mov|push|push|push|mov|xor|add|movzx|lea|movzx|add|mov|mov|test|je|mov|mov|mov|mov|movzx|or|sub|mov|mov|mov|mov|movzx|or|sub|jne|mov|test|je|mov|inc|movzx|or|movzx|or|sub|je|mov|mov|test|js|jg|je|inc|add|add|mov|cmp|jae|mov|jmp|mov|lea|pop|pop|pop|lea|mov|pop|ret|pop|pop|xor|pop|mov|pop|ret|int3|int3|int3|int3|int3|int3|int3|int3|int3|push|mov|mov|mov|and|sub|xor|sub|cmp|jne|cmp|je|inc|cmp|jl|xor|pop|ret|int3|int3|int3|int3|push|mov|sub|push|mov|xor|push|push|xor|mov|mov|add|mov|mov|add|mov|add|mov|add|mov|mov|mov|mov|mov|mov|test|je|movsx|mov|cmp|jae|cmp|jae|mov|mov|mov|mov|movzx|add|mov|or|movzx|or|sub|jne|sub|test|je|mov|inc|movzx|or|movzx|or|sub|je|test|js|jg|je|mov|inc|cmp|jae|mov|mov|jmp|mov|mov|mov|add|pop|pop|mov|pop|mov|pop|ret|pop|xor|pop|mov|pop|mov|pop|ret|pop|pop|xor|pop|mov|pop|ret|int3|int3|int3|push|xor|mov|js|mov|mov|lodsd|mov|jmp|mov|lea|mov|pop|ret|int3|int3|int3|int3|int3|int3|int3|int3|int3|int3|int3|int3|int3|int3|push|mov|sub|mov|push|push|push|mov|add|call|mov|push|push|call|mov|push|push|mov|call|push|push|mov|call|mov|add|mov|test|jne|test|je|test|je|mov|mov|add|mov|cmp|je|mov|mov|add|add|cmp|je|mov|mov|add|add|mov|test|je|push|call|jmp|test|je|push|push|push|call|mov|test|je|mov|test|movzx|js|lea|push|push|call|cmp|je|mov|mov|add|mov|add|mov|cmp|jne|mov|add|mov|cmp|jne|pop|pop|pop|mov|pop|ret|int3|int3|int3|int3|int3|int3|int3|int3|int3|push|push|call|mov|push|call|push|call|add|pop|test|jne|ret|call|int3|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add ----------^ SET rule0 ^----------- Hancitor_Malware/fff786ec23e6385e1d4f06dcf6859cc2ce0a32cee46d8f2a0c8fd780b3ecf89a_S1.exe 0x10003000 - 0x100033fe in .edata [-] Remaining matches ----------v SET rule0 v---------- Hancitor_Malware/e7b3ef04c211fafa36772da62ab1d250970d27745182d0f3736896cf7673dc3a_S1.exe 0x10003000 - 0x100033fe in .edata Hancitor_Malware/6e73879ca49b40974cce575626e31541b49c07daa12ec2e9765c432bfac07a20_S1.exe 0x10003000 - 0x100033fe in .edata Hancitor_Malware/2b3c920dca2fd71ecadd0ae500b2be354d138841de649c89bacb9dee81e89fd4_S1.exe 0x10003000 - 0x100033fe in .edata ... Hancitor_Malware/ab90ed6cb461f17ce1f901097a045aba7c984898a0425767f01454689698f2e9_S1.exe 0x10003000 - 0x100033fe in .edata Hancitor_Malware/594ab467454aafa64fc6bbf2b4aa92f7628d5861560eee1155805bd0987dbac3_S1.exe 0x10003000 - 0x100033fe in .edata Hancitor_Malware/643951eee2dac8c3677f5ef7e9cb07444f12d165f6e401c1cd7afa27d7552367_S1.exe 0x10003000 - 0x100033fe in .edata ----------^ SET rule0 ^-----------

Once it has the match, it will show the offset within each file so that you can look at it further if needed. In this instance, you can see all of the matches happened in the ".edata" section at offset 0x10003000 through 0x100033FE. A quick look in IDA looks promising.

You can see the matched sequence extends across a number of function blocks and the code looks interesting with multiple instances of Win32 API calls commonly found in malware.

[+] Generating YARA rule for matches off of bytes from gold - Hancitor_Malware/fff786ec23e6385e1d4f06dcf6859cc2ce0a32cee46d8f2a0c8fd780b3ecf89a_S1.exe [*] Do you want to try and morph rule0 for accuracy and attempt to make it VT Retro friendly [Y/N] y [+] Check 01 - Checking for exact byte match

Before it actually does the YARA generation, the script can perform up to 3 various matching techniques in order to improve performance and accuracy. Each one builds upon the last and I'm going to spend a second detailing the three methods as they aren't all on display here.

You can skip the first two checks with the `-o` flag (although the third check will still utilize their techniques during morphing to some degree).

[+] Check 02 - Checking for optimal opcode match [*] Found optimal opcode match across all samples [*] Do you want to include matched sample names in rule meta? [Y/N] y [*] Do you want to include matched byte sequence in rule comments? [Y/N] y

Once the generation is complete and it validates the YARA rule, it will dump the rule to your console.

[+] Completed YARA rules /* SAMPLES: Hancitor_Malware/18046a720cd23c57981fdfed59e3df775476b0f189b7f52e2fe5f50e1e6003e7_S1.exe Hancitor_Malware/f4f026fbe3df5ee8ed848bd844fffb72b63006cfa8d1f053a9f3ee4c271e9188_S1.exe ... Hancitor_Malware/40a8bb6e3eed57ed7bc802cc29b4e57360aa10c2de01d755f9577f07e10b848b_S1.exe Hancitor_Malware/fed9cc2c7cfb97741470cb79c189a203545af88bdd67bc99e2d7499d343de653_S1.exe BYTES: 558BEC83EC1C57FF750 ... INFO: binsequencer.py Hancitor_Malware/ Match SUCCESS for morphing */ rule rule0 { meta: description = "Autogenerated by Binsequencer v.1.0.4 from Hancitor_Malware/fff786ec23e6385e1d4f06dcf6859cc2ce0a32cee46d8f2a0c8fd780b3ecf89a_S1.exe" author = "" date = "2018-06-05" strings: $rule0_bytes = { 558B??83????57FF????FF????E8????????8B??83????85??74??568B????8D????03????6A??5056FF??????????83????75??F6??????74??8B????33??85??74??80??????403B??72??5EB8????????5F8B??5DC35E33??5F8B??5DC3B8????????5F8B??5DC3CCCCCCCCCCCCCC558B??68????????FF????E8????????4883????F7??1B??405DC3CCCCCCCCCC558B??83????8B????5356578B????33??03??0FB7????8D????0FB7????03??89????89????85??74??8B????8B??8A??88????0FB6??83????2B??89????8B??89????8B??0FB6??83????2B????75??8A????84??74??8A????420FB6??83????0FB6????83????2B??74??8B????8B????85??78??7F??74??4783????83????89????3B??73??8B????EB??8B????8D????5F5E5B8D????8B??5DC35F5E33??5B8B??5DC3CCCCCCCCCCCCCCCCCC558B??8B????8B??81??????????2B??33??2D????????80????75??80??????74??4183????7C??33??5DC3CCCCCCCC558B??83????538B????33??565733??8B????8B??????03??8B????8B????03??89????03??8B????03??89????89????8B????8B????89????89????85??74??0FBF????89????3B??73??3B????73??8B????8B????8B??8B????0FB6????03??8A??83????0FB6??83????2B??75??2B??84??74??8A????420FB6??83????0FB6????83????2B??74??85??78??7F??74??8B????463B??73??8B????8B????EB??8B????8B????8B????03????5F5E8B??5B8B??5DC35F33??5E8B??5B8B??5DC35F5E33??5B8B??5DC3CCCCCC5633??64??????????78??8B????8B????AD8B????EB??8B????8D????8B????5EC3CCCCCCCCCCCCCCCCCCCCCCCCCCCC558B??83????8B????5356578B????03??E8????????8B??68????????56E8????????8B??68????????5689????E8????????68????????5689????E8????????8B????83????89????85??75??85??0F84????????85??0F84????????8B??????????8B????03??89????83??????74??8B????8B??03??03??83????74??8B??8B????03??03??8B????85??74??50FF??EB??85??74??6A??6A??50FF??8B??85??74??8B??85??0FB7??78??8D????5051FF????39??74??89??8B????83????8B????83????8B????83????75??8B????83????89????83??????75??5F5E5B8B??5DC3CCCCCCCCCCCCCCCCCC5668????????E8????????8B??56E8????????56E8????????83????5E85??75??C3E8????????CC000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 } condition: all of them }

You can manually validate the rule matches with YARA as expected before retrohunting.

rule0 [description="Autogenerated by Binsequencer v.1.0.4 from Hancitor_Malware/fff786ec23e6385e1d4f06dcf6859cc2ce0a32cee46d8f2a0c8fd780b3ecf89a_S1.exe",author="",date="2018-06-05"] Hancitor_Malware//e7b3ef04c211fafa36772da62ab1d250970d27745182d0f3736896cf7673dc3a_S1.exe 0x2200:$rule0_bytes: 55 8B EC 83 EC 1C 57 FF 75 0C FF 75 08 E8 7E 00 00 00 8B F8 83 C4 08 85 FF 74 44 56 8B 77 0C 8D ... rule0 [description="Autogenerated by Binsequencer v.1.0.4 from Hancitor_Malware/fff786ec23e6385e1d4f06dcf6859cc2ce0a32cee46d8f2a0c8fd780b3ecf89a_S1.exe",author="",date="2018-06-05"] Hancitor_Malware//6e73879ca49b40974cce575626e31541b49c07daa12ec2e9765c432bfac07a20_S1.exe 0x2200:$rule0_bytes: 55 8B EC 83 EC 1C 57 FF 75 0C FF 75 08 E8 7E 00 00 00 8B F8 83 C4 08 85 FF 74 44 56 8B 77 0C 8D ... rule0 [description="Autogenerated by Binsequencer v.1.0.4 from Hancitor_Malware/fff786ec23e6385e1d4f06dcf6859cc2ce0a32cee46d8f2a0c8fd780b3ecf89a_S1.exe",author="",date="2018-06-05"] Hancitor_Malware//87a10cc169f9ffd0c75bb9846a99fb477fc4329840964b02349ae44a672729c2_S1.exe 0x2200:$rule0_bytes: 55 8B EC 83 EC 1C 57 FF 75 0C FF 75 08 E8 7E 00 00 00 8B F8 83 C4 08 85 FF 74 44 56 8B 77 0C 8D ... ... rule0 [description="Autogenerated by Binsequencer v.1.0.4 from Hancitor_Malware/fff786ec23e6385e1d4f06dcf6859cc2ce0a32cee46d8f2a0c8fd780b3ecf89a_S1.exe",author="",date="2018-06-05"] Hancitor_Malware//643951eee2dac8c3677f5ef7e9cb07444f12d165f6e401c1cd7afa27d7552367_S1.exe 0x2200:$rule0_bytes: 55 8B EC 83 EC 1C 57 FF 75 0C FF 75 08 E8 7E 00 00 00 8B F8 83 C4 08 85 FF 74 44 56 8B 77 0C 8D ... rule0 [description="Autogenerated by Binsequencer v.1.0.4 from Hancitor_Malware/fff786ec23e6385e1d4f06dcf6859cc2ce0a32cee46d8f2a0c8fd780b3ecf89a_S1.exe",author="",date="2018-06-05"] Hancitor_Malware//1c72f575d0c9574afcfcaab7e0b89fe0083dbe8ac20c0132a978eb1f6be59641_S1.exe 0x2200:$rule0_bytes: 55 8B EC 83 EC 1C 57 FF 75 0C FF 75 08 E8 7E 00 00 00 8B F8 83 C4 08 85 FF 74 44 56 8B 77 0C 8D ... rule0 [description="Autogenerated by Binsequencer v.1.0.4 from Hancitor_Malware/fff786ec23e6385e1d4f06dcf6859cc2ce0a32cee46d8f2a0c8fd780b3ecf89a_S1.exe",author="",date="2018-06-05"] Hancitor_Malware//ab90ed6cb461f17ce1f901097a045aba7c984898a0425767f01454689698f2e9_S1.exe 0x2200:$rule0_bytes: 55 8B EC 83 EC 1C 57 FF 75 0C FF 75 08 E8 7E 00 00 00 8B F8 83 C4 08 85 FF 74 44 56 8B 77 0C 8D ...

Below is another examples where I utilize some of the tuning features of the script. Specifically, the match must exist in at least 75% of the samples, it will include strings that exist across all 75% of them, it will skip the first two matching techniques, and it will set Capstone to x64 architecture.

$ python binsequencer.py -c 75 -s -o -a x64 DarkComet_x64/ [+] Extracting instructions and generating sets [-]DarkComet_x64/d4e28eebd4f485496590e55e245a22b17d58414ae22f5186da90b0cc8d4d2006 .text - 16007 instructions extracted [-]DarkComet_x64/8841880e35b0936d7768e489f386e259970c98489a7ddd00e693797a394b4e39 .text - 16340 instructions extracted [-]DarkComet_x64/c8dcafd948f8ae3b60c5d89365ad2fe328ceb74820fc1334e9d4853a3ad4c3e8 .text - 16340 instructions extracted ... [-]DarkComet_x64/24a09b9dadb7471f29e1d2af22219c8fb94a2643bf481228ac60f48f55fa4e5f .text - 16340 instructions extracted [-]DarkComet_x64/1593b9482d71a52917eb1d3104eba822b93798eb812010afbbb50ab857fcbfab .text - 16340 instructions extracted [-]DarkComet_x64/2022ec23240d89da591ec57db5507c41fb3335253f6e1eb9306598d4a8255ef9 .text - 16340 instructions extracted [+] Golden hash (16340 instructions) - DarkComet_x64/8841880e35b0936d7768e489f386e259970c98489a7ddd00e693797a394b4e39 [+] Zeroing in longest mnemonic instruction set in .text [-] Matches - 0 Block Size - 4000 Time - 2.94 seconds [-] Matches - 0 Block Size - 2000 Time - 2.44 seconds [-] Matches - 0 Block Size - 1000 Time - 1.85 seconds [-] Matches - 333 Block Size - 500 Time - 2.50 seconds [-] Matches - 39 Block Size - 750 Time - 1.86 seconds [-] Matches - 0 Block Size - 875 Time - 1.79 seconds [-] Matches - 0 Block Size - 813 Time - 1.76 seconds [-] Matches - 7 Block Size - 782 Time - 1.78 seconds [-] Matches - 0 Block Size - 797 Time - 1.75 seconds [-] Matches - 0 Block Size - 790 Time - 1.76 seconds [-] Matches - 2 Block Size - 787 Time - 1.75 seconds [-] Matches - 2 Block Size - 787 Time - 1.75 seconds [-] Matches - 1 Block Size - 788 Time - 1.75 seconds [-] Matches - 0 Block Size - 789 Time - 1.74 seconds [-] Moving 1 instruction sets to review with a length of 788 [*] Do you want to display matched instruction set? [Y/N] y add|add|add|add|add|add|add|add|outsb|push|jb|.byte|jae|push|outsd|imul|add|je|outsd|imul|outsb|outsw|jb|.byte|je|outsd|outsb|add|push|push|je|.byte|insb|jne|js|add|.byte|add|jne|insb|push|imul|push|jne|jb|push|.byte|insb|jne|js|add|xchg|add|imul|jb|jbe|insb|push|.byte|insb|jne|add|add|jb|.byte|je|jns|js|add|push|jo|outsb|jns|js|add|push|push|jne|jb|outsb|outsw|jns|add|add|insb|je|push|.byte|insb|jne|add|add|insb|outsd|.byte|.byte|je|outsb|outsb|imul|push|imul|push|imul|jne|je|outsd|imul|jb|jbe|insb|jae|xor|push|insb|outsd|jae|jns|push|push|xor|insb|insb|add|push|add|.byte|insd|jo|add|.byte|add|insb|outsb|add|add|imul|add|imul|mov|je|jne|jb|outsb|je|jb|.byte|jae|add|add|outsd|.byte|.byte|insb|outsd|.byte|imul|add|outsd|jae|add|add|jo|outsb|outsb|jbe|jb|outsb|insd|outsb|je|je|imul|add|je|imul|imul|jns|add|mov|.byte|.byte|insb|insb|insb|outsd|.byte|add|je|jb|jbe|je|push|jb|imul|add|add|je|imul|.byte|jne|jae|add|add|jae|push|.byte|.byte|.byte|jns|add|jbe|je|jns|je|insd|imul|jns|add|add|outsd|.byte|.byte|insb|push|outsb|insb|outsd|.byte|imul|add|je|push|.byte|je|.byte|.byte|insd|add|jb|.byte|je|imul|jns|add|cmp|imul|je|imul|je|.byte|jae|jb|outsd|jb|add|je|jb|.byte|jb|jae|add|add|insd|outsd|jbe|imul|jns|add|insb|add|je|imul|.byte|jne|jae|add|ret|insb|outsd|.byte|.byte|insb|jb|add|xor|imul|add|je|jb|jbe|je|push|jb|imul|imul|add|imul|add|add|.byte|insb|insb|insb|outsd|.byte|add|cmp|push|jb|jbe|je|push|jb|imul|imul|add|je|outsd|jne|imul|add|add|outsb|js|imul|outsd|insd|jo|jb|push|je|imul|add|outsb|add|push|add|outsd|jae|.byte|.byte|outsb|insb|add|outsd|.byte|.byte|insb|jb|add|add|je|imul|js|je|jb|.byte|jae|add|add|.byte|.byte|je|push|imul|insb|push|imul|jb|.byte|je|imul|imul|jne|.byte|add|je|imul|jb|add|jb|push|jae|jne|.byte|add|add|push|jae|jne|.byte|add|or|outsd|jb|imul|push|.byte|je|add|je|jbe|outsb|je|add|add|je|outsd|jne|.byte|.byte|.byte|outsb|insb|push|add|.byte|add|jb|.byte|je|jae|.byte|add|js|push|je|imul|xor|push|jb|je|imul|je|jb|jbe|push|jns|add|push|outsd|insb|jne|outsb|outsw|jb|.byte|je|outsd|outsb|add|add|jb|imul|jb|.byte|add|outsd|push|jae|jne|.byte|add|.byte|add|.byte|.byte|je|jbe|outsb|je|add|out|je|js|je|outsd|push|jb|.byte|jae|add|movsb|add|.byte|.byte|je|push|jb|.byte|jae|add|ret|add|.byte|imul|je|jne|jb|outsb|je|imul|jns|add|mov|je|insd|jo|imul|add|adc|jae|je|jbe|outsb|je|add|outsd|.byte|imul|outsd|jne|.byte|add|jp|je|jns|je|insd|outsb|outsw|add|add|imul|js|add|wait|add|.byte|.byte|je|jne|js|add|.byte|add|je|jne|jb|outsb|je|imul|jns|add|stosd|add|je|jb|imul|stosb|add|je|jb|imul|je|insd|jo|.byte|je|add|mov|jb|.byte|je|push|push|add|add|.byte|insb|imul|outsd|imul|rol|jo|push|xor|insb|insb|add|retf|add|je|jbe|.byte|.byte|.byte|.byte|jo|add|xor|insb|insb|add|insb|je|insd|push|js|add|jae|je|jo|imul|fiadd|outsb|imul|add|.byte|jb|jb|jbe|add|add|imul|jae|js|.byte|add|.byte|jb|js|add|add|.byte|jb|jo|jb|add|adc|jae|.byte|jo|std|add|push|je|imul|je|insb|je|insd|push|js|add|insb|outsd|outsd|js|outsb|imul|jb|insd|add|insb|push|imul|outsd|.byte|add|add|.byte|.byte|insb|push|imul|rol|push|je|imul|js|add|scasb|add|jo|je|push|.byte|add|retf|je|imul|outsb|push|je|add|add|outsd|ja|imul|.byte|add|je|imul|jae|add|add|je|insb|je|insd|add|imul|.byte|jae|add|add|imul|jae|add|adc|jae|.byte|outsd|js|add|add|je|add|push|outsb|jae|.byte|add|cdq|add|je|outsd|jb|jb|jne|push|imul|add|push|.byte|imul|je|jo|.byte|.byte|.byte|push|.byte|je|add|push|outsb|insb|je|insd|jae|.byte|add|push|imul|outsb|push|je|add|push|imul|.byte|je|push|push|push|xor|insb|insb|add|push|add|jae|jo|imul|add [*] Do you want to disassemble the underlying bytes? [Y/N] y 0x1000dbbb: add byte ptr [rax], al | 0000 0x1000dbbd: add byte ptr [rax], al | 0000 0x1000dbbf: add byte ptr [rax], al | 0000 0x1000dbc1: add byte ptr [rax], al | 0000 0x1000dbc3: add byte ptr [rax], al | 0000 0x1000dbc5: add byte ptr [rax], al | 0000 0x1000dbc7: add bh, dh | 00F7 0x1000dbc9: add dword ptr [rdi + 0x70], ecx | 014F70 0x1000dbcc: outsb dx, byte ptr gs:[rsi] | 656E ... 0x1000e3b4: insb byte ptr [rdi], dx | 2E646C 0x1000e3b7: insb byte ptr [rdi], dx | 6C 0x1000e3b8: add byte ptr [rax], al | 0000 0x1000e3ba: push rdx | 52 0x1000e3bb: add ebx, dword ptr [rdi + 0x76] | 035F76 0x1000e3be: jae 0x1000e42e | 736E 0x1000e3c0: jo 0x1000e434 | 7072 0x1000e3c2: imul ebp, dword ptr [rsi + 0x74], 0x71000066 | 696E7466000071 0x1000e3c9: add byte ptr [rdi + 0x5f], bl | 005F5F [*] Do you want to display the raw byte blob? [Y/N] y 00000000000000000000000000F7014F70656E50726F63 ... [*] Do you want to keep this set? [Y/N] y [+] Keeping 1 mnemonic set using 75 % commonality out of 50 hashes [-] Length - 788 Section - .text [+] Printing offsets of type: longest [-] Gold matches ----------v SET rule0 v---------- add|add|add|add|add|add|add|add|outsb|push|jb|.byte|jae|push|outsd|imul|add|je|outsd|imul|outsb|outsw|jb|.byte|je|outsd|outsb|add|push|push|je|.byte|insb|jne|js|add|.byte|add|jne|insb|push|imul|push|jne|jb|push|.byte|insb|jne|js|add|xchg|add|imul|jb|jbe|insb|push|.byte|insb|jne|add|add|jb|.byte|je|jns|js|add|push|jo|outsb|jns|js|add|push|push|jne|jb|outsb|outsw|jns|add|add|insb|je|push|.byte|insb|jne|add|add|insb|outsd|.byte|.byte|je|outsb|outsb|imul|push|imul|push|imul|jne|je|outsd|imul|jb|jbe|insb|jae|xor|push|insb|outsd|jae|jns|push|push|xor|insb|insb|add|push|add|.byte|insd|jo|add|.byte|add|insb|outsb|add|add|imul|add|imul|mov|je|jne|jb|outsb|je|jb|.byte|jae|add|add|outsd|.byte|.byte|insb|outsd|.byte|imul|add|outsd|jae|add|add|jo|outsb|outsb|jbe|jb|outsb|insd|outsb|je|je|imul|add|je|imul|imul|jns|add|mov|.byte|.byte|insb|insb|insb|outsd|.byte|add|je|jb|jbe|je|push|jb|imul|add|add|je|imul|.byte|jne|jae|add|add|jae|push|.byte|.byte|.byte|jns|add|jbe|je|jns|je|insd|imul|jns|add|add|outsd|.byte|.byte|insb|push|outsb|insb|outsd|.byte|imul|add|je|push|.byte|je|.byte|.byte|insd|add|jb|.byte|je|imul|jns|add|cmp|imul|je|imul|je|.byte|jae|jb|outsd|jb|add|je|jb|.byte|jb|jae|add|add|insd|outsd|jbe|imul|jns|add|insb|add|je|imul|.byte|jne|jae|add|ret|insb|outsd|.byte|.byte|insb|jb|add|xor|imul|add|je|jb|jbe|je|push|jb|imul|imul|add|imul|add|add|.byte|insb|insb|insb|outsd|.byte|add|cmp|push|jb|jbe|je|push|jb|imul|imul|add|je|outsd|jne|imul|add|add|outsb|js|imul|outsd|insd|jo|jb|push|je|imul|add|outsb|add|push|add|outsd|jae|.byte|.byte|outsb|insb|add|outsd|.byte|.byte|insb|jb|add|add|je|imul|js|je|jb|.byte|jae|add|add|.byte|.byte|je|push|imul|insb|push|imul|jb|.byte|je|imul|imul|jne|.byte|add|je|imul|jb|add|jb|push|jae|jne|.byte|add|add|push|jae|jne|.byte|add|or|outsd|jb|imul|push|.byte|je|add|je|jbe|outsb|je|add|add|je|outsd|jne|.byte|.byte|.byte|outsb|insb|push|add|.byte|add|jb|.byte|je|jae|.byte|add|js|push|je|imul|xor|push|jb|je|imul|je|jb|jbe|push|jns|add|push|outsd|insb|jne|outsb|outsw|jb|.byte|je|outsd|outsb|add|add|jb|imul|jb|.byte|add|outsd|push|jae|jne|.byte|add|.byte|add|.byte|.byte|je|jbe|outsb|je|add|out|je|js|je|outsd|push|jb|.byte|jae|add|movsb|add|.byte|.byte|je|push|jb|.byte|jae|add|ret|add|.byte|imul|je|jne|jb|outsb|je|imul|jns|add|mov|je|insd|jo|imul|add|adc|jae|je|jbe|outsb|je|add|outsd|.byte|imul|outsd|jne|.byte|add|jp|je|jns|je|insd|outsb|outsw|add|add|imul|js|add|wait|add|.byte|.byte|je|jne|js|add|.byte|add|je|jne|jb|outsb|je|imul|jns|add|stosd|add|je|jb|imul|stosb|add|je|jb|imul|je|insd|jo|.byte|je|add|mov|jb|.byte|je|push|push|add|add|.byte|insb|imul|outsd|imul|rol|jo|push|xor|insb|insb|add|retf|add|je|jbe|.byte|.byte|.byte|.byte|jo|add|xor|insb|insb|add|insb|je|insd|push|js|add|jae|je|jo|imul|fiadd|outsb|imul|add|.byte|jb|jb|jbe|add|add|imul|jae|js|.byte|add|.byte|jb|js|add|add|.byte|jb|jo|jb|add|adc|jae|.byte|jo|std|add|push|je|imul|je|insb|je|insd|push|js|add|insb|outsd|outsd|js|outsb|imul|jb|insd|add|insb|push|imul|outsd|.byte|add|add|.byte|.byte|insb|push|imul|rol|push|je|imul|js|add|scasb|add|jo|je|push|.byte|add|retf|je|imul|outsb|push|je|add|add|outsd|ja|imul|.byte|add|je|imul|jae|add|add|je|insb|je|insd|add|imul|.byte|jae|add|add|imul|jae|add|adc|jae|.byte|outsd|js|add|add|je|add|push|outsb|jae|.byte|add|cdq|add|je|outsd|jb|jb|jne|push|imul|add|push|.byte|imul|je|jo|.byte|.byte|.byte|push|.byte|je|add|push|outsb|insb|je|insd|jae|.byte|add|push|imul|outsb|push|je|add|push|imul|.byte|je|push|push|push|xor|insb|insb|add|push|add|jae|jo|imul|add ----------^ SET rule0 ^----------- DarkComet_x64/8841880e35b0936d7768e489f386e259970c98489a7ddd00e693797a394b4e39 0x1000dbbb - 0x1000e3c9 in .text [-] Remaining matches ----------v SET rule0 v---------- DarkComet_x64/d4e28eebd4f485496590e55e245a22b17d58414ae22f5186da90b0cc8d4d2006 0x1000d707 - 0x1000df19 in .text DarkComet_x64/c8dcafd948f8ae3b60c5d89365ad2fe328ceb74820fc1334e9d4853a3ad4c3e8 0x1000dbbb - 0x1000e3c9 in .text DarkComet_x64/3faaac0cc5a64fa2a14e2d733cd10faf93fcaa05f7a5bf60bbf4540b95c37bf9 0x1000dbbb - 0x1000e3c9 in .text DarkComet_x64/d5134afb4b67af145384a5ca68dac7345e375c4c97a5d1bdd9d0419a20c72c15 0x1000d707 - 0x1000df19 in .text ... DarkComet_x64/4c70caa63cce7fcc4e66d2696ffb1844536b353775b874458549957b19cedbdf 0x1000d707 - 0x1000df19 in .text DarkComet_x64/95216ff79bd77eada3004203ccf8ba3eee2cfbe3800b3227c3fe0b2f12f32362 0x1000dbbb - 0x1000e3c9 in .text DarkComet_x64/24a09b9dadb7471f29e1d2af22219c8fb94a2643bf481228ac60f48f55fa4e5f 0x1000dbbb - 0x1000e3c9 in .text DarkComet_x64/1593b9482d71a52917eb1d3104eba822b93798eb812010afbbb50ab857fcbfab 0x1000dbbb - 0x1000e3c9 in .text DarkComet_x64/2022ec23240d89da591ec57db5507c41fb3335253f6e1eb9306598d4a8255ef9 0x1000dbbb - 0x1000e3c9 in .text ----------^ SET rule0 ^----------- [+] Generating YARA rule for matches off of bytes from gold - DarkComet_x64/8841880e35b0936d7768e489f386e259970c98489a7ddd00e693797a394b4e39 [*] Do you want to try and morph rule0 for accuracy and attempt to make it VT Retro friendly [Y/N] y [+] Check 03 - Dynamically morphing YARA rule0 [*] Dynamic morphing succeeded [*] Do you want to include matched sample names in rule meta? [Y/N] y [*] Do you want to include matched byte sequence in rule comments? [Y/N] y [+] Completed YARA rules /* SAMPLES: DarkComet_x64/84bc7ecfe0a9170892b9a2ebd0fc88a41b33951db3b57b7c51faed55930b3a26 DarkComet_x64/dacf28a75841b59284a14e8ad87b3a9dd93edca75adf2bd651d8de684ab1b53a DarkComet_x64/e1000ad839f519cc74b40013bea2c294b85a02460f860f58665e48ab4919ed90 DarkComet_x64/cb8e7cd1401d263c935add9d31738df96832df039bec31a9209b09fcf90c3c5a ... DarkComet_x64/422fc02a254a3e60336932cfe67121766fcefd7da4c734e8d0660d692b8a439b DarkComet_x64/c69a7d7f88c16345afa4ed7b6b456a286ab64df4bdbeef00e39b212f2884d6ca DarkComet_x64/2a19a370ae166a6e3f184031f9dc9a94d158317fbae50ca682c6e07e0acb537a DarkComet_x64/d003ec22e4d9c86f106c8d5d6e2c8788fdc158c78c1f98b452c872370cb4176e BYTES: 00000000000000000000000000F7014F70656E50726F63 ... INFO: binsequencer.py -c 75 -s -o -a x64 DarkComet_x64/ Match SUCCESS for morphing */ rule rule0 { meta: description = "Autogenerated by Binsequencer v.1.0.4 from DarkComet_x64/8841880e35b0936d7768e489f386e259970c98489a7ddd00e693797a394b4e39" author = "" date = "2018-06-05" strings: $rule0_bytes = { 000000000000000000000000????01????656E5072????6573??546F6B????????????4765????6F6B??????6E666F72????74??6F6E??????5265????6574????6C75??45????0000??01????75??6C5369??????????????67??75??72??56??6C75??45????00009601??????6B??????72??76??6C65????????6C75????????02????6743??????74??4B65????78????????5265674F????6E4B65????78????????5265????75??72??49??666F4B65????000047??????674465??6574??56??6C75??????????????6C6F????74??41??6449??69??????????????5369??????????????65????69??????????????75??74??6F6B??????72??76??6C656765????30??52656743??6F73??4B65????41????41??49????2E64??6C00005405??????????6D70??0000??05????????6C656E??????4B??????????6565??????01????6565????????????????C6????6574??75??72??6E74??72????6573??????02????6F????6C4C????6B????49??????????6F73??????01????70??6E6445??76??72??6E6D656E74??74??69????????????02????74??69????????????69????????????79??0000BB????????????6C41??6C6F????????4765????72??76??74??5072??6669????????????????01????74??69????????????????75??6573??0000????49????42??????????6442????65????76??4765????79??74??6D44??????????????79??????02????6F????6C556E6C6F??6B????6702????74??68??????????74??????6D65??????????72????74??44??????????????79??000038??46??????????????74??69??????????????4765??????73??45????6F72??????????6574??72????4164??????73??000003????656D6F76??44??????????????79??00006C04??6574??69????????????????75??6573??0000C2????6C6F????6C46????65????34??46??????????????65??????4765????72??76??74??5072??6669??????????69????????????03??????644C??????????????000046??????????6C41??6C6F??000039??????????65??72??76??74??5072??6669??????????69????????????02????74??6F6475??6546????????????????000049??????6E644E??????46????????????????6F6D70??72??5374??69????????????05????????656E000052??????6F73??????6E646C65??????4C??????6C46????65????????????6574??46????????????????78??74??72????6573??????????????????74??5469????????????6C65??69????????????72????74??46????????????????69????????????75????65??????????6574??69??????????????6572????????46????65??6573??75????65????43????????64??6573??75????65????08??????????46??72??69????????????6A????74??6704??6574??76??6E74??????02????74??6F6475????????6E646C65??0000??01????72????74??6573????6765??????78??536574??69??????????????34??5772??74??46????????????????74??72??76??5479??65??????????????566F6C75??6549??666F72????74??6F6E??????04??6572??69????????????72????64????????????656F66??6573??75????65????????????????74??45????6E74??0000E6??4765????78??74??6F64????72????6573??0000A4??????????74??5072????6573????????C303??????6446????????????????6574??75??72??6E74??69????????????79??000089??4765????656D70??69??????????????000012????6573??74??76??6E74????????4C????6B??????6F75????65????7A??4765????79??74??6D49??666F????03??????644C??????????????78??00009B??????????74??4D????6578??0000??01????74??75??72??6E74??69????????????79??0000AB02????74??6572??69????????????AA02????74??6572??69????????????4765????656D70????74????????B4??43??????74??5468????????000048??????????6C46????????????????6F46????????????????C0??????6565????4B????4E????????2E64??6C0000CB01????74??6576??????????70????????49????2E64??6C????????????44??6749????6D546578????????????????4465????74??70??69????????????DA??45??6444??????????????????????72??72??76????????????69??????????????73??78????????????72??6578????????????????72??70??6572??000011??4D65??????676542????????FD01??????64??74??69????????????4765????6C6749????6D546578????????????????6C6F6742??78??6E6469????????????72??6D??????????????6C5769????????????6F??????????????????6C65??69????????????D2??536574??69????????????78????????AE????????70??74??68??????????6765??????CA????6574??69????????????6E67??74????????02????6F77??69??????????????02????74??69????????????73??????01????74??6C6749????6D000069????????????73??????????02????656B??????73??6765??????12??4D65??????676542??78??????01????74??????????53656E644D????????6765??????9902????74??6F72??6772??75??64??69????????????02????67????69??????????????74??70????????6A????74????????53656E6444??6749????6D4D65??????6765??????????????5769????????????6E67??74????????????????5769??????????????74??555345??33??2E64??6C00005203????73??70??69?????????????????? } $string_0 = { 21546869732070726F6772616D2063616E6E6F742062652072756E20696E20444F53206D6F64652E } // !This program cannot be run in DOS mode. $string_1 = { 52696368 } // Rich $string_2 = { 2E74657874 } // .text $string_3 = { 602E64617461 } // `.data $string_4 = { 2E7064617461 } // .pdata $string_5 = { 402E72737263 } // @.rsrc $string_6 = { 402E72656C6F63 } // @.reloc $string_7 = { 41445641504933322E646C6C } // ADVAPI32.dll ... $string_1429 = { 5704505805405204104305402E04505804502002002002002002002002002002002002002E04D0550490 } // WEXTRACT.EXE .MUI $string_1430 = { 5007206F06407506307404E06106D0650 } // ProductName $string_1431 = { 5706906E06406F0770730 } // Windows $string_1432 = { 2004906E07406507206E06507402004507807006C06F0720650720 } // Internet Explorer $string_1433 = { 5007206F06407506307405606507207306906F06E0 } // ProductVersion $string_1434 = { 5606107204606906C06504906E06606F0 } // VarFileInfo $string_1435 = { 5407206106E07306C06107406906F06E0 } // Translation condition: all of them }

That about sums it up. Hopefully someone finds it useful but if not, c'est la vie.

The GIT for this script is up. Enjoy.

By Jeff White (karttoon)

Getting phished sucks. Getting phished really sucks when you've spent significant amounts of time analyzing phishing attacks only to end up falling prey to one anyway. It happens, this is my story...

If you're not familiar with who Shane Missler is, he's a 20 year old who recently won a Mega Millions jackpot. A HUGE jackpot, one of the largest they've ever had. Also being in Florida with Shane, he dominated our local news coverage for a short period of time. One thing that kept reoccurring was that he kept saying he wanted to help people and do good with his money. Cool, that's a nice thing to do and I wish him the best of luck and then I moved on to real news.

A few days later though, I saw a Tweet pop-up on my feed from an account claiming to be Shane, created in April of 2016, stating that he wanted to give back to everyone and offered USD $5,000 to the first 50,000 people who retweeted his message. Remembering his repeated messaging of wanting to do good I said "sure, what's the harm in retweeting?" and did so. I figured, if it's a fake account than I'll just unfollow and that'll be the end of it.

Fast forward another couple of days and I see a new Tweet pop-up on my feed, again from Shane. This time he says he's hired a company to put together a website to process the payments for the 50,000 people who met the requirement. I thought, "no way..." and went to the Twitter account. It looked like I remember it looking like when I saw it in the media and I started browsing his tweets.

They were thoughtful and offering positive messages with seemingly a lot of engagements. Huh, "I'll be damned!" I thought, this guy is actually doing it.

Now, in hindsight, besides all of the obvious red flags I even acknowledged and willfully ignored as this phish built-up, the basic math of it all should have been a no-brainer. At USD $5,000, across 50,000 people, you have USD $250M dollars which, again in hindsight, was far more than I knew Shane had since he took the cash payout which was significantly less than what he won.

So, I took the bait. I clicked on the website and it had some fancy JavaScript, a pretty background, and three different "payout" options: "PayPal", "Venmo", or "Check". I was surprised he had a physical cheque listed as an option but it added more credibility, in my mind at the time, that this guy might actually be serious if he's willing to mail it.

Naturally, I opted for PayPal. Now, the information asked for felt off, but as I do a lot of PayPal and they weren't asking for things that weren't already in public domain or easily Googleable, I let the little devil on my left shoulder shut the little angel on the right out - "this is totally going to be legit" as I filled in my name, address, and e-mail with thoughts dancing through my head about getting my entire family to sign-up ASAP!

At this point, it does some "checks" to "verify" your eligibility and, again, I thought "this dude is crazy but hey I'm all about that free money!".

The deeper I went into this rabbit hole, the more I self-convinced myself to ignore all of the glaringly obvious red flags.

Next up, it says it needs to verify you're a human on the totally legit site "areyouahuman[.]co". Sounds reasonable, we definitely don't want to give money away to a bot so let's see what we have to do...

Now, mind you, I was doing this all on my phone while also preoccupied with something else so when the "verify you're a human" page came up and said I needed to install 4 Apps on my phone and let them run for 30 seconds, it made me stop what I was doing and take pause. What the hell kind of verification is this? How does that even work? Is this malware? What are these apps? Are they trying to generate money to cover some of the costs of sending out USD $5,000 to every person? This last question was, if you haven't already figured it out, somewhat true. The apps were all legitimate, Google validated, and very popular games on the Google Play store. Regardless, I trudged on due to greed, played a game of Solitaire, and pondered why I was letting myself be fooled by such an obvious fraud.

I decided to skip ahead, dread beginning to rise, to the Amazon voucher. It was a Amazon survey for $1,000 which was the final straw, as there is no way it's tied to human verification. I decided to go back to Twitter and confirm my fears; almost every subtweet to the original was along the lines of "THIS IS FAKE!!!". Red faced, annoyed, had, I decided to figure out just exactly what I got myself into it.

First up, I confirmed it didn't matter what options I picked, what bullshit I filled in, I would "verify" and get sent over to the "areyouahuman[.]co" site. On the PC, the entries were of course different so they are doing some device/source detection and redirecting based on that. I don't think this site is necessarily related to the other, but you can clearly see the affiliate ID at the top.

Going through the source code on the page, it luckily appears to just be a scheme to generate revenue and using the affiliate ID for tracking. The person behind the ID would get cash for each successful app install, links clicked, and surveys taken thus making me a pawn in their game. The links all followed a similar pattern of using this site "jump[.]ogtrk[.]net" preceeded by the affiliate ID and whatever the AD is, as shown below:

<a href="https://jump[.]ogtrk[.]net/aff_c?offer_id=12646&aff_id=9480&aff_sub=e4484e0556d6808de3acde38d3a1925f&aff_sub2=0vhEVTB6vnEGatmzW%2Fui5lHzWFPiC5VSPb13HcDHnP63DDN5M9CkPHxxshjLpbVIb4ED5sdZ9sQ88M7imgoUew%3D%3D&aff_sub3=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzUxMiJ9.eyJpYXQiOjE1MTY1NjU4ODYsImp0aSI6InlPc21WZ1hRbU9iNmNLVXEwcGRXN1R2OGFoVDFRbUlGUjEzWndFWXZTUFk9IiwiaXNzIjoib2dhZHMiLCJuYmYiOjE1MTY1NjU4ODYsImV4cCI6MTUxOTE1Nzg4NiwiZGF0YSI6eyJpcCI6Ijk2LjU4LjIyOC4xMDYiLCJyZWYiOiJhSFIwY0RvdkwzTm9ZVzVsYldsemMyeGxjbVp2ZFc1a1lYUnBiMjR1WTI5dEx3PT0iLCJ1YSI6Ik1vemlsbGFcLzUuMCAoTWFjaW50b3NoOyBJbnRlbCBNYWMgT1MgWCAxMF8xM18yKSBBcHBsZVdlYktpdFwvNTM3LjM2IChLSFRNTCwgbGlrZSBHZWNrbykgQ2hyb21lXC82My4wLjMyMzkuMTMyIFNhZmFyaVwvNTM3LjM2In19.F7XTVXlMvwcbLMAeBULRcmX1JAuwIl55HDSagOoE64eZTQZRe2pmMsMYT2QWiha1IyYoHd3TAMiyyPXW8ozWsw&aff_sub4=&aff_sub5=" target="_blank"><li><span class="lc-checks__feature ">Win A $1000 Amazon Voucher</span></li></a> <a href="https://jump[.]ogtrk[.]net/aff_c?offer_id=12306&aff_id=9480&aff_sub=e4484e0556d6808de3acde38d3a1925f&aff_sub2=0vhEVTB6vnEGatmzW%2Fui5lHzWFPiC5VSPb13HcDHnP63DDN5M9CkPHxxshjLpbVIb4ED5sdZ9sQ88M7imgoUew%3D%3D&aff_sub3=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzUxMiJ9.eyJpYXQiOjE1MTY1NjU4ODYsImp0aSI6InlPc21WZ1hRbU9iNmNLVXEwcGRXN1R2OGFoVDFRbUlGUjEzWndFWXZTUFk9IiwiaXNzIjoib2dhZHMiLCJuYmYiOjE1MTY1NjU4ODYsImV4cCI6MTUxOTE1Nzg4NiwiZGF0YSI6eyJpcCI6Ijk2LjU4LjIyOC4xMDYiLCJyZWYiOiJhSFIwY0RvdkwzTm9ZVzVsYldsemMyeGxjbVp2ZFc1a1lYUnBiMjR1WTI5dEx3PT0iLCJ1YSI6Ik1vemlsbGFcLzUuMCAoTWFjaW50b3NoOyBJbnRlbCBNYWMgT1MgWCAxMF8xM18yKSBBcHBsZVdlYktpdFwvNTM3LjM2IChLSFRNTCwgbGlrZSBHZWNrbykgQ2hyb21lXC82My4wLjMyMzkuMTMyIFNhZmFyaVwvNTM3LjM2In19.F7XTVXlMvwcbLMAeBULRcmX1JAuwIl55HDSagOoE64eZTQZRe2pmMsMYT2QWiha1IyYoHd3TAMiyyPXW8ozWsw&aff_sub4=&aff_sub5=" target="_blank"><li><span class="lc-checks__feature ">What Is Your Favorite Starbucks Coffee?</span></li></a> <a href="https://jump[.]ogtrk[.]net/aff_c?offer_id=11916&aff_id=9480&aff_sub=e4484e0556d6808de3acde38d3a1925f&aff_sub2=0vhEVTB6vnEGatmzW%2Fui5lHzWFPiC5VSPb13HcDHnP63DDN5M9CkPHxxshjLpbVIb4ED5sdZ9sQ88M7imgoUew%3D%3D&aff_sub3=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzUxMiJ9.eyJpYXQiOjE1MTY1NjU4ODYsImp0aSI6InlPc21WZ1hRbU9iNmNLVXEwcGRXN1R2OGFoVDFRbUlGUjEzWndFWXZTUFk9IiwiaXNzIjoib2dhZHMiLCJuYmYiOjE1MTY1NjU4ODYsImV4cCI6MTUxOTE1Nzg4NiwiZGF0YSI6eyJpcCI6Ijk2LjU4LjIyOC4xMDYiLCJyZWYiOiJhSFIwY0RvdkwzTm9ZVzVsYldsemMyeGxjbVp2ZFc1a1lYUnBiMjR1WTI5dEx3PT0iLCJ1YSI6Ik1vemlsbGFcLzUuMCAoTWFjaW50b3NoOyBJbnRlbCBNYWMgT1MgWCAxMF8xM18yKSBBcHBsZVdlYktpdFwvNTM3LjM2IChLSFRNTCwgbGlrZSBHZWNrbykgQ2hyb21lXC82My4wLjMyMzkuMTMyIFNhZmFyaVwvNTM3LjM2In19.F7XTVXlMvwcbLMAeBULRcmX1JAuwIl55HDSagOoE64eZTQZRe2pmMsMYT2QWiha1IyYoHd3TAMiyyPXW8ozWsw&aff_sub4=&aff_sub5=" target="_blank"><li><span class="lc-checks__feature ">Win Samsung S8</span></li></a>

The next logical question then was, "who the hell is the man behind the curtain?".

A quick WHOIS didn't provide any useful information. The domain was created fairly recently, which would make sense, but otherwise it had the usual GoDaddy abuse information; however, there was a Registrant Name which would be useful.

Domain Name: SHANEMISSLERFOUNDATION[.]COM Registry Domain ID: 2216035150_DOMAIN_COM-VRSN Registrar WHOIS Server: whois.godaddy.com Registrar URL: http://www.godaddy.com Updated Date: 2018-01-21T01:17:36Z Creation Date: 2018-01-21T01:17:35Z Registry Expiry Date: 2019-01-21T01:17:35Z Registrar: GoDaddy.com, LLC Registrar IANA ID: 146 Registrar Abuse Contact Email: abuse@godaddy.com Registrar Abuse Contact Phone: 480-624-2505 Domain Status: clientDeleteProhibited https://icann.org/epp#clientDeleteProhibited Domain Status: clientRenewProhibited https://icann.org/epp#clientRenewProhibited Domain Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited Domain Status: clientUpdateProhibited https://icann.org/epp#clientUpdateProhibited Name Server: NS03.DOMAINCONTROL.COM Name Server: NS04.DOMAINCONTROL.COM DNSSEC: unsigned URL of the ICANN Whois Inaccuracy Complaint Form: https://www.icann.org/wicf/ >>> Last update of whois database: 2018-01-21T20:27:59Z <<< Domain Name: shanemisslerfoundation[.]com Registrar URL: http://www.godaddy.com Registrant Name: Sean Courtney Registrant Organization: Name Server: NS03.DOMAINCONTROL.COM Name Server: NS04.DOMAINCONTROL.COM DNSSEC: unsigned

Sean Courtney isn't a lot to go on. Looking in PassiveTotal there are 106 recorded Registrants that share this name. A lot of them seem unrelated so name alone might not be strong enough to go off. Looking at the resolutions for the domain show two IP addresses, both registered to GoDaddy. This implies that on the date it was registered they changed the IP address, which is always an interesting piece of data to pivot on.

Cross-correlating the two IP addresses with every other domain that shares the Registrant "Sean Courtney" showed a number of domains that overlapped.

107.180.11.206: "shanemisslerfoundation[.]com" "activatedcharcoal[.]life" "clashwrap[.]com" "clashup[.]org" 160.153.72.6: "shanemisslerfoundation[.]com" "seancourtney[.]org"

Now, these are shared GoDaddy IP addresses, so each IP has a significant amount of domains attached to them (500 and 1K, respectively) but it's building a stronger relationship. Additionally, the other domains also have another overlap which makes things more interesting. Specifically, they share an e-mail address used during registration.

seanstuhh7@gmail[.]com

Obviously, the one that immediately stands out in the domains is "seancourtney[.]org". Looking at this sites WHOIS information reveals a bit more. Relevant bits follow:

Domain Name: SEANCOURTNEY[.]ORG Registrar WHOIS Server: whois.godaddy.com Updated Date: 2017-10-23T03:46:48Z Creation Date: 2017-08-24T01:27:16Z Registrar: GoDaddy.com, LLC Registrar IANA ID: 146 Registrar Abuse Contact Email: abuse@godaddy.com Registrar Abuse Contact Phone: +1.4806242505 Registrant Name: Sean Courtney Registrant Street: 3105 N. Oakwood Ave Registrant Street: #121 Registrant City: Muncie Registrant State/Province: Indiana Registrant Postal Code: 47304 Registrant Country: US Registrant Phone: +1.3174400050 Registrant Email: seanstuhh7@gmail[.]com

Before diving in further on that, I want to take a second to talk about the e-mail address. If you Google it, you receive back a handful of other domains they've registered in the past all related to video game cheating. Similarly, using PassiveTotal to look at the historical domain registrations from this e-mail, a picture begins to emerge around content and theme.

seancourtney[.]org activatedcharcoal[.]life cpamethods[.]biz offtrackworld[.]us clashwrap[.]com clashup[.]org clashcomp[.]com clashway[.]org clashwork[.]com pokemaps[.]org clashpro[.]net miitomomines[.]org miitomomines[.]info clashpro[.]org overwatchlooter[.]org overwatchlooter[.]info accesspokemon[.]xyz accesspokemon[.]info thefuturedrake[.]com supercellko[.]com cheatcove[.]com robuxhack[.]info boombeachelite[.]com freexbltrials[.]com twitchshop[.]biz allthingsprofitable[.]net iotaexchange[.]us

Based on these and the few Google hits, it seems this individual tries to profit off "hacks" for very popular games, such as PokémonGo, Clash of Clans, and OverWatch. I also suspect that not one of these serve actual hacks or cheats but simply are used as a lure for desperate people. Again, if you play off peoples desire to win (or get free money) then you can more easily entice them into clicking your affiliate links and thus make money.

I have an e-mail, a name, and an address so lets see what else is available online.

Of course, almost immediately, I stumble onto the individuals LinkedIn page.

Sean Courtney, Advertiser, located in Muncie, Indiana.

He's also worked other Advertiser jobs in the past, but his most recent experience as an "Affiliate Advertiser" seems to line up perfectly with what we've uncovered so far. You'll also note the name of his currently employer, "OGAds". My gosh, that sounds awfully familiar...

You'll recall that when I looked at the source code of the page the surveys and apps were being funneled through, the domain was "ogtrk[.]net" - I wonder what the chances are those two are related?

Well, pretty fucking likely apparently.

The final icing on the cake is an all too now familiar app-install offer that OGAds displays proudly on their site.

That about wraps things up here. I'm sure they made a good chunk of change off everyone and it was a good lure (for me) so hats off to Sean and, most likely, OGAds. You're all a bunch of twats.

By Jeff White (karttoon)

*EDIT - 27AUG2018*
I converted this to a dedicated Cuckoo module and it far exceeds the capabilities here. Branch can be found here for Cuckoo and general code on GitHub here.

In this post I want to give a brief introduction to a new tool I'm working on called "Curtain". It will be complimentary to another post I'm working on for $dayjob where I created a Curtain Cuckoo module.

Curtain is basically just a small script that can be used to detonate malicious files or PowerShell scripts and then scrape out the ScriptBlock logs for fast analysis. I'll go into the concept behind it more in my other post.

The idea here then is to get that same functionality but without Cuckoo, as it's not always available to everyone or people may not want to much with installing custom modules, and I wanted something standalone.

That being said, there are still a few requirements for this alternate iteration which, admittedly, was my first version before I decided to try and streamline it more for work usage.

The usage of the script is fairly straight forward and you simply pass it either a PS1 script OR a file which it will try to execute and then report back the ScriptBlock event logs. For PS1 scripts, it will launch PowerShell, otherwise it will rely natively on the extension and the OS the recognize/execute it. It'll wait 10 seconds and then simply scrape the logs, parse them into a simple HTML file, and display it on the host.

Below is a simple example of the script in action...

$ ./curtain.sh test.ps1 [+] Reverting to snapshot - curtain [+] Starting headless VM 2017-11-09T10:51:39.463| ServiceImpl_Opener: PID 26072 [+] Copying curtain.ps1 to virtual Guest [+] Sending target file to detonate [+] Launching Curtain PS script for - test.ps1 [!] Sleeping for 10 seconds to let malware doing its thang... [+] Transferring output from script [+] Grabbing a screenshot of the desktop [+] Killing virtual Guest [+] Launching site...

If you click HERE you can see the resulting output that gets created.

Nothing too crazy or fancy but it allows you to see how things flowed and have some visibility into the deobfuscated PowerShell which was executed on the Guest machine. I haven't yet beautified it so it's pretty raw at the moment but figured I'd put it out there if there was a need for it.

The GIT for this script is up and you'll need to fill out a few variables in the "curtain.sh" script to make it work.

Hopefully this helps if you do not have Cuckoo available or just want a quick way to be able to parse out PowerShell execution activity from a code sample.

There is also a file "psorder.py" I've included in the GIT repo. At some point I was writing a manual PowerShell deobfuscator and, while that is a fruitless endeavor, there are some benefits using it in tandem with Curtain for token replacement/etc.




Older posts...