/* CAT(1) */

In this blog post I want to share a Python script that I wrote a few years back which generates YARA rules by using a byte->opcode abstraction method to match similar parts of code across files. At a high level, it's pretty straight-forward. It takes a cluster of files (ideally PE files) and attempts to find the longest common sequence (LCS) of x86/x64 opcodes with the idea that once you abstract back to just opcodes, you can find similar functions in other files and possibly unknown malware. There are a few other matching techniques employed that I'll dive into later but that is the gist. Hopefully the script can compliment hunters and analysts in searching for similar malware samples through automation. There are a lot of other similar tools already out there doing this, but it never hurts to have another option in the proverbial toolbox.

This script was written in Python and heavily utilizes Capstone and YARA Python modules. What initially started as a small script for CTF challenges, and an excuse to play with Capstone <3, has matured over time into more of an operational tool to leverage VirusTotal's Retrohunt capability. I've had a few wins with it but I can't say it's ever been terribly useful in my current capacity and thus it was more of an exercise in programming logic and interacting with files at a low-level.

The reality of this approach is that due to different versions of compilers, different compile time flags, architectures, variable values, etc, all can cause similar code in logic and structures that, at a byte level, is quite different. This is the gap that BinSequencer is trying to bridge.

Overview

The core matching technique in BinSequencer attempts to first determine where executable code may reside within a PE by looking at whether the "code" or "executable" bit is set within each section of the Windows PE file. Once it's identified the potential locations for code, it will disassemble the bytes into assembly instructions and strip out the opcode mnemonics. These opcode mnemonics will be strung together as a sequence in which the script attempts to find the longest match that exists across the sample set and then reverses the process of disassembling it back into bytes for YARA rules.

Below is an image I cobbled together to try and illustrate this particular process.

One of the pro's of this method is that it makes logic matching easier. The mnemonic strings are essentially byte-agnostic, meaning that going from byte to opcode is easy. Looking at a sequence of just mnemonic opcodes is also much faster than byte-by-byte comparison so it helps when it comes to scaling this across large sample sets as well. Unfortunately, we can't have a pro without an equal con. Converting from mnemonic opcode back into a byte can be extremely problematic and this additional processing can slow things down significantly. You end up having to test various byte-variations and variable lengths of the operands used by the instructions to properly generate an accurate YARA rule, not to mention the problems with following wrong branches when building up the YARA rules, but I'll touch on this more later.

Assembly Primer

Before getting into the actual script, it behooves me to attempt and give a mini-x86 assembly overview in order to help illustrate some of the aforementioned pros and cons. The x86 instruction layout is below.

[prefix][opcode][mod-r/m][simb][operands]

When we go from low (bytes) to high (opcode) we will convert a byte like 0x75 to the opcode "JNZ" (jump if not zero). The bytes 0xF85 also use this same "JNZ" opcode so, regardless of the underlying bytes (0x75 vs 0xF85), we end up with the same opcode "JNZ" and the logic of the code remains intact. This is the major benefit of this abstraction technique.

Most instructions are fairly straight forward, "PUSHAD" will always be 0x60 and "CALL" will always be 0xE8 or 0xFF; however, the variations in the underlying bytes dictate things like operand length and change the overall size of the bytes in use.

; e80ef2ffff call 0xfffff213 ; ff1548304000 call dword ptr [0x403048]

One of the above "CALL" instructions is 5 bytes in total while the other is 6 bytes. This is one area of variation in which it begins to complicate things when we end up disassembling "CALL" back into bytes for matching in a YARA rule. There are also built-in optimizations that x86 uses, such as 0x5, which is "ADD EAX, ???" where a value is added to the EAX register. These optimizations create additional variations in potential bytes because the initial bytes change a lot. The "XOR" opcode is a good example of one with baked in optimizations creating a lot of variation which must be accounted for within our YARA rule.

0x30 XOR r/m8 r8 0x31 XOR r/m16/32 r16/32 0x32 XOR r8 r/m8 0x33 XOR r16/32 r/m16/32 0x34 XOR AL imm8 0x35 XOR EAX imm16/32 0x80 XOR r/m8 imm8 0x81 XOR r/m16/32 imm16/32 0x82 XOR r/m8 imm8 0x83 XOR r/m16/32 imm8

This is 10 different byte-representations of the same opcode, so going from byte to opcode is easy, but reversing it can start to get tricky. As if that wasn't bad enough, it gets complicated further by things like the mod-r/m byte which changes the actual opcode meaning. The general 8-bit breakdown for the mod-r/m byte follows:

MOD (7,6) | Reg/Opcode (5,4,3) | R/M (2,1,0)

Using the 0x80 byte from the previous "XOR" example, take a look at the following two instructions.

xor byte ptr [0x418e58] = 8035 588e410037 sub byte ptr [0x418e58] = 802D 588e410037

The mod-r/m byte for the first instruction is 0x35 and 0x2D for the second. The 3 bits that dictate the opcode are 5,4, and 3. These correspond to a table of potential opcodess for that particular byte, so we can't just assume 0x80 is "XOR" when we do our conversion. For the 0x35 mod-r/m byte, the bits in question are 110, which equals 0x6 - the bits for 0x2D are 101, which equals 0x5. The table for that specific 0x80 byte is below.

0x0 = ADD 0x1 = OR 0x2 = ADC 0x3 = SBB 0x4 = AND 0x5 = SUB 0x6 = XOR 0x7 = CMP

I've probably re-written this script 3 times over the past 2-3 years tackling problems as they've developed from these types of variations that would pop-up. The "MOV" opcode has over 20 different variations by itself so sometimes things got quite messy!

BinSequencer

Alright, so back to the script. The Python "pefile" module is used for the extraction of code from the files, assuming it is a PE, and then the Capstone module is used for the actual disassembly engine. For each file, it will convert the extracted area of data and pull each instruction out, followed by each opcode mnemonic, which get strung into a "blob" which is used as the base for matching.

There are various ways to tune the functionality of this script from the command line which can be seen below.

usage: binsequencer.py [-h] [-c <integer_percent>] [-m <integer>] [-l <integer>] [-v] [-a {x86,x64}] [-g <file>] [-d] [-Q] [-n] [-o] [-s] ... Sequence a set of binaries to identify commonalities in code structure. positional arguments: file optional arguments: -h, --help show this help message and exit -c <integer_percent>, --commonality <integer_percent> Commonality percentage the sets criteria for matches, default 100 -m <integer>, --matches <integer> Set the minimum number of matches to find, default 1 -l <integer>, --length <integer> Set the minimum length of the instruction set, default 25 -v, --verbose Prints data while processing, use only for debugging -a {x86,x64}, --arch {x86,x64} Select code architecture of samples, default x86 -g <file>, --gold <file> Override gold selection -d, --default Accept default prompt values -Q, --quiet Disable output except for YARA rule -n, --nonpe Process non-PE files (eg PCAP/JAR/PDF/DOC) -o, --opcode Use only the opcode matching technique -s, --strings Include strings in YARA for matched hashes

The `-c` flag lets you specify the percentage of samples the sequence needs to exist in and the `-l` flag sets the minimum length of linear opcodes required. I've found 25 to be around the sweet spot for bare minimum accuracy as it usually covers a couple of functional blocks but, in general, the higher the better. When you go too low, it leads to non-unique common sequences of instructions, such as with prologue and epilogues or basic code re-use. The "blobs" will look like "jnz|jmp|add|mov|xor|call" with hundreds to a few hundred thousand opcodes.

Since we're doing comparisons, the script tries to identify the best initial file to use for analysis as the "gold" file. If it's doing a 100% match, it will search for the file with the lowest volume of instructions since it has to exist in every other file; it does the reverse if it dips below 100%. As an aside, it's also worth noting a limitation of YARA at this point. YARA has a hard-coded limit of 10,000 hex bytes that can used for a rule so I artificially limit the amount of opcodes that can be used to 4,000 as a safeguard.

To do the actual comparison, it uses a simple sliding window technique between the low match limit (25) and 4,000 opcodes, starting at the highest size and subtracting one opcode after each iteration until it empties and moves the window up by one offset. From there, it utilizes a number of tricks to speed things up, like black listing known bad sequences, so that it can zero in on the optimal matching length and reduce the number of iterations required.

Once it has a match of sequenced opcodes, it moves into the actual YARA generation which is where things get fun. One nice feature of YARA is the ability to do a boolean OR in the hex match. A "JMP" opcode, which can be the 0xE8 or 0xFF byte, would be "(E8|FF)". You can also do wildcards in YARA so "PUSH", which is 0x50 through 0x57 and 0x6A, can be represented as "(5?|6A)". Finally, you can account for overall instruction length with a byte jump in YARA like "[4-5]", which skips between 4 to 5 bytes before the next match.

As I intended to use this on primarily on VirusTotal, I began running into a few undocumented limitations VirusTotal imposed on the rules that can be used for retrohunting. If you have too many jumps, boolean OR's, and even in some scenarios just the length of the jump could cause your rule to fail. I suspect these limitations are primarily for efficiency/performance management but it's something that I had to account for throughout the script - your mileage may vary and I have not kept up with any changes.

Output

It's probably easier to just show some examples of the expected output to highlight what the script does. I've truncated parts of it and will interject commentary between the sections to explain each one. For this first example, I'll take a look at my favorite lame Malware - Hancitor!

python binsequencer.py Hancitor_Malware/ [+] Extracting instructions and generating sets [-]Hancitor_Malware/e7b3ef04c211fafa36772da62ab1d250970d27745182d0f3736896cf7673dc3a_S1.exe .text - 3740 instructions extracted .edata - 477 instructions extracted [-]Hancitor_Malware/6e73879ca49b40974cce575626e31541b49c07daa12ec2e9765c432bfac07a20_S1.exe .text - 3740 instructions extracted .edata - 477 instructions extracted [-]Hancitor_Malware/2b3c920dca2fd71ecadd0ae500b2be354d138841de649c89bacb9dee81e89fd4_S1.exe .text - 3740 instructions extracted .edata - 477 instructions extracted ... [-]Hancitor_Malware/ab90ed6cb461f17ce1f901097a045aba7c984898a0425767f01454689698f2e9_S1.exe .text - 3740 instructions extracted .edata - 477 instructions extracted [-]Hancitor_Malware/594ab467454aafa64fc6bbf2b4aa92f7628d5861560eee1155805bd0987dbac3_S1.exe .text - 3740 instructions extracted .edata - 477 instructions extracted [-]Hancitor_Malware/643951eee2dac8c3677f5ef7e9cb07444f12d165f6e401c1cd7afa27d7552367_S1.exe .text - 3740 instructions extracted .edata - 477 instructions extracted

The script takes a path to where the files reside and will iterate over each one to extract instructions. If you specify the `-n` flag, it will treat the files as non-PE, this was done for analyzing files of raw shellcode but the opcode technique holds up over other files as well (PCAP/JAR/whatever), but I wouldn't recommend it and you may run into bugs.

[+] Golden hash (3736 instructions) - Hancitor_Malware/fff786ec23e6385e1d4f06dcf6859cc2ce0a32cee46d8f2a0c8fd780b3ecf89a_S1.exe

Since I ran the script with it's default settings, it attempts a 100% match across all samples and thus finds the sample with the lowest instruction count since any match *must* exist in this file.

[+] Zeroing in longest mnemonic instruction set in .text [-] Matches - 0 Block Size - 3259 Time - 0.00 seconds [-] Matches - 0 Block Size - 1630 Time - 0.13 seconds [-] Matches - 0 Block Size - 816 Time - 0.13 seconds [-] Matches - 0 Block Size - 409 Time - 0.14 seconds [-] Matches - 0 Block Size - 206 Time - 0.12 seconds [-] Matches - 0 Block Size - 105 Time - 0.10 seconds [-] Matches - 0 Block Size - 55 Time - 0.08 seconds [-] Matches - 0 Block Size - 30 Time - 0.07 seconds [+] Zeroing in longest mnemonic instruction set in .edata [-] Moving 1 instruction sets to review with a length of 477

The script should iterate over each section in the golden hash that it identified as potentially having code and check these against the other samples. In this case, it does 8 iterations of the sliding window for sizes above the minimum of 25 and finds no matches in ".text". Then it moves on to the ".edata" section, which has 477 instructions, and finds that this match exists in all of the samples.

[*] Do you want to display matched instruction set? [Y/N] y push|mov|sub|push|push|push|call|mov|add|test|je|push|mov|lea|add|push|push|push|call|cmp|jne|test|je|mov|xor|test|je|xor|inc|cmp|jb|pop|mov|pop|mov|pop|ret|pop|xor|pop|mov|pop|ret|mov|pop|mov|pop|ret|int3|int3|int3|int3|int3|int3|int3|push|mov|push|push|call|dec|add|neg|sbb|inc|pop|ret|int3|int3|int3|int3|int3|push|mov|sub|mov|push|push|push|mov|xor|add|movzx|lea|movzx|add|mov|mov|test|je|mov|mov|mov|mov|movzx|or|sub|mov|mov|mov|mov|movzx|or|sub|jne|mov|test|je|mov|inc|movzx|or|movzx|or|sub|je|mov|mov|test|js|jg|je|inc|add|add|mov|cmp|jae|mov|jmp|mov|lea|pop|pop|pop|lea|mov|pop|ret|pop|pop|xor|pop|mov|pop|ret|int3|int3|int3|int3|int3|int3|int3|int3|int3|push|mov|mov|mov|and|sub|xor|sub|cmp|jne|cmp|je|inc|cmp|jl|xor|pop|ret|int3|int3|int3|int3|push|mov|sub|push|mov|xor|push|push|xor|mov|mov|add|mov|mov|add|mov|add|mov|add|mov|mov|mov|mov|mov|mov|test|je|movsx|mov|cmp|jae|cmp|jae|mov|mov|mov|mov|movzx|add|mov|or|movzx|or|sub|jne|sub|test|je|mov|inc|movzx|or|movzx|or|sub|je|test|js|jg|je|mov|inc|cmp|jae|mov|mov|jmp|mov|mov|mov|add|pop|pop|mov|pop|mov|pop|ret|pop|xor|pop|mov|pop|mov|pop|ret|pop|pop|xor|pop|mov|pop|ret|int3|int3|int3|push|xor|mov|js|mov|mov|lodsd|mov|jmp|mov|lea|mov|pop|ret|int3|int3|int3|int3|int3|int3|int3|int3|int3|int3|int3|int3|int3|int3|push|mov|sub|mov|push|push|push|mov|add|call|mov|push|push|call|mov|push|push|mov|call|push|push|mov|call|mov|add|mov|test|jne|test|je|test|je|mov|mov|add|mov|cmp|je|mov|mov|add|add|cmp|je|mov|mov|add|add|mov|test|je|push|call|jmp|test|je|push|push|push|call|mov|test|je|mov|test|movzx|js|lea|push|push|call|cmp|je|mov|mov|add|mov|add|mov|cmp|jne|mov|add|mov|cmp|jne|pop|pop|pop|mov|pop|ret|int3|int3|int3|int3|int3|int3|int3|int3|int3|push|push|call|mov|push|call|push|call|add|pop|test|jne|ret|call|int3|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add

Again, designing this with the intent of being used during analysis, it prompts the user multiple times to display the results in various ways. The above lets you see the opcode sequence and gives a quick impression of whether or not the code construct seems to make logical sense. For example, you can see near the beginning 3 "PUSH" and then a "CALL", which is common behavior for pushing values to the stack before calling a function. This tells me what we're looking at is most likely code as opposed to data that just happened to be intermingled within a code section and disassembled.

As was stated before, it doesn't have to be code to work but the abstraction method will be less useful if it's not.

[*] Do you want to disassemble the underlying bytes? [Y/N] y 0x10003000: push ebp | 55 0x10003001: mov ebp, esp | 8BEC 0x10003003: sub esp, 0x1c | 83EC1C 0x10003006: push edi | 57 0x10003007: push dword ptr [ebp + 0xc] | FF750C 0x1000300a: push dword ptr [ebp + 8] | FF7508 0x1000300d: call 0x10003090 | E87E000000 0x10003012: mov edi, eax | 8BF8 0x10003014: add esp, 8 | 83C408 0x10003017: test edi, edi | 85FF ... 0x10003360: push esi | 56 0x10003361: push 0x403360 | 6860334000 0x10003366: call 0x10003140 | E8D5FDFFFF 0x1000336b: mov esi, eax | 8BF0 0x1000336d: push esi | 56 0x1000336e: call 0x10003270 | E8FDFEFFFF 0x10003373: push esi | 56 0x10003374: call 0x10003070 | E8F7FCFFFF 0x10003379: add esp, 0xc | 83C40C 0x1000337c: pop esi | 5E 0x1000337d: test eax, eax | 85C0 0x1000337f: jne 0x10003382 | 7501 0x10003381: ret | C3 0x10003382: call 0x10002010 | E889ECFFFF 0x10003387: int3 | CC ...

Similar to the previous display but with more details so you can see exactly what the assembly looks like.

[*] Do you want to display the raw byte blob? [Y/N] y 558BEC83EC1C57FF750CFF7508E87E0000008BF883C40885FF744456 ... [*] Do you want to keep this set? [Y/N] y [+] Keeping 1 mnemonic set using 100 % commonality out of 29 hashes [-] Length - 477 Section - .edata

If you did not want to keep the match, possibly due to the code being part of a known library, not being unique, the match being just data and not an actual code construct you're after, then it's possible to decline it and restart the matching process.

[+] Printing offsets of type: longest [-] Gold matches ----------v SET rule0 v---------- push|mov|sub|push|push|push|call|mov|add|test|je|push|mov|lea|add|push|push|push|call|cmp|jne|test|je|mov|xor|test|je|xor|inc|cmp|jb|pop|mov|pop|mov|pop|ret|pop|xor|pop|mov|pop|ret|mov|pop|mov|pop|ret|int3|int3|int3|int3|int3|int3|int3|push|mov|push|push|call|dec|add|neg|sbb|inc|pop|ret|int3|int3|int3|int3|int3|push|mov|sub|mov|push|push|push|mov|xor|add|movzx|lea|movzx|add|mov|mov|test|je|mov|mov|mov|mov|movzx|or|sub|mov|mov|mov|mov|movzx|or|sub|jne|mov|test|je|mov|inc|movzx|or|movzx|or|sub|je|mov|mov|test|js|jg|je|inc|add|add|mov|cmp|jae|mov|jmp|mov|lea|pop|pop|pop|lea|mov|pop|ret|pop|pop|xor|pop|mov|pop|ret|int3|int3|int3|int3|int3|int3|int3|int3|int3|push|mov|mov|mov|and|sub|xor|sub|cmp|jne|cmp|je|inc|cmp|jl|xor|pop|ret|int3|int3|int3|int3|push|mov|sub|push|mov|xor|push|push|xor|mov|mov|add|mov|mov|add|mov|add|mov|add|mov|mov|mov|mov|mov|mov|test|je|movsx|mov|cmp|jae|cmp|jae|mov|mov|mov|mov|movzx|add|mov|or|movzx|or|sub|jne|sub|test|je|mov|inc|movzx|or|movzx|or|sub|je|test|js|jg|je|mov|inc|cmp|jae|mov|mov|jmp|mov|mov|mov|add|pop|pop|mov|pop|mov|pop|ret|pop|xor|pop|mov|pop|mov|pop|ret|pop|pop|xor|pop|mov|pop|ret|int3|int3|int3|push|xor|mov|js|mov|mov|lodsd|mov|jmp|mov|lea|mov|pop|ret|int3|int3|int3|int3|int3|int3|int3|int3|int3|int3|int3|int3|int3|int3|push|mov|sub|mov|push|push|push|mov|add|call|mov|push|push|call|mov|push|push|mov|call|push|push|mov|call|mov|add|mov|test|jne|test|je|test|je|mov|mov|add|mov|cmp|je|mov|mov|add|add|cmp|je|mov|mov|add|add|mov|test|je|push|call|jmp|test|je|push|push|push|call|mov|test|je|mov|test|movzx|js|lea|push|push|call|cmp|je|mov|mov|add|mov|add|mov|cmp|jne|mov|add|mov|cmp|jne|pop|pop|pop|mov|pop|ret|int3|int3|int3|int3|int3|int3|int3|int3|int3|push|push|call|mov|push|call|push|call|add|pop|test|jne|ret|call|int3|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add|add ----------^ SET rule0 ^----------- Hancitor_Malware/fff786ec23e6385e1d4f06dcf6859cc2ce0a32cee46d8f2a0c8fd780b3ecf89a_S1.exe 0x10003000 - 0x100033fe in .edata [-] Remaining matches ----------v SET rule0 v---------- Hancitor_Malware/e7b3ef04c211fafa36772da62ab1d250970d27745182d0f3736896cf7673dc3a_S1.exe 0x10003000 - 0x100033fe in .edata Hancitor_Malware/6e73879ca49b40974cce575626e31541b49c07daa12ec2e9765c432bfac07a20_S1.exe 0x10003000 - 0x100033fe in .edata Hancitor_Malware/2b3c920dca2fd71ecadd0ae500b2be354d138841de649c89bacb9dee81e89fd4_S1.exe 0x10003000 - 0x100033fe in .edata ... Hancitor_Malware/ab90ed6cb461f17ce1f901097a045aba7c984898a0425767f01454689698f2e9_S1.exe 0x10003000 - 0x100033fe in .edata Hancitor_Malware/594ab467454aafa64fc6bbf2b4aa92f7628d5861560eee1155805bd0987dbac3_S1.exe 0x10003000 - 0x100033fe in .edata Hancitor_Malware/643951eee2dac8c3677f5ef7e9cb07444f12d165f6e401c1cd7afa27d7552367_S1.exe 0x10003000 - 0x100033fe in .edata ----------^ SET rule0 ^-----------

Once it has the match, it will show the offset within each file so that you can look at it further if needed. In this instance, you can see all of the matches happened in the ".edata" section at offset 0x10003000 through 0x100033FE. A quick look in IDA looks promising.

You can see the matched sequence extends across a number of function blocks and the code looks interesting with multiple instances of Win32 API calls commonly found in malware.

[+] Generating YARA rule for matches off of bytes from gold - Hancitor_Malware/fff786ec23e6385e1d4f06dcf6859cc2ce0a32cee46d8f2a0c8fd780b3ecf89a_S1.exe [*] Do you want to try and morph rule0 for accuracy and attempt to make it VT Retro friendly [Y/N] y [+] Check 01 - Checking for exact byte match

Before it actually does the YARA generation, the script can perform up to 3 various matching techniques in order to improve performance and accuracy. Each one builds upon the last and I'm going to spend a second detailing the three methods as they aren't all on display here.

You can skip the first two checks with the `-o` flag (although the third check will still utilize their techniques during morphing to some degree).

[+] Check 02 - Checking for optimal opcode match [*] Found optimal opcode match across all samples [*] Do you want to include matched sample names in rule meta? [Y/N] y [*] Do you want to include matched byte sequence in rule comments? [Y/N] y

Once the generation is complete and it validates the YARA rule, it will dump the rule to your console.

[+] Completed YARA rules /* SAMPLES: Hancitor_Malware/18046a720cd23c57981fdfed59e3df775476b0f189b7f52e2fe5f50e1e6003e7_S1.exe Hancitor_Malware/f4f026fbe3df5ee8ed848bd844fffb72b63006cfa8d1f053a9f3ee4c271e9188_S1.exe ... Hancitor_Malware/40a8bb6e3eed57ed7bc802cc29b4e57360aa10c2de01d755f9577f07e10b848b_S1.exe Hancitor_Malware/fed9cc2c7cfb97741470cb79c189a203545af88bdd67bc99e2d7499d343de653_S1.exe BYTES: 558BEC83EC1C57FF750 ... INFO: binsequencer.py Hancitor_Malware/ Match SUCCESS for morphing */ rule rule0 { meta: description = "Autogenerated by Binsequencer v.1.0.4 from Hancitor_Malware/fff786ec23e6385e1d4f06dcf6859cc2ce0a32cee46d8f2a0c8fd780b3ecf89a_S1.exe" author = "" date = "2018-06-05" strings: $rule0_bytes = { 558B??83????57FF????FF????E8????????8B??83????85??74??568B????8D????03????6A??5056FF??????????83????75??F6??????74??8B????33??85??74??80??????403B??72??5EB8????????5F8B??5DC35E33??5F8B??5DC3B8????????5F8B??5DC3CCCCCCCCCCCCCC558B??68????????FF????E8????????4883????F7??1B??405DC3CCCCCCCCCC558B??83????8B????5356578B????33??03??0FB7????8D????0FB7????03??89????89????85??74??8B????8B??8A??88????0FB6??83????2B??89????8B??89????8B??0FB6??83????2B????75??8A????84??74??8A????420FB6??83????0FB6????83????2B??74??8B????8B????85??78??7F??74??4783????83????89????3B??73??8B????EB??8B????8D????5F5E5B8D????8B??5DC35F5E33??5B8B??5DC3CCCCCCCCCCCCCCCCCC558B??8B????8B??81??????????2B??33??2D????????80????75??80??????74??4183????7C??33??5DC3CCCCCCCC558B??83????538B????33??565733??8B????8B??????03??8B????8B????03??89????03??8B????03??89????89????8B????8B????89????89????85??74??0FBF????89????3B??73??3B????73??8B????8B????8B??8B????0FB6????03??8A??83????0FB6??83????2B??75??2B??84??74??8A????420FB6??83????0FB6????83????2B??74??85??78??7F??74??8B????463B??73??8B????8B????EB??8B????8B????8B????03????5F5E8B??5B8B??5DC35F33??5E8B??5B8B??5DC35F5E33??5B8B??5DC3CCCCCC5633??64??????????78??8B????8B????AD8B????EB??8B????8D????8B????5EC3CCCCCCCCCCCCCCCCCCCCCCCCCCCC558B??83????8B????5356578B????03??E8????????8B??68????????56E8????????8B??68????????5689????E8????????68????????5689????E8????????8B????83????89????85??75??85??0F84????????85??0F84????????8B??????????8B????03??89????83??????74??8B????8B??03??03??83????74??8B??8B????03??03??8B????85??74??50FF??EB??85??74??6A??6A??50FF??8B??85??74??8B??85??0FB7??78??8D????5051FF????39??74??89??8B????83????8B????83????8B????83????75??8B????83????89????83??????75??5F5E5B8B??5DC3CCCCCCCCCCCCCCCCCC5668????????E8????????8B??56E8????????56E8????????83????5E85??75??C3E8????????CC000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 } condition: all of them }

You can manually validate the rule matches with YARA as expected before retrohunting.

rule0 [description="Autogenerated by Binsequencer v.1.0.4 from Hancitor_Malware/fff786ec23e6385e1d4f06dcf6859cc2ce0a32cee46d8f2a0c8fd780b3ecf89a_S1.exe",author="",date="2018-06-05"] Hancitor_Malware//e7b3ef04c211fafa36772da62ab1d250970d27745182d0f3736896cf7673dc3a_S1.exe 0x2200:$rule0_bytes: 55 8B EC 83 EC 1C 57 FF 75 0C FF 75 08 E8 7E 00 00 00 8B F8 83 C4 08 85 FF 74 44 56 8B 77 0C 8D ... rule0 [description="Autogenerated by Binsequencer v.1.0.4 from Hancitor_Malware/fff786ec23e6385e1d4f06dcf6859cc2ce0a32cee46d8f2a0c8fd780b3ecf89a_S1.exe",author="",date="2018-06-05"] Hancitor_Malware//6e73879ca49b40974cce575626e31541b49c07daa12ec2e9765c432bfac07a20_S1.exe 0x2200:$rule0_bytes: 55 8B EC 83 EC 1C 57 FF 75 0C FF 75 08 E8 7E 00 00 00 8B F8 83 C4 08 85 FF 74 44 56 8B 77 0C 8D ... rule0 [description="Autogenerated by Binsequencer v.1.0.4 from Hancitor_Malware/fff786ec23e6385e1d4f06dcf6859cc2ce0a32cee46d8f2a0c8fd780b3ecf89a_S1.exe",author="",date="2018-06-05"] Hancitor_Malware//87a10cc169f9ffd0c75bb9846a99fb477fc4329840964b02349ae44a672729c2_S1.exe 0x2200:$rule0_bytes: 55 8B EC 83 EC 1C 57 FF 75 0C FF 75 08 E8 7E 00 00 00 8B F8 83 C4 08 85 FF 74 44 56 8B 77 0C 8D ... ... rule0 [description="Autogenerated by Binsequencer v.1.0.4 from Hancitor_Malware/fff786ec23e6385e1d4f06dcf6859cc2ce0a32cee46d8f2a0c8fd780b3ecf89a_S1.exe",author="",date="2018-06-05"] Hancitor_Malware//643951eee2dac8c3677f5ef7e9cb07444f12d165f6e401c1cd7afa27d7552367_S1.exe 0x2200:$rule0_bytes: 55 8B EC 83 EC 1C 57 FF 75 0C FF 75 08 E8 7E 00 00 00 8B F8 83 C4 08 85 FF 74 44 56 8B 77 0C 8D ... rule0 [description="Autogenerated by Binsequencer v.1.0.4 from Hancitor_Malware/fff786ec23e6385e1d4f06dcf6859cc2ce0a32cee46d8f2a0c8fd780b3ecf89a_S1.exe",author="",date="2018-06-05"] Hancitor_Malware//1c72f575d0c9574afcfcaab7e0b89fe0083dbe8ac20c0132a978eb1f6be59641_S1.exe 0x2200:$rule0_bytes: 55 8B EC 83 EC 1C 57 FF 75 0C FF 75 08 E8 7E 00 00 00 8B F8 83 C4 08 85 FF 74 44 56 8B 77 0C 8D ... rule0 [description="Autogenerated by Binsequencer v.1.0.4 from Hancitor_Malware/fff786ec23e6385e1d4f06dcf6859cc2ce0a32cee46d8f2a0c8fd780b3ecf89a_S1.exe",author="",date="2018-06-05"] Hancitor_Malware//ab90ed6cb461f17ce1f901097a045aba7c984898a0425767f01454689698f2e9_S1.exe 0x2200:$rule0_bytes: 55 8B EC 83 EC 1C 57 FF 75 0C FF 75 08 E8 7E 00 00 00 8B F8 83 C4 08 85 FF 74 44 56 8B 77 0C 8D ...

Below is another examples where I utilize some of the tuning features of the script. Specifically, the match must exist in at least 75% of the samples, it will include strings that exist across all 75% of them, it will skip the first two matching techniques, and it will set Capstone to x64 architecture.

$ python binsequencer.py -c 75 -s -o -a x64 DarkComet_x64/ [+] Extracting instructions and generating sets [-]DarkComet_x64/d4e28eebd4f485496590e55e245a22b17d58414ae22f5186da90b0cc8d4d2006 .text - 16007 instructions extracted [-]DarkComet_x64/8841880e35b0936d7768e489f386e259970c98489a7ddd00e693797a394b4e39 .text - 16340 instructions extracted [-]DarkComet_x64/c8dcafd948f8ae3b60c5d89365ad2fe328ceb74820fc1334e9d4853a3ad4c3e8 .text - 16340 instructions extracted ... [-]DarkComet_x64/24a09b9dadb7471f29e1d2af22219c8fb94a2643bf481228ac60f48f55fa4e5f .text - 16340 instructions extracted [-]DarkComet_x64/1593b9482d71a52917eb1d3104eba822b93798eb812010afbbb50ab857fcbfab .text - 16340 instructions extracted [-]DarkComet_x64/2022ec23240d89da591ec57db5507c41fb3335253f6e1eb9306598d4a8255ef9 .text - 16340 instructions extracted [+] Golden hash (16340 instructions) - DarkComet_x64/8841880e35b0936d7768e489f386e259970c98489a7ddd00e693797a394b4e39 [+] Zeroing in longest mnemonic instruction set in .text [-] Matches - 0 Block Size - 4000 Time - 2.94 seconds [-] Matches - 0 Block Size - 2000 Time - 2.44 seconds [-] Matches - 0 Block Size - 1000 Time - 1.85 seconds [-] Matches - 333 Block Size - 500 Time - 2.50 seconds [-] Matches - 39 Block Size - 750 Time - 1.86 seconds [-] Matches - 0 Block Size - 875 Time - 1.79 seconds [-] Matches - 0 Block Size - 813 Time - 1.76 seconds [-] Matches - 7 Block Size - 782 Time - 1.78 seconds [-] Matches - 0 Block Size - 797 Time - 1.75 seconds [-] Matches - 0 Block Size - 790 Time - 1.76 seconds [-] Matches - 2 Block Size - 787 Time - 1.75 seconds [-] Matches - 2 Block Size - 787 Time - 1.75 seconds [-] Matches - 1 Block Size - 788 Time - 1.75 seconds [-] Matches - 0 Block Size - 789 Time - 1.74 seconds [-] Moving 1 instruction sets to review with a length of 788 [*] Do you want to display matched instruction set? [Y/N] y add|add|add|add|add|add|add|add|outsb|push|jb|.byte|jae|push|outsd|imul|add|je|outsd|imul|outsb|outsw|jb|.byte|je|outsd|outsb|add|push|push|je|.byte|insb|jne|js|add|.byte|add|jne|insb|push|imul|push|jne|jb|push|.byte|insb|jne|js|add|xchg|add|imul|jb|jbe|insb|push|.byte|insb|jne|add|add|jb|.byte|je|jns|js|add|push|jo|outsb|jns|js|add|push|push|jne|jb|outsb|outsw|jns|add|add|insb|je|push|.byte|insb|jne|add|add|insb|outsd|.byte|.byte|je|outsb|outsb|imul|push|imul|push|imul|jne|je|outsd|imul|jb|jbe|insb|jae|xor|push|insb|outsd|jae|jns|push|push|xor|insb|insb|add|push|add|.byte|insd|jo|add|.byte|add|insb|outsb|add|add|imul|add|imul|mov|je|jne|jb|outsb|je|jb|.byte|jae|add|add|outsd|.byte|.byte|insb|outsd|.byte|imul|add|outsd|jae|add|add|jo|outsb|outsb|jbe|jb|outsb|insd|outsb|je|je|imul|add|je|imul|imul|jns|add|mov|.byte|.byte|insb|insb|insb|outsd|.byte|add|je|jb|jbe|je|push|jb|imul|add|add|je|imul|.byte|jne|jae|add|add|jae|push|.byte|.byte|.byte|jns|add|jbe|je|jns|je|insd|imul|jns|add|add|outsd|.byte|.byte|insb|push|outsb|insb|outsd|.byte|imul|add|je|push|.byte|je|.byte|.byte|insd|add|jb|.byte|je|imul|jns|add|cmp|imul|je|imul|je|.byte|jae|jb|outsd|jb|add|je|jb|.byte|jb|jae|add|add|insd|outsd|jbe|imul|jns|add|insb|add|je|imul|.byte|jne|jae|add|ret|insb|outsd|.byte|.byte|insb|jb|add|xor|imul|add|je|jb|jbe|je|push|jb|imul|imul|add|imul|add|add|.byte|insb|insb|insb|outsd|.byte|add|cmp|push|jb|jbe|je|push|jb|imul|imul|add|je|outsd|jne|imul|add|add|outsb|js|imul|outsd|insd|jo|jb|push|je|imul|add|outsb|add|push|add|outsd|jae|.byte|.byte|outsb|insb|add|outsd|.byte|.byte|insb|jb|add|add|je|imul|js|je|jb|.byte|jae|add|add|.byte|.byte|je|push|imul|insb|push|imul|jb|.byte|je|imul|imul|jne|.byte|add|je|imul|jb|add|jb|push|jae|jne|.byte|add|add|push|jae|jne|.byte|add|or|outsd|jb|imul|push|.byte|je|add|je|jbe|outsb|je|add|add|je|outsd|jne|.byte|.byte|.byte|outsb|insb|push|add|.byte|add|jb|.byte|je|jae|.byte|add|js|push|je|imul|xor|push|jb|je|imul|je|jb|jbe|push|jns|add|push|outsd|insb|jne|outsb|outsw|jb|.byte|je|outsd|outsb|add|add|jb|imul|jb|.byte|add|outsd|push|jae|jne|.byte|add|.byte|add|.byte|.byte|je|jbe|outsb|je|add|out|je|js|je|outsd|push|jb|.byte|jae|add|movsb|add|.byte|.byte|je|push|jb|.byte|jae|add|ret|add|.byte|imul|je|jne|jb|outsb|je|imul|jns|add|mov|je|insd|jo|imul|add|adc|jae|je|jbe|outsb|je|add|outsd|.byte|imul|outsd|jne|.byte|add|jp|je|jns|je|insd|outsb|outsw|add|add|imul|js|add|wait|add|.byte|.byte|je|jne|js|add|.byte|add|je|jne|jb|outsb|je|imul|jns|add|stosd|add|je|jb|imul|stosb|add|je|jb|imul|je|insd|jo|.byte|je|add|mov|jb|.byte|je|push|push|add|add|.byte|insb|imul|outsd|imul|rol|jo|push|xor|insb|insb|add|retf|add|je|jbe|.byte|.byte|.byte|.byte|jo|add|xor|insb|insb|add|insb|je|insd|push|js|add|jae|je|jo|imul|fiadd|outsb|imul|add|.byte|jb|jb|jbe|add|add|imul|jae|js|.byte|add|.byte|jb|js|add|add|.byte|jb|jo|jb|add|adc|jae|.byte|jo|std|add|push|je|imul|je|insb|je|insd|push|js|add|insb|outsd|outsd|js|outsb|imul|jb|insd|add|insb|push|imul|outsd|.byte|add|add|.byte|.byte|insb|push|imul|rol|push|je|imul|js|add|scasb|add|jo|je|push|.byte|add|retf|je|imul|outsb|push|je|add|add|outsd|ja|imul|.byte|add|je|imul|jae|add|add|je|insb|je|insd|add|imul|.byte|jae|add|add|imul|jae|add|adc|jae|.byte|outsd|js|add|add|je|add|push|outsb|jae|.byte|add|cdq|add|je|outsd|jb|jb|jne|push|imul|add|push|.byte|imul|je|jo|.byte|.byte|.byte|push|.byte|je|add|push|outsb|insb|je|insd|jae|.byte|add|push|imul|outsb|push|je|add|push|imul|.byte|je|push|push|push|xor|insb|insb|add|push|add|jae|jo|imul|add [*] Do you want to disassemble the underlying bytes? [Y/N] y 0x1000dbbb: add byte ptr [rax], al | 0000 0x1000dbbd: add byte ptr [rax], al | 0000 0x1000dbbf: add byte ptr [rax], al | 0000 0x1000dbc1: add byte ptr [rax], al | 0000 0x1000dbc3: add byte ptr [rax], al | 0000 0x1000dbc5: add byte ptr [rax], al | 0000 0x1000dbc7: add bh, dh | 00F7 0x1000dbc9: add dword ptr [rdi + 0x70], ecx | 014F70 0x1000dbcc: outsb dx, byte ptr gs:[rsi] | 656E ... 0x1000e3b4: insb byte ptr [rdi], dx | 2E646C 0x1000e3b7: insb byte ptr [rdi], dx | 6C 0x1000e3b8: add byte ptr [rax], al | 0000 0x1000e3ba: push rdx | 52 0x1000e3bb: add ebx, dword ptr [rdi + 0x76] | 035F76 0x1000e3be: jae 0x1000e42e | 736E 0x1000e3c0: jo 0x1000e434 | 7072 0x1000e3c2: imul ebp, dword ptr [rsi + 0x74], 0x71000066 | 696E7466000071 0x1000e3c9: add byte ptr [rdi + 0x5f], bl | 005F5F [*] Do you want to display the raw byte blob? [Y/N] y 00000000000000000000000000F7014F70656E50726F63 ... [*] Do you want to keep this set? [Y/N] y [+] Keeping 1 mnemonic set using 75 % commonality out of 50 hashes [-] Length - 788 Section - .text [+] Printing offsets of type: longest [-] Gold matches ----------v SET rule0 v---------- add|add|add|add|add|add|add|add|outsb|push|jb|.byte|jae|push|outsd|imul|add|je|outsd|imul|outsb|outsw|jb|.byte|je|outsd|outsb|add|push|push|je|.byte|insb|jne|js|add|.byte|add|jne|insb|push|imul|push|jne|jb|push|.byte|insb|jne|js|add|xchg|add|imul|jb|jbe|insb|push|.byte|insb|jne|add|add|jb|.byte|je|jns|js|add|push|jo|outsb|jns|js|add|push|push|jne|jb|outsb|outsw|jns|add|add|insb|je|push|.byte|insb|jne|add|add|insb|outsd|.byte|.byte|je|outsb|outsb|imul|push|imul|push|imul|jne|je|outsd|imul|jb|jbe|insb|jae|xor|push|insb|outsd|jae|jns|push|push|xor|insb|insb|add|push|add|.byte|insd|jo|add|.byte|add|insb|outsb|add|add|imul|add|imul|mov|je|jne|jb|outsb|je|jb|.byte|jae|add|add|outsd|.byte|.byte|insb|outsd|.byte|imul|add|outsd|jae|add|add|jo|outsb|outsb|jbe|jb|outsb|insd|outsb|je|je|imul|add|je|imul|imul|jns|add|mov|.byte|.byte|insb|insb|insb|outsd|.byte|add|je|jb|jbe|je|push|jb|imul|add|add|je|imul|.byte|jne|jae|add|add|jae|push|.byte|.byte|.byte|jns|add|jbe|je|jns|je|insd|imul|jns|add|add|outsd|.byte|.byte|insb|push|outsb|insb|outsd|.byte|imul|add|je|push|.byte|je|.byte|.byte|insd|add|jb|.byte|je|imul|jns|add|cmp|imul|je|imul|je|.byte|jae|jb|outsd|jb|add|je|jb|.byte|jb|jae|add|add|insd|outsd|jbe|imul|jns|add|insb|add|je|imul|.byte|jne|jae|add|ret|insb|outsd|.byte|.byte|insb|jb|add|xor|imul|add|je|jb|jbe|je|push|jb|imul|imul|add|imul|add|add|.byte|insb|insb|insb|outsd|.byte|add|cmp|push|jb|jbe|je|push|jb|imul|imul|add|je|outsd|jne|imul|add|add|outsb|js|imul|outsd|insd|jo|jb|push|je|imul|add|outsb|add|push|add|outsd|jae|.byte|.byte|outsb|insb|add|outsd|.byte|.byte|insb|jb|add|add|je|imul|js|je|jb|.byte|jae|add|add|.byte|.byte|je|push|imul|insb|push|imul|jb|.byte|je|imul|imul|jne|.byte|add|je|imul|jb|add|jb|push|jae|jne|.byte|add|add|push|jae|jne|.byte|add|or|outsd|jb|imul|push|.byte|je|add|je|jbe|outsb|je|add|add|je|outsd|jne|.byte|.byte|.byte|outsb|insb|push|add|.byte|add|jb|.byte|je|jae|.byte|add|js|push|je|imul|xor|push|jb|je|imul|je|jb|jbe|push|jns|add|push|outsd|insb|jne|outsb|outsw|jb|.byte|je|outsd|outsb|add|add|jb|imul|jb|.byte|add|outsd|push|jae|jne|.byte|add|.byte|add|.byte|.byte|je|jbe|outsb|je|add|out|je|js|je|outsd|push|jb|.byte|jae|add|movsb|add|.byte|.byte|je|push|jb|.byte|jae|add|ret|add|.byte|imul|je|jne|jb|outsb|je|imul|jns|add|mov|je|insd|jo|imul|add|adc|jae|je|jbe|outsb|je|add|outsd|.byte|imul|outsd|jne|.byte|add|jp|je|jns|je|insd|outsb|outsw|add|add|imul|js|add|wait|add|.byte|.byte|je|jne|js|add|.byte|add|je|jne|jb|outsb|je|imul|jns|add|stosd|add|je|jb|imul|stosb|add|je|jb|imul|je|insd|jo|.byte|je|add|mov|jb|.byte|je|push|push|add|add|.byte|insb|imul|outsd|imul|rol|jo|push|xor|insb|insb|add|retf|add|je|jbe|.byte|.byte|.byte|.byte|jo|add|xor|insb|insb|add|insb|je|insd|push|js|add|jae|je|jo|imul|fiadd|outsb|imul|add|.byte|jb|jb|jbe|add|add|imul|jae|js|.byte|add|.byte|jb|js|add|add|.byte|jb|jo|jb|add|adc|jae|.byte|jo|std|add|push|je|imul|je|insb|je|insd|push|js|add|insb|outsd|outsd|js|outsb|imul|jb|insd|add|insb|push|imul|outsd|.byte|add|add|.byte|.byte|insb|push|imul|rol|push|je|imul|js|add|scasb|add|jo|je|push|.byte|add|retf|je|imul|outsb|push|je|add|add|outsd|ja|imul|.byte|add|je|imul|jae|add|add|je|insb|je|insd|add|imul|.byte|jae|add|add|imul|jae|add|adc|jae|.byte|outsd|js|add|add|je|add|push|outsb|jae|.byte|add|cdq|add|je|outsd|jb|jb|jne|push|imul|add|push|.byte|imul|je|jo|.byte|.byte|.byte|push|.byte|je|add|push|outsb|insb|je|insd|jae|.byte|add|push|imul|outsb|push|je|add|push|imul|.byte|je|push|push|push|xor|insb|insb|add|push|add|jae|jo|imul|add ----------^ SET rule0 ^----------- DarkComet_x64/8841880e35b0936d7768e489f386e259970c98489a7ddd00e693797a394b4e39 0x1000dbbb - 0x1000e3c9 in .text [-] Remaining matches ----------v SET rule0 v---------- DarkComet_x64/d4e28eebd4f485496590e55e245a22b17d58414ae22f5186da90b0cc8d4d2006 0x1000d707 - 0x1000df19 in .text DarkComet_x64/c8dcafd948f8ae3b60c5d89365ad2fe328ceb74820fc1334e9d4853a3ad4c3e8 0x1000dbbb - 0x1000e3c9 in .text DarkComet_x64/3faaac0cc5a64fa2a14e2d733cd10faf93fcaa05f7a5bf60bbf4540b95c37bf9 0x1000dbbb - 0x1000e3c9 in .text DarkComet_x64/d5134afb4b67af145384a5ca68dac7345e375c4c97a5d1bdd9d0419a20c72c15 0x1000d707 - 0x1000df19 in .text ... DarkComet_x64/4c70caa63cce7fcc4e66d2696ffb1844536b353775b874458549957b19cedbdf 0x1000d707 - 0x1000df19 in .text DarkComet_x64/95216ff79bd77eada3004203ccf8ba3eee2cfbe3800b3227c3fe0b2f12f32362 0x1000dbbb - 0x1000e3c9 in .text DarkComet_x64/24a09b9dadb7471f29e1d2af22219c8fb94a2643bf481228ac60f48f55fa4e5f 0x1000dbbb - 0x1000e3c9 in .text DarkComet_x64/1593b9482d71a52917eb1d3104eba822b93798eb812010afbbb50ab857fcbfab 0x1000dbbb - 0x1000e3c9 in .text DarkComet_x64/2022ec23240d89da591ec57db5507c41fb3335253f6e1eb9306598d4a8255ef9 0x1000dbbb - 0x1000e3c9 in .text ----------^ SET rule0 ^----------- [+] Generating YARA rule for matches off of bytes from gold - DarkComet_x64/8841880e35b0936d7768e489f386e259970c98489a7ddd00e693797a394b4e39 [*] Do you want to try and morph rule0 for accuracy and attempt to make it VT Retro friendly [Y/N] y [+] Check 03 - Dynamically morphing YARA rule0 [*] Dynamic morphing succeeded [*] Do you want to include matched sample names in rule meta? [Y/N] y [*] Do you want to include matched byte sequence in rule comments? [Y/N] y [+] Completed YARA rules /* SAMPLES: DarkComet_x64/84bc7ecfe0a9170892b9a2ebd0fc88a41b33951db3b57b7c51faed55930b3a26 DarkComet_x64/dacf28a75841b59284a14e8ad87b3a9dd93edca75adf2bd651d8de684ab1b53a DarkComet_x64/e1000ad839f519cc74b40013bea2c294b85a02460f860f58665e48ab4919ed90 DarkComet_x64/cb8e7cd1401d263c935add9d31738df96832df039bec31a9209b09fcf90c3c5a ... DarkComet_x64/422fc02a254a3e60336932cfe67121766fcefd7da4c734e8d0660d692b8a439b DarkComet_x64/c69a7d7f88c16345afa4ed7b6b456a286ab64df4bdbeef00e39b212f2884d6ca DarkComet_x64/2a19a370ae166a6e3f184031f9dc9a94d158317fbae50ca682c6e07e0acb537a DarkComet_x64/d003ec22e4d9c86f106c8d5d6e2c8788fdc158c78c1f98b452c872370cb4176e BYTES: 00000000000000000000000000F7014F70656E50726F63 ... INFO: binsequencer.py -c 75 -s -o -a x64 DarkComet_x64/ Match SUCCESS for morphing */ rule rule0 { meta: description = "Autogenerated by Binsequencer v.1.0.4 from DarkComet_x64/8841880e35b0936d7768e489f386e259970c98489a7ddd00e693797a394b4e39" author = "" date = "2018-06-05" strings: $rule0_bytes = { 000000000000000000000000????01????656E5072????6573??546F6B????????????4765????6F6B??????6E666F72????74??6F6E??????5265????6574????6C75??45????0000??01????75??6C5369??????????????67??75??72??56??6C75??45????00009601??????6B??????72??76??6C65????????6C75????????02????6743??????74??4B65????78????????5265674F????6E4B65????78????????5265????75??72??49??666F4B65????000047??????674465??6574??56??6C75??????????????6C6F????74??41??6449??69??????????????5369??????????????65????69??????????????75??74??6F6B??????72??76??6C656765????30??52656743??6F73??4B65????41????41??49????2E64??6C00005405??????????6D70??0000??05????????6C656E??????4B??????????6565??????01????6565????????????????C6????6574??75??72??6E74??72????6573??????02????6F????6C4C????6B????49??????????6F73??????01????70??6E6445??76??72??6E6D656E74??74??69????????????02????74??69????????????69????????????79??0000BB????????????6C41??6C6F????????4765????72??76??74??5072??6669????????????????01????74??69????????????????75??6573??0000????49????42??????????6442????65????76??4765????79??74??6D44??????????????79??????02????6F????6C556E6C6F??6B????6702????74??68??????????74??????6D65??????????72????74??44??????????????79??000038??46??????????????74??69??????????????4765??????73??45????6F72??????????6574??72????4164??????73??000003????656D6F76??44??????????????79??00006C04??6574??69????????????????75??6573??0000C2????6C6F????6C46????65????34??46??????????????65??????4765????72??76??74??5072??6669??????????69????????????03??????644C??????????????000046??????????6C41??6C6F??000039??????????65??72??76??74??5072??6669??????????69????????????02????74??6F6475??6546????????????????000049??????6E644E??????46????????????????6F6D70??72??5374??69????????????05????????656E000052??????6F73??????6E646C65??????4C??????6C46????65????????????6574??46????????????????78??74??72????6573??????????????????74??5469????????????6C65??69????????????72????74??46????????????????69????????????75????65??????????6574??69??????????????6572????????46????65??6573??75????65????43????????64??6573??75????65????08??????????46??72??69????????????6A????74??6704??6574??76??6E74??????02????74??6F6475????????6E646C65??0000??01????72????74??6573????6765??????78??536574??69??????????????34??5772??74??46????????????????74??72??76??5479??65??????????????566F6C75??6549??666F72????74??6F6E??????04??6572??69????????????72????64????????????656F66??6573??75????65????????????????74??45????6E74??0000E6??4765????78??74??6F64????72????6573??0000A4??????????74??5072????6573????????C303??????6446????????????????6574??75??72??6E74??69????????????79??000089??4765????656D70??69??????????????000012????6573??74??76??6E74????????4C????6B??????6F75????65????7A??4765????79??74??6D49??666F????03??????644C??????????????78??00009B??????????74??4D????6578??0000??01????74??75??72??6E74??69????????????79??0000AB02????74??6572??69????????????AA02????74??6572??69????????????4765????656D70????74????????B4??43??????74??5468????????000048??????????6C46????????????????6F46????????????????C0??????6565????4B????4E????????2E64??6C0000CB01????74??6576??????????70????????49????2E64??6C????????????44??6749????6D546578????????????????4465????74??70??69????????????DA??45??6444??????????????????????72??72??76????????????69??????????????73??78????????????72??6578????????????????72??70??6572??000011??4D65??????676542????????FD01??????64??74??69????????????4765????6C6749????6D546578????????????????6C6F6742??78??6E6469????????????72??6D??????????????6C5769????????????6F??????????????????6C65??69????????????D2??536574??69????????????78????????AE????????70??74??68??????????6765??????CA????6574??69????????????6E67??74????????02????6F77??69??????????????02????74??69????????????73??????01????74??6C6749????6D000069????????????73??????????02????656B??????73??6765??????12??4D65??????676542??78??????01????74??????????53656E644D????????6765??????9902????74??6F72??6772??75??64??69????????????02????67????69??????????????74??70????????6A????74????????53656E6444??6749????6D4D65??????6765??????????????5769????????????6E67??74????????????????5769??????????????74??555345??33??2E64??6C00005203????73??70??69?????????????????? } $string_0 = { 21546869732070726F6772616D2063616E6E6F742062652072756E20696E20444F53206D6F64652E } // !This program cannot be run in DOS mode. $string_1 = { 52696368 } // Rich $string_2 = { 2E74657874 } // .text $string_3 = { 602E64617461 } // `.data $string_4 = { 2E7064617461 } // .pdata $string_5 = { 402E72737263 } // @.rsrc $string_6 = { 402E72656C6F63 } // @.reloc $string_7 = { 41445641504933322E646C6C } // ADVAPI32.dll ... $string_1429 = { 5704505805405204104305402E04505804502002002002002002002002002002002002002E04D0550490 } // WEXTRACT.EXE .MUI $string_1430 = { 5007206F06407506307404E06106D0650 } // ProductName $string_1431 = { 5706906E06406F0770730 } // Windows $string_1432 = { 2004906E07406507206E06507402004507807006C06F0720650720 } // Internet Explorer $string_1433 = { 5007206F06407506307405606507207306906F06E0 } // ProductVersion $string_1434 = { 5606107204606906C06504906E06606F0 } // VarFileInfo $string_1435 = { 5407206106E07306C06107406906F06E0 } // Translation condition: all of them }

That about sums it up. Hopefully someone finds it useful but if not, c'est la vie.

The GIT for this script is up. Enjoy.

Getting phished sucks. Getting phished really sucks when you've spent significant amounts of time analyzing phishing attacks only to end up falling prey to one anyway. It happens, this is my story...

If you're not familiar with who Shane Missler is, he's a 20 year old who recently won a Mega Millions jackpot. A HUGE jackpot, one of the largest they've ever had. Also being in Florida with Shane, he dominated our local news coverage for a short period of time. One thing that kept reoccurring was that he kept saying he wanted to help people and do good with his money. Cool, that's a nice thing to do and I wish him the best of luck and then I moved on to real news.

A few days later though, I saw a Tweet pop-up on my feed from an account claiming to be Shane, created in April of 2016, stating that he wanted to give back to everyone and offered USD $5,000 to the first 50,000 people who retweeted his message. Remembering his repeated messaging of wanting to do good I said "sure, what's the harm in retweeting?" and did so. I figured, if it's a fake account than I'll just unfollow and that'll be the end of it.

Fast forward another couple of days and I see a new Tweet pop-up on my feed, again from Shane. This time he says he's hired a company to put together a website to process the payments for the 50,000 people who met the requirement. I thought, "no way..." and went to the Twitter account. It looked like I remember it looking like when I saw it in the media and I started browsing his tweets.

They were thoughtful and offering positive messages with seemingly a lot of engagements. Huh, "I'll be damned!" I thought, this guy is actually doing it.

Now, in hindsight, besides all of the obvious red flags I even acknowledged and willfully ignored as this phish built-up, the basic math of it all should have been a no-brainer. At USD $5,000, across 50,000 people, you have USD $250M dollars which, again in hindsight, was far more than I knew Shane had since he took the cash payout which was significantly less than what he won.

So, I took the bait. I clicked on the website and it had some fancy JavaScript, a pretty background, and three different "payout" options: "PayPal", "Venmo", or "Check". I was surprised he had a physical cheque listed as an option but it added more credibility, in my mind at the time, that this guy might actually be serious if he's willing to mail it.

Naturally, I opted for PayPal. Now, the information asked for felt off, but as I do a lot of PayPal and they weren't asking for things that weren't already in public domain or easily Googleable, I let the little devil on my left shoulder shut the little angel on the right out - "this is totally going to be legit" as I filled in my name, address, and e-mail with thoughts dancing through my head about getting my entire family to sign-up ASAP!

At this point, it does some "checks" to "verify" your eligibility and, again, I thought "this dude is crazy but hey I'm all about that free money!".

The deeper I went into this rabbit hole, the more I self-convinced myself to ignore all of the glaringly obvious red flags.

Next up, it says it needs to verify you're a human on the totally legit site "areyouahuman[.]co". Sounds reasonable, we definitely don't want to give money away to a bot so let's see what we have to do...

Now, mind you, I was doing this all on my phone while also preoccupied with something else so when the "verify you're a human" page came up and said I needed to install 4 Apps on my phone and let them run for 30 seconds, it made me stop what I was doing and take pause. What the hell kind of verification is this? How does that even work? Is this malware? What are these apps? Are they trying to generate money to cover some of the costs of sending out USD $5,000 to every person? This last question was, if you haven't already figured it out, somewhat true. The apps were all legitimate, Google validated, and very popular games on the Google Play store. Regardless, I trudged on due to greed, played a game of Solitaire, and pondered why I was letting myself be fooled by such an obvious fraud.

I decided to skip ahead, dread beginning to rise, to the Amazon voucher. It was a Amazon survey for $1,000 which was the final straw, as there is no way it's tied to human verification. I decided to go back to Twitter and confirm my fears; almost every subtweet to the original was along the lines of "THIS IS FAKE!!!". Red faced, annoyed, had, I decided to figure out just exactly what I got myself into it.

First up, I confirmed it didn't matter what options I picked, what bullshit I filled in, I would "verify" and get sent over to the "areyouahuman[.]co" site. On the PC, the entries were of course different so they are doing some device/source detection and redirecting based on that. I don't think this site is necessarily related to the other, but you can clearly see the affiliate ID at the top.

Going through the source code on the page, it luckily appears to just be a scheme to generate revenue and using the affiliate ID for tracking. The person behind the ID would get cash for each successful app install, links clicked, and surveys taken thus making me a pawn in their game. The links all followed a similar pattern of using this site "jump[.]ogtrk[.]net" preceeded by the affiliate ID and whatever the AD is, as shown below:

<a href="https://jump[.]ogtrk[.]net/aff_c?offer_id=12646&aff_id=9480&aff_sub=e4484e0556d6808de3acde38d3a1925f&aff_sub2=0vhEVTB6vnEGatmzW%2Fui5lHzWFPiC5VSPb13HcDHnP63DDN5M9CkPHxxshjLpbVIb4ED5sdZ9sQ88M7imgoUew%3D%3D&aff_sub3=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzUxMiJ9.eyJpYXQiOjE1MTY1NjU4ODYsImp0aSI6InlPc21WZ1hRbU9iNmNLVXEwcGRXN1R2OGFoVDFRbUlGUjEzWndFWXZTUFk9IiwiaXNzIjoib2dhZHMiLCJuYmYiOjE1MTY1NjU4ODYsImV4cCI6MTUxOTE1Nzg4NiwiZGF0YSI6eyJpcCI6Ijk2LjU4LjIyOC4xMDYiLCJyZWYiOiJhSFIwY0RvdkwzTm9ZVzVsYldsemMyeGxjbVp2ZFc1a1lYUnBiMjR1WTI5dEx3PT0iLCJ1YSI6Ik1vemlsbGFcLzUuMCAoTWFjaW50b3NoOyBJbnRlbCBNYWMgT1MgWCAxMF8xM18yKSBBcHBsZVdlYktpdFwvNTM3LjM2IChLSFRNTCwgbGlrZSBHZWNrbykgQ2hyb21lXC82My4wLjMyMzkuMTMyIFNhZmFyaVwvNTM3LjM2In19.F7XTVXlMvwcbLMAeBULRcmX1JAuwIl55HDSagOoE64eZTQZRe2pmMsMYT2QWiha1IyYoHd3TAMiyyPXW8ozWsw&aff_sub4=&aff_sub5=" target="_blank"><li><span class="lc-checks__feature ">Win A $1000 Amazon Voucher</span></li></a> <a href="https://jump[.]ogtrk[.]net/aff_c?offer_id=12306&aff_id=9480&aff_sub=e4484e0556d6808de3acde38d3a1925f&aff_sub2=0vhEVTB6vnEGatmzW%2Fui5lHzWFPiC5VSPb13HcDHnP63DDN5M9CkPHxxshjLpbVIb4ED5sdZ9sQ88M7imgoUew%3D%3D&aff_sub3=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzUxMiJ9.eyJpYXQiOjE1MTY1NjU4ODYsImp0aSI6InlPc21WZ1hRbU9iNmNLVXEwcGRXN1R2OGFoVDFRbUlGUjEzWndFWXZTUFk9IiwiaXNzIjoib2dhZHMiLCJuYmYiOjE1MTY1NjU4ODYsImV4cCI6MTUxOTE1Nzg4NiwiZGF0YSI6eyJpcCI6Ijk2LjU4LjIyOC4xMDYiLCJyZWYiOiJhSFIwY0RvdkwzTm9ZVzVsYldsemMyeGxjbVp2ZFc1a1lYUnBiMjR1WTI5dEx3PT0iLCJ1YSI6Ik1vemlsbGFcLzUuMCAoTWFjaW50b3NoOyBJbnRlbCBNYWMgT1MgWCAxMF8xM18yKSBBcHBsZVdlYktpdFwvNTM3LjM2IChLSFRNTCwgbGlrZSBHZWNrbykgQ2hyb21lXC82My4wLjMyMzkuMTMyIFNhZmFyaVwvNTM3LjM2In19.F7XTVXlMvwcbLMAeBULRcmX1JAuwIl55HDSagOoE64eZTQZRe2pmMsMYT2QWiha1IyYoHd3TAMiyyPXW8ozWsw&aff_sub4=&aff_sub5=" target="_blank"><li><span class="lc-checks__feature ">What Is Your Favorite Starbucks Coffee?</span></li></a> <a href="https://jump[.]ogtrk[.]net/aff_c?offer_id=11916&aff_id=9480&aff_sub=e4484e0556d6808de3acde38d3a1925f&aff_sub2=0vhEVTB6vnEGatmzW%2Fui5lHzWFPiC5VSPb13HcDHnP63DDN5M9CkPHxxshjLpbVIb4ED5sdZ9sQ88M7imgoUew%3D%3D&aff_sub3=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzUxMiJ9.eyJpYXQiOjE1MTY1NjU4ODYsImp0aSI6InlPc21WZ1hRbU9iNmNLVXEwcGRXN1R2OGFoVDFRbUlGUjEzWndFWXZTUFk9IiwiaXNzIjoib2dhZHMiLCJuYmYiOjE1MTY1NjU4ODYsImV4cCI6MTUxOTE1Nzg4NiwiZGF0YSI6eyJpcCI6Ijk2LjU4LjIyOC4xMDYiLCJyZWYiOiJhSFIwY0RvdkwzTm9ZVzVsYldsemMyeGxjbVp2ZFc1a1lYUnBiMjR1WTI5dEx3PT0iLCJ1YSI6Ik1vemlsbGFcLzUuMCAoTWFjaW50b3NoOyBJbnRlbCBNYWMgT1MgWCAxMF8xM18yKSBBcHBsZVdlYktpdFwvNTM3LjM2IChLSFRNTCwgbGlrZSBHZWNrbykgQ2hyb21lXC82My4wLjMyMzkuMTMyIFNhZmFyaVwvNTM3LjM2In19.F7XTVXlMvwcbLMAeBULRcmX1JAuwIl55HDSagOoE64eZTQZRe2pmMsMYT2QWiha1IyYoHd3TAMiyyPXW8ozWsw&aff_sub4=&aff_sub5=" target="_blank"><li><span class="lc-checks__feature ">Win Samsung S8</span></li></a>

The next logical question then was, "who the hell is the man behind the curtain?".

A quick WHOIS didn't provide any useful information. The domain was created fairly recently, which would make sense, but otherwise it had the usual GoDaddy abuse information; however, there was a Registrant Name which would be useful.

Domain Name: SHANEMISSLERFOUNDATION[.]COM Registry Domain ID: 2216035150_DOMAIN_COM-VRSN Registrar WHOIS Server: whois.godaddy.com Registrar URL: http://www.godaddy.com Updated Date: 2018-01-21T01:17:36Z Creation Date: 2018-01-21T01:17:35Z Registry Expiry Date: 2019-01-21T01:17:35Z Registrar: GoDaddy.com, LLC Registrar IANA ID: 146 Registrar Abuse Contact Email: abuse@godaddy.com Registrar Abuse Contact Phone: 480-624-2505 Domain Status: clientDeleteProhibited https://icann.org/epp#clientDeleteProhibited Domain Status: clientRenewProhibited https://icann.org/epp#clientRenewProhibited Domain Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited Domain Status: clientUpdateProhibited https://icann.org/epp#clientUpdateProhibited Name Server: NS03.DOMAINCONTROL.COM Name Server: NS04.DOMAINCONTROL.COM DNSSEC: unsigned URL of the ICANN Whois Inaccuracy Complaint Form: https://www.icann.org/wicf/ >>> Last update of whois database: 2018-01-21T20:27:59Z <<< Domain Name: shanemisslerfoundation[.]com Registrar URL: http://www.godaddy.com Registrant Name: Sean Courtney Registrant Organization: Name Server: NS03.DOMAINCONTROL.COM Name Server: NS04.DOMAINCONTROL.COM DNSSEC: unsigned

Sean Courtney isn't a lot to go on. Looking in PassiveTotal there are 106 recorded Registrants that share this name. A lot of them seem unrelated so name alone might not be strong enough to go off. Looking at the resolutions for the domain show two IP addresses, both registered to GoDaddy. This implies that on the date it was registered they changed the IP address, which is always an interesting piece of data to pivot on.

Cross-correlating the two IP addresses with every other domain that shares the Registrant "Sean Courtney" showed a number of domains that overlapped.

107.180.11.206: "shanemisslerfoundation[.]com" "activatedcharcoal[.]life" "clashwrap[.]com" "clashup[.]org" 160.153.72.6: "shanemisslerfoundation[.]com" "seancourtney[.]org"

Now, these are shared GoDaddy IP addresses, so each IP has a significant amount of domains attached to them (500 and 1K, respectively) but it's building a stronger relationship. Additionally, the other domains also have another overlap which makes things more interesting. Specifically, they share an e-mail address used during registration.

seanstuhh7@gmail[.]com

Obviously, the one that immediately stands out in the domains is "seancourtney[.]org". Looking at this sites WHOIS information reveals a bit more. Relevant bits follow:

Domain Name: SEANCOURTNEY[.]ORG Registrar WHOIS Server: whois.godaddy.com Updated Date: 2017-10-23T03:46:48Z Creation Date: 2017-08-24T01:27:16Z Registrar: GoDaddy.com, LLC Registrar IANA ID: 146 Registrar Abuse Contact Email: abuse@godaddy.com Registrar Abuse Contact Phone: +1.4806242505 Registrant Name: Sean Courtney Registrant Street: 3105 N. Oakwood Ave Registrant Street: #121 Registrant City: Muncie Registrant State/Province: Indiana Registrant Postal Code: 47304 Registrant Country: US Registrant Phone: +1.3174400050 Registrant Email: seanstuhh7@gmail[.]com

Before diving in further on that, I want to take a second to talk about the e-mail address. If you Google it, you receive back a handful of other domains they've registered in the past all related to video game cheating. Similarly, using PassiveTotal to look at the historical domain registrations from this e-mail, a picture begins to emerge around content and theme.

seancourtney[.]org activatedcharcoal[.]life cpamethods[.]biz offtrackworld[.]us clashwrap[.]com clashup[.]org clashcomp[.]com clashway[.]org clashwork[.]com pokemaps[.]org clashpro[.]net miitomomines[.]org miitomomines[.]info clashpro[.]org overwatchlooter[.]org overwatchlooter[.]info accesspokemon[.]xyz accesspokemon[.]info thefuturedrake[.]com supercellko[.]com cheatcove[.]com robuxhack[.]info boombeachelite[.]com freexbltrials[.]com twitchshop[.]biz allthingsprofitable[.]net iotaexchange[.]us

Based on these and the few Google hits, it seems this individual tries to profit off "hacks" for very popular games, such as PokémonGo, Clash of Clans, and OverWatch. I also suspect that not one of these serve actual hacks or cheats but simply are used as a lure for desperate people. Again, if you play off peoples desire to win (or get free money) then you can more easily entice them into clicking your affiliate links and thus make money.

I have an e-mail, a name, and an address so lets see what else is available online.

Of course, almost immediately, I stumble onto the individuals LinkedIn page.

Sean Courtney, Advertiser, located in Muncie, Indiana.

He's also worked other Advertiser jobs in the past, but his most recent experience as an "Affiliate Advertiser" seems to line up perfectly with what we've uncovered so far. You'll also note the name of his currently employer, "OGAds". My gosh, that sounds awfully familiar...

You'll recall that when I looked at the source code of the page the surveys and apps were being funneled through, the domain was "ogtrk[.]net" - I wonder what the chances are those two are related?

Well, pretty fucking likely apparently.

The final icing on the cake is an all too now familiar app-install offer that OGAds displays proudly on their site.

That about wraps things up here. I'm sure they made a good chunk of change off everyone and it was a good lure (for me) so hats off to Sean and, most likely, OGAds. You're all a bunch of twats.

In this post I want to give a brief introduction to a new tool I'm working on called "Curtain". It will be complimentary to another post I'm working on for $dayjob where I created a Curtain Cuckoo module.

Curtain is basically just a small script that can be used to detonate malicious files or PowerShell scripts and then scrape out the ScriptBlock logs for fast analysis. I'll go into the concept behind it more in my other post.

The idea here then is to get that same functionality but without Cuckoo, as it's not always available to everyone or people may not want to much with installing custom modules, and I wanted something standalone.

That being said, there are still a few requirements for this alternate iteration which, admittedly, was my first version before I decided to try and streamline it more for work usage.

The usage of the script is fairly straight forward and you simply pass it either a PS1 script OR a file which it will try to execute and then report back the ScriptBlock event logs. For PS1 scripts, it will launch PowerShell, otherwise it will rely natively on the extension and the OS the recognize/execute it. It'll wait 10 seconds and then simply scrape the logs, parse them into a simple HTML file, and display it on the host.

Below is a simple example of the script in action...

$ ./curtain.sh test.ps1 [+] Reverting to snapshot - curtain [+] Starting headless VM 2017-11-09T10:51:39.463| ServiceImpl_Opener: PID 26072 [+] Copying curtain.ps1 to virtual Guest [+] Sending target file to detonate [+] Launching Curtain PS script for - test.ps1 [!] Sleeping for 10 seconds to let malware doing its thang... [+] Transferring output from script [+] Grabbing a screenshot of the desktop [+] Killing virtual Guest [+] Launching site...

If you click HERE you can see the resulting output that gets created.

Nothing too crazy or fancy but it allows you to see how things flowed and have some visibility into the deobfuscated PowerShell which was executed on the Guest machine. I haven't yet beautified it so it's pretty raw at the moment but figured I'd put it out there if there was a need for it.

The GIT for this script is up and you'll need to fill out a few variables in the "curtain.sh" script to make it work.

Hopefully this helps if you do not have Cuckoo available or just want a quick way to be able to parse out PowerShell execution activity from a code sample.

There is also a file "psorder.py" I've included in the GIT repo. At some point I was writing a manual PowerShell deobfuscator and, while that is a fruitless endeavor, there are some benefits using it in tandem with Curtain for token replacement/etc.

When Magic the Gathering (MtG) came out I was beyond excited. As a fairly young kid, this was unlike anything I'd ever seen before and for a few years I would spend every penny from chores, lunch money, and birthday cards to round out my collection. It was a blast, I really enjoyed it, and developed many fond memories around playing and collecting the cards.

Alas, as with all good things, that came to an end as life started to kick in. Fast forward 20+ years to today and I'm free to indulge myself in my old(new) hobbies again. One thing I always wanted to do was buy a booster box. SO.MANY.CARDS! But holy shit is it not cheap, or at least not for past-me, so it was always a pipe dream until now.

I've been shitposting with friends at work about doing a MtG draft at a conference we'll be attending one night, rehashing old memories, and then I felt a familiar feeling...an itch that needed to be scratched. I read up on the latest Amonkhet set and fell in love with the theme of it so I went out and bought an Amonkhet Booster Box, the Amonkhet Deck Builder's Toolkit, and an Amonkhet Bundle Box. Indulge I shall.

It's a metric crap top of cards, espceially going from 0, and I found myself with an overwhelming amount of information to take in and try to process. Tons of new rules, new abilities, new everything. I spent the majority of time looking up card rules to try and get a grasp of the game, not even knowing where to begin with building a new deck. I cobbled together some decks as I opened packs but I wondered if, within my cache of newfound cards, there may already be a deck someone else has built and posted online. Surely that would be the case with 1,100 cards, right?

TL;DR don't buy packs and expect to have any semblance of a pre-constructed deck, official or otherwise.

To come to this conclusion, which I admittedly already thought may be the case before buying all of these (still didn't deter me from making the purchase though, at least to do it once) I created a script to essentially search decks I scrape online and attempt to match my library against. The script mtgdeckhunter.py is up on Github with usage examples and output data. You can skip everything below if you're just interested in that instead of my rambling about it.

After some time on the net looking up new MtG rules, I came across three main sites (Deckbox, MtG Goldfish, and MtG Top 8) that had tons of decks available to peruse in all kinds of formats. The problem then became not finding the decks, but identifying which decks I might have most of the cards for. The idea being I could then just craft some decks other people put thought into, give them a whirl, see if I liked that style, then maybe go from there. Unfortunately none of these sites have that feature available except for MtG Goldfish - part of their monthly pay service.

I decided I'd try to craft something simple in Python to scrape the publicly available decks and see if I had any matches. What I've now realized is that if you just have one "set" (eg Amonkhet) then you're pretty much SoL on finding pre-made decks. Since there are so many editions in rotation, you rarely find real decks solely focused on just one set except the officially released ones. In hindsight, I'd have bought a few of the "Deck Builder's Toolkits" and Bundle Boxes for maybe the most recent 2-3 sets or just a couple of the official pre-constructed ones that actually don't seem too bad. Initially the idea of buying a pre-constructed deck had me sticking my nose up as if I'm some kind of MtG legend (I'm not).

Anywho, I'm not providing the decks I've pulled from these sites, that is an exercise left to the user, but if you have a ton of cards that you've got listed out on your computer, then maybe this program will prove helpful to you. I ended up not really using it at all...go figure...and built two EDH decks that have been pretty fun in my limited playing. I may revisit this once I have a more well rounded collection.

You can see my cards from the three purchases mentioned previously on Deckbox or in text format here, if you're interested in what I got. Deckbox says the total value of cards in the set is $283.95, which I think is pretty good since it's about 2x what I paid (doesn't account for the foils I got). The new formats (eg EDH) seem really fun and I'm excited to get back into this hobby.

Enjoy!

*NOTE: I likely won't be updating this code anytime soon for the reasons above. It's probably quite a bit buggy and after having collected 100,000+ decks, I quickly recognized using JSON as the storage format as being a terrible idea (80MB+ file). Also I'd say it's not particularly stable as it relies upon parsing these sites which can change their code at any moment.

Regular expressions (regex) are a language construct that allow you to define a search pattern. The flexibility of this language allows you to craft search patterns for tons of practical applications, including passive identification of network traffic. Specifically, they can allow you to pattern match on URL's so that you may quickly identify malicious sites frequently used by malware command and control (C2), domain generation algorithms (DGA's), and other such activities.

I fell in love with using regex as a defensive tool while doing incident response many years ago. The depth of control they provide naturally lends itself to the forensic, analyst, and responder lines of work. This blog may be old hat to most blue teamers out there, but if not, hopefully it serves as an educational resource on how you can use data to build PCRE's for network defense.

Over the course of this blog, I'll cover developing Perl compatible regex (PCRE) for the Emotet banking malware download URL's and develop PCRE's that encompass multiple campaigns that can then be used on a proxy device (blocking), in a SIEM (identification), or whatever system you have that supports utilizing these expressions. Emotet is a great candidate for review as it has varying domain structures that are ripe for pattern matching. I'll walk you through how I develop these PCRE's, along with refining them, and then finally how they can be vetted for false-positives (FP) to make them ready for production.

Throughout the blog, I'll be using a Python script I wrote called pcre_check to assit with the analysis. Essentially, all the tool does is take a parameter for a file containing your PCRE's, a parameter for a file containing the URL's, and then some flags for how to display the pattern matches and misses. This is helpful for the rapid development of PCRE's because, more often than naught, you find yourself in the midst of developing these when the shit has hit the fan...or at least I always did.

I'll be focusing solely on URL's in this example; however, on the off chance you're not familiar with regex, keep in mind that a myriad of tools, all the way down at the byte level and up to the application level that I'll be covering here can utilize regex. You should absolutely learn the basics at least as it's something that can be a life saver in your daily toolbox.

Before I get too much further in, here are a couple of helpful links, that I find myself constantly visiting, which you may find useful if you want to review or build your own PCRE's. I'll try to explain the regex syntax and logic as I go but I'll assume you know the basic structure of the language. If not, hit the references below.

This will be a long blog, and a little free flowing, as I develop these while enumerating step-by-step. Below are some jumps so you can skip around as needed.

Initial Sample Corpus

The Emotet banking malware download locations have a lot of different URL structures across their different campaigns. It's been popping up on my radar more and more lately so I want to try and enumerate the patterns here to further expand what I can catch. That being said, the very first thing I need to do is collect a decent samples of the various campaigns so that I can begin to try and match them. Prior to my current $dayjob, I'd approach this by hitting up multiple blogs from researchers or security companies and compile the URL set. When I didn't have access to systems that made this task fairly trivial, I would frequently build them from the below resources.

Usually just Googling the threat name, "Emotet domains", bring you to sites like this one which have links to Pastebin posts containing loads of samples. The more the better but in general, in my experience, I'd say between 15-30 URL's is usually enough to make a solid base for an individual pattern and then you can tweak it during the false-positive (FP) checking phase.

I've placed 696 Emotet URL's on Github which you can use to follow along or throw in a blocklist.

Pattern Recognition / Enumeration

Once you have a decent sample set, the next step is to analyze the data and look for patterns. I'll show the various changes to the PCRE's as I analyze the URL's and you can see how they evolve into the final product after each iteration. To better illustrate this, I'll just focus on the last 20 URL's at a time but normally I'll have open 3 terminals: top window editing the URL file, middle window with pcre_check output, bottom window editing the PCRE file. This layout allows me to quickly modify and validate changes on the fly and significantly reducing the time to turnaround.

Below is the first run of the script showing that none of the URLs matched and truncated to the last 20.

Round 1

$ python pcre_check.py -p emotet_pcres -u emotet_urls -n [+] NO HITS [+] http://12back.com/dw3wz-ue164-qqv/ http://4glory.net/p7lrq-s191-iv/ ... http://www.melodywriters.com/INVOICE-864339-98261/ http://www.prodzakaz.com.ua/H27560xzwsS/ http://www.stellaimpianti.it/download2467/ http://www.stepstonedev.com/field/download7812/ http://www.surreycountycleaners.com/t5wx-x064-mzdb/ http://www.voloskof.net/Sn83160EngQs/ http://www.wildweek.com/EDHFR-08-77623-document-May-04-2017/ http://www.ziyufang.studio/project/wp-content/plugins/nprojects/download5337/ http://wyskocil.de/ORDER-525808-73297/ http://xionglutions.com/NDKBS-51-84402-document-May-03-2017/ http://xionglutions.com/wl7dh-uf201-asnw/ http://xyphoid.com/RRT-13279129.dokument/ http://xyphoid.com/SCANNED/MM3431UCNPCEZRO/ http://yildiriminsaat.com.tr/JCV-71815736.dokument/ http://zahahadidmiami.com/K38258Q/ http://zeroneed.com/FNN-40446899.dokument/ http://ziarahsutera.com/5377959590/ http://zonasacra.com/zH83293YizhQ/ http://zvarga.com/15-12-07/CUST-9405847-8348/ http://zypern-aktiv.de/wp-content/plugins/wordfence/img9re-a789-stz/

There are a couple of things that jump out immediately on the first review.

For each of the PCRE, I've grown accustomed to starting them with the below structure.

^http:\/\/[^\x2F]+\/

This matches any line that begins ("^") with "http://" followed by any characters, except ("[^ ]") forward slash ("\x2F"), up to the first foward slash. This ensures we match the domain regardless of what TLD or subdomains may be present.

For ease of illustration, I'm going to group the variations and break them down individually.

[Group 01]

http://www.surreycountycleaners.com/t5wx-x064-mzdb/ http://xionglutions.com/wl7dh-uf201-asnw/ http://zypern-aktiv.de/wp-content/plugins/wordfence/img9re-a789-stz/

For this pattern, we have 4-5 alpha(lower)numeric, dash, 4-5 alpha(lower)numeric, dash, 3-4 alpha(lower). We'll also want to account for the last line which has the path multiple levels in. We can accomplish this by putting our "[^\x2F]+\/" section in a group and saying the group can repeat one or more times (eg match everything between the forward slashes until the last one, where our pattern is).

^http:\/\/([^\x2F]+\/)+[a-z0-9]{4,5}-[a-z0-9]{4,5}-[a-z]{3,4}\/$

[Group 02]

http://www.melodywriters.com/INVOICE-864339-98261/ http://wyskocil.de/ORDER-525808-73297/ http://zvarga.com/15-12-07/CUST-9405847-8348/

This next group appears to use a word in caps, dash, 6-7 numbers, dash, 4-5 numbers. We'll need to account for the subpaths again as well. In this case, I prefer to group full words instead of using a character range, which helps for trying to be false-positive adverse.

^http:\/\/([^\x2F]+\/)+(INVOICE|ORDER|CUST)-[0-9]{6,7}-[0-9]{4,5}\/$

[Group 03]

http://www.prodzakaz.com.ua/H27560xzwsS/ http://www.voloskof.net/Sn83160EngQs/ http://xyphoid.com/SCANNED/MM3431UCNPCEZRO/ http://zahahadidmiami.com/K38258Q/ http://ziarahsutera.com/5377959590/ http://zonasacra.com/zH83293YizhQ/

I feel this group may end up getting split later. We have one URL which is purely numerical and then two which have no lowercase letters. We'll cross that bridge as we look at more samples, if necessary. Another thing to note is that this group has a very weak pattern in that it is very generic, which means it will likely match a lot of legitimate URL's and not hold up during FP testing. We'll cross that bridge when we get to it as well.

For now, it's a mix of 7-15 alphanumeric characters.

^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{7,15}\/$

[Group 04]

http://xyphoid.com/RRT-13279129.dokument/ http://yildiriminsaat.com.tr/JCV-71815736.dokument/ http://zeroneed.com/FNN-40446899.dokument/

This one, and the next three, all look pretty straight forward: 3 alpha(upper), dash, 8 numbers, period, "dokument" string.

^http:\/\/([^\x2F]+\/)+[A-Z]{3}-[0-9]{8}\.dokument\/$

[Group 05]

http://www.wildweek.com/EDHFR-08-77623-document-May-04-2017/ http://xionglutions.com/NDKBS-51-84402-document-May-03-2017/

Similarly, very structured (which is good for us): 5 alpha(upper), dash, 2 numbers, dash, 5 numbers, dash, "document" string, dash, "May" string, dash, 2 numbers, dash, "2017" string. I've defaulted to using "2017" as a string since it aligns with their usage of it as a date so it seems unlikely to change.

^http:\/\/([^\x2F]+\/)+[A-Z]{5}-[0-9]{2}-[0-9]{5}-document-May-[0-9]{2}-2017\/$

[Group 06]

http://www.stellaimpianti.it/download2467/ http://www.stepstonedev.com/field/download7812/ http://www.ziyufang.studio/project/wp-content/plugins/nprojects/download5337/

The string "download", 4 numbers.

^http:\/\/([^\x2F]+\/)+download[0-9]{4}\/$

I'll throw these into the emotet_pcres file and see how each performs against our target data set of known-bad Emotet sites.

[+] FOUND [+] Count: 24/696 Comment: Group 01 - [ t5wx-x064-mzdb ] PCRE: ^http:\/\/([^\x2F]+\/)+[a-z0-9]{4,5}-[a-z0-9]{4,5}-[a-z]{3,4}\/$ [+] FOUND [+] Count: 43/696 Comment: Group 02 - [ INVOICE-864339-98261 ] PCRE: ^http:\/\/([^\x2F]+\/)+(INVOICE|ORDER|CUST)-[0-9]{6,7}-[0-9]{4,5}\/$ [+] FOUND [+] Count: 177/696 Comment: Group 03 - [ H27560xzwsS ] PCRE: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{7,15}\/$ [+] FOUND [+] Count: 30/696 Comment: Group 04 - [ RRT-13279129.dokument ] PCRE: ^http:\/\/([^\x2F]+\/)+[A-Z]{3}-[0-9]{8}\.dokument\/$ [+] FOUND [+] Count: 24/696 Comment: Group 05 - [ EDHFR-08-77623-document-May-04-2017 ] PCRE: ^http:\/\/([^\x2F]+\/)+[A-Z]{5}-[0-9]{2}-[0-9]{5}-document-May-[0-9]{2}-2017\/$ [+] FOUND [+] Count: 62/696 Comment: Group 06 - [ download2467 ] PCRE: ^http:\/\/([^\x2F]+\/)+download[0-9]{4}\/$

Pretty low across the board except for group 3, which is the one I mentioned is too loose to begin with. From here on out, if I don't list a particular group, it implies there was no change to the PCRE.

Round 2

The next 20 URL's are below.

http://web2present.com/Invoice-538878-14610/ http://webbmfg.com/krupy/gallery2/g2data/LUqc663BAyN333-HoO/ http://webbsmail.co.uk/DIDE-19-85247-document-May-04-2017/ http://webergy.co.uk/15-14-47/Cust-0910279-3981/ http://webics.org/Cust-951068-69554/ http://websajt.nu/ap6ohc-au152-urttp/ http://wescographics.com/17-40-07/Invoice-5558936-1201/ http://whiteroofradio.com/YD796MJO974-NNW/ http://wightman.cc/ipa0oab-j490-keap/ http://wilstu.com/hHiDSaaP03Y95TIGpIUS4Aa/ http://wingitproductions.org/NUDA-X-52454-DE/ http://wlrents.com/CUST.-Document-YDI-04-GQ389557/ http://wnyil.org/wnyil_transfer/Ups__com__WebTracking__tracknum__4DFH74180493688150/ORDER.-Document-SY-92-E736730/ http://wolffy.net/17-00-07/Invoice-9545415-1483/ http://wortis.com/CH760Wcv003-Luh/ http://www.anti-corruption.su/Cust-3708876-8210/ http://www.anti-corruption.su/TNO-59-97413-document-May-04-2017/ http://www.babyo.com.mx/Invoice-583156-73417/ http://www.doodle.tj/yW1NZ-sh00-cH/ http://zypern-aktiv.de/wp-content/plugins/wordfence/img9re-a789-stz/

It looks like we have a few new groups as well. I'll attempt to highlight in red the changes to the PCRE's which might make the changes clearer.

[Group 01] - [ t5wx-x064-mzdb ]

http://websajt.nu/ap6ohc-au152-urttp/ http://wightman.cc/ipa0oab-j490-keap/ http://www.doodle.tj/yW1NZ-sh00-cH/ http://zypern-aktiv.de/wp-content/plugins/wordfence/img9re-a789-stz/

You'll note that the third one now introduces capital letters; it's possible this is a separate campaign but I'll circle back to this later during review. The main changes will be the addition of the capital letters and adjustment on the ranges, which will likely be the case for the rest of the groups.

OLD: ^http:\/\/([^\x2F]+\/)+[a-z0-9]{4,5}-[a-z0-9]{4,5}-[a-z]{3,4}\/$ NEW: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{4,7}-[a-z0-9]{4,5}-[a-z]{2,5}\/$

[Group 02] - [ INVOICE-864339-98261 ]

http://web2present.com/Invoice-538878-14610/ http://webergy.co.uk/15-14-47/Cust-0910279-3981/ http://webics.org/Cust-951068-69554/ http://wescographics.com/17-40-07/Invoice-5558936-1201/ http://wolffy.net/17-00-07/Invoice-9545415-1483/ http://www.anti-corruption.su/Cust-3708876-8210/ http://www.babyo.com.mx/Invoice-583156-73417/

New strings "Invoice" and "Cust".

OLD: ^http:\/\/([^\x2F]+\/)+(INVOICE|ORDER|CUST)-[0-9]{6,7}-[0-9]{4,5}\/$ NEW: ^http:\/\/([^\x2F]+\/)+(INVOICE|ORDER|CUST|Invoice|Cust)-[0-9]{6,7}-[0-9]{4,5}\/$

[Group 03] - [ H27560xzwsS ]

http://wilstu.com/hHiDSaaP03Y95TIGpIUS4Aa/

Range adjustment (making this one even more useless).

OLD: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{7,15}\/$ NEW: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{7,23}\/$

[Group 05] - [ EDHFR-08-77623-document-May-04-2017 ]

http://webbsmail.co.uk/DIDE-19-85247-document-May-04-2017/ http://www.anti-corruption.su/TNO-59-97413-document-May-04-2017/

Range adjustment.

OLD: ^http:\/\/([^\x2F]+\/)+[A-Z]{5}-[0-9]{2}-[0-9]{5}-document-May-[0-9]{2}-2017\/$ NEW: ^http:\/\/([^\x2F]+\/)+[A-Z]{3,5}-[0-9]{2}-[0-9]{5}-document-May-[0-9]{2}-2017\/$

[Group 07] - [ LUqc663BAyN333-HoO ]

http://webbmfg.com/krupy/gallery2/g2data/LUqc663BAyN333-HoO/ http://whiteroofradio.com/YD796MJO974-NNW/ http://wortis.com/CH760Wcv003-Luh/

This cluser is defined by one dash towards the end: 11-14 alphanumeric, dash, 3 alpha.

^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{11,14}-[a-zA-Z]{3}\/$

[Group 08] - [ NUDA-X-52454-DE ]

http://wingitproductions.org/NUDA-X-52454-DE/

Only one sample so I'll match it exactly, 4 alpha(upper), dash, 1 alpha(upper), dash, 5 numbers, dash, 2 alpha(upper).

^http:\/\/([^\x2F]+\/)+[A-Z]{4}-[A-Z]{1}-[0-9]{5}-[A-Z]{2}\/$

[Group 09] - [ CUST.-Document-YDI-04-GQ389557 ]

http://wlrents.com/CUST.-Document-YDI-04-GQ389557/ http://wnyil.org/wnyil_transfer/Ups__com__WebTracking__tracknum__4DFH74180493688150/ORDER.-Document-SY-92-E736730/

Similar to Group 2: same word choice, period, dash, "Document" string, dash, 2-3 alpha(upper), dash, 2 numbers, dash, 7-8 alpha(upper)numeric.

^http:\/\/([^\x2F]+\/)+(CUST|ORDER)\.-Document-[A-Z]{2,3}-[0-9]{2}-[A-Z0-9]{7,8}\/$

Note that the delta in the output after each group is just something I've included after the fact to show the progress for the blog.

[+] FOUND [+] Count: 66/696 (+42) Comment: [Group 01] - [ t5wx-x064-mzdb ] PCRE: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{4,7}-[a-z0-9]{4,5}-[a-z]{2,5}\/$ [+] FOUND [+] Count: 80/696 (+37) Comment: [Group 02] - [ INVOICE-864339-98261 ] PCRE: ^http:\/\/([^\x2F]+\/)+(INVOICE|ORDER|CUST|Invoice|Cust)-[0-9]{6,7}-[0-9]{4,5}\/$ [+] FOUND [+] Count: 190/696 (+13) Comment: [Group 03] - [ H27560xzwsS ] PCRE: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{7,23}\/$ [+] FOUND [+] Count: 30/696 Comment: [Group 04] - [ RRT-13279129.dokument ] PCRE: ^http:\/\/([^\x2F]+\/)+[A-Z]{3}-[0-9]{8}\.dokument\/$ [+] FOUND [+] Count: 59/696 (+35) Comment: [Group 05] - [ EDHFR-08-77623-document-May-04-2017 ] PCRE: ^http:\/\/([^\x2F]+\/)+[A-Z]{3,5}-[0-9]{2}-[0-9]{5}-document-May-[0-9]{2}-2017\/$ [+] FOUND [+] Count: 62/696 Comment: [Group 06] - [ download2467 ] PCRE: ^http:\/\/([^\x2F]+\/)+download[0-9]{4}\/$ [+] FOUND [+] Count: 15/696 Comment: [Group 07] - [ LUqc663BAyN333-HoO ] PCRE: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{11,14}-[a-zA-Z]{3}\/$ [+] FOUND [+] Count: 3/696 Comment: [Group 08] - [ NUDA-X-52454-DE ] PCRE: ^http:\/\/([^\x2F]+\/)+[A-Z]{4}-[A-Z]{1}-[0-9]{5}-[A-Z]{2}\/$ [+] FOUND [+] Count: 20/696 Comment: [Group 09] - [ CUST.-Document-YDI-04-GQ389557 ] PCRE: ^http:\/\/([^\x2F]+\/)+(CUST|ORDER)\.-Document-[A-Z]{2,3}-[0-9]{2}-[A-Z0-9]{7,8}\/$

Round 3

The next set of 20 URL's.

http://theocforrent.com/BG-47535325/zp3x-r88-wuh.view/ http://thepogs.net/rs4eG-Md93-FSZV/ http://thesubservice.com/ORDER.-Document-9543529814/ http://theuntoldsorrow.co.uk/ORDER.-XI-80-UY913942/ http://tiger12.com/TGA-48-76252-doc-May-04-2017/ http://timmadden.com.au/qzw1s-wc740-m/ http://toppprogramming.com/Cust-8328499631/ http://tpsystem.net/TaVS391hyCaD623-dJ/ http://transfinity.co.uk/sam/fathers-day/htdocs/b2m-qp699-jxmln/ http://tridentii.com/OY-30676027.dokument/ http://tscoaching.co.uk/l1R-q60-pe/ http://uncover.jp/XwXL806QaDN792-jr/ http://uncover.jp/r-2psl-vo440-lz.doc/ http://visionsoflightphotography.com/FRMLW-RNT-41482-DE/ http://visuals.com/CUST.-VT-38-RH422386/ http://voxellab.com/BBM-07-75350-doc-May-04-2017/ http://vspacecreative.co.uk/O2-view-report-818/c1o-jn07-er.view/ http://wayanad.net/xhW017TRfP646-z/ http://wb0rur.com/ZGAG-59-63863-doc-May-05-2017/ http://www.doodle.tj/yW1NZ-sh00-cH/

One new variant in this set.

[Group 01] - [ t5wx-x064-mzdb ]

http://thepogs.net/rs4eG-Md93-FSZV/ http://timmadden.com.au/qzw1s-wc740-m/ http://transfinity.co.uk/sam/fathers-day/htdocs/b2m-qp699-jxmln/ http://tscoaching.co.uk/l1R-q60-pe/ http://www.doodle.tj/yW1NZ-sh00-cH/

Range adjustment and additiona case changes.

OLD: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{4,7}-[a-z0-9]{4,5}-[a-z]{2,5}\/$ NEW: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{4,7}-[a-zA-Z0-9]{4,5}-[a-zA-Z]{1,5}\/$

[Group 02] - [ INVOICE-864339-98261 ]

http://toppprogramming.com/Cust-8328499631/

This could be a different campaign as it breaks from the double-dashes but it's so similar to group 2 that I'll leave it for now and possibly revisit.

The second dash I'll make optional which should allow the lowest ranges of the numerical sections to match. I'll use an optinal capturing group ("(-)?") for the second dash. Effectively creating a capture group and then using the "?" value after will cause the group to match between zero and one time, thus becoming optional.

OLD: ^http:\/\/([^\x2F]+\/)+(INVOICE|ORDER|CUST|Invoice|Cust)-[0-9]{6,7}-[0-9]{4,5}\/$ NEW: ^http:\/\/([^\x2F]+\/)+(INVOICE|ORDER|CUST|Invoice|Cust)-[0-9]{6,7}(-)?[0-9]{4,5}\/$

[Group 04] - [ RRT-13279129.dokument ]

http://tridentii.com/OY-30676027.dokument/

Range adjustment.

OLD: ^http:\/\/([^\x2F]+\/)+[A-Z]{3}-[0-9]{8}\.dokument\/$ NEW: ^http:\/\/([^\x2F]+\/)+[A-Z]{2,3}-[0-9]{8}\.dokument\/$

[Group 05] - [ EDHFR-08-77623-document-May-04-2017 ]

http://tiger12.com/TGA-48-76252-doc-May-04-2017/ http://voxellab.com/BBM-07-75350-doc-May-04-2017/ http://wb0rur.com/ZGAG-59-63863-doc-May-05-2017/

Add "doc" string to grouping.

OLD: ^http:\/\/([^\x2F]+\/)+[A-Z]{3,5}-[0-9]{2}-[0-9]{5}-document-May-[0-9]{2}-2017\/$ NEW: ^http:\/\/([^\x2F]+\/)+[A-Z]{3,5}-[0-9]{2}-[0-9]{5}-(document|doc)-May-[0-9]{2}-2017\/$

[Group 07] - [ LUqc663BAyN333-HoO ]

http://tpsystem.net/TaVS391hyCaD623-dJ/ http://uncover.jp/XwXL806QaDN792-jr/ http://wayanad.net/xhW017TRfP646-z/

Range adjustment.

OLD: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{11,14}-[a-zA-Z]{3}\/$ NEW: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{11,15}-[a-zA-Z]{1,3}\/$

[Group 08] - [ NUDA-X-52454-DE ]

http://visionsoflightphotography.com/FRMLW-RNT-41482-DE/

Range adjustment.

OLD: ^http:\/\/([^\x2F]+\/)+[A-Z]{4}-[A-Z]{1}-[0-9]{5}-[A-Z]{2}\/$ NEW: ^http:\/\/([^\x2F]+\/)+[A-Z]{4,5}-[A-Z]{1,3}-[0-9]{5}-[A-Z]{2}\/$

[Group 09] - [ CUST.-Document-YDI-04-GQ389557 ]

http://thesubservice.com/ORDER.-Document-9543529814/ http://theuntoldsorrow.co.uk/ORDER.-XI-80-UY913942/ http://visuals.com/CUST.-VT-38-RH422386/

Couple of things going on here.

New grouping of words for second part and first entry is only numerical without dashes, which looks similar to the new entry for Group 2. To account for these, I'll use optional capturing groups again to build around them. It makes the rule slightly less accurate but with the other anchors in it, I think it'll still be fairly unique enough to not FP.

NEW: ^http:\/\/([^\x2F]+\/)+(CUST|ORDER)\.-Document-[A-Z]{2,3}-[0-9]{2}-[A-Z0-9]{7,8}\/$ OLD: ^http:\/\/([^\x2F]+\/)+(CUST|ORDER)\.-(Document|XI|VT)((-[A-Z]{2,3})?-[0-9]{2})?-[A-Z0-9]{7,10}\/$

[Group 10] - [ zp3x-r88-wuh.view ]

http://theocforrent.com/BG-47535325/zp3x-r88-wuh.view/ http://uncover.jp/r-2psl-vo440-lz.doc/ http://vspacecreative.co.uk/O2-view-report-818/c1o-jn07-er.view/

The "doc" and "view" ones may be different campaigns but, again, I'll lump them together for now and will separate at the end if necessary: 1-4 alpha(lower)numeric, dash, 3-4 alpha(lower)numeric, dash, optional 5 alpha(lower)numeric, dash, 2-3 alpha(lower), period, group "view" or "doc" strings.

^http:\/\/([^\x2F]+\/)+[a-z0-9]{1,4}-[a-z0-9]{3,4}(-[a-z0-9]{5})?-[a-z]{2,3}\.(view|doc)\/$

The pcre_check output shows decent coverage improvements.

[+] FOUND [+] Count: 93/696 (+27) Comment: [Group 01] - [ t5wx-x064-mzdb ] PCRE: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{4,7}-[a-zA-Z0-9]{4,5}-[a-zA-Z]{1,5}\/$ [+] FOUND [+] Count: 89/696 (+9) Comment: [Group 02] - [ INVOICE-864339-98261 ] PCRE: ^http:\/\/([^\x2F]+\/)+(INVOICE|ORDER|CUST|Invoice|Cust)-[0-9]{6,7}(-)?[0-9]{4,5}\/$ [+] FOUND [+] Count: 190/696 Comment: [Group 03] - [ H27560xzwsS ] PCRE: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{7,23}\/$ [+] FOUND [+] Count: 56/696 (+26) Comment: [Group 04] - [ RRT-13279129.dokument ] PCRE: ^http:\/\/([^\x2F]+\/)+[A-Z]{2,3}-[0-9]{8}\.dokument\/$ [+] FOUND [+] Count: 79/696 (+20) Comment: [Group 05] - [ EDHFR-08-77623-document-May-04-2017 ] PCRE: ^http:\/\/([^\x2F]+\/)+[A-Z]{3,5}-[0-9]{2}-[0-9]{5}-(document|doc)-May-[0-9]{2}-2017\/$ [+] FOUND [+] Count: 62/696 Comment: [Group 06] - [ download2467 ] PCRE: ^http:\/\/([^\x2F]+\/)+download[0-9]{4}\/$ [+] FOUND [+] Count: 43/696 (+28) Comment: [Group 07] - [ LUqc663BAyN333-HoO ] PCRE: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{11,15}-[a-zA-Z]{1,3}\/$ [+] FOUND [+] Count: 10/696 (+7) Comment: [Group 08] - [ NUDA-X-52454-DE ] PCRE: ^http:\/\/([^\x2F]+\/)+[A-Z]{4,5}-[A-Z]{1,3}-[0-9]{5}-[A-Z]{2}\/$ [+] FOUND [+] Count: 31/696 (+11) Comment: [Group 09] - [ CUST.-Document-YDI-04-GQ389557 ] PCRE: ^http:\/\/([^\x2F]+\/)+(CUST|ORDER)\.-(Document|XI|VT)((-[A-Z]{2,3})?-[0-9]{2})?-[A-Z0-9]{7,10}\/$ [+] FOUND [+] Count: 3/696 Comment: [Group 10] - [ zp3x-r88-wuh.view ] PCRE: ^http:\/\/([^\x2F]+\/)+[a-z0-9]{1,4}-[a-z0-9]{3,4}(-[a-z0-9]{5})?-[a-z]{2,3}\.(view|doc)\/$

Round 4

The next 20 sites.

http://pinoypiper.com/Sz1Mr-H23-Xw/ http://proiecte-pac.ro/ORDER.-5883789520/ http://proprints.dk/Rech-74779857260/ http://pulmad.ee/B6y-Fb95-NMW/ http://redkitecottages.com/Cust-Document-VMH-46-TJ804065/ http://reichertgroup.com/d0r-tl410-cxa/ http://sgbusiness.co.uk/YM-57911235-document-May-03-2017/ http://sign1.no/dhl___status___2668292851/ http://sloan3d.com/Cust-Document-WMV-26-EW054554/ http://stacibockman.com/g2c-o179-pocja/ http://streamingair.com/i0A-St59-m/ http://sublevel3.us/G7n-Gh58-y/ http://superalumnos.net/php/ORDER.-HW-84-Y947883/ http://technetemarketing.com/CUST.-8520279770/ http://teed.ru/YG-47124992/bc7za-l30-v.view/ http://texasbrits.com/m3s-r623-x/ http://thegilbertlawoffice.com/m-9q-d054-gu.doc/ http://thenursesagent.com/ORDER.-9592209302/ http://transfinity.co.uk/sam/fathers-day/htdocs/b2m-qp699-jxmln/ http://tscoaching.co.uk/l1R-q60-pe/

One new variant sticks out, otherwise business as usual.

[Group 01] - [ t5wx-x064-mzdb ]

http://pinoypiper.com/Sz1Mr-H23-Xw/ http://pulmad.ee/B6y-Fb95-NMW/ http://reichertgroup.com/d0r-tl410-cxa/ http://stacibockman.com/g2c-o179-pocja/ http://streamingair.com/i0A-St59-m/ http://sublevel3.us/G7n-Gh58-y/ http://texasbrits.com/m3s-r623-x/ http://transfinity.co.uk/sam/fathers-day/htdocs/b2m-qp699-jxmln/ http://tscoaching.co.uk/l1R-q60-pe/

Half of the 20 are for this group. Just some small range adjustments.

OLD: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{4,7}-[a-zA-Z0-9]{4,5}-[a-zA-Z]{1,5}\/$ NEW: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{3,7}-[a-zA-Z0-9]{3,5}-[a-zA-Z]{1,5}\/$

[Group 02] - [ INVOICE-864339-98261 ]

http://proiecte-pac.ro/ORDER.-5883789520/ http://proprints.dk/Rech-74779857260/ http://technetemarketing.com/CUST.-8520279770/ http://thenursesagent.com/ORDER.-9592209302/

It should be apparent now that Group 2 and 9 have a bit of overlap and I was going to wait till the end to course correct; however, I feel it's just too much at this point so I'm going to split it so the ones above, and previously matched in both groups, with the "ORDER" and "CUST" strings followed by 10 digits are a new unique group. That means I need to edit Group 2 and 9 to avoid these and the simplest way of doing that is removing the previous optional dash, making it absolutely required. See Group 9 and 12 for further iteration details.

OLD: ^http:\/\/([^\x2F]+\/)+(INVOICE|ORDER|CUST|Invoice|Cust)-[0-9]{6,7}(-)?[0-9]{4,5}\/$ NEW: ^http:\/\/([^\x2F]+\/)+(INVOICE|ORDER|CUST|Invoice|Cust)-[0-9]{6,7}-[0-9]{4,5}\/$

[Group 05] - [ EDHFR-08-77623-document-May-04-2017 ]

http://sgbusiness.co.uk/YM-57911235-document-May-03-2017/

This new one breaks from the two parts separated by a dash. I can add the dash to the character list and up the range, or I can opt for a optional grouping and up the range. I'm going to do the latter for the reason that it keeps the structure in tact; for this, I'm not as worried about FP's due to the ending part of the pattern being fairly unique.

OLD: ^http:\/\/([^\x2F]+\/)+[A-Z]{3,5}-[0-9]{2}-[0-9]{5}-(document|doc)-May-[0-9]{2}-2017\/$ NEW: ^http:\/\/([^\x2F]+\/)+[A-Z]{2,5}(-[0-9]{2})?-[0-9]{5,10}-(document|doc)-May-[0-9]{2}-2017\/$

[Group 09] - [ CUST.-Document-YDI-04-GQ389557 ]

http://redkitecottages.com/Cust-Document-VMH-46-TJ804065/ http://sloan3d.com/Cust-Document-WMV-26-EW054554/ http://superalumnos.net/php/ORDER.-HW-84-Y947883/

Similar to Group 2, I'm going to reverse course on the optional groupings so that the 10 digits are not captured. To account for the new variants in Group 9, I'm adding an optional grouping for the period after the first word and for the "Document" string, then moving the others back into the A-Z grouping that followed.

OLD: ^http:\/\/([^\x2F]+\/)+(CUST|ORDER)\.-(Document|XI|VT)((-[A-Z]{2,3})?-[0-9]{2})?-[A-Z0-9]{7,10}\/$ NEW: ^http:\/\/([^\x2F]+\/)+(CUST|ORDER|Cust)(.)?(-Document)?-[A-Z]{2,3}-[0-9]{2}-[A-Z0-9]{7,10}\/$

[Group 10] - [ zp3x-r88-wuh.view ]

http://thegilbertlawoffice.com/m-9q-d054-gu.doc/

Range adjustment.

OLD: ^http:\/\/([^\x2F]+\/)+[a-z0-9]{1,4}-[a-z0-9]{3,4}(-[a-z0-9]{5})?-[a-z]{2,3}\.(view|doc)\/$ NEW: ^http:\/\/([^\x2F]+\/)+[a-z0-9]{1,4}-[a-z0-9]{3,4}(-[a-z0-9]{4,5})?-[a-z]{2,3}\.(view|doc)\/$

[Group 11] - [ dhl___status___2668292851 ]

http://sign1.no/dhl___status___2668292851/

Not much to work with yet so it's fairly static.

^http:\/\/([^\x2F]+\/)+dhl___status___[0-9]{10}\/$

[Group 12] - [ ORDER.-5883789520 ]

Previous set: http://thesubservice.com/ORDER.-Document-9543529814/ http://toppprogramming.com/Cust-8328499631/ Current set: http://proiecte-pac.ro/ORDER.-5883789520/ http://proprints.dk/Rech-74779857260/ http://technetemarketing.com/CUST.-8520279770/ http://thenursesagent.com/ORDER.-9592209302/

Looking at the data in Group 2 and 9, this pattern will have: string grouping of "ORDER", "RECH", "CUST", "Cust", optional period, dash, optional "Document" string, 10-11 numbers. By the way, "rech" is shorthand for "rechnung", which is German for "bill" - you see these variations quite a bit in phishing campaigns as they focus on different regions.

^http:\/\/([^\x2F]+\/)+(ORDER|Rech|CUST|Cust)(.)?(-Document)?-[0-9]{10,11}\/$

Next iteration below.

[+] FOUND [+] Count: 127/696 (+34) Comment: [Group 01] - [ t5wx-x064-mzdb ] PCRE: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{3,7}-[a-zA-Z0-9]{3,5}-[a-zA-Z]{1,5}\/$ [+] FOUND [+] Count: 80/696 (-9) Comment: [Group 02] - [ INVOICE-864339-98261 ] PCRE: ^http:\/\/([^\x2F]+\/)+(INVOICE|ORDER|CUST|Invoice|Cust)-[0-9]{6,7}-[0-9]{4,5}\/$ [+] FOUND [+] Count: 190/696 Comment: [Group 03] - [ H27560xzwsS ] PCRE: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{7,23}\/$ [+] FOUND [+] Count: 56/696 Comment: [Group 04] - [ RRT-13279129.dokument ] PCRE: ^http:\/\/([^\x2F]+\/)+[A-Z]{2,3}-[0-9]{8}\.dokument\/$ [+] FOUND [+] Count: 86/696 (+7) Comment: [Group 05] - [ EDHFR-08-77623-document-May-04-2017 ] PCRE: ^http:\/\/([^\x2F]+\/)+[A-Z]{2,5}(-[0-9]{2})?-[0-9]{5,10}-(document|doc)-May-[0-9]{2}-2017\/$ [+] FOUND [+] Count: 62/696 Comment: [Group 06] - [ download2467 ] PCRE: ^http:\/\/([^\x2F]+\/)+download[0-9]{4}\/$ [+] FOUND [+] Count: 43/696 Comment: [Group 07] - [ LUqc663BAyN333-HoO ] PCRE: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{11,15}-[a-zA-Z]{1,3}\/$ [+] FOUND [+] Count: 10/696 Comment: [Group 08] - [ NUDA-X-52454-DE ] PCRE: ^http:\/\/([^\x2F]+\/)+[A-Z]{4,5}-[A-Z]{1,3}-[0-9]{5}-[A-Z]{2}\/$ [+] FOUND [+] Count: 36/696 (+5) Comment: [Group 09] - [ CUST.-Document-YDI-04-GQ389557 ] PCRE: ^http:\/\/([^\x2F]+\/)+(CUST|ORDER|Cust)(.)?(-Document)?-[A-Z]{2,3}-[0-9]{2}-[A-Z0-9]{7,10}\/$ [+] FOUND [+] Count: 3/696 Comment: [Group 10] - [ zp3x-r88-wuh.view ] PCRE: ^http:\/\/([^\x2F]+\/)+[a-z0-9]{1,4}-[a-z0-9]{3,4}(-[a-z0-9]{4,5})?-[a-z]{2,3}\.(view|doc)\/$ [+] FOUND [+] Count: 3/696 Comment: [Group 11] - [ dhl___status___2668292851 ] PCRE: ^http:\/\/([^\x2F]+\/)+dhl___status___[0-9]{10}\/$ [+] FOUND [+] Count: 31/696 Comment: [Group 12] - [ ORDER.-5883789520 ] PCRE: ^http:\/\/([^\x2F]+\/)+(ORDER|Rech|CUST|Cust)(.)?(-Document)?-[0-9]{10,11}\/$

Round 5

Since there are only 31 URL's left I'm just going to add them all here and close out this phase.

http://akhmerov.com/AuHffUo4L1BcEmca0BW5e4UtI/ http://albrightfinancial.com/gescanntes-Dokument-66764196575/ http://anjep.com/TBWEV-YCAP-91327-DE/ http://arroyave.net/Rech-K-682-GO1130/ http://beowulf7.com/kgcee/ http://bitach.com/RIJW-FNFE-86299-DE/ http://bobrow.com/ito-6r-w193-pkr.doc/ http://boningue.com/g843enx500-Jh/ http://carriedavenport.com/Scan-58146582290/ http://davidberman.com/gescanntes-Dokument-85218870046/ http://dentaltravelpoland.co.uk/NUN-63376893/b4fe-nn88-s.view/ http://donnjo.com/Rechnung-IOOY-776-LUV2894/ http://frossweddingcollections.co.uk/qdu-7p-wi523-hgnt.doc/ http://froufrouandthomas.co.uk/c644kNg297-uy/ http://gabrielramos.com.br/lxu-3h-ip079-zgmg.doc/ http://genxvisual.com/U494KHq064-VK/ http://gestion-arte.com.ar/CLCJY-EMIE-76216-DE/ http://imnet.ro/gcxbh/ http://johncarta.com/jexaag/ http://kowalenko.ca/D603ImA780-xxJ/ http://kratiroff.com/Scan-62799108494/ http://lapetitenina.com/eyym/ http://magmaprod.com.br/FcmUZ9GGTFaq2SYC5HTuFgc4v7/ http://masmp.com/rby-4c-rp108-sqq.doc/ http://missgypsywhitemoon.com.au/ismpce/ http://music111.com/VAQT-DYBC-27274-DE/ http://myhorses.ca/lb8TApg9aZI6PP5RWRAIdmfU/ http://onlineme.w04.wh-2.com/LD-36666076/ir5r-mu75-h.view/ http://phoneworx.co.uk/HLqwOU1uNQ7rWLWkXW6VoMheZf/ http://teed.ru/YG-47124992/bc7za-l30-v.view/ http://thegilbertlawoffice.com/m-9q-d054-gu.doc/

[Group 03] - [ H27560xzwsS ]

http://akhmerov.com/AuHffUo4L1BcEmca0BW5e4UtI/ http://beowulf7.com/kgcee/ http://imnet.ro/gcxbh/ http://johncarta.com/jexaag/ http://lapetitenina.com/eyym/ http://magmaprod.com.br/FcmUZ9GGTFaq2SYC5HTuFgc4v7/ http://myhorses.ca/lb8TApg9aZI6PP5RWRAIdmfU/ http://phoneworx.co.uk/HLqwOU1uNQ7rWLWkXW6VoMheZf/

I'll adjust the ranges on this one but you can see from the above that it looks like two distinct campaigns. I have no doubt now that there will be more in this grouping but since it's almost over 200 URL's I'll review the entire set at the end.

OLD: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{7,23}\/$ NEW: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{4,26}\/$

[Group 07] - [ LUqc663BAyN333-HoO ]

http://boningue.com/g843enx500-Jh/ http://froufrouandthomas.co.uk/c644kNg297-uy/ http://genxvisual.com/U494KHq064-VK/ http://kowalenko.ca/D603ImA780-xxJ/

Range adjustment.

OLD: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{11,15}-[a-zA-Z]{1,3}\/$ NEW: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{10,15}-[a-zA-Z]{1,3}\/$

[Group 08] - [ NUDA-X-52454-DE ]

http://anjep.com/TBWEV-YCAP-91327-DE/ http://bitach.com/RIJW-FNFE-86299-DE/ http://gestion-arte.com.ar/CLCJY-EMIE-76216-DE/ http://music111.com/VAQT-DYBC-27274-DE/

Range adjustment. Curious these all end with "DE" too, possibly region based given the "Rech" stuff seen previously; will follow-up after.

OLD: ^http:\/\/([^\x2F]+\/)+[A-Z]{4,5}-[A-Z]{1,3}-[0-9]{5}-[A-Z]{2}\/$ NEW: ^http:\/\/([^\x2F]+\/)+[A-Z]{4,5}-[A-Z]{1,4}-[0-9]{5}-[A-Z]{2}\/$

[Group 09] - [ CUST.-Document-YDI-04-GQ389557 ]

http://arroyave.net/Rech-K-682-GO1130/ http://donnjo.com/Rechnung-IOOY-776-LUV2894/

Added "Rech" and "Rechnung" to initial string grouping along with expanding some ranges.

OLD: ^http:\/\/([^\x2F]+\/)+(CUST|ORDER|Cust)(.)?(-Document)?-[A-Z]{2,3}-[0-9]{2}-[A-Z0-9]{7,10}\/$ NEW: ^http:\/\/([^\x2F]+\/)+(CUST|ORDER|Cust|Rech|Rechnung)(.)?(-Document)?-[A-Z]{1,4}-[0-9]{2,3}-[A-Z0-9]{6,10}\/$

[Group 10] - [ zp3x-r88-wuh.view ]

http://bobrow.com/ito-6r-w193-pkr.doc/ http://dentaltravelpoland.co.uk/NUN-63376893/b4fe-nn88-s.view/ http://frossweddingcollections.co.uk/qdu-7p-wi523-hgnt.doc/ http://gabrielramos.com.br/lxu-3h-ip079-zgmg.doc/ http://masmp.com/rby-4c-rp108-sqq.doc/ http://onlineme.w04.wh-2.com/LD-36666076/ir5r-mu75-h.view/ http://teed.ru/YG-47124992/bc7za-l30-v.view/ http://thegilbertlawoffice.com/m-9q-d054-gu.doc/

Range adjustment.

OLD: ^http:\/\/([^\x2F]+\/)+[a-z0-9]{1,4}-[a-z0-9]{3,4}(-[a-z0-9]{4,5})?-[a-z]{2,3}\.(view|doc)\/$ NEW: ^http:\/\/([^\x2F]+\/)+[a-z0-9]{1,5}-[a-z0-9]{2,4}(-[a-z0-9]{4,5})?-[a-z]{1,4}\.(view|doc)\/$

[Group 12] - [ ORDER.-5883789520 ]

http://albrightfinancial.com/gescanntes-Dokument-66764196575/ http://carriedavenport.com/Scan-58146582290/ http://davidberman.com/gescanntes-Dokument-85218870046/ http://kratiroff.com/Scan-62799108494/

Added "gescanntes" to initial string grouping (this is Dutch for "Scanned") and "Scan". Added "Dokument" to second optional grouping.

OLD: ^http:\/\/([^\x2F]+\/)+(ORDER|Rech|CUST|Cust)(.)?(-Document)?-[0-9]{10,11}\/$ NEW: ^http:\/\/([^\x2F]+\/)+(ORDER|Rech|CUST|Cust|gescanntes|Scan)(.)?(-Document|-Dokument)?-[0-9]{10,11}\/$

Alright, now I've cleared all of the remaining matches.

[+] FOUND [+] Count: 127/696 Comment: [Group 01] - [ t5wx-x064-mzdb ] PCRE: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{3,7}-[a-zA-Z0-9]{3,5}-[a-zA-Z]{1,5}\/$ [+] FOUND [+] Count: 80/696 Comment: [Group 02] - [ INVOICE-864339-98261 ] PCRE: ^http:\/\/([^\x2F]+\/)+(INVOICE|ORDER|CUST|Invoice|Cust)-[0-9]{6,7}-[0-9]{4,5}\/$ [+] FOUND [+] Count: 199/696 (+9) Comment: [Group 03] - [ H27560xzwsS ] PCRE: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{4,26}\/$ [+] FOUND [+] Count: 56/696 Comment: [Group 04] - [ RRT-13279129.dokument ] PCRE: ^http:\/\/([^\x2F]+\/)+[A-Z]{2,3}-[0-9]{8}\.dokument\/$ [+] FOUND [+] Count: 86/696 Comment: [Group 05] - [ EDHFR-08-77623-document-May-04-2017 ] PCRE: ^http:\/\/([^\x2F]+\/)+[A-Z]{2,5}(-[0-9]{2})?-[0-9]{5,10}-(document|doc)-May-[0-9]{2}-2017\/$ [+] FOUND [+] Count: 62/696 Comment: [Group 06] - [ download2467 ] PCRE: ^http:\/\/([^\x2F]+\/)+download[0-9]{4}\/$ [+] FOUND [+] Count: 47/696 (+4) Comment: [Group 07] - [ LUqc663BAyN333-HoO ] PCRE: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{10,15}-[a-zA-Z]{1,3}\/$ [+] FOUND [+] Count: 14/696 (+4) Comment: [Group 08] - [ NUDA-X-52454-DE ] PCRE: ^http:\/\/([^\x2F]+\/)+[A-Z]{4,5}-[A-Z]{1,4}-[0-9]{5}-[A-Z]{2}\/$ [+] FOUND [+] Count: 38/696 (+2) Comment: [Group 09] - [ CUST.-Document-YDI-04-GQ389557 ] PCRE: ^http:\/\/([^\x2F]+\/)+(CUST|ORDER|Cust|Rech|Rechnung)(.)?(-Document)?-[A-Z]{1,4}-[0-9]{2,3}-[A-Z0-9]{6,10}\/$ [+] FOUND [+] Count: 11/696 (+1) Comment: [Group 10] - [ zp3x-r88-wuh.view ] PCRE: ^http:\/\/([^\x2F]+\/)+[a-z0-9]{1,5}-[a-z0-9]{2,4}(-[a-z0-9]{4,5})?-[a-z]{1,4}\.(view|doc)\/$ [+] FOUND [+] Count: 3/696 Comment: [Group 11] - [ dhl___status___2668292851 ] PCRE: ^http:\/\/([^\x2F]+\/)+dhl___status___[0-9]{10}\/$ [+] FOUND [+] Count: 35/696 (+4) Comment: [Group 12] - [ ORDER.-5883789520 ] PCRE: ^http:\/\/([^\x2F]+\/)+(ORDER|Rech|CUST|Cust|gescanntes|Scan)(.)?(-Document|-Dokument)?-[0-9]{10,11}\/$

Round 6

The next step is to validate the matches with the "-s" flag in pcre_check. This will show all of the respective matches under each PCRE. For this phase, I just eyeball it to make sure there is no overlap and what's expected in each group is present.

All of the PCRE's look solid except Group 3, which I already mentioned would need more TLC, as it overlaps with other PCRE's.

For Group 3, I'm going to visually break these down. I'll put 5 examples under each sub-grouping to show how I separated them. Some are very good for matching while others will just have to be left behind. TAKE NOTE BAD GUYS, BEING GENRIC IS GOOD, UNIQUE SNOWFLAKES ARE THE FIRST AGAINST THE WALL.

[Group 03] - [ dhl/paket/com/pkp/appmanager/8376315127 ]

http://8kindsoffun.com/dhl/paket/com/pkp/appmanager/8376315127/ http://balletopia.org/dhl/paket/com/pkp/appmanager/7293445574/ http://cnwconsultancy.com/dhl/paket/com/pkp/appmanager/0622636111/ http://cookieco.com/dhl/paket/com/pkp/appmanager/8333287922/ http://cspdx.com/dhl/paket/com/pkp/appmanager/6213914600/

I think thins one would have stood out earlier had it not been clobbered by the previous PCRE. The path is very unique and ends with 10 digits. This PCRE will replace the old one for Group 3 and the other new ones will start at Group 13.

^http:\/\/([^\x2F]+\/)+dhl\/paket\/com\/pkp\/appmanager\/[0-9]{10}\/$

[Group 13] - [ 6572646300 ]

http://alfareklama.cz/6572646300/ http://algicom.net/6673413599/ http://bourdin.name/0014489972/ http://carbitech.net/dhl/2354409458/ http://dsltech.co.uk/0217183208/ ... http://oscartvazquez.com/DHL24/15382203695/

I'm going to create a PCRE for this one but I don't expect it to live past the FP check. There is one that stands off from the rest here with 11 numbers instead of 10 - it may be that I just don't have enough samples to account for that campaign. Finally, I'll need to exclude the previous set of matches which also end with 10 digits. To do this, I'll use a negative lookbehind to ensure once we match 10 digits, "appmanager" was not in the URL path.

^http:\/\/([^\x2F]+\/)+(?<!appmanager\/)[0-9]{10,11}\/$

[ alpha(lower) ]

http://aifesdespets.fr/kkrxtsmodw/ http://beowulf7.com/kgcee/ http://bunngalow.com/injeutznnb/ http://carbofilms.com/cms/wp-content/upgrade/jcnfkvken/ http://dolphinrunvb.com/yozypdznpb/

I don't see any good patterns in this set or the next one.

[ alpha(lower)numeric ]

http://benard.ca/z49641l/ http://jaqua.us/hid4kiwcvd84fljkpqpl/ http://krakhud.pl/rguen0ebxndrci41frworbr/ http://micromatrices.com/qwh7zxijifxsnxg20mlwa/ http://patu.ch/bgrvm2wqpjw74hz/

[Group 14] [ SCANNED/RZ7498WEXEZB ]

http://icaredentalstudio.com/APE88743TZ/ http://lbcd.se/MFV09235UA/ http://lucasliftruck.com/SCANNED/RZ7498WEXEZB/ http://meanconsulting.com/K44975X/ http://sentios.lt/W95941C/ http://triadesolucoes.com.br/SCANNED/RBA6517MHPKCZDEX/ http://xyphoid.com/SCANNED/MM3431UCNPCEZRO/ http://zahahadidmiami.com/K38258Q/

This group was characterized by alpha(upper)numeric, which normally wouldn't be worth pattern matching, but I can see two patterns in the above that may be worth entertaining. For Group 14, I'll match on the URL's with "SCANNED" string in the path and the unique placement of the digits within the string: 2-3 alpha(upper), 4 digits, 6-9 alpha(upper).

^http:\/\/([^\x2F]+\/)+SCANNED\/[A-Z]{2,3}[0-9]{4}[A-Z]{6,9}\/$

[Group 15] [ K44975X ]

http://meanconsulting.com/K44975X/ http://sentios.lt/W95941C/ http://zahahadidmiami.com/K38258Q/

For Group 15, I'll match on 1 alpha(upper), 5 digits, 1 alpha(upper). The non-matched ones in the previous Group 14 may be an expanded part of this campaign but it's such a weak PCRE and prone to FP that I'm not going to bother with it. It's highly likely to not make the final cut either way.

^http:\/\/([^\x2F]+\/)+[A-Z]{1}[0-9]{5}[A-Z]{1}\/$

[ alphanumeric long 18-26 ]

http://akhmerov.com/AuHffUo4L1BcEmca0BW5e4UtI/ http://arosa.nl/crm/xs2ckmwotgcml95cxdhbo/ http://crosslink.ca/nWlKL3PdKyi1goahyZfbNr/ http://ideaswebstudio.com/v3mzbzaink00sndmyz/ http://infojass.com/gvtsl7ddrnjkupn50pp/

Nothing jumps out at me that would make for a good PCRE. It has a similar structure of alpha, digit, alpha but the ranges are very broad which makes it highly prone to FP again.

[ alphanumeric short 7-14 ]

http://akirmak.com/QhS33472le/ http://austinaaron.com/eCjH94174LaN/ http://campanus.cz/N6571iwA/ http://carolsgardeninn.com/vX94098JvVJ/ http://cdoprojectgraduation.com/eaSz15612O/ ... http://www.alfredomartinez.com.mx/Afz3999lDtz/ http://www.kreodesign.pl/test/O77405ccSC/ http://www.prodzakaz.com.ua/H27560xzwsS/ http://www.voloskof.net/Sn83160EngQs/ http://zonasacra.com/zH83293YizhQ/

This next one follows the same pattern I identified for Group 15: 1-5 alphanumeric, 4-5 digits, 1-5 alphanumeric. I'll just update Group 15 and see how it fairs in the FP check, but for what it's worth, it does match every single entry in this category which had 30+.

OLD: ^http:\/\/([^\x2F]+\/)+[A-Z]{1}[0-9]{5}[A-Z]{1}\/$ NEW: ^http:\/\/([^\x2F]+\/)+[A-Za-z]{1,4}[0-9]{4,5}[a-zA-Z]{1,5}\/$

Refinement

Now that everything is clustered together, I'll do one final visual inspection to see if any other patterns jump out that allow us to tighten the rules up and avoid FP's.

[Group 01] - [ t5wx-x064-mzdb ]

http://12back.com/dw3wz-ue164-qqv/ http://4glory.net/p7lrq-s191-iv/ http://aconai.fr/v4OZ-PR72-gtS/ http://adamkranitz.com/gqj5ijg-y250-ex/ http://allisonhibbard.com/x4b-th601-m/

In Group 1, we can actually refine this a bit once you see the underlying pattern. Almost every part of this one changed so I'll just go back over it: 1-3 alpha, 1 digit, 1-3 alpha, dash, 1-2 alpha, 2-3 digit, dash, 1-5 alpha.

OLD: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{3,7}-[a-zA-Z0-9]{3,5}-[a-zA-Z]{1,5}\/$ NEW: ^http:\/\/([^\x2F]+\/)+[a-zA-Z]{1,3}[0-9]{1}[a-zA-Z]{1,3}-[a-zA-Z]{1,2}[0-9]{2,3}-[a-zA-Z]{1,5}\/$

[Group 07] - [ LUqc663BAyN333-HoO ]

http://agenity.com/EAVx829uahI723-tv/ http://argoinf.com/YFSR334KgXCe907-z/ http://artmedieval.net/RK415njzzR555-p/ http://autoradio.com.br/fRq804tvz270-tWa/ http://belief-systems.com/obn247eaC420-Z/

In Group 7, the first part of the pattern can be refined: 1-4 alphanumeric, 3 digits, 1-5 alphanumeric, 3 digits.

OLD: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{10,15}-[a-zA-Z]{1,3}\/$ NEW: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{1,4}[0-9]{3}[a-zA-Z]{1,5}[0-9]{3}-[a-zA-Z]{1,3}\/$

[Group 08] - [ NUDA-X-52454-DE ]

http://altius.co.in/EJZB-T-66361-DE/ http://anjep.com/TBWEV-YCAP-91327-DE/ http://aquarthe.com/AIUO-P-70826-DE/ http://bitach.com/RIJW-FNFE-86299-DE/ http://cliftonsecurities.co.uk/YJTX-NMO-51102-DE/

In Group 8 they all end with "DE" so I'll convert that part to a static string.

OLD: ^http:\/\/([^\x2F]+\/)+[A-Z]{4,5}-[A-Z]{1,4}-[0-9]{5}-[A-Z]{2}\/$ NEW: ^http:\/\/([^\x2F]+\/)+[A-Z]{4,5}-[A-Z]{1,4}-[0-9]{5}-DE\/$

[Group 09] - [ CUST.-Document-YDI-04-GQ389557 ]

http://archabits.com/ORDER.-AXN-60-X400251/ http://arrosio.com.ar/ORDER.-Document-SF-41-F318806/ http://arroyave.net/Rech-K-682-GO1130/ http://avenueevents.co.uk/Cust-PBP-03-D683320/ http://babyo.com.mx/Cust-Document-KEQ-04-FF065857/

In Group 9, every entry entry ends with 1-3 alpha(upper) followed by 4-6 digits.

OLD: ^http:\/\/([^\x2F]+\/)+(CUST|ORDER|Cust|Rech|Rechnung)(.)?(-Document)?-[A-Z]{1,4}-[0-9]{2,3}-[A-Z0-9]{6,10}\/$ NEW: ^http:\/\/([^\x2F]+\/)+(CUST|ORDER|Cust|Rech|Rechnung)(.)?(-Document)?-[A-Z]{1,4}-[0-9]{2,3}-[A-Z]{1,3}[0-9]{4,6}\/$

The final run for the PCRE's before FP testing.

[+] FOUND [+] Count: 127/696 Comment: [Group 01] - [ t5wx-x064-mzdb ] PCRE: ^http:\/\/([^\x2F]+\/)+[a-zA-Z]{1,3}[0-9]{1}[a-zA-Z]{1,3}-[a-zA-Z]{1,2}[0-9]{2,3}-[a-zA-Z]{1,5}\/$ [+] FOUND [+] Count: 80/696 Comment: [Group 02] - [ INVOICE-864339-98261 ] PCRE: ^http:\/\/([^\x2F]+\/)+(INVOICE|ORDER|CUST|Invoice|Cust)-[0-9]{6,7}-[0-9]{4,5}\/$ [+] FOUND [+] Count: 29/696 (changed to new pattern) Comment: [Group 03] - [ dhl/paket/com/pkp/appmanager/8376315127 ] PCRE: ^http:\/\/([^\x2F]+\/)+dhl\/paket\/com\/pkp\/appmanager\/[0-9]{10}\/$ [+] FOUND [+] Count: 56/696 Comment: [Group 04] - [ RRT-13279129.dokument ] PCRE: ^http:\/\/([^\x2F]+\/)+[A-Z]{2,3}-[0-9]{8}\.dokument\/$ [+] FOUND [+] Count: 86/696 Comment: [Group 05] - [ EDHFR-08-77623-document-May-04-2017 ] PCRE: ^http:\/\/([^\x2F]+\/)+[A-Z]{2,5}(-[0-9]{2})?-[0-9]{5,10}-(document|doc)-May-[0-9]{2}-2017\/$ [+] FOUND [+] Count: 62/696 Comment: [Group 06] - [ download2467 ] PCRE: ^http:\/\/([^\x2F]+\/)+download[0-9]{4}\/$ [+] FOUND [+] Count: 47/696 Comment: [Group 07] - [ LUqc663BAyN333-HoO ] PCRE: ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{1,4}[0-9]{3}[a-zA-Z]{1,5}[0-9]{3}-[a-zA-Z]{1,3}\/$ [+] FOUND [+] Count: 14/696 Comment: [Group 08] - [ NUDA-X-52454-DE ] PCRE: ^http:\/\/([^\x2F]+\/)+[A-Z]{4,5}-[A-Z]{1,4}-[0-9]{5}-DE\/$ [+] FOUND [+] Count: 38/696 Comment: [Group 09] - [ CUST.-Document-YDI-04-GQ389557 ] PCRE: ^http:\/\/([^\x2F]+\/)+(CUST|ORDER|Cust|Rech|Rechnung)(.)?(-Document)?-[A-Z]{1,4}-[0-9]{2,3}-[A-Z]{1,3}[0-9]{4,6}\/$ [+] FOUND [+] Count: 11/696 Comment: [Group 10] - [ zp3x-r88-wuh.view ] PCRE: ^http:\/\/([^\x2F]+\/)+[a-z0-9]{1,5}-[a-z0-9]{2,4}(-[a-z0-9]{4,5})?-[a-z]{1,4}\.(view|doc)\/$ [+] FOUND [+] Count: 3/696 Comment: [Group 11] - [ dhl___status___2668292851 ] PCRE: ^http:\/\/([^\x2F]+\/)+dhl___status___[0-9]{10}\/$ [+] FOUND [+] Count: 35/696 Comment: [Group 12] - [ ORDER.-5883789520 ] PCRE: ^http:\/\/([^\x2F]+\/)+(ORDER|Rech|CUST|Cust|gescanntes|Scan)(.)?(-Document|-Dokument)?-[0-9]{10,11}\/$ [+] FOUND [+] Count: 15/696 Comment: [Group 13] - [ 6572646300 ] PCRE: ^http:\/\/([^\x2F]+\/)+(?<!appmanager\/)[0-9]{10,11}\/$ [+] FOUND [+] Count: 3/696 Comment: [Group 14] [ SCANNED/RZ7498WEXEZB ] PCRE: ^http:\/\/([^\x2F]+\/)+SCANNED\/[A-Z]{2,3}[0-9]{4}[A-Z]{6,9}\/$ [+] FOUND [+] Count: 60/696 Comment: [Group 15] [ K44975X ] PCRE: ^http:\/\/([^\x2F]+\/)+[A-Za-z]{1,4}[0-9]{4,5}[a-zA-Z]{1,5}\/$

That leaves only 30 URL's that I was unable to reliably match - not too shabby! You can find the output of the pcre_check script showing the matches and non-matches HERE.

The current PCRE list is below.

^http:\/\/([^\x2F]+\/)+[a-zA-Z]{1,3}[0-9]{1}[a-zA-Z]{1,3}-[a-zA-Z]{1,2}[0-9]{2,3}-[a-zA-Z]{1,5}\/$ [Group 01] - [ t5wx-x064-mzdb ] ^http:\/\/([^\x2F]+\/)+(INVOICE|ORDER|CUST|Invoice|Cust)-[0-9]{6,7}-[0-9]{4,5}\/$ [Group 02] - [ INVOICE-864339-98261 ] ^http:\/\/([^\x2F]+\/)+dhl\/paket\/com\/pkp\/appmanager\/[0-9]{10}\/$ [Group 03] - [ dhl/paket/com/pkp/appmanager/8376315127 ] ^http:\/\/([^\x2F]+\/)+[A-Z]{2,3}-[0-9]{8}\.dokument\/$ [Group 04] - [ RRT-13279129.dokument ] ^http:\/\/([^\x2F]+\/)+[A-Z]{2,5}(-[0-9]{2})?-[0-9]{5,10}-(document|doc)-May-[0-9]{2}-2017\/$ [Group 05] - [ EDHFR-08-77623-document-May-04-2017 ] ^http:\/\/([^\x2F]+\/)+download[0-9]{4}\/$ [Group 06] - [ download2467 ] ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{1,4}[0-9]{3}[a-zA-Z]{1,5}[0-9]{3}-[a-zA-Z]{1,3}\/$ [Group 07] - [ LUqc663BAyN333-HoO ] ^http:\/\/([^\x2F]+\/)+[A-Z]{4,5}-[A-Z]{1,4}-[0-9]{5}-DE\/$ [Group 08] - [ NUDA-X-52454-DE ] ^http:\/\/([^\x2F]+\/)+(CUST|ORDER|Cust|Rech|Rechnung)(.)?(-Document)?-[A-Z]{1,4}-[0-9]{2,3}-[A-Z]{1,3}[0-9]{4,6}\/$ [Group 09] - [ CUST.-Document-YDI-04-GQ389557 ] ^http:\/\/([^\x2F]+\/)+[a-z0-9]{1,5}-[a-z0-9]{2,4}(-[a-z0-9]{4,5})?-[a-z]{1,4}\.(view|doc)\/$ [Group 10] - [ zp3x-r88-wuh.view ] ^http:\/\/([^\x2F]+\/)+dhl___status___[0-9]{10}\/$ [Group 11] - [ dhl___status___2668292851 ] ^http:\/\/([^\x2F]+\/)+(ORDER|Rech|CUST|Cust|gescanntes|Scan)(.)?(-Document|-Dokument)?-[0-9]{10,11}\/$ [Group 12] - [ ORDER.-5883789520 ] ^http:\/\/([^\x2F]+\/)+(?<!appmanager\/)[0-9]{10,11}\/$ [Group 13] - [ 6572646300 ] ^http:\/\/([^\x2F]+\/)+SCANNED\/[A-Z]{2,3}[0-9]{4}[A-Z]{6,9}\/$ [Group 14] [ SCANNED/RZ7498WEXEZB ] ^http:\/\/([^\x2F]+\/)+[A-Za-z]{1,4}[0-9]{4,5}[a-zA-Z]{1,5}\/$ [Group 15] [ K44975X ]

Rule Vetting

The last step is to check the PCRE's against a corpus of random URL's and see if they appear strict enough in their matching to be used in a production environment. This is critical if you plan to use them for blocking instead of just identification. I can't stress enough how important this phase is; while it's nice to be alerted on access to one of these URL's, it's solid gold if you can prevent attacks and C2 from happening in the first place. Of course, with any blocking action, the caveat is that one wrong block could spell disaster so these need to be as close to perfect as possible.

Ideally, you want to test against a large amount of URL's from your own environment that most closely resemble what traffic your users generate. Unfortunately that's not always possible, or you don't have users, so you need to either build your own corpus or find someone who can test the PCRE's for you.

There isn't much online in the way of random URL lists or logs but I've put together a few possible methods one could try to compile a fairly random set of URL's, and then I'll detail my preferred method.

The Twitter option works nicely and can generate hundreds of thousands of unique URL's per day. Given enough time, you'll have a solid base to test your PCRE's against.

To do this, you need to register an app with Twitter and get your API keys. Once you have those, I've included a Python script, twitter_scraper that you can input them into and run in a continous loop with a one-liner like the below.

while true; do sleep 5; python twitter_scraper.py >> twitter_urls; done

I've also included 2 million URL's on GitHub, which is just under the 25MB file limit compressed. These are ones that I've scraped in the past few days and should help you get started.

Typically I'll check this every so often and filter out things like URL shortening services or other sites that, for one reason or another, have bubbled up to the top of my domain list. This keeps it filled with fairly unique sites and helps improve entropy.

Below is a GIF of the sites streaming by in real time, showing some of the variety.

Once we have our list, we can run pcre_check against the URL's and see how our PCRE's fare.

$ python pcre_check.py -u twitter_urls -p emotet_pcres -s [+] FOUND [+] Count: 1290/2000000 Comment: [Group 13] - [ 6572646300 ] PCRE: ^http:\/\/([^\x2F]+\/)+(?<!appmanager\/)[0-9]{10,11}\/$ [-] MATCH [-] http://db.netkeiba.com/horse/1985105175/ ... http://www.northernminer.com/news/lukas-lundin-copper-commodity-choice/1003786598/ http://www.oita-trinita.co.jp/news/20170532318/ ... http://www.schuh.co.uk/womens/irregular-choice-x-disney-how-do-i-look?-pink-flat-shoes/1364153360/ ... http://www.yutaro-miura.com/info/event/2017/0528100324/ http://yapi.ta2o.net/maseli/2017052901/ [+] FOUND [+] Count: 2595/2000000 Comment: [Group 15] [ K44975X ] PCRE: ^http:\/\/([^\x2F]+\/)+[A-Za-z]{1,4}[0-9]{4,5}[a-zA-Z]{1,5}\/$ [-] MATCH [-] http://epcaf.com/c2805tw/ ... http://hobbyostrov.ru/automodels/electro-monster-1-10/tra3602g/ ... http://monipla.jp/mfpa/card2017ss/ http://ncode.syosetu.com/N0588Q/ ... http://www.nollieskateboarding.com/fs5050grind/ http://www.profootballweekly.com/2017/05/30/victor-cruz-prepared-to-produce-and-mentor-in-chicago-bears-transitioning-wr-corps/a4613p/

Using the "-s" (show matches) flag in pcre_check will allow you to manually review the false positives. If the sites don't look legitimate or match a little too perfectly, you'll want to do a little manual research to make sure they are in fact FP's and not true positives you didn't know about. I've truncated the results but above shows a few under each to give you an idea of the kind of output I'm looking for to conclude it's not up-to-par.

As you can see, Group 13 and 15 have numerous false-positives. This isn't surprising given Group 13 is simply 10 digits and Group 15 is a small range of alpha, digits, alpha, which continued to repeat itself throughout my analysis.

Additionally, I sent these PCRE's to some fellow miscreant punchers who ran them through over billions of URL's from their environment and received similar output with FP's only for Group 13 and 15.

The last check I'll perform for this set is to remove the trailing forward slash ("/") that was included in the PCRE's. The reason for this is that, while my Emotet seed list all included the forward slash, the URL's I'm scraping may not have it and I just want to try to further identify any potential issues.

$ python pcre_check.py -p emotet_pcres_mod -u twitter_urls

Nadda. Fantastic!

Wrapping-up

All in all, 13 total PCRE's make the cut and cover the seen Emotet download URL's. These will provide good historical forensic capability and good passive blocking for future victims of these campaigns.

With that, the below is the final list for publishing and available on GitHub, along with all of the above iterations.

^http:\/\/([^\x2F]+\/)+[a-zA-Z]{1,3}[0-9]{1}[a-zA-Z]{1,3}-[a-zA-Z]{1,2}[0-9]{2,3}-[a-zA-Z]{1,5}\/$ karttoon 31MAY2017 - Emotet download - [ t5wx-x064-mzdb ] ^http:\/\/([^\x2F]+\/)+(INVOICE|ORDER|CUST|Invoice|Cust)-[0-9]{6,7}-[0-9]{4,5}\/$ karttoon 31MAY2017 - Emotet download - [ INVOICE-864339-98261 ] ^http:\/\/([^\x2F]+\/)+dhl\/paket\/com\/pkp\/appmanager\/[0-9]{10}\/$ karttoon 31MAY2017 - Emotet download - [ dhl/paket/com/pkp/appmanager/8376315127 ] ^http:\/\/([^\x2F]+\/)+[A-Z]{2,3}-[0-9]{8}\.dokument\/$ karttoon 31MAY2017 - Emotet download - [ RRT-13279129.dokument ] ^http:\/\/([^\x2F]+\/)+[A-Z]{2,5}(-[0-9]{2})?-[0-9]{5,10}-(document|doc)-May-[0-9]{2}-2017\/$ karttoon 31MAY2017 - Emotet download - [ EDHFR-08-77623-document-May-04-2017 ] ^http:\/\/([^\x2F]+\/)+download[0-9]{4}\/$ karttoon 31MAY2017 - Emotet download - [ download2467 ] ^http:\/\/([^\x2F]+\/)+[a-zA-Z0-9]{1,4}[0-9]{3}[a-zA-Z]{1,5}[0-9]{3}-[a-zA-Z]{1,3}\/$ karttoon 31MAY2017 - Emotet download - [ LUqc663BAyN333-HoO ] ^http:\/\/([^\x2F]+\/)+[A-Z]{4,5}-[A-Z]{1,4}-[0-9]{5}-DE\/$ karttoon 31MAY2017 - Emotet download - [ NUDA-X-52454-DE ] ^http:\/\/([^\x2F]+\/)+(CUST|ORDER|Cust|Rech|Rechnung)(.)?(-Document)?-[A-Z]{1,4}-[0-9]{2,3}-[A-Z]{1,3}[0-9]{4,6}\/$ karttoon 31MAY2017 - Emotet download - [ CUST.-Document-YDI-04-GQ389557 ] ^http:\/\/([^\x2F]+\/)+[a-z0-9]{1,5}-[a-z0-9]{2,4}(-[a-z0-9]{4,5})?-[a-z]{1,4}\.(view|doc)\/$ karttoon 31MAY2017 - Emotet download - [ zp3x-r88-wuh.view ] ^http:\/\/([^\x2F]+\/)+dhl___status___[0-9]{10}\/$ karttoon 31MAY2017 - Emotet download - [ dhl___status___2668292851 ] ^http:\/\/([^\x2F]+\/)+(ORDER|Rech|CUST|Cust|gescanntes|Scan)(.)?(-Document|-Dokument)?-[0-9]{10,11}\/$ karttoon 31MAY2017 - Emotet download - [ ORDER.-5883789520 ] ^http:\/\/([^\x2F]+\/)+SCANNED\/[A-Z]{2,3}[0-9]{4}[A-Z]{6,9}\/$ karttoon 31MAY2017 - Emotet download [ SCANNED/RZ7498WEXEZB ]

Hopefully this was helpful to some and demonstrated the ease in which these can be created to identify malicious patterns.

The more the merrier in the sharing community!

Ciao!




Older posts...