Key Points
-
DanaBot is a malware-as-a-service platform discovered in 2018 that is designed to steal sensitive information that may be used for wire fraud, conduct cryptocurrency theft, or perform espionage related activities
- The malware is heavily obfuscated which makes it very difficult and time consuming to reverse engineer and analyze
- Zscaler ThreatLabz has reverse engineered the various obfuscation techniques used by DanaBot and developed a set of tools using IDA Python scripts to assist with binary analysis
DanaBot, first discovered in 2018, is a malware-as-a-service platform that threat actors use to steal usernames, passwords, session cookies, account numbers, and other personally identifiable information (PII). The threat actors may use this stolen information to commit banking fraud, steal cryptocurrency, or sell access to other threat actors.
While DanaBot isn’t as prominent as it once was, the malware is still a formidable and active threat. Recently, version 2646 of the malware was spotted in the wild and also a researcher tweeted screenshots of Danabot’s advertisement website shown in Figure 1.
Figure 1: DanaBot’s advertisement website
Unfortunately, the DanBot developers have done a very good job of obfuscating the malware code. Therefore, it is very difficult and time consuming process to to reverse engineer and analyze. This is a companion blog post to a set of IDA Python scripts that Zscaler ThreatLabz is releasing on our Github page. The goal of the scripts is to help peel away some of the layers of DanaBot’s obfuscations and inspire additional research into not only the obfuscation techniques, but the malware itself.
Technical Analysis
The following sections summarize the numerous techniques that the DanaBot developers have implemented to obfuscate the malware binary code.
Junk Byte Jumps
One of the first anti-analysis techniques that DanaBot employs is a “junk byte jump” instruction. This is an anti-disassembly technique where a jump instruction will always jump over a junk byte. The junk byte is skipped during normal program execution, but causes IDA Pro to display an incorrect disassembly. An example of this technique is shown in Figure 2:
Figure 2: An example of a junk byte jump
The 01_junk_byte_jump.py IDA Python script searches for junk byte jump patterns and patches them with NOP instructions. This operation fixes IDA Pro’s disassembly as shown in Figure 3.
Figure 3: An example of a patched junk byte jump
Dynamic Returns
The next anti-analysis technique is a “dynamic return” operation. This technique calculates a new return address at the end of a function, causing a change in the program’s control flow. In DanaBot’s implementation, they are used to “extend” a function–exposing additional hidden code. An example of a dynamic return is shown in Figure 4.
Figure 4: An example of a dynamic return
Using the 02_dynamic_return.py IDA Python script, these dynamic returns can be patched, the functions extended, and the hidden code exposed. An example of this is shown in Figure 5.
Figure 5: An example of a patched dynamic return
Stack String Deobfuscation Preparation and Code Re-analysis
Before moving on to additional DanaBot anti-analysis techniques, we’ve included three IDA Python scripts:
The first two scripts are preparation steps to help with stack string deobfuscation described in a later section. The first script patches out a code pattern that is used to uppercase letters (this removes a small function basic block that interferes with stack string reconstruction) and the second script renames variables that store the letters used in stack strings.
Before running the third script, check that IDA Pro’s ”Options->Compiler…” is set to “Delphi” (see Figure 6.)
Figure 6: IDA Pro’s Compiler options
Since the previous scripts patched a lot of existing code and exposed a bunch of new code, the 05_reset_code.py script helps reset and re-analyze the modified code in IDA Pro to get a cleaner IDB database. Once the script and analysis completes, some manual clean up may be required. Our general method is:
- Search → Sequence of bytes…
- Search for the standard function prolog: 55 8B EC
- Sort by Function
- For each result without a defined function:
- Right click → Create function…
- Look for any addresses that are causing issues in the Output window
- Right click → Undefine
- Right click → Code
Junk StrAsg and StrCopy Function Calls
DanaBot adds a lot of junk code to slow down and complicate reverse engineering. One of the junk code patterns is adding extraneous StrAsg and StrCopy function calls. These functions are part of the standard Delphi library and are used to assign or copy data between variables. Figure 7 shows an example snippet of code with a number of these calls. If we trace the variable arguments we can see that they are usually assigned to themselves or a small set of other variables that aren’t used in actual malware code.
Figure 7: Example of junk StrAsg and StrCopy function calls
The IDA Python script 06_fake_UStrLAsg_and_UStrCopy.py tries to find and patch these junk calls. Figure 8 shows the result in the example from Figure 7.
Figure 8: Example of patched junk StrAsg and StrCopy function calls
Stack Strings
The next obfuscation method is DanaBot’s version of creating “stack strings”. The malware assigns letters of the alphabet to individual variables and then uses those variables, pointers to those variables, and various Delphi character/string handling functions to construct strings one character at a time. Figure 9 is an example construction of the string “wow64.dll”.
Figure 9: Example stack string of “wow64.dll”
These stack strings litter most of the malicious functions in DanaBot and very easily lead to reverse engineering fatigue. On top of that, while some of the constructed strings are used for malware purposes, most of them turn out to be junk strings. Figure 10 is a snippet of output from a script that will be introduced below. As can be seen in the figure, most of the strings are random DLL, executable, and Windows API names.
Figure 10: Example script output showing junk strings
The best way to extract these stack strings is by emulating the construction code, but due to the following reasons we experimented with another deobfuscation technique:
- There are thousands of these strings
- There are not clear start/end patterns to automatically extract the construction code
- They rely on standard Delphi functions which aren’t particularly easy to emulate
- Most of them are junk strings
- The sheer amount of construction code hinders malware analysis the most
The goal of the IDA Python scripts 07_stack_string_letters_to_last_StrCatN_call.py and 08_set_stack_string_letters_comments.py is not to extract a wholly accurate stack string, but enough of the string to determine whether the string is junk or not. After some trial and error experimentation, the scripts also try their best to remove the stack string construction code to allow for much easier analysis. If the string turns out to be legitimate, the original construction code is saved as comments so a proper extraction of the string can be done if/when needed.
Empty Loops and Junk Math Loops
After removing the junk StrAsg and StrCopy function calls and the stack strings, there will be a bunch of empty loops. The IDA Python script 09_empty_loops.py can be used to remove these loops. There will also be loops left that just contain junk math code (see Figure 11.) The IDA Python script 10_math_loops.py will remove these junk code math loops.
Figure 11: Example junk math loops
Junk Strings and Junk Global Variables
As we saw in the stack strings section above, there were a lot of DLL, executable, and Windows API name based junk strings. These junk strings exist as normal strings as well, see Figure 12 as an example.
Figure 12: Example junk strings
While we haven’t found good patterns to automatically remove references to these junk strings, the IDA Python script 11_rename_junk_variables.py renames them as “junk” to ease manual analysis.
DanaBot also adds a lot of junk code involving global variables and various math operations, see Figure 13 for an example.
Figure 13: Example junk global variable math
The IDA Python script 12_rename_junk_random_variables.py attempts to locate and rename these variables as “junk” to help with analysis.
Miscellaneous Tips and Tricks
Based on our experience reverse engineering DanaBot over the years, we have found the following miscellaneous tricks and tips to be helpful. The first is using the Interactive Delphi Reconstructor (IDR) program to export standard Delphi library function and variable names. We use Tools → MAP Generator and Tools → IDC Generator to export MAP and IDC files. While IDR creates an IDA IDC script, we don’t use it directly as it degrades the quality of the IDA Pro disassembly/decompilation. Instead, we use the scripts idr_idc_to_idapy.py and idr_map_to_idapy.py to extract the information from the generated IDC and MAP files and use the output scripts to import the naming information.
DanaBot resolves some of its Windows API functions by hash, so we use OALabs’ HashDB IDA Plugin (which recently added support for DanaBot’s hashing algorithm) to resolve the names by hash.
Finally, we make liberal use of IDA Pro’s right click → Collapse item feature to hide the remaining junk code, especially the renamed junk strings and global variables.
Before and After Example
As an overall example, Figure 14 is a screenshot for a section of DanaBot code before the deobfuscation scripts have been applied. The details of the code don’t particularly matter for this discussion, but the snippet shows DanaBot’s initialization of its 455-byte binary structure used in its initial “system information” command and control beacon.
Figure 14: Example of code before deobfuscations
Figure 15 is an example of the same code snippet after applying the deobfuscation scripts.
Figure 15: Example of code after deobfuscations
Conclusion
While there is still room for improvement, the DanaBot malware code is much easier to analyze and reason about. Expanding the scope to the entire binary, the deobfuscation techniques significantly reduce the complexity and time spent while reverse engineering the malware. We look forward to making further improvements/additions and welcome other researchers’ contributions to the existing scripts to peel away more layers of DanaBot’s obfuscation.
Zscaler Detection Status
Cloud Sandbox Detection
Indicators of Compromise
IOC |
Notes |
8c6224d9622b929e992500cb0a75025332c9cf901b3a25f48de6c87ad7b67114 |
SHA256 hash of DanaBot version 2646 main component |