Zscalerのブログ
Zscalerの最新ブログ情報を受信
購読するAutomating Pikabot’s String Deobfuscation
Introduction
Pikabot is a malware loader that originally emerged in early 2023 with one of the prominent features being the code obfuscation that it leverages to evade detection and thwart technical analysis. Pikabot employed the obfuscation method to encrypt binary strings, including the address of the command-and-control (C2) servers.
In this article, we briefly describe the obfuscation method used by Pikabot and we present an IDA plugin (with source code) that we developed to assist in our binary analysis.
As mentioned in our previous article, the obfuscation method was removed when Pikabot remerged with a new version in early 2024. As of April 2024, this obfuscation method has not been used again in any Pikabot samples.
Key Takeaways
- Pikabot is a malware loader that was first observed in early 2023 and became very active following the takedown of Qakbot in August 2023.
- Previous versions of Pikabot used advanced string encryption techniques, which have been replaced with simpler algorithms. Previously, the strings were encrypted using a combination of AES-CBC and RC4 algorithms.
- The string obfuscation’s implementation is similar to
ADVobfuscator
. - In this article, we describe the binary strings’ obfuscation algorithm and our approach to decrypt the binary strings using IDA’s microcode.
- Zscaler ThreatLabz developed an IDA plugin to automatically decrypt Pikabot’s obfuscated strings and are releasing the source code.
Technical Analysis
Strings obfuscation
The steps for decrypting a Pikabot string are relatively simple. Each string is decrypted only when required (in other words, Pikabot does not decrypt all strings at once). Pikabot follows the steps below to decrypt a string:
- Pushes on the stack the encrypted string array.
- Initializes the RC4 encryption algorithm. The RC4 key is different for each string (with very few exceptions).
- Pikabot takes the decrypted RC4 output, decodes it using Base64 after replacing all instances of the character ‘
_
’ (underscore) with ‘=
’ (equal) and decrypts it using the AES-CBC algorithm. The AES key and initialization vector (IV) are the same for all strings.
ANALYST NOTE: There are encrypted strings, which are encrypted only with the RC4 algorithm.
Figure 1 shows the code used to decrypt the string, Kernel32.dll
.
Figure 1: Example Pikabot string decryption for Kernel32.dll
.
Figure 2 shows the function that first decrypts the AES key and IV. The RC4 decrypted string passed to the function is then Base64 decoded, and is finally decrypted using AES.
Figure 2: Pikabot Base64 decoding and AES decryption function.
Decrypting Pikabot strings
The following information is required to decrypt a Pikabot string:
- The AES key and IV of a binary sample.
- The RC4 encrypted array of each string.
- The RC4 key of each encrypted string.
- The string’s size.
Our approach relies on IDA’s microcode. This decision helped us with several problems such as:
- IDA’s microcode converts the assignment/copy of the RC4 key into a
strcpy
function. In the assembly level, this could either be multiplemov
orrep
instructions. As a result, it would make the detection and extraction harder and more challenging. - Extracting the RC4 encrypted array. Since IDA reconstructs the stack, it makes it much easier to search and extract the encrypted array.
IDA’s microcode brings other limitations (for example, decompilation failure for a function) but no such issues were encountered for the parts of the code we wanted to analyze.
In the sections below, we describe how each component was extracted.
Extracting the AES key/IV
For the extraction of the AES key and IV, we iterate all analyzed functions and discard any function, whose size is not in the range of 600 and 1,600 bytes.
Next, we scan the functions for the following patterns:
- Existence of RC4 encryption. This is the same heuristic we use for detecting encrypted RC4 strings.
- Existence of values 0x3D and 0x5F (used before Base64 decoding the string) that are used with microcode opcodes
m_stx
andm_jnz
respectively.
Lastly, if all of the patterns above match, then the handler for decrypting a Pikabot string is invoked. For the classification of the key and the IV, we apply the following checks:
- The number of decrypted strings from the identified function must be two. Otherwise, the identified function is incorrect.
- The longest string is marked as the AES key (by taking the first 32-bytes) and the remaining decrypted string as the IV (by taking the first 16-bytes).
Extracting the RC4 encrypted array
Pikabot constructs the RC4 encrypted array by pushing it onto the stack and then decrypting it. Our approach involves the following steps for detecting each part of the array:
- Use the detected RC4 encryption block address as a starting point.
- Search for the microcode opcode
m_add
in the decryption instruction. The detected microcode holds the starting stack offset of the encrypted array. - Start iterating backwards and search for the microcode opcodes
m_mov/m_call
, the second opcode is used in case the data is copied via astrcpy
ormemcpy
instruction. If the stack offset matches, then we save the data and update the stack offset. This process is repeated until the reconstructed encrypted array has the expected size.
Extracting the RC4 encrypted array size
The length of the encrypted array is extracted in a similar way as the encrypted array. The detection pattern is:
- Use the detected RC4 encryption block address as a starting point.
- Search for the microcode opcodes
m_jb
,m_jae
, andm_setb
, and use the immediate constant number in the instruction as a size.
Extracting the RC4 key
Extracting the RC4 key of each string proved to be the most challenging part while creating the plugin. In our first attempt, we were extracting the RC4 key after detecting the initialization of the RC4 algorithm. However, this approach had the following issues:
- Incorrect extraction of the RC4 key: In many cases, an invalid/junk string was placed in-between the correct RC4 key and the RC4 algorithm initialization.
- Incorrect detection of RC4 initialization code block: For example, if the size of the encrypted array was 256 bytes then an incorrect RC4 key would be detected.
Instead of trying to detect the RC4 key by detecting the initialization of the RC4 algorithm, we decided to extract all strings from each targeted function. Then, we decrypted the RC4 encrypted array with each extracted RC4 key and validated the decrypted output by applying the following checks:
- If it matches the expected string size.
- If all characters of the string are readable.
ANALYST NOTE: After successful decryption, the RC4 key is marked and not reused in order to limit any false-positives. For example, if the decrypted string does not have any junk characters.
IDA Plugin
We tested our Pikabot plugin with IDA versions 8 and newer. The plugin can be executed by compiling the source code using IDA's SDK and/or copying the generated DLL into the IDA plugins folder. After a Pikabot sample is loaded, the user can decompile a function and right-click in the decompiled output and either choose to decrypt strings in the current function or in all of them (Figure 3).
Figure 3: IDA Pikabot plugin options.
For each decrypted string, the plugin sets a comment in the decompiled output. Figure 4 shows a function with the obfuscated strings before the plugin is invoked.
Figure 4: Before running the Pikabot string decryption plugin.
Figure 5 shows the output after our Pikabot IDA plugin is executed.
Figure 5: Output after running the Pikabot string decryption plugin.
Source Code
The source code for our IDA plugin to deobfuscate Pikabot strings can be found at this GitHub repository.
Conclusion
Older Pikabot variants include a string obfuscation implementation, which can make automation a complicated task. By using IDA’s microcode and developing our own plugin, we were able to speed up our analysis in most cases and analyze the code much faster. Since this technique is no longer used by Pikabot, we decided to open source our IDA plugin to assist the research community with defeating current and future stack-based obfuscation techniques.
Zscaler Coverage
In addition to sandbox detections, Zscaler’s multilayered cloud security platform detects indicators related to Pikabot at various levels with the following threat names:
Indicators Of Compromise (IOCs)
The following samples were used for testing the plugin.
SHA256 | DESCRIPTION |
aebff5134e07a1586b911271a49702c8623b8ac8da2c135d4d3b0145a826f507 | Pikabot Sample |
4c53383c1088c069573f918c0f99fe30fa2dc9e28e800d33c4d212a5e4d36839 | Pikabot Sample |
15e4de42f49ea4041e4063b991ddfc6523184310f03e645c17710b370ee75347 | Pikabot Sample |
e97fd71f076a7724e665873752c68d7a12b1b0c796bc7b9d9924ec3d49561272 | Pikabot Sample |
a9f0c978cc851959773b90d90921527dbf48977b9354b8baf024d16fc72eae01 | Pikabot Sample |
1c125a10c33d862e6179b6827131e1aac587d23f1b7be0dbcb32571d70e34de4 | Pikabot Sample |
62f2adbc73cbdde282ae3749aa63c2bc9c5ded8888f23160801db2db851cde8f | Pikabot Sample |
b178620d56a927672654ce2df9ec82522a2eeb81dd3cde7e1003123e794b7116 | Pikabot Sample |
72f1a5476a845ea02344c9b7edecfe399f64b52409229edaf856fcb9535e3242 | Pikabot Sample |
Acknowledgments
The following projects were the initial inspiration for developing our plugin. In addition, they assisted with the usage of IDA’s SDK:
- HexRaysDeob - by Rolf Rolles
- Goomba - by Hex-Rays