The term malware is the contraction of “malicious software”. It designates any program or software specifically designed or modified to damage a computer. It can thus be a keylogger, a trojan horse, a rogue, a ransomware, a computer worm… etc.
Since the appearance of the first computer worms, malware has continued to transform and diversify. With the growing variety of threats and their ever evolving sophistication, today’s protection solutions no longer fulfill their mission properly.
There are three methods for detecting malware: the static method which consists of analyzing the malware code by decompiling it, the dynamic method which consists of analyzing the behavior of the malware, and finally the new approach relying on machine learning techniques.
Static analysis or disassembly (reverse engineering) consists of opening an executable file with tools such as IDA pro for example in order to find and analyze the assembly code corresponding to the binary content of the file. Some advanced tools may even offer decompilation functions to reconstruct the high level source code used initially (usually C, C++ or C#).
By analyzing the code, it is thus possible to study the behavior of each part of the program and to deduce its functionalities.
Unlike static analysis, dynamic or behavioral analysis involves executing malware in a controlled environment in order to observe its actions and infer its functionalities. Compared to disassembly, this method has many advantages:
- It’s much faster. In a few minutes, it is already possible to get an overview of the main features of the malware.
- It does not require knowledge of assembler or programming.
- It is not sensitive to malware protection techniques (obfuscation, blending, etc.)
On the other hand, this analysis approach is not effective in some cases where certain features of the malicious code require special conditions to be executed, we can miss this type of analysis or if the malware contains functionality to detect that it is in a controlled environment, hence, it will not execute. Besides, dynamic analysis does not always allow access to sensitive information contained in malware.
This leads us to constantly explore new ways to defeat malware. The newest analysis approach is based on machine learning techniques. Nowadays, many vendors rely on machine learning algorithms to detect malware. ML techniques have proven to be very useful for network protection in general and cybersecurity in particular. Versatile learning will teach several complex tasks that previously required significant time. Several models have been created to detect phishing emails and spam attempts; real-time threat research platforms were designed as well as malware detectors and botnet traffic datasets were built. In the following, we will be tackling this approach for malware detection.
During several analyzes carried out to help our clients evaluate their antiviral solutions, we have observed the limitations of tools which rely on the use of one analysis technique only in antiviral detection.
An evaluative injection was performed by infecting the target with a reverse shell executable. The latter was developed, based on compiled C code, and was detected as a low-risk executable by the EDR (Endpoint Detection & Response) solution relying on dynamic and machine learning analysis techniques.
It is important to note that the custom developed binary is based on the following:
1. Payload consisting of well-known meterpreter reverse shell byte code
2. Payload custom encoding
3. Payload execution through “VALLOC” (Virtual memory allocation)
N.B: the malicious communication channel established between the backdoor and C&C is using unencrypted/cleartext protocols and was not detected by the EDR.
It was noted that the EDR detected child processes of the malicious file executed as shown below:
The EDR mapped the parent/child processes running/created with their correspondent Windows executables (cmd.exe, ifconfig.exe, powershell.exe, etc.) as shown below:
However, obvious potential malicious processes, leveraging classic TTPs scripts were not blocked nor removed by the EDR.
We managed to maintain access and perform many operations such as initializing a shell session, executing a PowerShell script, analyzing the internal network through a compromised workstation as well as the extraction of several host information without being detected nor blocked by the EDR solution.
We have gone a step further in this regard. Several tests were conducted to analyze input files to establish their malicious or harmless nature based on machine learning models.
Using the Random Forest algorithm, a decision tree solves the problems by automatically generating a query during sample formation. A question is used for each node in the tree to determine whether the sample is malware or not. The Random Forest algorithm combines multiple decision trees, with each tree being trained using different questions. Each tree was formed using a partial set of samples chosen at random and the characteristics of each set are selected randomly. The detection of a sample is performed on each tree and the algorithm decides whether the binary is malicious based on the response of the majority of the trees.
Test results appeared as benign (62,5% Malicious).
Other ML algorithms have been tested also. The malware was verified on several models including Random Forest, Logistic Regression, Naive Bayes, Support Vector Machines, K nearest neighbors, and neural networks. However, none of these algorithms perceived the binary as being assuredly malicious.
This means that the promise of a perfect AI product may not be fulfilled for antiviral detection, and vendors will need to include multiple types of detection together.
If you liked this article, you may also like:
Bypassing EPP — Chapter 1
Cybersecurity is not just one operation that would be always threatened. Cybersecurity is a continuous process, which…