Exploring GenAI in Cybersecurity: Gemini for Malware Analysis

10/07/2024
G DATA Blog

How useful are Generative AI technologies when it comes to being used in a security context? We have taken the plunge and gave it a try.

In recent years, Generative AI (Gen AI) has been cause for both excitement and concern. While its potential is widely recognized in industries like healthcare and finance [1], its application in cybersecurity remains a point of debate. Can AI be trusted with the critical task of protecting our digital infrastructure? Will it enhance our defenses or expose new vulnerabilities? These questions are at the forefront of discussions among professionals. Recently, we decided to explore this by conducting an experiment with Gemini (formerly known as Bard), a Gen AI tool developed by Google. While Gemini offers free access, we opted to use its paid version, Gemini Advanced, for this experiment to see how it could assist in malware analysis, particularly focusing on executable files.

Using Gemini for Malware Analysis: Our Approach

As cybersecurity analysts, we’re always looking for ways to improve our efficiency and accuracy. When we first heard about Gemini’s capabilities [1], we were curious about how it might fit into our malware analysis process. We decided to test it out by analyzing executable and script files. The initial key in getting the most out of Gemini was in crafting the right prompts to direct the AI’s focus. We have decided to use the following prompts:

Prompts

  1. Summarize the main behaviors observed in this code.
  2. Analyze and list any API calls that are commonly associated with malware.
  3. What are the Indicators of Compromise (IOCs) related to this file?
  4. How do the behaviors of this file map to the MITRE ATT&CK framework?
  5. Extract any embedded strings that might indicate malicious intent.
  6. Compare this file to known malware samples and suggest possible classifications.
  7. Evaluate the potential impact of this file if executed on a system.
  8. Can you give us the summary for communicating the results to both technical and non-technical stakeholders.

Methodology

The sample we used in this article is a Risepro Stealer, a well-known malware executable compiled in Microsoft Visual C/ C++ with an initial file size of 1,300 KB. Decompiling this executable yields 5,588 KB of decompiled code.

Preparation work: Decompiling the Files with Ghidra and IDA Pro

We began by loading our malware sample into Ghidra and IDA Pro. After decompiling the files, we had a high-level code, which was ready for further analysis using Gemini.
Ghidra is a tool originally developed by the NSA. It is an open-source tool that excels in decompiling executable files. It converts complex binary code into a more readable format, which helps us understand the file’s structure and potential behavior.
IDA Pro is a disassembler that translates machine code into a more human-readable assembly language, while the “Hex-Rays” Decompiler add-on enhances IDA Pro's capabilities by attempting to reconstruct assembly code back into a high-level language like C, greatly improving code readability and understanding.

Using AI prompts to enhance our analysis

With the decompiled code in hand, we turned to Gemini. Using the prompts we had developed, we asked Gemini to help us identify suspicious patterns, behaviors, and potential Indicators of Compromise (IOCs).

Summarize the main behaviors observed in this code.

This prompt provides a quick overview of the code's actions, aiding in prioritizing analysis and identifying potential threats.

Analyze and list any API calls that are commonly associated with malware.

Highlights suspicious API calls, enabling analysts to focus on areas likely related to malicious activity and potentially pivot to related malware families.

What are the Indicators of Compromise (IOCs) related to this file?

Extracts specific artifacts (strings, file paths, registry keys, network indicators) that can be used for detection and response efforts. 

How do the behaviors of this file map to the MITRE ATT&CK framework?

The use of this prompt categorizes the code's actions within the ATT&CK framework, providing a structured understanding of its tactics and techniques, enabling analysts to assess its capabilities and potential impact. 

Compare this file to known malware samples and suggest possible classifications.

Using this prompt attempts to classify the malware based on its behavior and characteristics, potentially linking it to known threat actors or campaigns, enabling faster response and mitigation. 

Evaluate the potential impact of this file if executed on a system.

With the use of this prompt can aid to assess the potential damage and consequences of the malware's execution, aiding in prioritizing response efforts and communicating the threat to the customers. These prompts allowed us to guide Gemini in analyzing the files, providing us with detailed insights into their behavior and potential threats. 

Our Desired Outcomes: Verdict, Behavior, and IOCs

The combination of Gemini with utilization of decompiling feature of Ghidra and IDA Pro provided us with the analysis of the files. Our main goals were to determine the file’s verdict, understand its behavior, and identify any Indicators of Compromise (IOCs). 

Verdict: Our first task was to assess whether the file was malicious, benign, or potentially unwanted. Gemini's analysis of the file's behavior and structure provided us with the initial data and information. While Gemini serves as a valuable aid in our analysis, it is ultimately the analyst who interprets the findings coupled with the traditional analysis method. 

Behavior: Next, we wanted to understand what the file would do if executed. Would it try to modify system settings, download additional content, or communicate with external servers? Gemini’s ability to analyze the decompiled code and compare it to known malware behavior provided us with additional insights of its actions, revealing that the file exhibits suspicious activities such as system modifications and possible external communications. 

IOCs (Indicators of Compromise): Finally, identifying IOCs was critical for detection and response. Gemini helped us list specific IP addresses, domain names, or file hashes that could indicate malicious behavior. 

Summary: After gathering all this information, Gemini also generated a summary that highlighted the most important findings. This summary was useful for communicating the results to both technical and non-technical readers. When prompted, Gemini can also provide recommendations and safety measures. 

Challenges in Using Gemini for Malware Analysis

While Gemini offers a convincing tool for an aide in malware analysis, it is not without challenges. Some of the possible key limitations are: 

Large Codebases: The size of the code can also be a significant limitation. Large codebases are a limitation now in Gemini, and without the complete code it will lead to incomplete or inaccurate analysis. Also, in decompiling the code it does not reveal the original source code and could lead to wrong high-level code, can be incomplete or corrupted, because of the data-loss compilation process. 

Heavily Obfuscated Code and Packed/Protected Samples: Malware authors often use various obfuscation techniques to conceal their code, making it difficult for Gemini to accurately analyze and interpret. In the case of packed or protected samples, the malware is compressed or encrypted, adding another layer of complexity. This requires additional steps and tools to deobfuscate and unpack the code before feeding it into Gemini, like traditional analysis methods. These extra steps are necessary to ensure a thorough and accurate evaluation of the malware’s behavior and structure. 

Conclusion

Our experiment with Gemini has shown that Gen AI can be a powerful tool in the field of malware analysis. By combining it with traditional reverse engineering tools like Ghidra and IDA Pro, we can enhance our workflow, uncover insights, and give us a head-start on what will be our pivot for further investigation for traditional analysis. While the debate over AI’s role in cybersecurity continues, our experiment suggests that, when used correctly, AI can be used as an assistant for cybersecurity professionals together with cross-checking of its results. We invite other researchers to explore Gemini’s capabilities in various areas, such as Android malware, script-based files, and beyond, to further expand its potential and efficacy in diverse cybersecurity contexts. 

Information for fellow researchers

RisePro Stealer (Win32.Trojan-Stealer.RisePro.WYQML3):

b0e194ed54bafa753bda5761c1264b67a5c438ee7a9ed624a83be913f037dcbb