Malware Mondays
Episode 1: Procmon
Start with Process ID (PID) When seeing the initial process drop new files, hash both files to compare whether they are the same.
From the Process-Tree select the top process executing other processes and select include-subtree. Gives a filter that includes child processes.
Filters
CreateFile not only creates a file, but also means creating a handle to a file, which happens a lot. Instead filter on WriteFile: if a file is important, data will be probably written to it. #RegSetValue #DeleteFile
Tip: save filters as a baseline, include process tree, take that as the baseline for analysing new samples.
Hotkeys
Ctrl-J to jump to file location
Done with particular events? Right-click → remove events before
source code
tip: check out my repositories with source code link and analyse executables in Procmon.
Reverse Engineering Malware with Ghidra
Overview notes
Native code (IDA (Pro), Ghidra, Binary Ninja) Interpreted code (.NET, Java, dnSpy, JD)
Check out the courses:
- Identifying and defeating code obfuscation
- Identifying and defeating packing
- Identifying and defeating anti-reverse engineering and anti-analysis
Menu and windows
Windows:
- Program trees (provides an overview of the binary structure of the program)
- Symbol tree (breakdown and overview of all program symbols such as imports, way to identify functions)
- Data type manager (structures and other data types, typically from header files, or created by you)
- Listing (disassembly of the executable code from the program)
- sidebar displays program overview, entropy and a number of different bookmarks
- default is a linear view, may open a graph view per function
- can patch/modify the listing
- options change based on area of program you are exploring
- create bookmarks and add labels to help with analysis
- modify register values and convert data types (convert data into code)
Decompiler
Listed to the right of the Listing window Converts machine code to assembly then PCode then C Warning: there’s not necessarily a direct match in the decompiled code from the original source code You can modify the decompiler output: - edit local variables, data structures, return types - consider changing a signature instead of a local instance - changes in signature are also reflected in the listing window - leverage header files when possible (usually not available)
The decompiler creates structures which help with program analysis and comprehension It can lock analysis through the ‘Decompiler Parameter ID’ (locking signatures) and committing parameters, return values and local variables trace variable usage and highlight variables that may be impacted going forwards/backwards (navigate through a function by way of a variable) Provides the ability to export functions to use with different tools
Demo: Analyzing a trojan
At entry point a call to __security_init_cookie and then an unconditional jump to another location. This is a telltale that it’s likely compiler generated code. To identify main is to scroll on and identify a call preceded by three pushes, corresponding to three pointers to:
- argc (argument count)
- -argv (argument vector (list))
- envp (array of pointers to environment variables) The three arguments to main.
First, follow that unconditional jump and then scroll until you see three pushes. Double click the following function and to get the graph, use the shortcut icon or the file menu to view the graph.
CodeBrowser
Highlighting: hold ‘Windows’ key and highlight an mnemonic or data in listing window Annotating the CodeBrowser allows you to add your insights into the program. You can use labels to mark locations of interest. Ghidra will add labels during autoanalysis. You can have multiple labels per address.
There are five types of comments:
- EOL
- plate
- post
- pre
- repeatable
Data types: can be applied at a single address or a selected range Use the data type manager window to select the type you want to apply, drag and drop it to the location you want to apply it, and you enhance you understand the code you’re analyzing.
Processor specific help: you can right-click on instructions and select ‘processor manual’
Tools and techniques to perform function analysis
Additional function windows:
- function call graph (shows function calls from current function)
- function graph
- function call trees depict a hierarchical relationship of function calls
Important: you need to understand where you are, when you begin analyzing a program. Generally, you have a function, strings or some other indicator that will drive you to begin analysis at a certain point. In the absence of those indicators, you will want to start at the beginning of the program, ignoring unnecessary code, like code generated by the compiler.
Ghidra can manage external programs. You can add additional programs (libraries) that your program depends on. This way you can quickly navigate to functions in those programs.
During flow analysis of a program, perhaps not all functions are identified. You can instead right-click and create or edit a function. This way you can change parameter types, change the calling convention, undefine functions, etcetera.
You can also add symbols, if they are available. Usually with malware you don’t have access to PDB files.
Demo: function analysis
One way to prevent anti-analysis techniques using IsDebuggerPresent:
Set a bp on IsDebuggerPresent from kernel32
Set eax
to 0 before test eax eax
Analyzing shellcode: move eax,fs:0x30 (walking the PEB)
Headless mode: can be helpful when you have a lot of repetitive tasks, like analyzing multiple samples.
Demo: scripting analysis
You can for example import and run a script that prints stack strings as ASCII strings and highlights their location.
Looking at decompiled code, you can sometimes see local variables with hex values above. These are telltale signs of the use of stack strings. You can convert hex values to character sequences, which can sometimes show you ASCII strings.
To use a script, you can use the Script Manager, right-click the script, and run it. The console below will show the output.
Malware analysis: Identifying and Defeating Packing
Primary signs of packing
- strings (the presence or absence)
- imports or a lack of
- sections (their number and names)
- entropy (high entropy in the sections is a leading indicator)
- signatures from known packers
Detecting packing with signatures
PEstudio will look for patterns which are actually a series of byte values. This byte pattern is compared inside your sample and if it matches, the name in the signature is displayed. When bytes are displayed as xx, these are wildcards. The signature can be either compared to the entry point (ep_only
) or searched in the entire binary.
PEiD established a signature database and while it is no longer maintained, this approach to signature writing is used by PEstudio. Also a warning: PEiD can actually execute your sample through its plugins!
Section names are arbitrary for execution. This might be why packers would create meaningless section names. Common section names are .text or CODE, .rdata (readonly), .bss (uninitialized data), .reloc (for relocated data). Programs need to import functionality to interact with the operating system. Usually this is done via dynamic linking. Runtime linking is done dynamically. Malware can use this to create its own import table. The last one is statically linked, meaning its functionality is built into (compiled) the binary.
You can also look at the call graph of a binary. Most packers will begin the program from START. Main is something the compiler adds. There can be very limited crossreferences to start. Where even simple ‘Hello world!’ programs tend to have more crossreferences.
Of the strings that you can see, important ones can be: VirtualAlloc
, memset
and memcpy
. These can sometimes not show up in the Imports table, which means they are dynamically resolving its imports. This is an obfuscation technique.
Search for the DOS stub to see where other PE files have been extracted in memory. remember: malware authors can change or remove the value ‘This program cannot be run in DOS mode’.
remember: the program you’re analyzing is also in the PE file format. Other libraries it has loaded will also be loaded in the virtual address space of your process. You will find legitimate PE files. There are techniques to reduce this noise.
Software breakpoints to set on memory allocations: VirtualAlloc
or VirtualAllocEx
.
#Windbg: bp kernel32!VirtualAlloc
.
#debugging: most debuggers allow you to search symbols (functions) of libraries. x /D /f KERNEL32!v*
in WinDbg.
debugging-memory-allocations: after setting breakpoints on VirtualAlloc*, watch the return value in the EAX/RAX register. This is the address of the newly allocated memory. Then investigate permissions with such commands as !vprot
in WinDbg. Then watch the contents of memory as the program begins to use them. You can also search memory for evidence of a new PE file. It helps to avoid high-address regions of memory, where imported libraries are usually imported.
investigate-memory-allocations: dump memory you want to investigate. This technique requires trial and error. You may dump corrupt PE files, you will develop a feel for how a PE file should look in memory/hex-editor.
Demo: Unpacking a ransomware
- set a bp on f.e. VirtualAlloc* (stub)
- begin execution
- step out from the breakpoint
- see the return value in the EAX register (the memory address just allocated from the call to MemoryAlloc)
dd memory address
which is still empty!vprot memory address
that gives information about the allocation (size, base, permissions)- if you hit the breakpoint again, step out
- then we should inspect the contents in memory again
- resume execution until the final call to VirtualAlloc
- Then check for the existence of a PE file: the DOS stub, the 4d 5a (MZ) header.
s -a 0 L?4fffffff "This program"
: search for ASCII strings from the beginning of the virtual address space and go to a size of 4 hex ffffff and then the string. That size helps us to avoid the higher ranges, where the DLLs and libraries could live in memory.- View memory and extract content if necessary, for which we can use Process Hacker. You could use the
!vprot memory address
of where you found the DOS stub. This identifies the base address (BaseAddress
). The base of the allocation is AllocationBase. - Given the base of the allocation, look up the specific memory tab in Process Hacker → right click → Save.
- Open the file in a hex editor and look for 4d 5a.
- You can select bytes before 4d 5a and see if it’s a valid PE file.
- You can use PEstudio to analyze the carved PE file (which could be corrupt due to the carving).
Key characteristics of unpacking when reverse engineering:
- Unpacking is often ‘one-way’, look for code with abnormal transfer control (when called, a normal function would return to the original location it was called from)
- Unpacking code would also lack a standard epilogue:
jmp
instead ofret
,push-ret
and other deviations - These occur at the end of a function (so skip over what happens before!). Look toward the end of the function for these characteristics.
- Use dynamic analysis to prove or disprove any theory you have.
Example of push-ret
:
- first, push
eax
onto the stack (so the calling function will return to this location). You can trace backwards what is moved intoeax
earlier in that function and look at that memory location.
Unpacking may utilize shellcode. You should continue tracing if using dynamic analysis and dump from memory and disassemble to perform static analysis.
remember: shellcode is a binary blob, any disassembler would not find the entry point. Your dynamic analysis would show you the entry point.
shellcode: do not get lost in the details, look for where the shellcode finishes. If the code jumps back to the initial binary, this could mean the unpacking is complete.
Demo: Unpacking Simba
We can identify a push-ret
where eax
is pushed onto the stack and then ret
is called.
We track back the value of eax
and find a memory address.
Then we see another strange epilogue: a push ecx
and a jmp
to a ret
. We again need to know the value of ecx
. In this case it’s a DWORD value (or 4 byte value). We move to that location, which is empty in static analysis, meaning it will be populated at runtime.
Dynamic analysis:
see the memory address for the last move into ecx
and set your breakpoint there.
We set a breakpoint at the move instruction, so we can step one instruction beyond that and see the location in ecx
.
We use !vprot memory address in ecx
to investigate that memory and see the allocation base.
In Process Hacker we see this allocation base which we can use to save the contents and disassemble it in IDA.
#memory: IDA does not know where the entry point of the shellcode is. So you go back to your debugger and see the address where you are about to return to, in this case in ecx
.
We can calculate the offset by:
Take the allocation base and subtract that from the memory address we identified in ecx
, that we are about to go to.
We go back to IDA Pro and scroll to that offset in the shellcode.
Then you right-click on the offset location and select ‘Create function’.
Again, don’t look at all the code, but go to the end of the function and check for unusual epilogues.