Inline Assembly

In this post, we will cover introductory concepts regarding the usage of inline assembly. We'll look at what we mean by inline assembly, how to use inline assembly and some examples of inline assembly usage.

Inline Assembly

This writeup will cover introductory concepts regarding the usage of inline assembly. We'll look at what we mean by inline assembly, how to use inline assembly and some examples of inline assembly usage.


What is inline assembly?


Inline assembly is a capability some compilers provide that allows programmers to specify the assembly instructions at points throughout a program's code. Up to this point, you have probably coded C programs like in the example below.

#include <stdio.h>

int main(){
    printf("Hello world!");  
    return 0;
}

We'll see the following if we open this program in x64dbg and set a breakpoint at the printf() function.

Note: __main is a function the build system provides that initializes library functions and the execution environment. You might see this in programs compiled with MingW.

The snippet of assembly is expected based on the C code we wrote, and it follows the following format:

Function Prologue
Variable Setup
Call printf()
Function Epilogue
Return

In the context of malware development in C, this standardization can be a negative feature of the programming language that often results in detection. However, inline assembly provides us a mechanism to directly interact with the operating system and a system's processor that would otherwise be significantly more difficult to achieve.

This ability for granular interaction allows us to modify the standard behaviors of our program, execute specific opcodes, and potentially bypass some detection engines and emulation environments.

Programmers that use Visual Studio know that the MSVC compiler only supports inline assembly for x86 executables, but CLion uses MingW by default. So in this course, we'll be able to use inline assembly in all cases.

How do we use inline assembly?

Basic inline assembly can be inserted into a program by calling '__asm("OPERATION");' where 'OPERATION' is a single instruction or a ';' separated set of valid assembly instructions. Let's see what this looks like in implementation.

#include <stdio.h>

int main(){
    __asm("int3"); // this will set a breakpoint that we'll catch in x64dbg
    printf("Hello world!");  
    return 0;
}

Looking at our program inside of x64dbg we'll see something like this.

Note: x64dbg can get a little buggy with inlined int3 instructions

Looking at our int3 instruction to the left, you can see that x64dbg is interpreting the 0xcc opcode into assembly language for us. But if we wanted to directly inline opcodes, we can do so with the following format.

#include <stdio.h>

int main() {
    __asm(".byte 0xcc");
    printf("Hello, World!\n");
    return 0;
}

Passing in variables to inline assembly

The inline assembly syntax supports an extended format allowing variables to be passed into (and out of) __asm() statements. The general syntax is as follows.

__asm(OPERATIONS
    :OUTPUT
    :INPUT
    :CLOBBERS
    );

Let's use this syntax to move a variable into an assembly statement. In the code below, we will pass a type int variable into an assembly statement and then move that variable into the ecx register. We'll know we succeeded if we see 0x13 in ecx before our printf statement, and in 0x13 in rcx from our debugger.

#include <stdio.h>

int main() {
    int i = 0x13;                   // we expect this to be on the stack
    __asm(".intel_syntax noprefix;" // intel syntax is personal preference
            "mov ecx, eax;"         // move i from eax into ecx
            :                       // outputs, none in this case
            :"a"(i)                 // the "a" so that i gets passed into eax
            :                       // clobbered registers, none in this case
            );
    printf("Our value is: %d\n", i);
    return 0;

And we succeeded!


Getting values out of inline assembly

Now that we've practiced getting a value into an __asm() statement, lets practice manipulating that value and then passing it back out. If we take another look at our syntax, we can quickly implement it by configuring the line values preceding our input configuration.

#include <stdio.h>

int main() {
    int i = 0x13;                            // we expect this to be on the stack
    int j = 0;

    __asm(".intel_syntax noprefix;"           // intel syntax is my preference
            "mov ecx, eax;"                   // move i from eax into ecx
            "inc ecx;"
            :"=c"(j)                          // outputs "c" -> ecx into j
            :"a"(i)                           // the "a" -> i into eax
            :                                 // clobbered registers, none
            );
    printf("Our i value is: %d\n", i);        // we expect decimal 19
    printf("Our j value is: %d\n", j);        // we expect decimal 20
    return 0;
}

And if we execute our program, we see that it performs as we expect!

Rbp-relative addressing is visible on lines 0x7ff77dce13af, 0x7ff77dce13b6, 0x7ff77dce13bd

It's important to note that the compiler seems to implement local variables using rbp-relative addresses in this example. This is not always the case for every compiler/optimization, so be careful to implement the code in the next section appropriately based on your program's existing variable addressing.

Spoofing a return address with inline assembly

Now that we have a good handle on how to manipulate inline assembly calls in our program, we can modify our code to spoof our control flow at runtime. Let's try it out.

#include <stdio.h>

void other_function();
void other_function_2();

void (*backup)();                            // backup address


int main() {
    int i = 0x13;                            // we expect this to be on the stack
    int j = 0;

    __asm(".intel_syntax noprefix;"           // intel syntax is my preference
            "mov ecx, eax;"                   // move i from eax into ecx
            "inc ecx;"
            :"=c"(j)                          // outputs "c" -> ecx into j
            :"a"(i)                           // the "a" -> i into eax
            :                                 // clobbered registers, none
            );
            
    printf("Our i value is: %d\n", i);        // we expect decimal 19
    printf("Our j value is: %d\n\n", j);      // we expect decimal 20
    
    printf("Calling other_function...\n");
    printf("-------------------------\n");
    
    other_function();                         // call other_function
    
    printf("-------------------------\n");
    return 0;                                 
}

void other_function_2(){                       // this will NOT execute
                                               // we'll set our return 
                                               // here with inline assembly
    
    printf("THIS IS other_function_2!\n");


}

void other_function(){
    
    void (*p)();
    p = other_function_2;                     // address of other_function_2
    
    printf("THIS IS other_function!\n");
    
    return;                                    
}

In this program, we're going to manipulate the printf() call inside of other_function(). The goal is going to be to make the call stack look like the printf() call came from other_function_2(). Let's take a look at the call stack before we implement inline assembly so we can tell the difference.

A normal call to printf() from other_function() looks like this.

Normal program call stack

If we follow the "To" address (00007FF7C0BB143C) , we'll be able to see that we're returning to other_function() after the call to printf() as we would expect. Combing through the call stack, we'll see that the calls go main() -> other_function() -> printf().

Normal program control flow

But by implementing a couple of snippets of inline assembly, we can manipulate some pointer locations and point this return address to other_function_2().

void other_function(){
    
    void (*p)();
    p = other_function_2;                     // address of other_function_2
    
    __asm(".intel_syntax noprefix;"           // spoof stack frame
            "mov %0, [rbp+0x8];"
            "mov [rbp+0x8], %1;"              // ..._2 stack frame
            :"=c"(backup)
            :"a"(p)
            :
            );
    
    printf("THIS IS other_function!\n");       // frame has other_function_2
    
    
    __asm(".intel_syntax noprefix;"            // restore old address
            "mov [rbp+0x8], %0;"
            :
            :"r"(backup)
            :
            );
                        
    return;                                    
}

Call stack before the spoof instruction.

Note: Pay attention to the "To" value of 000000A3094FFD48 => 00007FF68BCA1315

Call stack after the spoof instructions

Note: The change in "To" value of 000000A3094FFD48 => 00007FF68BCA141F

If we follow our new "To" address, we see that it points to other_function_2 instead of other_function! We've successfully manipulated a return address in our call stack and validated that the call to printf() retains all functionality.

Note: These low-level manipulations can cause buggy debugger output in the call stack tab, so don't be surprised if things look a little weird there. Use the stack view in x64dbg to get an accurate view of the progam's stack values
Stack view showing the correct address 7FF68BCA1420 on the stack.

Now, if we carefully step into the printf() call inside of other_function(), we'll see that it will report to the debugger that it's supposed to return to other_function_2() even though it we never made any invocation to other_function_2()!

Using what you now know, it's possible to implement additional granular modifications to your programs to obfuscate your implant's functionality .

Our final C code looks like this

//x86_64-w64-mingw32-gcc main.c -o inline.exe -masm=intel

#include <stdio.h>

void other_function();
void other_function_2();

void (*backup)();                            // backup address


int main() {
    int i = 0x13;                            // we expect this to be on the stack
    int j = 0;

    __asm(".intel_syntax noprefix;"           // intel is my preference
            "mov ecx, eax;"                   // move i from eax into ecx
            "inc ecx;"
            :"=c"(j)                          // outputs "c" -> ecx into j
            :"a"(i)                           // the "a" -> i into eax
            :                                 // clobbered registers, none
            );
            
    printf("Our i value is: %d\n", i);        // we expect decimal 19
    printf("Our j value is: %d\n\n", j);      // we expect decimal 20
    
    printf("Calling other_function...\n");
    printf("-------------------------\n");
    
    other_function();                         // call other_function
    
    printf("-------------------------\n");
    return 0;                                 
}

void other_function_2(){                       // this will NOT execute
                                               // we'll set our return 
                                               // here with inline assembly
    
    printf("THIS IS other_function_2!\n");


}



void other_function(){
    
    void (*p)();
    p = other_function_2;                     // address of other_function_2
    
    __asm(".intel_syntax noprefix;"           // spoof stack frame
            "mov %0, [rbp+0x8];"
            "mov [rbp+0x8], %1;"              // ..._2 stack frame
            :"=c"(backup)
            :"a"(p)
            :
            );
    
    printf("THIS IS other_function!\n");       // frame has other_function_2
    
    
    __asm(".intel_syntax noprefix;"            // restore old address
            "mov [rbp+0x8], %0;"
            :
            :"r"(backup)
            :
            );
                        
    return;                                    
}

Alternatives to inline assembly

C and C++ programmers that rely on the MSVC compiler often rely on intrinsics, essentially a set of macros implemented by Microsoft to allow a semblance of low-level access when programming. Compiler intrinsics are powerful, stable, and consistent implementations of some of the most useful inline assembly capabilities. A complete list of compiler intrinsics can be found in the References section at the end of this topic.

Malware developers using MSVC may also implement assembly in standalone .asm files linked to their program during building. You'll often see it implemented in the manner shown below.

Original Hell's Gate implementation that demonstrates standalone a .asm file (right) declared as external functions in main.c (left)

As you become more advanced in understanding and implementing the techniques covered in this course, you'll have to become familiar with assembly.

References

Compiler intrinsics
Learn more about: Compiler intrinsics
Extended Asm (Using the GNU Compiler Collection (GCC))
Extended Asm (Using the GNU Compiler Collection (GCC))
Amazon.com
HellsGate/hellsgate.asm at master · am0nsec/HellsGate
Original C Implementation of the Hell’s Gate VX Technique - HellsGate/hellsgate.asm at master · am0nsec/HellsGate