====== Machine Code ====== With great appreciation of the original [[https://www.autohotkey.com/boards/viewtopic.php?t=32|MCode Tutorial]] by [[user:nnnik]] ===== What is Machine Code? ===== [[https://simple.wikipedia.org/wiki/Machine_code|Machine code]] is the lowest level of binary code that your computer can run on. Programming languages like C, %%C++%%, Rust, and Go all compile to machine code in order for your computer to understand them. Machine code can run several hundred times faster than equivalent AutoHotkey code, which does not compile to machine code. In the AutoHotkey community the term "MCode" refers to tools and methods used for putting machine code into scripts. These MCode tools normally take code written in a language like C, use a compiler to turn that code into machine code, then turns that machine code either directly into AutoHotkey code or into text that can be loaded using a custom AutoHotkey library and then called by ''DllCall''. MCode is important for //optimization// when writing scripts that need to process a relatively large amount of data quickly. Here are some common situations where MCode can be helpful: * Encoding or decoding data more than a few kilobytes in size, in formats like [[libraries:json|Json]] * Hashing files or large amounts of text * Manipulating images like GDI+ bitmaps (especially for custom ImageSearch algorithms) * Performing real-time calculations, such as for a physics engine MCode is not the only way to achieve these performance goals. It is possible, and sometimes more flexible, to use the normal tooling of those compiled languages to produce a standard machine code DLL that can be used from AutoHotkey. However, a script that comes with a custom DLL is harder to share because it takes multiple files, it can take up more disk space than a normal script, and is more likely to be blocked by antivirus and corporate application filters. ===== Requirements ===== Before you can get started with MCode, you will need: * Some understanding of how to read and write C or %%C++%% code * A basic understanding of how to use ''DllCall'' to interact with C APIs (like the Windows API) * An MCode library which can create and load MCode * A compiler compatible with your chosen MCode library If you are not very familiar with programming in C or %%C++%%, a great place to start is on [[https://www.codecademy.com/catalog/language/c|Codecademy]] which offers a variety of free interactive programming courses. There are not very many MCode libraries, but the most comprehensive is [[libraries:machine_code:mcl]]. Other options are listed on the [[libraries:machine_code|Libraries page]]. If using MCL, the easiest compiler to install is likely [[https://jmeubank.github.io/tdm-gcc/|TDM-GCC]] if you do not already have a compiler installed. MCL will support most GCC compatible compilers. ===== Learning MCode from Scratch ===== ==== 1. Return a Number ==== One of the simplest MCode projects you can do is to have a bit of MCode that can return a number. For example, the C code: int alpha() { return 42; } With a function this trivial, it is possible to manually convert the C code to MCode using a compiler like [[https://godbolt.org/|godbolt]]. From godbolt the C code can be entered, a compatible compiler can be chosen like "MinGW gcc", and the output settings can be changed (enabling "compile to binary object") in order to reveal the machine code generated from the function. By default it outputs 64-bit machine code, so if you are using 32-bit AutoHotkey you may need to add ''-m32'' to the godbolt compiler options in order to get compatible machine code. {{:guides:godbolt_alpha_return_42.png?direct|Screenshot of godbolt compiler}} The hex code on every other line of the output can be entered into a script to form MCode. The simplest method is using a ''Buffer'' and ''NumPut'' to write the code into that buffer. When writing large amounts of data it is most efficient to write the data in groups, using the ''Int64'' type rather than trying to write each individual byte with the ''Char'' type. To do this, construct a buffer large enough to hold the bytes of machine code, rounded up to the nearest multiple of 8. Note that we can safely ignore any ''90'' NOP bytes at the end of the code. In this case, 11 bytes were output by the compiler so we will make a buffer of **16** bytes. Next, split the bytes into groups of at most 8: ''55 48 89 e5 b8 2a 00 00'' / ''00 5d c3'' Now reverse the bytes in each group: ''00 00 2a b8 e5 89 48 55'' / ''c3 5d 00'' Join the bytes together and add the ''0x'' prefix to create your constants: ''0x00002ab8e5894855'' / ''0xc35d00'' Now you can assemble your mcode: #Requires AutoHotkey v2.0 mcode := Buffer(16) NumPut('Int64', 0x00002ab8e5894855, 'Int64', 0xc35d00, mcode) ; Once you have your buffer populated, you must use ''VirtualProtect'' to change ; the protection mode of the RAM held in the buffer to allow code within it to ; be executed. By default, memory is not executable for security reasons. if !DllCall("VirtualProtect", "Ptr", mcode, "Ptr", mcode.Size, "UInt", 0x40, "UInt*", &OldProtect := 0, "UInt") throw Error("Failed to mark memory as executable") ; Now that your buffer is executable, you may execute it using DllCall MsgBox DllCall(mcode, "Cdecl Int") By calling the buffer with DllCall, execution will jump to the first byte of the machine code and execute whatever function is at that position. For code compiling only one function, it will normally be the start of that one function. ==== 2. Find the Length of a String ==== A function that just returns one number is not very useful. A more useful function could be one to find the length of a string: int StringLen(short *str) { int i = 0; for (; str[i] != 0; i++) {} return i; } This function loops over the characters of a unicode string until it reaches the end, then returns the count of characters. When fed into godbolt, we get an output like ''55 48 89 e5 48 83 ec 10 48 89 4d 10 c7 45 fc 00 00 00 00 eb 04 83 45 fc 01 8b 45 fc 48 98 48 8d 14 00 48 8b 45 10 48 01 d0 0f b7 00 66 85 c0 75 e4 8b 45 fc 48 83 c4 10 5d c3'' which is very quickly getting out of hand. Performing the same manipulations as before, we can write the MCode as follows: #Requires AutoHotkey v2.0 code := Buffer(64) NumPut( 'Int64', 0x10ec8348e5894855, 'Int64', 0x00fc45c7104d8948, 'Int64', 0xfc458304eb000000, 'Int64', 0x8d489848fc458b01, 'Int64', 0x014810458b480014, 'Int64', 0x75c0856600b70fd0, 'Int64', 0x10c48348fc458be4, 'Int64', 0xc35d, code ) if !DllCall("VirtualProtect", "Ptr", code, "Ptr", code.Size, "UInt", 0x40, "UInt*", &OldProtect := 0, "UInt") throw Error("Failed to mark MCL memory as executable") MsgBox "The string is " DllCall(code, "Str", "Hello", "Cdecl Int") " characters long" As the functions get more complicated, the need for a dedicated MCode compiler becomes more apparent. An MCode compiler will take your C / %%C++%% input and turn it into something you can put directly into your AHK script. When provided to joedf's [[https://www.autohotkey.com/boards/viewtopic.php?f=6&t=4642|MCode4GCC]], the provided function will output a string like this ''2,x86:i1QkBDHAZoM6AHQUjXQmAIPAAWaDPEIAdfbDjXQmAJDD,2,x64:ZoM5AHQiuAEAAAAPH0QAAEGJwEiDwAFmg3xB/gB18USJwMMPH0QAAEUxwESJwMM='' which can be provided to a pre-written mcode loader function in order to load your machine code. Noting that it compiles for both 32 //and// 64 bit. #Requires AutoHotkey v2.0 ptr := MCode('2,x86:i1QkBDHAZoM6AHQUjXQmAIPAAWaDPEIAdfbDjXQmAJDD,2,x64:ZoM5AHQiuAEAAAAPH0QAAEGJwEiDwAFmg3xB/gB18USJwMMPH0QAAEUxwESJwMM=') MsgBox "The string is " DllCall(ptr, "Str", "Hello", "Cdecl Int") " characters long" MCode(mcode) { static e := Map('1', 4, '2', 1), c := (A_PtrSize=8) ? "x64" : "x86" if (!regexmatch(mcode, "^([0-9]+),(" c ":|.*?," c ":)([^,]+)", &m)) return if (!DllCall("crypt32\CryptStringToBinary", "str", m.3, "uint", 0, "uint", e[m.1], "ptr", 0, "uint*", &s := 0, "ptr", 0, "ptr", 0)) return p := DllCall("GlobalAlloc", "uint", 0, "ptr", s, "ptr") if (c="x64") DllCall("VirtualProtect", "ptr", p, "ptr", s, "uint", 0x40, "uint*", &op := 0) if (DllCall("crypt32\CryptStringToBinary", "str", m.3, "uint", 0, "uint", e[m.1], "ptr", p, "uint*", &s, "ptr", 0, "ptr", 0)) return p DllCall("GlobalFree", "ptr", p) } Similarly, when provided to the MCL compiler it will output the loader itself, with the 32 and 64 bit MCode already baked in: #Requires AutoHotkey v2.0 lib := MCode() MsgBox "The string is " DllCall(lib, "Str", "Hello", "Cdecl Int") " characters long" MCode() { static lib := false if lib return lib switch A_PtrSize { case 4: code := Buffer(20), exports := {StringLen: 0}, b64 := "" . "VTHAieWLVQhmgzxCAHQDQOv2XcM=" case 8: code := Buffer(32), exports := {StringLen: 0}, b64 := "" . "McBBicBI/8Bmg3xB/gB18kSJwMOQkJCQkJCQkJCQkJA=" default: throw Error(A_ThisFunc " does not support " A_PtrSize * 8 " bit AHK") } if !DllCall("Crypt32\CryptStringToBinary", "Str", b64, "UInt", 0, "UInt", 1, "Ptr", code, "UInt*", code.Size, "Ptr", 0, "Ptr", 0, "UInt") throw Error("Failed to convert MCL b64 to binary") if !DllCall("VirtualProtect", "Ptr", code, "Ptr", code.Size, "UInt", 0x40, "UInt*", &old := 0, "UInt") throw Error("Failed to mark MCL memory as executable") for k, v in exports.OwnProps() exports.%k% := code.Ptr + v return lib := { exports: exports, code: code, Ptr: exports.StringLen, StringLen: exports.StringLen, } } With the MCL library on a system which has a compatible compiler installed, you can skip the need to separately compile and then paste a loader into your script to test. Instead, compiling and loading can be done as a single step inside your script: #Requires AutoHotkey v2.0 #Include library := MCL.FromC(" ( int StringLen(short *str) { int i = 0; for (; str[i] != 0; i++) {} return i; } )") MsgBox "The string is " DllCall(library, "Str", "Hello", "Cdecl Int") " characters long" It is only once you are done developing your MCode, you would use the ''MCL.StandaloneAHKFromC'' method to generate the standalone loader that does not require a compiler, to be pasted into a script. ==== 3. Multiple Functions ==== When working with C code that defines multiple functions, MCode becomes much trickier to generate. Consider the code as follows: int alpha() { return 42; } int beta() { return alpha(); } When [[https://godbolt.org/z/Y7aWP8jKq|compiled using godbolt]], we see two problems arise: - When the machine code is generated, only ''alpha'' is at the start of the machine code, so when your MCode is called only ''alpha'' can be executed. This can be fixed by counting the byte offset of ''beta'' from the start of the code and calling ''DllCall(mcode.Ptr + betaOffset, ...'' - If beta //were// to be called, //it doesn't know where alpha is to call it//. If you look in the code, it defines the call to ''alpha'' as ''c8 00 00 00 00'' which means to enter the function at memory location ''0'', //not// memory location of ''alpha''. Traditional MCode tools like MCode4GCC do nothing to address these problems. Instead, you have to compile and load each function separately, managing and passing around function pointers manually to everywhere they are used. alpha := MCode('2,x86:uCoAAADD,x64:uCoAAADD') ; int alpha() { return 42; } beta := MCode('2,x86:/2QkBA==,x64:SP/h') ; int beta(int (*alpha)(void)) { return alpha(); } MsgBox DllCall(beta, "Ptr", alpha, "Cdecl Int") MCode(mcode) { static e := Map('1', 4, '2', 1), c := (A_PtrSize=8) ? "x64" : "x86" if (!regexmatch(mcode, "^([0-9]+),(" c ":|.*?," c ":)([^,]+)", &m)) return if (!DllCall("crypt32\CryptStringToBinary", "str", m.3, "uint", 0, "uint", e[m.1], "ptr", 0, "uint*", &s := 0, "ptr", 0, "ptr", 0)) return p := DllCall("GlobalAlloc", "uint", 0, "ptr", s, "ptr") if (c="x64") DllCall("VirtualProtect", "ptr", p, "ptr", s, "uint", 0x40, "uint*", &op := 0) if (DllCall("crypt32\CryptStringToBinary", "str", m.3, "uint", 0, "uint", e[m.1], "ptr", p, "uint*", &s, "ptr", 0, "ptr", 0)) return p DllCall("GlobalFree", "ptr", p) } This kind of workaround is not necessary when using MCL, which uses its built-in linker and loader to automatically identify function offsets and adjust the MCode at run-time so that references to other functions resolve correctly instead of remaining ''0''. For code that defines multiple functions to be called from AHK, those functions must be //exported// so that MCL can determine what should be included in the output MCode. Functions that are not exported, or used by exported code, will be trimmed from the output automatically. To export a function, you must include the MCL header file and then use the ''MCL_EXPORT'' macro. #Requires AutoHotkey v2.0 #Include lib := MCL.FromC(" ( #include MCL_EXPORT(Factorial) unsigned int Factorial(unsigned int a, unsigned int b) { if (a > 0) return Factorial(a - 1, b * a); else return b; } MCL_EXPORT(FactorialCaller) unsigned int FactorialCaller(unsigned int a) { return Factorial(a, 1); } )") loop 5 { MsgBox ( "FactorialCaller(" A_Index ") = " DllCall(lib.FactorialCaller, "UInt", A_Index, "Cdecl UInt") "`n" "Factorial(" A_Index ", 1) = " DllCall(lib.Factorial, "UInt", A_Index, "UInt", 1, "Cdecl UInt") "`n" ) } Optionally, the export macro can accept DllCall types as parameters. When these are specified, the exports will be exported as //pre-made wrappers// that can be called without having to pull out DllCall and specify types in your AutoHotkey code. #Requires AutoHotkey v2.0 #Include lib := MCL.FromC(" ( #include MCL_EXPORT(Factorial, UInt, a, UInt, b, Cdecl_UInt) unsigned int Factorial(unsigned int a, unsigned int b) { if (a > 0) return Factorial(a - 1, b * a); else return b; } MCL_EXPORT(FactorialCaller, UInt, a, Cdecl_UInt) unsigned int FactorialCaller(unsigned int a) { return Factorial(a, 1); } )") loop 5 { MsgBox ( "FactorialCaller(" A_Index ") = " lib.FactorialCaller(A_Index) "`n" "Factorial(" A_Index ", 1) = " lib.Factorial(A_Index, 1) "`n" ) } ==== 4. Global Variables ==== Although traditional MCode compilers like MCode4GCC did not have any tools for constructing and managing global variables inside your MCode, MCL supports this readily. Like exporting functions when working with multiple functions, you can export global variables with the ''MCL_EXPORT_GLOBAL'' macro. This provides the user with a pointer to where that global lives in RAM, so that it can be modified with NumPut and NumGet. #Requires AutoHotkey v2.0 #Include lib := MCL.FromC(" ( #include MCL_EXPORT_GLOBAL(myGlobal) int myGlobal = 1; MCL_EXPORT(alpha) int alpha() { return myGlobal; } )") MsgBox DllCall(lib.alpha, "Cdecl Int") NumPut("Int", 42, lib.myGlobal) MsgBox DllCall(lib.alpha, "Cdecl Int") Like with exporting functions, exporting a global can also export with typing information to form a wrapper. #Requires AutoHotkey v2.0 #Include lib := MCL.FromC(" ( #include MCL_EXPORT_GLOBAL(myGlobal, Int) int myGlobal = 1; MCL_EXPORT(alpha, Cdecl_Int) int alpha() { return myGlobal; } )") MsgBox lib.alpha() lib.myGlobal := 42 MsgBox lib.alpha() ==== 5. Callback to AHK ==== Although there are many things you can do with MCode, sometimes it is helpful or even necessary to callback to AutoHotkey for some actions. For example, it can be extremely valuable to call back to AutoHotkey for OutputDebug, MsgBox, or other debugging-oriented logging tools. AutoHotkey provides a tool to export its own functions to be consumed as a callback from C/%%C++%% APIs, [[https://www.autohotkey.com/docs/v2/lib/CallbackCreate.htm|CallbackCreate]], which can be used very easily with your own MCode functions as well. The simplest way to do this is to create your callback, then pass it as a parameter to the function. #Requires AutoHotkey v2.0 #Include lib := MCL.FromC(" ( void alpha(void (*MsgBoxInt)(int), void (*MsgBoxStr)(short*)) { MsgBoxInt(123); MsgBoxStr(L"abc"); MsgBoxInt(456); } )") pMsgBoxInt := CallbackCreate((i) => MsgBox(i), "Cdecl") pMsgBoxStr := CallbackCreate((p) => MsgBox(StrGet(p)), "Cdecl") DllCall( lib, "Ptr", pMsgBoxInt, "Ptr", pMsgBoxStr, "Cdecl" ) When working with MCL, you can do a little better by passing the functions as global variables instead of parameters. This lets you more easily add and remove debug functions to be used all throughout the MCode without having to update all your function calls each time. #Requires AutoHotkey v2.0 #Include lib := MCL.FromC(" ( #include MCL_EXPORT_GLOBAL(MsgBoxInt, Ptr) void (*MsgBoxInt)(int); MCL_EXPORT_GLOBAL(MsgBoxStr, Ptr) void (*MsgBoxStr)(short*); MCL_EXPORT(alpha, CDecl) void alpha() { MsgBoxInt(123); MsgBoxStr(L"abc"); MsgBoxInt(456); } )") lib.MsgBoxInt := CallbackCreate((i) => MsgBox(i), "Cdecl") lib.MsgBoxStr := CallbackCreate((p) => MsgBox(StrGet(p)), "Cdecl") lib.alpha() It may be tempting to try to use CallbackCreate in order to wrap mathematical functions like Sqrt, however this should be avoided normally because it will absolutely destroy any performance gains that you were aiming to achieve by using MCode in the first place. Instead, native machine code implementations of sqrt and other mathematical functions should be embedded or imported. ==== 6. Importing Functions ==== Writing C or %%C++%% code in an MCode environment, without any access to the standard libraries, can be extremely limiting. These limits can be eased significantly by importing key functions from DLLs that ship with Windows, or from third-party DLLs that you have on hand. The basic strategy is this: - Load the DLL using AutoHotkey - Fetch the function pointer using AutoHotkey - Pass that function pointer using the same strategy we did for CallbackCreate - Call that function from your MCode For example, we can import ''sqrt'' from ''msvcrt.dll'' in order to do some mathematical calculations: #Requires AutoHotkey v2.0 if !(hDll := DllCall("GetModuleHandle", "Str", "msvcrt", "Ptr")) throw OSError(,, "Failed to find DLL msvcrt") if !(pFunction := DllCall("GetProcAddress", "Ptr", hDll, "AStr", "sqrt", "Ptr")) throw Error(,, "Failed to find function sprintf from DLL msvcrt") ;double hypotenuse(double (*sqrt)(double), double a, double b) { ; return sqrt(a * a + b * b); ;} lib := MCode("2,x86:8g8QRCQI8g8QTCQQi0QkBPIPWcDyD1nJ8g9YwfIPEUQkBP/g,x64:8g9ZyfIPWdJmDyjB8g9Ywkj/4Q==") MsgBox DllCall(lib, "Ptr", pFunction, "Double", 3.0, "Double", 4.0, "Cdecl Double") MCode(mcode) { static e := Map('1', 4, '2', 1), c := (A_PtrSize=8) ? "x64" : "x86" if (!regexmatch(mcode, "^([0-9]+),(" c ":|.*?," c ":)([^,]+)", &m)) return if (!DllCall("crypt32\CryptStringToBinary", "str", m.3, "uint", 0, "uint", e[m.1], "ptr", 0, "uint*", &s := 0, "ptr", 0, "ptr", 0)) return p := DllCall("GlobalAlloc", "uint", 0, "ptr", s, "ptr") if (c="x64") DllCall("VirtualProtect", "ptr", p, "ptr", s, "uint", 0x40, "uint*", &op := 0) if (DllCall("crypt32\CryptStringToBinary", "str", m.3, "uint", 0, "uint", e[m.1], "ptr", p, "uint*", &s, "ptr", 0, "ptr", 0)) return p DllCall("GlobalFree", "ptr", p) } When working with MCL, importing functions like this can be handled automatically by the ''MCL_IMPORT'' macro. Any standalone loader generated by MCL that has an import like this will automatically include any AHK-side import loading code that would otherwise have to be written manually. Imports done this way are automatically added to the global scope, so they do not have to be passed as a parameter to DllCall. #Requires AutoHotkey v2.0 #include lib := MCL.FromC(" ( #include MCL_IMPORT(double, msvcrt, sqrt, (double)); double hypotenuse(double a, double b) { return sqrt(a * a + b * b); } )") MsgBox DllCall(lib, "Double", 3.0, "Double", 4.0, "Cdecl Double") MCL can also be used to import functions from third-party DLLs, such as lua54.dll. For more information about that, please refer to the [[libraries:machine_code:mcl|library page for MCL]]. ===== Compatibility Notes ===== MCode libraries generally support generating both 32 and 64 bit machine code, but they may not generate both at once. Although it is generally recommended to use AutoHotkey U64, when 32 bit is required there are some things to keep in mind. C functions in 32 bit machine code default to the "Cdecl" calling convention, which will cause memory leaks if not accounted for. To account for that, you must either adjust any ''DllCall'' lines to specify "Cdecl" (which you can read more about on the DllCall documentation page) //or// you can adjust your C function declaration to use the standard calling convention by adding ''_stdcall'' to its signature: int _stdcall SomeFunc(int arg1, int arg2) {} When exporting functions using the ''MCL_EXPORT'' macro, you can specify ''Cdecl_'' as a prefix to any type in order to ensure the Cdecl calling convention is used. For example, ''MCL_EXPORT(alpha, Int, a, Int, b, Cdecl_Int)''.