COM APIs

Via Microsoft

The Microsoft Component Object Model (COM) is a platform-independent, distributed, object-oriented system for creating binary software components that can interact. COM is the foundation technology for Microsoft's OLE (compound documents), ActiveX (Internet-enabled components), as well as others.

To understand COM (and therefore all COM-based technologies), it is crucial to understand that it is not an object-oriented language but a standard. Nor does COM specify how an application should be structured; language, structure, and implementation details are left to the application developer. Rather, COM specifies an object model and programming requirements that enable COM objects (also called COM components, or sometimes simply objects) to interact with other objects. These objects can be within a single process, in other processes, and can even be on remote computers. They can be written in different languages, and they may be structurally quite dissimilar, which is why COM is referred to as a binary standard; a standard that applies after a program has been translated to binary machine code.

COM (the Component Object Model) is an object-oriented standard for designing objects (collections of function code called methods, and variable-like data called properties) to be accessible and usable across different programming languages. COM was designed by Microsoft and is reliant on the Windows API to function. Following the COM design, an application registers a "Component" with Windows that third-party applications can call on to retrieve an instance of a COM Object. When interacting with that object, the Windows API will automatically handle the communication between your application and the third-party software, allowing you to treat the object like a native object and not worry about inter-process communication (IPC).

A COM component registered by a software package provides one or more globally unique identifiers (GUIDs), which are a 128-bit numbers formatted in hex like 123e4567-e89b-12d3-a456-426614174000. A GUID identifying an entire component is called a CLSID (Class Identifier). A GUID identifying a single interface implemented offered by the component is called an IID (Interface Identifier). The CLSID is registered with Windows, sometimes along with a human-readable program ID such as InternetExplorer.Application, WinHttp.WinHttpRequest.5.1, WScript.Shell, or AutoCAD.Application. Windows keeps the list of CLSIDs in the Registry, so that they can be looked up later. That CLSID, or when available the human-readable program ID, is specified by your own application when asking Windows for an instance object representing one of those registered components. Upon this request, Windows will handle loading any third-party code transparently, allowing you to use the object without worrying about implementation details like DLL management.

When your application retrieves an object from a COM component, that object can either connect to a new instance of the third-party code, or connect to an existing instance of the third-party code. For example, a COM Object registered by Microsoft Office could be requested to allow your application to manipulate documents in the background independently of any running Office applications. Or it could connect to a running application to allow manipulation of the document in a visible application window.

Because of its flexibility, COM forms the basis of OLE (Object Linking and Embedding), ActiveX, Active Scripting, and even DirectX.

OLE: Under OLE, COM facilitates the embedding of objects representing documents of one type, within another type of document. For example, embedding a piece of media in a slideshow. Or embedding a spreadsheet in a text document.

ActiveX: Under ActiveX, COM facilitates the embedding of software within other software. For example, embedding a Java applet or a Flash player within a web browser. Or embedding a web browser within another desktop application.

Active Scripting: Originally known as ActiveX Scripting, Active Scripting is a framework for developing scripting languages to take advantage of powerful high-level COM interfaces. In the early days of the web, this would have allowed scripting languages other than JavaScript to be embedded into a web page. Today, it powers Microsoft Office's VBScript macros. In many aspects, AutoHotkey is considered to be an Active Scripting language. Its objects build off the COM base, allowing them to be passed seamlessly back and forth between AutoHotkey and third-party COM components.

In short, COM was Microsoft's primary solution for communication between software packages in the years before leaning into before their .NET Common Language Runtime. A lot of the Windows API and third-party software still supports COM interfaces, and utilizing those interfaces will allow you to do amazing things. AutoHotkey, especially AutoHotkey v2, has a variety of tools for interacting with those interfaces, if only you take the time to learn how to use them.

Anatomy of a COM Object

A COM Object follows the C++ ABI for objects. COM objects are composed of structured data, and what is known as a virtual method table (vtable).

The virtual method table is an array of pointers to __stdcall functions. They are arranged in the order they are declared in headers. Each method is implemented by a regular function where the first parameter is "This", a pointer to the structured data of the object. All COM objects derive from the IUnknown interface, so a basic IUnknown-compatible COM object's vtable would look like this:

// Interface Identifier (IID) {00000000-0000-0000-C000-000000000046}
typedef struct IUnknownVtbl {
	__stdcall HRESULT(*QueryInterface)(IUnknown *This, ...); // From IUnknown
	__stdcall ULONG(*AddRef)(IUnknown *This, ...); // From IUnknown
	__stdcall ULONG(*Release)(IUnknown *This, ...); // From IUnknown
} IUnknownVtbl;

And an object with this interface would look like this:

typedef struct IUnknown {
	IUnknownVtbl* vtbl;
	... // any data fields go here
} IUnknown;

So when you have a (pointer to a) COM object pObject of type IUnknown, you could call its method "QueryInterface" by:

  1. Retrieving the vtable: pObjectVtbl := NumGet(pObject, 0, "Ptr")
  2. Retrieving the function reference at index 0: pObjectQueryInterface := NumGet(pObjectVtbl, 0 * A_PtrSize, "Ptr")
  3. Calling the function passing the object as the first parameter: DllCall(pObjectQueryInterface, "Ptr", pObject, …)

(or in AHKv2, by using ComCall which performs all those steps for you)

IUnknown is the most basic of COM Object interfaces, but to perform useful work it is typically necessary to work with objects that extend IUnknown, such as IDispatch. With an interface that extends another, the vtable will start with the functions from the original interface and then continue into the new extended functions. For IDispatch, this means its vtable would look like this:

// Interface Identifier (IID) {00020400-0000-0000-C000-000000000046}
typedef struct IDispatchVtbl {
	// From IUnknown
	__stdcall HRESULT(*QueryInterface)(IDispatch *This, ...);
	__stdcall ULONG(*AddRef)(IDispatch *This);
	__stdcall ULONG(*Release)(IDispatch *This);
 
	// From IDispatch
	__stdcall HRESULT(*GetTypeInfoCount)(IDispatch *This, ...);
	__stdcall HRESULT(*GetTypeInfo)(IDispatch * This, ...);
	__stdcall HRESULT(*GetIDsOfNames)(IDispatch *This, ...);
	__stdcall HRESULT(*Invoke)(IDispatch *This, ...);
} IDispatchVtbl;

Therefore, the indexes of the IDispatch methods in the vtable start 3 not 0. This is very important to keep in mind when looking for indexes from headers posted online. For example, it is often helpful to perform Google searches such as IDispatchVtbl filetype:h to find header files like this one. Instead of showing that it begins with the IUnknown functions, it just has the text BEGIN_INTERFACE which, while it's likely easier to write and manage, it is not very useful to us the readers.


COM Object Reference [AutoHotkey v1.1+] (archived forum)

Inspection of IDispatch COM objects using Powershell