If you’re not familiar, “imphash” stands for “import hash” of all imported libraries in a Windows Portable Executable (PE) file. You can get started playing with it quickly with its python implementation here:
To calculate an “imphash,” all imported libraries and their linked functions are dumped in string format, concatenated, then cryptographically hashed. Virus Total is also doing this against the PE files it sees in its daily submissions, so it’s important to understand how this works and why.
Why calculate this?
Simple: malware authors are humans, and humans stick with what they know. They go to the same watering holes. So, when a malware author tweaks a few things to throw off a signature of their malicious EXE or DLL, odds are high that their imphash will remain identical. This makes imphash a good way of tracking the life cycle of malware families and their authors. Read more on this on FireEye’s blog:
How does this work in code?
Suppose you have the following snippet at the top of your C++ .cpp file of your malware:
#pragma comment(lib, “WtsApi32.lib”)
There are other ways to instruct your C++ linker to include a library, but they require modifying the project files and specifying import directories, so they’re not as portable and not as clear to the next developer who reads your code as the #pragma statement which quite clearly indicates you need to pull in WtsApi32.lib.
This #pragma comment to include that library is an import, so this will show up in the list of imports that the python pefile project will kick out.
pe = pefile.PE(‘path/to/exe.exe’)
for entry in pe.DIRECTORY_ENTRY_IMPORT:
print(‘\t’ + entry.dll.decode(‘utf-8’))
You should see WTSAPI32.DLL in the list of imports if you ran this python against our hypothetical malware snippet after it’s compiled to an EXE file. Now, that is a Microsoft API, so it’s not unrealistic that good software will use it, however, it’s not as common to use that particular API as some of the others. Perhaps that single API isn’t the problem, it’s the combination of that imported API along with the other set that you pull in which makes your malware unique.
What are the options for a malware developer?
One option is to tell yourself: “I’ve got a fever, and the only prescription is more cowbell, ahem … stages.” Then proceed to modularize your malware, loading each module into memory only when needed, and keep your base stages simple so their imphashes are not that unique. The cons to this approach are: it’s complicated, and while a good practice for malware development, a seasoned developer will admit that even their later stage modules will eventually get captured in PE form by a competent forensicator.
Another option is to dynamically load all the things: keep your imports as small as your compiler will let you and bring any Microsoft or third-party imports into memory at runtime, rather than broadcasting them in your imports table in your PE file. Sound complicated? It’s probably not as complicated as you might think.
Looking at our example C++ import (the pragma statement above), suppose we did it this way …
// comment this out, we won’t need it any more:
//#pragma comment(lib, “WtsApi32.lib”)
For simple imports, this may be all you need. Re-run the python3 pefile example above, and you’ll note that WTSAPI32.DLL will drop out of the list, and the imphash is now different.
But for more complicated use cases, you’ll need to not only load the library into memory, but you’ll also need to find and use specific functions in that DLL, like this example:
// declare the type definition of CryptStringToBinaryA (the Base64 function) from Crypt32
// this method signature is literally copy/paste from the actual CryptStringToBinaryA function
// all we have to do is follow the typedef pattern with the __stdcall pointer to set it up:
typedef BOOL(__stdcall *pCryptStringToBinaryA)(
// get a handle on Crypt32.dll:
HMODULE hCrypt32 = LoadLibrary(“Crypt32.dll”);
// get a pointer to CryptStringToBinaryA() based on our previous type definition:
pCryptStringToBinaryA pCrypt = (pCryptStringToBinaryA)GetProcAddress(hCrypt32, “CryptStringToBinaryA”);
// now we can use our pointer just like we would have called the imported function:
if (!pCrypt(“some base64 string here”, 0, CRYPT_STRING_BASE64, …
In this more complicated example, we have removed the import of CRYPT32.DLL from the prying eyes of imphash and pefile. At runtime, our malicious little program will locate the DLL on the victim host’s file system, load it up, and make it usable. It wasn’t that difficult to do, really.
An astute reader will note that we just traded Crypt32 from the PE file’s import table in exchange for “Crypt32.dll” and “CryptStringToBinaryA” strings in the PE file. Yes, we did. But there are other ways to obfuscate strings in a binary (topic for another day).
False Flag Imphashes
Suppose that, during your cyber espionage, you want to plant a false flag malware artifact on a targeted host, and you know that imphash calculation will be used by the defenders during forensics. And suppose you have a malware artifact with a very unique imphash that you’d like to emulate. Maybe you’re a red team doing adversary emulation, and this is a training objective. But maybe you’re a nation state trying to implicate another nation state. Or maybe a criminal group trying to “drop a dime” on somebody else.
With the sample artifact in hand, you can walk its import table using the python3 pefile library and then simply add references in your C++ to match, making sure that you dynamically load any libraries you need to carry out your tasking that are NOT in the target artifact’s import table. Watch out for compiler/linker optimizations — you may need to do more than make a reference, you may need a code path that actually touches functions in those namespaces so they don’t get optimized out of your final PE file. This approach won’t necessarily make your malware look entirely like the other artifact, just the imphash, but it might be the tool you need in your toolbox for that operation.
Happy compiling and linking …