One of the things that bothers me about development is when you come across a problem, and you think to yourself, “surely someone else has solved this before”, but then you start searching and keep coming up empty. This is the issue I had when starting a recent project that needed to do things like see which Window (application) is open and active in a cross-platform (Windows, Linux, Mac) way.
In fact, I had such a hard time finding existing cross-platform libraries that I started writing one myself, starting with the Windows side of things, but when the complexity got really high, I revisited my search, and finally got some results. I wanted to share what I found, since as I said, these resources were not easy to track down.
Quick list of options:
So far I have only found three libraries that are cross-platform, although I’m still keeping an eye out. See the embedded spreadsheet below for a comparison of features, or click here to open in a new tab:
Most Robust: Robot (Robot/robot and Robot/robot-js)
I really wish I had found this library sooner. The Robot library is an incredibly thorough and well put-together library that can handle pretty much anything you would need in order to manipulate windows and send keypresses. What makes it even more impressive is that it appears to be written and maintained by just one very dedicated individual; David Krutsko. Kudos to him on building this amazing library.
Robot can be used as a C++ library, or a NodeJS library. I have just started playing around with it for C++, combined with QT, and I can share a tip related to getting it working. To get it to work for my project, I had to change a few things:
- Changed compiler from MinGW to MSVC.
- Robot uses the <threads> library, which, at least for the version of MinGW shipped with QT, is not available. So trying to compile I got a bunch of errors about mutex and threads. Someone put together a Thread library for MinGW, but I couldn’t get it to work, so I just gave up and switched over to MSVC, which basically works out of the box with this library.
- Manually linked up wingdi.h / Gdi32.dll Windows GDI library
- For Windows, Robot uses some bitmap / drawing functions that rely on the Windows gdi32 library. When trying to compile without any special setup, I got a bunch of linker errors related to the use of this library – things like “Clipboard.obj:-1: error: LNK2019: unresolved external symbol __imp_GetDIBits referenced in function “public: static bool __cdecl Robot::Clipboard::GetImage(class Robot::Image &)” (?GetImage@Clipboard@Robot@@SA_NAEAVImage@2@@Z)”
- If you are using QMake like I am, just add “LIBS += -lgdi32” to either your project file, or in a PRI that gets loaded for this library.
- Also make sure that QMake can find the Windows Kit folder where the DLL and header files live. You can always add it manually through “INCLUDEPATH +=”
- For Windows, Robot uses some bitmap / drawing functions that rely on the Windows gdi32 library. When trying to compile without any special setup, I got a bunch of linker errors related to the use of this library – things like “Clipboard.obj:-1: error: LNK2019: unresolved external symbol __imp_GetDIBits referenced in function “public: static bool __cdecl Robot::Clipboard::GetImage(class Robot::Image &)” (?GetImage@Clipboard@Robot@@SA_NAEAVImage@2@@Z)”
Here is the PRI file that I created to pull in all the Robot source files into my QT project. Depending on how you have structured your project, you would need to change the ROBOT_LIB_SOURCEPATH variable.
# https://github.com/Robot/robot
ROBOT_LIB_SOURCEPATH = $$PWD/../lib/robot/Source
INCLUDEPATH += $$ROBOT_LIB_SOURCEPATH
# Rare use case will require uncommenting below - =1 forces new C++11 ABI
#DEFINES += "_GLIBCXX_USE_CXX11_ABI=0"
# Robot needs some extra win libs - primarily for clipboard and bitmap operations
INCLUDEPATH += "C:/Program Files (x86)/Windows Kits/8.1/Include"
LIBS += -lgdi32
HEADERS += \
$$ROBOT_LIB_SOURCEPATH/Enum.h \
$$ROBOT_LIB_SOURCEPATH/Bounds.h \
$$ROBOT_LIB_SOURCEPATH/Clipboard.h \
$$ROBOT_LIB_SOURCEPATH/Color.h \
$$ROBOT_LIB_SOURCEPATH/Global.h \
$$ROBOT_LIB_SOURCEPATH/Hash.h \
$$ROBOT_LIB_SOURCEPATH/Image.h \
$$ROBOT_LIB_SOURCEPATH/Keyboard.h \
$$ROBOT_LIB_SOURCEPATH/Memory.h \
$$ROBOT_LIB_SOURCEPATH/Module.h \
$$ROBOT_LIB_SOURCEPATH/Mouse.h \
$$ROBOT_LIB_SOURCEPATH/Point.h \
$$ROBOT_LIB_SOURCEPATH/Process.h \
$$ROBOT_LIB_SOURCEPATH/Range.h \
$$ROBOT_LIB_SOURCEPATH/Robot.h \
$$ROBOT_LIB_SOURCEPATH/Screen.h \
$$ROBOT_LIB_SOURCEPATH/Size.h \
$$ROBOT_LIB_SOURCEPATH/Timer.h \
$$ROBOT_LIB_SOURCEPATH/Types.h \
$$ROBOT_LIB_SOURCEPATH/Window.h
SOURCES += \
$$ROBOT_LIB_SOURCEPATH/Bounds.cc \
$$ROBOT_LIB_SOURCEPATH/Clipboard.cc \
$$ROBOT_LIB_SOURCEPATH/Color.cc \
$$ROBOT_LIB_SOURCEPATH/Hash.cc \
$$ROBOT_LIB_SOURCEPATH/Image.cc \
$$ROBOT_LIB_SOURCEPATH/Keyboard.cc \
$$ROBOT_LIB_SOURCEPATH/Memory.cc \
$$ROBOT_LIB_SOURCEPATH/Module.cc \
$$ROBOT_LIB_SOURCEPATH/Mouse.cc \
$$ROBOT_LIB_SOURCEPATH/Point.cc \
$$ROBOT_LIB_SOURCEPATH/Process.cc \
$$ROBOT_LIB_SOURCEPATH/Range.cc \
$$ROBOT_LIB_SOURCEPATH/Screen.cc \
$$ROBOT_LIB_SOURCEPATH/Size.cc \
$$ROBOT_LIB_SOURCEPATH/Timer.cc \
$$ROBOT_LIB_SOURCEPATH/Window.cc
Getting Robotjs (octaImage/robotjs) working with C++ and QT
Robotjs (not to be confused with Robot-JS by Robot) is a NodeJS desktop manipulation library written by octaImage, aka Jason Stallings. One of the neat things about this library, besides it being written in C, is that Jason did a really nice job of keeping the C/C++ layer separate from the Javascript / NodeJS layer. This layer of separation allows us to actually use the native C code in its raw form, before it gets compiled by node-gyp into a Node addon.
I was able to do this surprisingly easily with QT – I just setup a PRI file to import all the raw header and source files, making sure not to pull in the files that wrap functions for node (primarily robotjs.cc). Here is my PRI file, but note that I’m not pulling in 100% of the files, since I only wanted to test part of the library:
NODE_MODULESPATH = $$PWD/../node_modules
NODE_SOURCEPATH = $$PWD/../node_modules/robotjs/src
INCLUDEPATH += $$NODE_SOURCEPATH
# Special - seems to be an issue with a windows constant - could also be related to _WIN32_WINNT issue on active-windows-lib
DEFINES += "MOUSEEVENTF_HWHEEL=0x1000"
HEADERS += \
windows.h \
$$NODE_SOURCEPATH/os.h \
$$NODE_SOURCEPATH/keycode.h \
$$NODE_SOURCEPATH/keypress.h \
$$NODE_SOURCEPATH/mouse.h \
$$NODE_SOURCEPATH/screen.h \
$$NODE_SOURCEPATH/types.h \
$$NODE_SOURCEPATH/deadbeef_rand.h
SOURCES += \
$$NODE_SOURCEPATH/keycode.c \
$$NODE_SOURCEPATH/keypress.c \
$$NODE_SOURCEPATH/mouse.c \
$$NODE_SOURCEPATH/screen.c \
$$NODE_SOURCEPATH/deadbeef_rand.c
I also had to set a constant, for MOUSEEVENTF_HWHEEL. I’ve moved on from using this library, so I can’t recall exactly why I had to add it, but I’m guessing the compiler was complaining about not knowing the value for it, which probably has to do with a screwed up path import on my end.
Understanding the Inner Workings
If you want to roll your own solution, or just understand more about the inner workings of these libraries and how you use system API calls to interact with Windows and inputs, I do actually have some advice for you.
First, start with the source code for the active-win library. It is well-commented and is basically the textbook example for how these system API calls are meant to be used, for Windows, Mac, and Linux.
Next, if you are developing for Windows, be aware that most of their desktop APIs use strings in a… ahem… “interesting way”. You will see strings passed around as “LPTSTR”, which is basically a macro that expands to either LPWSTR (long pointer to wide string, which is composed of wchar_t) or LPSTR (composed of char), based on if your project uses Unicode. This stack overflow answer has a good breakdown of some unique types that the Windows API uses.
For keypresses, be aware that although we have things like the Unicode standard so that characters and strings are encoded the same across platforms, actual keypresses are usually represented by enums/constants that are unique per platform. For example, Windows represents keypresses through “Virtual-Key Codes”, the constants for which can be found here. Meanwhile, Unix uses X11 and defines it own set of constants, which can be found here among other places. I started compiling a database of these different constants across platforms, which can be found here.
Finally, when looking for existing source code to reference and see how this problem has been tackled by others, don’t forget some interesting places to look. Software that is likely to use system APIs to interact with Windowed applications and keypresses is:
- Automated testing software
- “Mocking” software
- Password database apps
- For example, you can see how KeePass uses some native Windows API calls to see which window is open and send passwords to the right one.
- Screenshot apps
- You can see here how Greenshot uses Dapplo.Windows to register system level keypress hooks – so they can listen for the Print Screen Key to be pressed, as well as custom combos.
- Productivity trackers (ones that automatically record which applications you are spending time in)
- Good example: ActivityWatch – Python source code
- Browsers
Know of something better?
If you feel like I missed something big, feel free to leave a comment below. It is always interesting to see how different solutions are engineered around a shared problem. I’m still looking for a cross-platform solution for key “hooks” – where your code can listen for a keypress regardless of whether or not it is headed to your application, and you can interrupt the press and prevent it from propagating. For example, KeePass, in part, manually implements Windows BlockInput, which is part of the standard User32 lib.