Cross-Platform Interacting with Windows, Titles, Keypresses, and More

  • report
    Disclaimer
    Click for Disclaimer
    This Post is over a year old (first published about 5 years ago). As such, please keep in mind that some of the information may no longer be accurate, best practice, or a reflection of how I would approach the same thing today.
  • infoFull Post Details
    info_outlineClick for Full Post Details
    Date Posted:
    Apr. 03, 2019
    Last Updated:
    Apr. 03, 2019
  • classTags
    classClick for Tags

One of the things that bothers me about development is when you come across a problem, and you think to yourself, “surely someone else has solved this before”, but then you start searching and keep coming up empty. This is the issue I had when starting a recent project that needed to do things like see which Window (application) is open and active in a cross-platform (Windows, Linux, Mac) way.

In fact, I had such a hard time finding existing cross-platform libraries that I started writing one myself, starting with the Windows side of things, but when the complexity got really  high, I revisited my search, and finally got some results. I wanted to share what I found, since as I said, these resources were not easy to track down.

Quick list of options:

So far I have only found three libraries that are cross-platform, although I’m still keeping an eye out. See the embedded spreadsheet below for a comparison of features, or click here to open in a new tab:

Most Robust: Robot (Robot/robot and Robot/robot-js)

I really wish I had found this library sooner. The Robot library is an incredibly thorough and well put-together library that can handle pretty much anything you would need in order to manipulate windows and send keypresses. What makes it even more impressive is that it appears to be written and maintained by just one very dedicated individual; David Krutsko. Kudos to him on building this amazing library.

Robot can be used as a C++ library, or a NodeJS library. I have just started playing around with it for C++, combined with QT, and I can share a tip related to getting it working. To get it to work for my project, I had to change a few things:

  • Changed compiler from MinGW to MSVC.
    • Robot uses the <threads> library, which, at least for the version of MinGW shipped with QT, is not available. So trying to compile I got a bunch of errors about mutex and threads. Someone put together a Thread library for MinGW, but I couldn’t get it to work, so I just gave up and switched over to MSVC, which basically works out of the box with this library.
  • Manually linked up wingdi.h / Gdi32.dll Windows GDI library
    • For Windows, Robot uses some bitmap / drawing functions that rely on the Windows gdi32 library. When trying to compile without any special setup, I got a bunch of linker errors related to the use of this library – things like “Clipboard.obj:-1: error: LNK2019: unresolved external symbol __imp_GetDIBits referenced in function “public: static bool __cdecl Robot::Clipboard::GetImage(class Robot::Image &)” (?GetImage@Clipboard@Robot@@SA_NAEAVImage@2@@Z)”
      • If you are using QMake like I am, just add “LIBS += -lgdi32” to either your project file, or in a PRI that gets loaded for this library.
      • Also make sure that QMake can find the Windows Kit folder where the DLL and header files live. You can always add it manually through “INCLUDEPATH +=”

Here is the PRI file that I created to pull in all the Robot source files into my QT project. Depending on how you have structured your project, you would need to change the ROBOT_LIB_SOURCEPATH variable.

# https://github.com/Robot/robot

ROBOT_LIB_SOURCEPATH = $$PWD/../lib/robot/Source
INCLUDEPATH += $$ROBOT_LIB_SOURCEPATH

# Rare use case will require uncommenting below - =1 forces new C++11 ABI
#DEFINES += "_GLIBCXX_USE_CXX11_ABI=0"

# Robot needs some extra win libs - primarily for clipboard and bitmap operations
INCLUDEPATH += "C:/Program Files (x86)/Windows Kits/8.1/Include"
LIBS += -lgdi32

HEADERS += \
    $$ROBOT_LIB_SOURCEPATH/Enum.h \
    $$ROBOT_LIB_SOURCEPATH/Bounds.h \
    $$ROBOT_LIB_SOURCEPATH/Clipboard.h \
    $$ROBOT_LIB_SOURCEPATH/Color.h \
    $$ROBOT_LIB_SOURCEPATH/Global.h \
    $$ROBOT_LIB_SOURCEPATH/Hash.h \
    $$ROBOT_LIB_SOURCEPATH/Image.h \
    $$ROBOT_LIB_SOURCEPATH/Keyboard.h \
    $$ROBOT_LIB_SOURCEPATH/Memory.h \
    $$ROBOT_LIB_SOURCEPATH/Module.h \
    $$ROBOT_LIB_SOURCEPATH/Mouse.h \
    $$ROBOT_LIB_SOURCEPATH/Point.h \
    $$ROBOT_LIB_SOURCEPATH/Process.h \
    $$ROBOT_LIB_SOURCEPATH/Range.h \
    $$ROBOT_LIB_SOURCEPATH/Robot.h \
    $$ROBOT_LIB_SOURCEPATH/Screen.h \
    $$ROBOT_LIB_SOURCEPATH/Size.h \
    $$ROBOT_LIB_SOURCEPATH/Timer.h \
    $$ROBOT_LIB_SOURCEPATH/Types.h \
    $$ROBOT_LIB_SOURCEPATH/Window.h

SOURCES += \
    $$ROBOT_LIB_SOURCEPATH/Bounds.cc \
    $$ROBOT_LIB_SOURCEPATH/Clipboard.cc \
    $$ROBOT_LIB_SOURCEPATH/Color.cc \
    $$ROBOT_LIB_SOURCEPATH/Hash.cc \
    $$ROBOT_LIB_SOURCEPATH/Image.cc \
    $$ROBOT_LIB_SOURCEPATH/Keyboard.cc \
    $$ROBOT_LIB_SOURCEPATH/Memory.cc \
    $$ROBOT_LIB_SOURCEPATH/Module.cc \
    $$ROBOT_LIB_SOURCEPATH/Mouse.cc \
    $$ROBOT_LIB_SOURCEPATH/Point.cc \
    $$ROBOT_LIB_SOURCEPATH/Process.cc \
    $$ROBOT_LIB_SOURCEPATH/Range.cc \
    $$ROBOT_LIB_SOURCEPATH/Screen.cc \
    $$ROBOT_LIB_SOURCEPATH/Size.cc \
    $$ROBOT_LIB_SOURCEPATH/Timer.cc \
    $$ROBOT_LIB_SOURCEPATH/Window.cc

Getting Robotjs (octaImage/robotjs) working with C++ and QT

Robotjs (not to be confused with Robot-JS by Robot) is a NodeJS desktop manipulation library written by octaImage, aka Jason Stallings. One of the neat things about this library, besides it being written in C, is that Jason did a really nice job of keeping the C/C++ layer separate from the Javascript / NodeJS layer. This layer of separation allows us to actually use the native C code in its raw form, before it gets compiled by node-gyp into a Node addon.

I was able to do this surprisingly easily with QT – I just setup a PRI file to import all the raw header and source files, making sure not to pull in the files that wrap functions for node (primarily robotjs.cc). Here is my PRI file, but note that I’m not pulling in 100% of the files, since I only wanted to test part of the library:

NODE_MODULESPATH = $$PWD/../node_modules
NODE_SOURCEPATH = $$PWD/../node_modules/robotjs/src
INCLUDEPATH += $$NODE_SOURCEPATH

# Special - seems to be an issue with a windows constant - could also be related to _WIN32_WINNT issue on active-windows-lib
DEFINES += "MOUSEEVENTF_HWHEEL=0x1000"

HEADERS += \
    windows.h \
    $$NODE_SOURCEPATH/os.h \
    $$NODE_SOURCEPATH/keycode.h \
    $$NODE_SOURCEPATH/keypress.h \
    $$NODE_SOURCEPATH/mouse.h \
    $$NODE_SOURCEPATH/screen.h \
    $$NODE_SOURCEPATH/types.h \
    $$NODE_SOURCEPATH/deadbeef_rand.h

SOURCES += \
    $$NODE_SOURCEPATH/keycode.c \
    $$NODE_SOURCEPATH/keypress.c \
    $$NODE_SOURCEPATH/mouse.c \
    $$NODE_SOURCEPATH/screen.c \
    $$NODE_SOURCEPATH/deadbeef_rand.c

I also had to set a constant, for MOUSEEVENTF_HWHEEL. I’ve moved on from using this library, so I can’t recall exactly why I had to add it, but I’m guessing the compiler was complaining about not knowing the value for it, which probably has to do with a screwed up path import on my end.

Understanding the Inner Workings

If you want to roll your own solution, or just understand more about the inner workings of these libraries and how you use system API calls to interact with Windows and inputs, I do actually have some advice for you.

First, start with the source code for the active-win library. It is well-commented and is basically the textbook example for how these system API calls are meant to be used, for Windows, Mac, and Linux.

Next, if you are developing for Windows, be aware that most of their desktop APIs use strings in a… ahem… “interesting way”. You will see strings passed around as “LPTSTR”, which is basically a macro that expands to either LPWSTR (long pointer to wide string, which is composed of wchar_t) or LPSTR (composed of char), based on if your project uses Unicode. This stack overflow answer has a good breakdown of some unique types that the Windows API uses.

For keypresses, be aware that although we have things like the Unicode standard so that characters and strings are encoded the same across platforms, actual keypresses are usually represented by enums/constants that are unique per platform. For example, Windows represents keypresses through “Virtual-Key Codes”, the constants for which can be found here. Meanwhile, Unix uses X11 and defines it own set of constants, which can be found here among other places. I started compiling a database of these different constants across platforms, which can be found here.

Finally, when looking for existing source code to reference and see how this problem has been tackled by others, don’t forget some interesting places to look. Software that is likely to use system APIs to interact with Windowed applications and keypresses is:

Know of something better?

If you feel like I missed something big, feel free to leave a comment below. It is always interesting to see how different solutions are engineered around a shared problem. I’m still looking for a cross-platform solution for key “hooks” – where your code can listen for a keypress regardless of whether or not it is headed to your application, and you can interrupt the press and prevent it from propagating. For example, KeePass, in part, manually implements Windows BlockInput, which is part of the standard User32 lib.

Leave a Reply

Your email address will not be published.