In a previous article, we looked at automation with Python using the Paramiko module. Paramiko provides an SSH client or server and works well for automating processes that use SSH connections, such as Linux, CISCO, or other command line-based operating systems. Python also contains other modules to interact directly with the Windows or Linux operating system from a command line that can be used to automate tasks that do not require an SSH connection. But what if we need to automate a task that requires interacting with a GUI? The answer is PyAutoGUI.
The following example assumes that you already have Python installed on your device.
Install PyAutoGUI ^
To use PyAutoGUI, we need to install it using pip with the command below:
pip install PyAutoGUI
There are quite a few dependencies for PyAutoGUI, and your list may be longer than what I have in the above screenshot. When it's all done, you should see everything that was installed:
Successfully installed PyTweening-1.0.4 mouseinfo-0.1.3 pyautoGUI-0.9.53 pygetwindow-0.0.9 pymsgbox-1.0.9 pyperclip-1.8.2 pyrect-0.2.0 pyscreeze-0.1.28
PyAutoGUI can also locate regions of the screen by comparing premade screenshots. This can be helpful in locating a button or other areas on the screen to click as part of our automation process. We will have a look at this in the example. To use this feature, we need to install the pillow module:
ip install pillow
Automate the Windows GUI ^
As mentioned above, PyAutoGUI works with Windows, Linux, and macOS. My examples in this tutorial focus on Windows.
In general, we can use PyAutoGUI to run the mouse and keyboard and locate certain areas of the screen either by a coordinate or by using a previously taken screenshot to have PyAutoGUI "find" its location. Let us review some of the common commands used in PyAutoGUI and then look at a simple example.
PyAutoGUI can move the mouse to any position on the screen using x and y coordinates. The upper left of the screen is position 0,0. It then moves right and down. Here are a few common commands to drive the keyboard and mouse:
pyautogui.position()—Returns the current mouse position
pyautogui.size()—Returns the current screen size
pyautogui.moveTo(x, y, duration = numberofseconds)—Moves the mouse to the coordinates defined by x and y; duration defines how "fast" the mouse moves.
pyautogui.click(x, y, clicks, interval = seconds_between_clicks, button = 'left')—Clicks the mouse button (left or right) at a location defined by the coordinates of x and y. You can also define the interval between clicks and the button to click. It can also be used without parameters to click the mouse at its current location.
pyautogui.click(path to an image file)—pyautogui will find the screen location defined by the image and click the mouse on the center of the image.
pyautogui.write('This is text')—Writes the text "This is text" at the current cursor location.
pyautogui.press('enter')—Presses the Enter key.
These are just some of the commands you can use. Check out the documentation for more information.
Now, let's take some of these commands and look at a simple automation example. In this example, we will open the Windows notepad application, type some text, and then close the notepad and answer the dialog box that appears when we close the application.
An explanation of the code follows below.
# Simple GUI automation Example using PyAutoGui # By John Kull import os import pyautogui as pag from time import sleep os.startfile("C:\\WINDOWS\\system32\\notepad.exe") sleep(1) pag.click('C:\\Users\\johnk\\notepad1.png') sleep(1) pag.write("This is a Test!") sleep(1) pag.hotkey('alt', 'f') sleep(1) pag.press('x') sleep(1) pag.click('C:\\Users\\johnk\\dont-save.png')
Lines 1 and 2 are comments.
Lines 4–6 import the OS, pyautogui, and the time module. We will use the OS module to allow us to call the notepad.exe file to launch Notepad. We will also import the sleep function from the time module to allow us to add some delays to the program. Notice I imported pyautogui as pag. This allows us to use that abbreviation in our code. Now I can type pag in place of pyautogui in the code.
I often add delays to my code to allow time for a screen to appear when I am automating multiple screens. This is a primitive technique, but it does work well in most circumstances. As you advance in your Python skills, you can build code to check for an image to appear before moving on.
Lines 8 and 9 call notepad.exe and then pause for 1 second.
Lines 10 and 11 use the click function to click the top of the Notepad title bar and then pause for 1 second. By clicking the title bar, we make sure our script is focused on the Notepad window. I created the screenshot below and then used the pag.click function to have PyAutoGUI find that image and click it.
Lines 12 and 13—Now that we are focused on the Notepad application, we write the text "This is a Test!" in the Notepad window and then pause for 1 second.
Lines 14 and 15 use the hotkey function to press "alt" and "f" to bring up the File menu in Notepad and then pause for 1 second.
Lines 16 and 17—Presses x to close the program and pause for 1 second.
Line 18—Looks for the image below from the Close File dialog box and clicks it to close the program without saving. We could have also used the press function to press "n," as it is underlined in the dialog box to indicate a keyboard shortcut. In general, I use keyboard shortcuts when they are available, as they can be easier than creating the screenshot; however, locating the screenshot has the advantage of making sure the dialog box is present. Either way can work.
In this article, we examined how we can use the Python module PyAutoGUI to accomplish Windows GUI automation. We reviewed some common commands used with PyAutoGUI and looked at a simple demo that utilized many of the common commands available in PyAutoGUI. We have only scratched the surface with this demo. Now it is your turn. What will you automate?