UiPath Activities Guide

About the AI Computer Vision Activities Pack


The first incarnation of the Computer Vision Activities pack is released in Beta and can only be downloaded from the Beta feed. For more information on how to install the beta feed, please go to our Forum here.

The AI Computer Vision pack contains refactored fundamental UiAutomation activities such as Click, Type Into, or Find Text. The main difference between the CV activities and their classic counterparts is their usage of the Computer Vision neural network developed in-house by our Machine Learning department. The neural network is able to identify UI elements such as buttons, text input fields, or check boxes without the use of selectors.

Created mainly for automation in virtual desktop environments, such as Citrix machines, these activities bypass the issue of inexistent or unreliable selectors, as they send images of the window you are automating to the neural network, where it is analyzed and all UI elements are identified and labeled according to what they are. Smart anchors are used to pinpoint the exact location of the UI element you are interacting with, ensuring the action you intend to perform is successful.

Installing the UiPath.AI.ComputerVision.Activities Pack

The UiPath.AI.ComputerVision.Activities package requires the UiPath.UIAutomation.Activities v19.2.0 package or above as a dependency for the current project. Due to a known issue, once a new project is created, you must first remove the UiPath.UIAutomation.Activities pack that is installed by default as a dependency in a new project.

After the UIAutomation package is uninstalled, install the UiPath.AI.ComputerVision.Activities pack from the Beta feed in the Package Manager. Please note that in order to find the package in the Beta feed you must select Include Prerelease in the Package Manager. This automatically installs the compatible UiPath.UIAutomation.Activities package.

Using the Computer Vision Activities

All of the activities in this pack only function when inside a CV Screen Scope activity, which establishes the actual connection to the neural network server, thus enabling you to analyze the UI of the apps you want to automate. Thus, any workflow using the Computer Vision activities must begin with dragging a CV Screen Scope activity to the Designer panel. Once this is done, the Indicate on screen button in the body of the scope activity can be used to select the area of the screen that you want to work in.


Double-clicking the informative screenshot displays the image that has been captured and highlights in purple all of the UI elements that have been identified by the neural network and OCR engine.


Area selection can also be used to select only a portion of the UI of the application you want to automate. This is especially useful in situations where there are multiple text fields that have the same label and cannot be properly identified.

Next, depending on the action you want to perform, you must drag a CV activity in the body of the CV Screen Scope container. For this example we use a CV Click activity. In the body of the CV Click activity you can choose what kind of UI element you want to click, and in this example, we want to click the Cash In input field, so we select the InputBox UI element and indicate it on screen. As you can easily notice, the input field is highlighted, as the neural network recognizes it as such.

After indicating, a Context Aware Anchor activity is automatically created, containing a Find CV Text activity, which holds the anchor for the input field, as well as the CV Click activity which clicks it.

One of the main functionalities of the new CV Click activity is that it can easily turn into whatever kind of Click activity you need. For example, trying to click a text label generates a CV Click Text activity, while trying to click a selected area generates a CV Click activity configured to click the selected image.

In case a reliable anchor cannot be automatically selected for the specified UI element, the user is prompted to manually choose an anchor that is more appropriate; the selected UI element is highlighted in red and all the other UI elements that are similar are highlighted yellow. Once an anchor is selected, if it is not appropriate, it is highlighted in red and all the similar anchors are highlighted in yellow.


If it is not possible to select an appropriate anchor by clicking a label, the best practice in this case is to box-select the entire area containing the UI element you want to interact with and the label as the anchor. This uses the image of the entire area to find the UI element.


Please remember that whenever you choose to submit errors in the behavior of the neural network, you are helping it learn and indirectly helping us give you a better product. Submit as many issues as you can, as this gives us the opportunity to acknowledge and fix them!