Skip to main content
UI

UiPath

Let UiPath robots work reliably across dynamic UIs and VDIs

UiPath AI Computer Vision helps RPA robots recognize and interact with on-screen elements when selectors are brittle or unavailable. It is aimed at teams building automations for virtual desktops, remote apps, and other dynamic interfaces.

iOS
Vision
B2B
Computer Use
Cloud Hosted
Hybrid
For Teams
Visit UiPath

Is this your tool? Claim this listing to manage your content and analytics.

Ask about UiPath

Get answers based on UiPath's actual documentation

Try asking:

About

What It Is

UiPath AI Computer Vision is a computer-vision capability inside the UiPath RPA platform. It is designed for developers and automation teams that need robots to interact with interfaces that are difficult to automate with traditional selectors, especially virtual desktop infrastructure (VDI) environments and remote desktops.

According to UiPath, you start by building automations in UiPath and using AI Computer Vision activities and recorder-based workflows. It supports deployment through UiPath’s SaaS offering, with on-premises options for Linux and Windows, and a desktop/local-server option mentioned in the documentation.

What to Know

This is a practical automation layer rather than a general-purpose AI agent. It works well when the main challenge is UI recognition across dynamic screens, images, PDFs, Flash/Silverlight-era interfaces, and VDI streams. UiPath says the system uses a neural network with custom Screen OCR, text matching, and a multi-anchoring system.

It is less clear how much of the broader UiPath platform is required, and pricing was not publicly available on the page beyond a free trial prompt. This is not the right tool if you want a standalone consumer assistant or a model-agnostic agent platform. It is proprietary, and the content does not mention MCP support or local model support.

Key Features
Recognizes on-screen elements using computer vision instead of only selectors
Supports automation in VDI environments such as Citrix, VMware, Microsoft RDP, and VNC
Works across desktop and web applications
Handles dynamic UI elements like tables, drop-downs, and checkboxes
Supports scrollable content through run-time auto-scroll
Use Cases
Automating legacy or dynamic enterprise apps where selectors are unreliable
Building RPA workflows for Citrix or other VDI-based desktops
Handling image-heavy interfaces, PDFs, and other hard-to-selector content
Agenticness: Reactive Tool

Responds to prompts but takes no autonomous action.

High evidence
Last evaluated: Mar 28, 2026

Dimension Breakdown

Action Capability
Autonomy
Adaptation
State & Memory
Safety

Categories

Pricing
  • Free: Free trial available via UiPath
  • Pro: Pricing not publicly available
  • Enterprise: Contact sales
Details
AddedJanuary 16, 2026
RefreshedMarch 28, 2026
Quick Facts
DeploymentHybrid (cloud + self-hosted)
AutonomySemi-autonomous
Model supportSingle model
Open sourceNo
Team supportEnterprise
Pricing modelSubscription
Interfacegui, api
Similar tools

Related Tools

Stagehand is an open-source browser automation SDK built for developers and LLM-powered agents. It combines code-based browser control with natural-language actions so you can build web workflows that are more resilient to page changes.

Open Source
Web
API
+4

Fireflies.ai connects to Webex to record, transcribe, and summarize meetings automatically. It also extracts action items and makes notes searchable and shareable across team tools.

Paid
Web
Voice
+4

browser-use helps you build agents that interact with websites, fill forms, and complete web tasks. It supports both a self-hosted open-source library and a cloud option for faster setup and scaling.

Open Source
iOS
Web Browsing
+4

Perplexity Computer orchestrates 19 different AI models simultaneously — routing each subtask to the optimal model (Claude for reasoning, Gemini for deep research, GPT for long context). Tasks run in isolated Firecracker microVMs with 400+ app integrations and can persist for hours, days, or months. Requires Perplexity Max ($200/month).

iOS
API
Web Browsing
+4