Computer Vision (CV)

Definition

The human eye is an amazing evolutionary system. It gives us the ability to see patterns, shapes, recognize faces and much, much more. It involves the development of a theoretical and algorithmic basis to achieve automatic visual understanding. In order to achieve this, computer vision uses a range of algorithms and machine learning principles to recognize, interpret and understand images.

The usage of computer vision range from tasks such as industrial machine vision systems in the likes of inspecting bottles speeding by on a production line, to research into artificial intelligence and computers or robots that can comprehend the world around them. For computer vision to be effective in daily use it needs to be trained.

Computer Vision in IA / RPA

In Intelligent Automation, computer vision has a range of use cases from the complex to the simple. In simple use cases, it is used to work with systems to recognize where a button is on a screen and where it needs to click, and in complex use cases, it can be used to recognize when a car is committing a parking violation.

Ultimately, computer vision opens up a whole new set of possibilities for interactions. Providing digital workers with the ability to not only see, but if trained broadly, the ability to recognize the intent of a UI design if a search button is replaced by a magnifying glass, or in a more complex situation mimics the real-life patterns that people usually carry out.

Document Understanding

Document understanding uses artificial intelligence (AI) models to automate classification of files and extraction of information. It works best with unstructured documents, such as letters or contracts. These documents must have text that can be identified based on phrases or patterns. The identified text designates both the type of file it is (its classification) and what you'd like to extract (its extractors). In RPA world with Document Understanding, robots can read, extract, interpret, and act upon data from the documents using artificial intelligence (AI).

For documents with fixed structure - like forms, passports, or licenses - it’s enough to create rules or templates which will work for thousands of similar documents with no need for AI. At the same time, documents with varying layouts or with no fixed structure - like receipts, bills, or resumes - require advanced AI skills which can automatically determine the location of data even if the layout changes.