What is computer vision? (machine learning)


Computer vision is a field of machine learning that uses techniques such as image processing and deep learning to identify and interpret patterns in digital images. It is the process of teaching computers how to interpret and understand the visual world just like humans do. Computer vision is used in many applications such as autonomous vehicles, facial recognition, medical imaging, robotics, and manufacturing.


AdvancedArtificial intelligence


AdvancedArtificial intelligence

What is computer vision?

Computer Vision (CV) is a field of machine learning and computer science that helps computers understand the world by recognizing visual patterns and detecting objects, just like humans do.

Technology is one of the subsections of artificial intelligence.

To create computer vision algorithms, both classical machine learning methods and deep neural networks, including convolutional neural networks (CNNs), are used.

When did computer vision appear?

In the late 1960s, pioneers in the field of artificial intelligence began to discuss more intensively the issues of pattern recognition using computer algorithms. Then scientists believed that imitation of the human visual system would help endow robots with intelligent behavior.

In 1966, they proposed connecting a camera to a computer and making the machine “describe what it saw,” but the technology of the time did not allow it to be realized.

Research in the 1970s laid the early foundations for many computer vision algorithms that exist today, including image edge detection, line marking, motion estimation, and more.

In the next decade, scientists worked on more rigorous mathematical analysis and quantitative aspects of the technology.

By the end of the 1990s, significant changes had taken place with increased interaction between the fields of computer graphics and computer vision. This included image-based rendering, view interpolation, panorama stitching, and more.

This decade also marked the first practical use of statistical learning methods for facial recognition in photographs.

The early 21st century saw a resurgence of feature-based methods, which began to be used in conjunction with machine learning and complex optimization frameworks. However, the real revolution came only with the development of the field of deep learning, the accuracy of which surpassed all existing approaches at that time.

In 2012, at the ImageNet competition, the convolutional neural network AlexNet entered in the top 5 algorithms with an error rate of 15.3%. In 2015, the neural network won in the competition. This event is considered the starting point in the modern history of computer vision.

How does computer vision work?

The mission of computer vision is to teach a computer to see and understand its environment through digital photographs and video recordings. Three components are used to achieve this goal:

  • receiving images;
  • data processing;
  • data analysis.

Imaging is the process of turning an analog world into a digital one. For this, webcams, digital and SLR cameras, as well as professional 3D cameras and laser rangefinders are used.

The data obtained in such ways must be further processed and analyzed to extract the maximum benefit.

The next step in computer vision is low-level data processing. It is necessary to define the edges, points and segments of the image, which are simple geometric shapes.

As a rule, data processing is carried out using complex mathematical algorithms. Popular methods of low-level analysis are:

  • border selection, or edge detection;
  • segmentation;
  • classification and detection of objects.

Edge detection involves a variety of mathematical methods, the purpose of which is to identify points in images. The algorithm analyzes the drawing and translates it into a set of curved segments and lines. This method is used to isolate the most important parts of an image, thus reducing the amount of data being processed.

What is computer vision?  (machine learning)
An image processed by the edge detection method. Data: Towards Data Science.

Segmentation is commonly used to locate objects and boundaries in images. During processing, the algorithm assigns a label to each pixel so that later they can be combined according to certain characteristics.

The result is a set of segments covering all parts of the image or contours extracted from it.

What is computer vision?  (machine learning)
Image segmentation using deep learning. Data: Towards Data Science.

Image classification involves extracting information about their content. As an example, the problem of determining the presence of a cat in a photograph is often cited: the model analyzes the data and tries to answer this question “yes” or “no”.

Image classification is at the heart of another, more complex algorithm in computer vision, object detection. This allows, for example, to distinguish a cat from a dog and other objects known to him in one image.

What is computer vision?  (machine learning)
Classification and detection of objects. Data: LaptrinhX.

Image analysis and understanding is the final step in computer vision, allowing machines to make their own decisions. This step uses the high-level data obtained from the previous step. An example of high-level analysis would be displaying a 3D scene, object recognition or tracking.

Where is computer vision used?

Today, computer vision methods are used in many areas.


Computer vision applications allow real-time processing of streams from CCTV cameras, object recognition, detection of intrusions into restricted areas, automatic passing of cars by license plate, and much more.

Face recognition

The technology is actively used to authenticate users in various situations, from providing access to a protected facility to unlocking a smartphone.

Recently, such systems have often been criticized by some human rights organizations and politicians. They believe that the widespread use of facial recognition systems threatens human rights and freedoms, and the use of technology should be limited.

Self-driving cars

A set of cameras and algorithms allows the robotic vehicle to navigate in space, distinguish between moving and static objects, and respond to their sudden appearance. Today, many automakers, including GM, Toyota, BMW and others, are actively working on the creation of fully autonomous vehicles.

Tesla has made significant progress with its Autopilot and Full Self-Driving driver assistance programs. They allow the car to control the speed, recognize traffic lights, road signs, other cars, independently turn at intersections and change lanes. In this case, the intervention of the driver is not required, but he must be present at the wheel.


Similar to self-driving cars, computer vision helps robots navigate in space, identify objects and obstacles, and interact with objects and people.

To date, there is no universal algorithm that allows smart devices to see and understand any environment in which they are placed. Each robot created for a specific task is trained to perform it.

augmented reality

AR technologies actively use computer vision algorithms to recognize objects in the real world. This allows you to define surfaces and their dimensions so that 3D models can be properly positioned on them.

For example, in 2017 IKEA released an application, which allows the user to see how the furniture in the room will look through augmented reality. A virtual copy of the product can be viewed from all sides in full size.

Movement and gesture recognition

Computer vision algorithms have also found application in film production, creating video games, recognizing patterns of behavior of store visitors, analyzing the activity of athletes, and more.

Image recovery and processing

The technology is actively used for restoring old images, colorizing black and white images, upscaling video recordings to 4K format, as well as increasing the resolution in video games.

What are the challenges in computer vision?

Today, developers of computer vision algorithms face a number of difficulties. One of them is the small amount of initial data.

Despite the widespread and cheapening of photo and video equipment, data scientists do not always have at their disposal a sufficient amount of materials for training algorithms. This may be due to legal regulation, ethical considerations and geographic barriers.

For example, the developer of an algorithm for recognizing the types of crops in agricultural fields is not always able to independently collect the necessary photo and video materials for training a high-precision algorithm. He has to use data from open sources or received from third parties.

This leads to another problem – the low quality of training materials. This includes both photos and videos in low resolution, as well as errors in datasets, which greatly affect the final result.

Data markup is a complex, long and monotonous manual labor. In this process, people tend to make mistakes, so there are often cases when datasets contain incorrect signatures, incompletely selected objects, and other artifacts.

In April 2021, scientists at the Massachusetts Institute of Technology found that 5.8% of the images in one of the most popular ImageNet test datasets were not labeled correctly. Among the most common mistakes are incorrect labels of objects: in photographs, a mushroom can be marked as a spoon, and a frog as a cat.

Such oversights in test datasets affect the quality of machine learning algorithms. The researchers urged developers of AI algorithms to be more careful when working with data when creating their models.

Another limitation is computing resources. Processing large amounts of media data requires expensive and powerful hardware. Cloud services partially solve the problem, however, to transfer huge amounts of data, you need a stable broadband Internet connection, especially when it comes to processing video streams in real time.

Edge computing can solve this problem. This is the paradigm that data processing takes place directly at the places of their collection. You can perform the corresponding calculations both on single-board computers like the Raspberry Pi or Nvidia Jetson, and on video cameras equipped with a computing processor and AI algorithms.

When using devices for edge computing, already high-level data is transmitted to the central server, which allows analytical tools to draw any conclusions.

However, this concept is still far from being realized: despite the cheapness of single-board computers, they still do not have enough power to process large amounts of data, especially real-time video.

What are the trends in computer vision?

One of the main directions in the field of computer vision is generative adversarial neural networks (GANs). Recently, these algorithms have been used not only to stylize photos and videos as paintings by famous artists, but also to create high-quality fakes.

For example, a project This Person Doesn’t Exist uses GAN to generate photorealistic images of people who don’t really exist. Other projects work on a similar principle: an algorithm for creating fake cats This Cat Doesn’t Existor sneakers – This Sneaker Doesn’t Exist.

Algorithms like these allow researchers and developers to create synthetic datasets for training models. Such datasets are easier to assemble and address some of the legal and ethical issues regarding the use of images.

Data generation startups are already successfully implementing this concept. In October 2021 Gretel.ai attracted $50 million to support a platform for generating synthetic datasets. In July 2021, the British company Mindtech received $3.25 million for the development of a service for training computer vision algorithms using generated data.

Another important area in the field is the modeling of 3D scenes. To implement this idea, special algorithms are being developed that, using a series of photographs from different angles, are able to recreate the scene in three-dimensional space.

This technology is actively used in construction, robotics, animation, interior design and military affairs.

The researchers note that today it is difficult for algorithms to reproduce complex textures, such as leaves on trees. However, in the near future, such tools will be able to significantly simplify the work of 3D designers.

What is the role of computer vision in the metaverse?

For the metaverse, computer vision may turn out to be one of the main technologies: starting from tasks in the field of virtual and augmented reality and ending with the recognition of objects, people and spaces.

During the rebranding event, Meta (formerly Facebook) showed realistic avatars, the environment for their existence, as well as a neural interface that allows them to be controlled. When they were created, computer vision technologies were used, among other things.

At Ignite 2021, Microsoft showed off its vision for the metaverse. The company introduced the Mesh for Teams collaboration tool for VR headsets, smartphones, tablets and PCs.

At the fall GTC 2021 conference, chipmaker NVIDIA announced the Omniverse Avatar platform for creating interactive 3D characters. It combines computer vision, natural language processing and recommender systems.

What threats does computer vision pose?

Despite the obvious benefits and benefits of computer vision for business and the public, the technology can be used for unscrupulous purposes.

Today, tools for creating deepfakes are actively developing. Methods for creating photo and video fakes have existed for a long time, but with the development of deep learning, the process of creating them has become much simpler, and the fakes themselves have become much more believable.

Fraudsters can use deepfakes to create fake pornographic videos, speeches by politicians and other celebrities.

In 2017, Reddit user DeepFake posted several fake adult videos. using the faces of such celebritieslike Gal Gadot, Scarlett Johansson, Taylor Swift and Katy Perry.

In the same year, deepfakes became more common. to replace politicians: videos appeared on the Internet where the face of Argentine President Mauricio Macri was replaced with Adolf Hitler, and German Chancellor Angela Merkel with Donald Trump.

Computer vision systems are often criticized for discrimination based on gender and race. Often, the reason for this is the lack of diversity in the datasets.

In 2019, a black New Jersey resident spent 10 days in prison due to a facial recognition error. Other African Americans in other US cities faced similar problems.

The technology has also been criticized for its excessive intrusion into the privacy of citizens. According to human rights activists, face recognition in public places and tracking people’s movements with the help of outdoor video surveillance cameras violate human rights to privacy.

Developers and the public offer various ways to solve the above problems, ranging from the creation of deepfake recognition systems to the legislative ban on the use of biometric identification systems. However, there is still no consensus on these issues.

Subscribe to CryptoNewsHerald news in Telegram: CryptoNewsHerald AI – all the news from the world of AI!

Subscribe to CryptoNewsHerald on social networks

Found a mistake in the text? Select it and press CTRL+ENTER

CryptoNewsHerald Newsletters: Keep your finger on the pulse of the bitcoin industry!


Computer vision is an exciting and powerful branch of machine learning, allowing us to analyse and process digital images and videos in order to gain insight into the world around us. With the rapid advances in technology, computer vision has the potential to revolutionize many industries, from medical imaging to autonomous vehicles. We can expect computer vision to continue to evolve and to play a major role in our lives in the future.


What is computer vision?

Computer vision is a field of artificial intelligence that enables computers to understand and interpret visuals. It involves the use of machine learning algorithms to learn from images and videos, and then to detect objects, classify images, and recognize faces.

Comments (No)

Leave a Reply