Improving the Object Detection Ability of Artificial Intelligence

Object detection in 3D spaces is a key component of many computer vision applications, such as image sensing in autonomous driving or robot navigation. However, sophisticated, high-performance systems such as Lidar can be expensive and computationally complicated.

Improving the Object Detection Ability of Artificial Intelligence.
Image Credit: Shipman, M., (2022) Technique Improves AI Ability to Understand 3D Space Using 2D Images. [online] NC State News. Available at: https://news.ncsu.edu/2022/01/monocon-ai-3d/

Now, a team of researchers at North Carolina State University has developed a new system dubbed MonoCon that enhances the ability of object detection of artificial intelligence (AI) programs using 2D images.

We live in a 3D world, but when you take a picture, it records that world in a 2D image. 

Tianfu Wu, Corresponding Author and Assistant Professor of Electrical and Computer Engineering at North Carolina State University

One of the main questions the team had to address was whether it was possible to recover information on 3D structures and the surrounding environment using 2D images that have lost some of the relevant depth information.

AI programs that receive relevant visual information from standard cameras convert 2D images into information that allows them to navigate 3D spaces by placing certain objects, such as vehicles, people, traffic structures, etc., into their surroundings.

While the researchers were particularly interested in the use of the MonoCon system for application in autonomous driving, the system could also find application potential in robotics and manufacturing systems.

Building in Redundancy

While the majority of today’s self-driving or autonomous systems employ Lidar to navigate 3D environments, these systems are expensive. Lidar, an acronym for light detection and ranging, uses a series of eye-safe lasers to map an area and create a 3D representation of an environment.

However, due to the cost, it is expensive and doesn’t allow much room for redundancy as it is not economical to include multiple Lidar scanners on a vehicle.

But if an autonomous vehicle could use visual inputs to navigate through space, you could build in redundancy.

Tianfu Wu, Corresponding Author and Assistant Professor of Electrical and Computer Engineering at North Carolina State University

This is because, unlike Lidar, cameras are much cheaper, meaning it would be economically viable to incorporate multiple cameras, making it possible for redundancy to be built into the system, thereby making it safe and sturdy.

However, the possibility of using 2D input and extracting 3D data also offers another exciting opportunity and exemplifies one of the advanced capabilities MonoCon carries - effectively telling an AI system the exterior edges of a relevant object by placing objects into ‘bounding boxes’.

Training MonoCon

The AI is trained by reading a series of 2D images and placing these bounding boxes around certain articles in the image. Each one of these boxes is a cuboid comprised of eight points. The AI is then fed the relevant information to each of the points allowing it to understand the height, length and width as well as the linear measure between the corners.

The AI can then make its predictions about the objects in relation to one another and their distance from the camera. Once the AI has made its calculations, the researchers can fine-tune the data by correcting any mistakes, which over time improves the performance of the AI.

What sets our work apart is how we train the AI, which builds on previous training techniques.” Wu says. 

The proposed method is motivated by a well-known theorem in measure theory, the Cramér–Wold theorem. It is also potentially applicable to other structured-output prediction tasks in computer vision.

Tianfu Wu, Corresponding Author and Assistant Professor of Electrical and Computer Engineering at North Carolina State University

As well as asking the AI to predict the distance between the camera, various objects, items, articles and the proportions of the bounding boxes, the AI was also given the objective to predict the positions of each of the eight points on the box and work out the distance from the center of the bounding box.

This process is known as ‘auxiliary context’ and allows the AI to detect and predict the location of 3D objects more accurately using 2D images.

The MonoCon system was then tested on an extensive set of data called KITTI, which outperformed many of the other AI programs that have been developed to extract 3D data from 2D images. While MonoCon also performed well when asked to identify bicycles and/or pedestrians, other programs still fare better.

The next step is fine-tuning MonoCon using larger datasets in order to make it scalable for use in autonomous, self-driving driving applications.

We also want to explore applications in manufacturing, to see if we can improve the performance of tasks such as the use of robotic arms,” Wu concludes.

References and Further Reading

Shipman, M., (2022) Technique Improves AI Ability to Understand 3D Space Using 2D Images. [online] NC State News. Available at: https://news.ncsu.edu/2022/01/monocon-ai-3d/

Disclaimer: The views expressed here are those of the author expressed in their private capacity and do not necessarily represent the views of AZoM.com Limited T/A AZoNetwork the owner and operator of this website. This disclaimer forms part of the Terms and conditions of use of this website.

David J. Cross

Written by

David J. Cross

David is an academic researcher and interdisciplinary artist. David's current research explores how science and technology, particularly the internet and artificial intelligence, can be put into practice to influence a new shift towards utopianism and the reemergent theory of the commons.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Cross, David. (2022, February 07). Improving the Object Detection Ability of Artificial Intelligence. AZoRobotics. Retrieved on October 30, 2024 from https://www.azorobotics.com/News.aspx?newsID=12729.

  • MLA

    Cross, David. "Improving the Object Detection Ability of Artificial Intelligence". AZoRobotics. 30 October 2024. <https://www.azorobotics.com/News.aspx?newsID=12729>.

  • Chicago

    Cross, David. "Improving the Object Detection Ability of Artificial Intelligence". AZoRobotics. https://www.azorobotics.com/News.aspx?newsID=12729. (accessed October 30, 2024).

  • Harvard

    Cross, David. 2022. Improving the Object Detection Ability of Artificial Intelligence. AZoRobotics, viewed 30 October 2024, https://www.azorobotics.com/News.aspx?newsID=12729.

Tell Us What You Think

Do you have a review, update or anything you would like to add to this news story?

Leave your feedback
Your comment type
Submit

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.