Orby AI Launches ActIO with Breakthrough Visual Grounding Capabilities

Orby AI (Orby), a technology trailblazer in generative AI solutions for the enterprise, today unveiled ActIO, the most capable large action model (LAM) AI foundation engine yet, with state-of-the-art (SOTA) performance on Large Action Model Benchmark.

The company also announced it has teamed with Ohio State University’s Natural Language Processing (NLP) group to develop advanced AI techniques such as visual grounding, the ability for an AI agent to connect what it sees in an image with what it understands through language learning. OSU and Orby have co-authored and published an extensive research paper on the innovation entitled: “Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents.” OSU refers to this new technique as UGround, which is now native within Orby’s ActIO foundation LAM.

Conventional large language models (LLMs) often struggle to effectively connect visual information with textual understanding. They can miss subtle details or misinterpret different types of information entirely. Orby’s collaboration with OSU on visual grounding now gives machines the ability to identify what is visible and understand its importance in the context of a specific task to be performed.

 “The advances we’ve made and the transition we are seeing right now within the AI world will be the most profound in our lifetimes, far bigger than the shift to mobile or to the web before it,” said Will Lu, Co-Founder and CTO at Orby. “Next generation AI systems must be able to process and interpret visual information, like objects, scenes, and their relationships as well as grasping the meaning of words and sentences – making the connection between the two.” That’s precisely what we’ve done,” concluded Lu.

This is an incredible milestone that we’ve achieved with Orby, and yet we’re only beginning to scratch the surface of what’s possible,” said Yu Su, Assistant Professor in the Department of Computer Science and Engineering at the Ohio State University.

Orby and OSU have open-sourced the new visual grounding model which is now available on HuggingFace, allowing developers to utilize the model in a variety of applications.

Introducing ActIO

Unlike conventional LLM AI systems that use basic language – relying on language input for every step, ActIO is the first commercially available and patented LAM AI foundation with advanced capabilities for decision-making, planning, and adapting to dynamic situations.

With the industry’s highest accuracy and success rate when comparing the performance of different AI agents, ActIO can analyze complex contexts and nuances to make informed choices, taking the initiative to automate complex and repetitive enterprise workflows with minimal supervision.

The industry’s only multimodal generative AI foundation model purposely developed and designed for complex enterprise use cases, ActIO tackles multi-step processes without needing constant human intervention – breaking down complex workflows into smaller tasks, executing and automating them while adapting automations to improve as the system learns.

ActIO has shown state-of-the-art performance across top GUI agent benchmarks, better than best existing multimodal models. These benchmarks cover multiple scenarios, including web, desktop and mobile in both online and offline settings.

In VisualWebBench test, ActIO-7b outperforms top models like, GPT-4o, Gemini 1.5 pro and Llava 1.6-34B. VisualWebBench is the first benchmark specifically designed for evaluating the performance of visual web understanding. This benchmark helps in assessing and comparing different systems or models based on their effectiveness in processing and understanding visual content on the web.

ActIO also demonstrates state-of-the-art effectiveness and proficiency in supporting GUI agents. End to end GUI agents built with ActIO now are the test performing systems. Rigorous testing showed that the organic orchestration of large and small models can achieve the best accuracy across top digital agent benchmarks, dramatically surpassing GPT-4, with 25% improvement. Detailed evaluation results are available on LAMB, a benchmark focusing on evaluating Large Action Models on various digital agent tasks using leading evaluation datasets and online environments.

With the addition of a strong action foundation model, Orby’s AI agent can automatically adapt to website UI updates, dynamic content and unstructured data sets. Keeping users in the driver’s seat, workers can easily modify automation steps using natural language.

Source:

Tell Us What You Think

Do you have a review, update or anything you would like to add to this news story?

Leave your feedback
Your comment type
Submit

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.