Tech

Watch a robot navigate the Google DeepMind offices using Gemini | TechCrunch

Published

5 months ago

July 11, 2024

Admin

Watch a robot navigate the Google DeepMind offices using Gemini | TechCrunch

Generative AI has already shown a lot of promise in robots. Applications include natural language interactions, robot learning, no-code programming and even design. Google’s DeepMind Robotics team this week is showcasing another potential sweet spot between the two disciplines: navigation.

In a paper titled “Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs,” the team demonstrates how it has implemented Google Gemini 1.5 Pro to teach a robot to respond to commands and navigate around an office. Naturally, DeepMind used some of the Every Day Robots that have been hanging around since Google shuttered the project amid widespread layoffs last year.

In a series of videos attached to the project, DeepMind employees open with a smart assistant-style “OK, Robot,” before asking the system to perform different tasks around the 9,000-square-foot office space.

In one example, a Googler asks the robot to take him somewhere to draw things. “OK,” the robot responds, wearing a jaunty yellow bowtie, “give me a minute. Thinking with Gemini …” The robot then proceeds to lead the human to a wall-sized white board. In a second video, a different person tells the robot to follow the directions on the whiteboard.

A simple map shows the robot how to get to the “Blue Area.” Again, the robot thinks for a moment before taking a long walk to what turns out to be a robotics testing any. “I’ve successfully followed the directions on the whiteboard,” the robot announces with a level of self-confidence most humans can only dream of.

Prior to these videos, the robots were familiarized with the space using what the team calls “Multimodal Instruction Navigation with demonstration Tours (MINT).” Effectively, that means walking the robot around the office while pointing out different landmarks with speech. Next, the team utilizes hierarchical Vision-Language-Action (VLA) to “that combin[e] the environment understanding and common sense reasoning power.” Once the processes are combined, the robot can respond to written and drawn commands, as well as gestures.

Google says the robot had a 90% or so success rate across more than 50 interactions with employees.

Related Topics:DeepMind gemini 1.5 pro google

Up Next

Google says Gemini AI is making its robots smarter

Don't Miss

Exclusive: Formlabs acquires 3D printing startup Micronics mid-Kickstarter campaign

The Herald News Today

Watch a robot navigate the Google DeepMind offices using Gemini | TechCrunch

Tech

Watch a robot navigate the Google DeepMind offices using Gemini | TechCrunch

From Stadium to Screen: How Technology is Changing Sports Viewing

20/7/2024 Horse Racing Tips and Best Bets – Flemington, Flemington Cup day

Indian tech hub Karnataka state’s move to reserve jobs for locals not finalised, chief minister says

Financial picture dramatically improves for Shamrock Rovers in 48 hours with Sinclair Armstrong deal and European win

Our fashion editor’s favourite affordable bag is 20% off today

Jay Shah’s Big Decision Hovers Over Cricket’s Associate Member Directors Election

Budget shampoo that adds MAJOR volume boost is on sale for Amazon Prime Day: ‘Your hair feels fuller and thicker’

Being active on your commute lowers risk of disease and mental health

Tiger Woods cops brutal Open schedule as full Round 1 tee times revealed

Rafael Nadal makes staggering Carlos Alcaraz prediction, makes Jannik Sinner comparison