This new robotics challenge could bring us closer to human-level AI
This is because that many of our intuitive planning and motor skills—things that we take for granted—are a lot more complicated than we think. Navigating unknown areas, finding and picking up objects, choosing routes, and planning tasks are complicated feats that we only appreciate when we try to turn them into computer programs.
Developing robots that can physically sense the world and interact with their environment is the realm of embodied artificial intelligence, one of the long-sought goals of AI scientists. And though progress in the field is still a far shot from the capabilities of humans and animals, the achievements are remarkable nonetheless.
In a recent development in embodied AI, scientists at IBM, the Massachusetts Institute of Technology, and Stanford University developed a new challenge that will help assess the ability of AI agents in finding paths, interacting with objects, and planning tasks efficiently. Titled “ ThreeDWorld Transport Challenge ,” the test is a virtual environment that will be presented at the Embodied AI Workshop during the Conference on Computer Vision and Pattern Recognition, held online in June.
No current AI technique comes close to solving the TDW Transport Challenge. But the results of the competition can help find new directions for the future of embodied AI and robotics research.
Reinforcement learning in virtual environments
At the heart of most robotics applications is reinforcement learning , a branch of machine learning that is based on actions, states, and rewards. A reinforcement learning agent is given a set of actions that it can apply to its environment to obtain rewards or reach a certain goal. These actions create changes to the state of the agent and the environment. The RL agent receives rewards based on how its actions bring it closer to its goal.
RL agents usually start by knowing nothing about their environment and selecting random actions. As they gradually receive feedback from their environment, they learn sequences of actions that can maximize their rewards.
This scheme is used not only in robotics, but in many other applications such as self-driving cars and content recommendation . Reinforcement learning has also helped researchers master complicated games such as Go, StarCraft 2, and DOTA.
Creating reinforcement learning models presents several challenges. One of them is designing the right set of states, rewards, and actions, which can be very difficult in applications such as robotics, where agents face a continuous environment that is affected by complicated factors such as gravity, wind, and physical interactions with other objects (in contrast, environments like chess and Go have very discrete states and actions).
Another challenge is gathering training data. Reinforcement learning agents need to train over data from millions of episodes of interactions with their environments. This constraint can slow down robotics applications because they must gather their data from the physical world as opposed to video and board games, which can be played in rapid succession on several computers.
To overcome this barrier, AI researchers have tried to create simulated environments for reinforcement learning applications. Today, self-driving cars and robotics often use simulated environments as a major part of their training regime.
“Training models using real robots can be expensive and sometimes involve safety considerations,” Chuang Gan, Principal Research Staff Member at the MIT-IBM Watson AI Lab, told TechTalks . “As a result, there has been a trend toward incorporating simulators, like what the TDW-Transport Challenge provides, to train and evaluate AI algorithms.”
But replicating the exact dynamics of the physical world is extremely difficult, and most simulated environments are a rough approximation of what a reinforcement learning agent would face in the real world. To address this limitation, the TDW Transport Challenge team has gone to great lengths to make the test environment as realistic as possible.
The environment is built on top of the ThreeDWorld platform , which the authors describe as “a general-purpose virtual world simulation platform supporting both near-photo realistic image rendering, physically-based sound rendering, and realistic physical interactions between objects and agents.”
“We aimed to use a more advanced physical virtual environment simulator to define a new embodied AI task requiring an agent to change the states of multiple objects under realistic physical constraints,” the researchers write in an accompanying paper .
Task and motion planning
Reinforcement learning tests have different degrees of difficulty. Most current tests involve navigation tasks, where an RL agent must find its way through a virtual environment based on visual and audio input.
The TDW Transport Challenge, on the other hand, pits the reinforcement learning agents against “task and motion planning” (TAMP) problems. TAMP requires the agent to not only find optimal movement paths but to also change the state of objects to achieve its goal.
The challenge takes place in a multi-roomed house adorned with furniture, objects, and containers. The reinforcement learning agent views the environment from a first-person perspective and must find one or several objects from the rooms and gather them at a specified destination. The agent is a two-armed robot, so it can only carry two objects at a time. Alternatively, it can use a container to carry several objects and reduce the number of trips it has to make.
At every step, the RL agent can choose one of several actions, such as turning, moving forward, or picking up an object. The agent receives a reward if it accomplishes the transfer task within a limited number of steps.
While this seems like the kind of problem any child could solve without much training, it is indeed a complicated task for current AI systems. The reinforcement learning program must find the right balance between exploring the rooms, finding optimal paths to the destination, choosing between carrying objects alone or in containers, and doing all this within the designated step budget.
“Through the TDW-Transport Challenge, we’re proposing a new embodied AI challenge,” Gan said. “Specifically, a robotic agent must take actions to move and change the state of a large number of objects in a photo- and physically-realistic virtual environment which remains a complex goal in robotics.”
Abstracting challenges for AI agents
While TDW is a very complex simulated environment, the designers have still abstracted some of the challenges that robots would face in the real world. The virtual robot agent, dubbed Magnebot, has two arms with nine degrees of freedom with joints at the shoulder, elbow, and wrist. However, the robot’s hands are magnets and can pick up any object without the need to handle it with fingers, which itself is a very challenging task .
The agent also perceives the environment in three different ways, an RGB-colored frame, a depth map, and a segmentation map that shows each object separately in hard colors. The depth and segmentation maps make it easier for the AI agent to read the dimensions of the scene and tell the objects apart when viewed from awkward angles.
To avoid confusion, the problems are posed in a simple structure (e.g., “vase:2, bowl:2, jug:1; bed”) as opposed to loose language commands (e.g., “Grab two bowls, a couple of vases, and the jug in the bedroom, and put them all on the bed”).
And to simplify the state and action space, the researchers have limited the Magnebot’s navigation to 25-centimeter movements and 15-degree rotations.
These simplifications enable developers to focus on the navigation and task-planning problems that AI agents must overcome in the TDW environment.
Gan told TechTalks that despite the levels of abstraction introduced in TDW, the robot still needs to address the following challenges:
The synergy between navigation and interaction: The agent cannot move to grasp an object if this object is not in the egocentric view, or if the direct path to it is obstructed.
Physics-aware interaction: grasping might fail if the agent’s arm cannot reach an object.
Physics-aware navigation: collision with obstacles might cause objects to be dropped and significantly impede transport efficiency.
This makes one appreciate the complexity of human vision and agency . The next time you go to a supermarket, consider how easily you can find your way through aisles, tell the difference between different products, reach for and pick up different items, place them in your basket or cart, and choose your path in an efficient way. And you’re doing all this without access to segmentation and depth maps and by reading items from a crumpled handwritten note in your pocket.
Pure deep reinforcement learning is not enough
The TDW-Transport Challenge is in the process of accepting submissions. In the meantime, the authors of the paper have already tested the environment with several known reinforcement learning techniques. Their findings show that pure reinforcement learning is very poor at solving task and motion planning challenges. A pure reinforcement learning approach requires the AI agent to develop its behavior from scratch, starting with random actions and gradually refining its policy to meet the goals in the specified number of steps.
According to the researchers’ experiments, pure reinforcement learning approaches barely managed to achieve above 10 percent success in the TDW tests.
“We believe this reflects the complexity of physical interaction and the large exploration search space of our benchmark,” the researchers wrote. “Compared to the previous point-goal navigation and semantic navigation tasks, where the agent only needs to navigate to specific coordinates or objects in the scene, the ThreeDWorld Transport challenge requires agents to move and change the objects’ physical state in the environment (i task-and-motion planning), which the end-to-end models might fall short on.”
When the researchers tried hybrid AI models , where a reinforcement learning agent was combined with a rule-based high-level planner, they saw a considerable boost in the performance of the system.
“This environment can be used to train RL models which fall short on these types of tasks and require explicit reasoning and planning abilities,” Gan said. “Through the TDW-Transport Challenge, we hope to demonstrate that a neuro-symbolic, hybrid model can improve this issue and demonstrate a stronger performance.”
The problem, however, remains largely unsolved, and even the best-performing hybrid systems had around 50-percent success rates. “Our proposed task is very challenging and could be used as a benchmark to track the progress of embodied AI in physically realistic scenes,” the researchers wrote.
Mobile robots are becoming a hot area of research and applications . According to Gan, several manufacturing and smart factories have already expressed interest in using the TDW environment for their real-world applications. It will be interesting to see whether the TDW Transport Challenge will help usher new innovations in the field.
“We’re hopeful the TDW-Transport Challenge can help advance research around assistive robotic agents in warehouses and home settings,” Gan said.
This article was originally published by Ben Dickson on TechTalks , a publication that examines trends in technology, how they affect the way we live and do business, and the problems they solve. But we also discuss the evil side of technology, the darker implications of new tech, and what we need to look out for. You can read the original article here .
IBM, Call for Code, and the Linux Foundation announce new open source projects to combat racism
The Linux Foundation last week announced it was hosting seven open source projects in partnership with IBM and David Clark Cause’s Call for Code for Racial Justice.
Background: Call for Code for Racial Justice launched late last year to solicit solutions from the global coding community.
The goal of the challenge is to come up with novel open source solutions backed by IBM and partner technologies such as cloud computing and artificial intelligence. There are currently seven “solution starters” which are now hosted by the Linux Foundation.
According to IBM Call for Code director Ruth Davis:
The seven initiatives, per a Linux foundation blog post , include:
Fair Change: A platform to help record, catalog, and access evidence of potentially racially charged incidents to help enable transparency, reeducation and reform as a matter of public interest and safety.
TakeTwo: [This project] aims to help mitigate bias in digital content, whether it is overt or subtle, with a focus on text across news articles, headlines, web pages, blogs, and even code.
Five Fifths Voter: This web app empowers minorities to exercise their right to vote and helps ensure their voice is heard by determining optimal voting strategies and limiting suppression issues.
Legit-Info: Local legislation can have significant impacts on areas as far-reaching as jobs, the environment, and safety. Legit-Info helps individuals understand the legislation that shapes their lives.
Incident Accuracy Reporting System: This platform allows witnesses and victims to corroborate evidence or provide additional information from multiple sources against an official police report.
Open Sentencing: To help public defenders better serve their clients and make a stronger case, Open Sentencing shows racial bias in data such as demographics.
Truth Loop: This app helps communities simply understand the policies, regulations, and legislation that will impact them the most.
Quick take: Politicians are apparently not going to solve the problem of racial injustice for us no matter how hard we vote. Luckily for us, racism manifests through digitally traceable means more often than not in the modern world. And that means we can fight it with technology.
Call for Code is an unwavering force for good and, just like its previous efforts to combat climate change and mitigate the damage done by natural disasters, this is a necessary target for its endeavors. There are few more pressing problems in society than racial injustice, and arguably none more ripe for attack by an eager global coding community.
For more information check out the Call for Code for Racial Justice website here .
Skyrim modders are using AI to generate new spoken dialogue
If you’re unimpressed by some of Skyrim’s hilarious dialogue , a new AI app called VASynth lets you take over the scriptwriting.
The tool uses voice samples from Bethesda games to convert text into speech.
You can generate dialogue in the style of many voices from the publisher’s back catalog, including Skyrim , Fallout 4 , and Morrowind .
YouTuber Adriac used it to produce a pretty impressive trailer — although the tone of voice sounds slightly artificial at times:
Others are using it for more quirky creations, like this mod that forces characters to compliment your naked body:
The tool was created by software developer Dan Ruta, who said it’s powered by a field of deep learning called neural speech synthesis:
The models used for each character are trained on in-game voice-acted lines.
After generating the audio from a text prompt, you can control the emotion and emphasis of the lines by adjusting the pitch and the duration of individual letters.
If you wanna try it out for yourself, Ruta’s produced a handy video guide for users and a more detailed text tutorial . You can drop him a tip for his work at the app’s Patreon .
Whether it’s a harbinger of doom for professional voice actors or just a fun way of customizing dialogue, the tool shows AI can have a powerful role in video game modding.