Newswise — Researchers have trained a robotic ‘chef’ to watch and learn from cooking videos, and recreate the dish itself.

The scientists at the University of Cambridge taught a robotic chef to cook salads by giving it a set of eight easy recipes. They showed the robot a video of a person making one of the recipes, and the robot could recognize which recipe it was and cook it.

Furthermore, the videos played a crucial role in expanding the robot's recipe collection. By the end of the experiment, the robot even created its own original recipe, bringing the total to nine. The findings, published in IEEE Access, highlight the significance of video content as a valuable and abundant source of information for automated food preparation. This research has the potential to facilitate the widespread and cost-effective utilization of robot chefs in various settings.

Robot chefs have long been depicted in science fiction, but in reality, cooking remains a complex challenge for robots. Some companies have developed prototype robot chefs, but they are not yet available for commercial use. These robot chefs are still far behind humans in terms of their culinary abilities and skills.

Human cooks have the ability to learn new recipes by observing others, whether it's watching someone cook in person or following instructional videos online. However, teaching a robot to prepare a variety of dishes is an expensive and time-consuming process.

Grzegorz Sochacki, the first author of the paper from Cambridge's Department of Engineering, stated that their objective was to investigate whether they could train a robot chef to learn in a similar incremental manner as humans. They aimed to teach the robot to recognize the ingredients and understand how they come together in a dish.

Sochacki, a PhD candidate in Professor Fumiya Iida's Bio-Inspired Robotics Laboratory, along with his colleagues, created eight basic salad recipes and recorded themselves preparing those salads. They utilized a publicly available neural network that had already been trained to recognize various objects, including the fruits and vegetables used in the salad recipes such as broccoli, carrot, apple, banana, and orange. The team employed this neural network to train their robot chef.

By employing computer vision techniques, the robot carefully examined each frame of the video. It successfully identified various objects and elements present, such as the ingredients, a knife, as well as the human demonstrator's arms, hands, and face. Both the recipes and the videos were transformed into vectors, which are mathematical representations. The robot then conducted mathematical operations on these vectors to assess the similarity between a demonstration and a vector.

By accurately recognizing the ingredients and observing the actions of the human chef, the robot was able to deduce which recipe was being prepared. For instance, if the robot observed the human demonstrator holding a knife in one hand and a carrot in the other, it could infer that the carrot would be chopped up as part of the recipe. This ability to interpret and understand the actions performed by the human chef allowed the robot to identify the specific recipe being prepared.

Out of the 16 videos it observed, the robot successfully identified the correct recipe with an impressive accuracy rate of 93%, despite only detecting 83% of the actions performed by the human chef. Additionally, the robot demonstrated the ability to recognize slight variations in a recipe, such as making a double portion or human errors, correctly identifying them as deviations rather than entirely new recipes. Moreover, the robot successfully recognized and learned a completely new, ninth salad recipe from a demonstration. It then added this new recipe to its cookbook and successfully prepared it.

“It’s amazing how much nuance the robot was able to detect,” said Sochacki. “These recipes aren’t complex – they’re essentially chopped fruits and vegetables, but it was really effective at recognising, for example, that two chopped apples and two chopped carrots is the same recipe as three chopped apples and three chopped carrots.”  

The videos used to train the robot chef differed from the food videos commonly found on social media platforms, which often include rapid cuts, visual effects, and frequent transitions between the person preparing the food and the dish itself. In contrast, the training videos for the robot chef were more straightforward and focused. For instance, if the human demonstrator's hand was wrapped around a carrot, the robot would face difficulty identifying the vegetable. Instead, the human demonstrator needed to hold up the carrot, ensuring that the entire vegetable was visible for the robot to accurately recognize it.

“Our robot isn’t interested in the sorts of food videos that go viral on social media – they’re simply too hard to follow,” said Sochacki. “But as these robot chefs get better and faster at identifying ingredients in food videos, they might be able to use sites like YouTube to learn a whole range of recipes.”

The research was supported in part by Beko plc and the Engineering and Physical Sciences Research Council (EPSRC), part of UK Research and Innovation (UKRI).

Journal Link: IEEE Access