Artificial intelligence (AI) models have already made their way into a wide range of real-world settings, helping humans to tackle everyday problems faster and more efficiently. Recently, computer scientists have also been exploring their potential for assisting humans with creative tasks, such as coming up with interesting designs, poems and even recipes.
Two researchers at PeopleTec, a technology company based in Alabama, developed a computational model that can create complex recipes containing ingredients that users have available, after analyzing images of what is inside their fridge. Their approach, presented in a paper pre-published on arXiv, is based on models that can identify objects in images and GPT-4, the renowned large language model (LLM) developed by OpenAI.
“In 2020, a group of our AI researchers had a party game where one team would propose a list of ingredients and another would use a language model to dream up an original recipe,” David Noever, one of the researchers who carried out the study, told Tech Xplore.
“As the ingredients got more interesting, the recipes got worse and worse, finally degenerating into nonsense like ‘A Recipe for Hungarian Shoe Leather, serving 2 for breakfast’—basically, complete gibberish. But since 2020, both the image and language models have gotten so good, that now we could really build an application that would solve the practical cook’s challenge—just look in your fridge right now, take some basic preconceived ideas about what one might want to cook today and generate a great novel recipe.”
A key objective of the recent work by Noever and his colleague Samantha Elizabeth Miller Noever was to highlight recent advances in the field of AI in a practical and useful way. To achieve recipe generation from imagery, they specifically used application programming interfaces (APIs) of models for image analysis as well as the text generator underpinning ChatGPT.
“The basic idea behind our work was to combine raw food and recipe ingredients using image analysis, then to ask a powerful language model to construct a plausible cooking recipe, including the expected title, proportion, and steps,” Noever explained.
“One of the interesting twists of this language-image approach is to constrain the recipe style generator in different and often complex ways based on say minimizing the cost of the meal, changing the portion sizes, or accommodating dietary restrictions. The hard parts of that really fall on how good the language model is, which of course, has made tremendous breakthroughs in only the last few months.”
The researchers evaluated their computational approach in a series of tests, feeding it more than 2,000 images of open refrigerators with different ingredients inside them. Using these images, their model generated a 100-page recipe book, with interesting and unique recipes featuring the 30 top ingredients pictured in the input images.
“Several researchers had helped us by building pictures of the top 30 items to find in a refrigerator,” Noever said. “We trained our own image models to give back a list of those in different settings. The novel part of our work was to continuously make the recipe fit to different constraints. Of course, there are billions of combinations available for even a small list of ingredients, but putting seasonal availability, leftovers, serving sizes, cost and dietary restrictions really got the problem and solution rolling.”
In the future, the computational approach introduced by Noever and Noever could be integrated into a smartphone application or other software tools designed to inspire both amateur and expert human cooks. Concurrently, it could also inspire other teams worldwide to apply LLMs and other AI models to recipe generation tasks or other creative problems.
“In our next studies, we plan to develop a convenient mobile application that takes a picture and inventories a real refrigerator in its likely cluttered state, without the constraint of using all ingredients but mixing and matching,” Noever added. “This would be a great refinement of this initial work.”
More information:
David Noever et al, The Multimodal And Modular Ai Chef: Complex Recipe Generation From Imagery, arXiv (2023). DOI: 10.48550/arxiv.2304.02016