April 7, 2022

Newfound AI creativity: Key facts about foundation models and how they help robots tell good jokes

Have you ever seen a photo of an avocado-shaped teapot or read a clever article that deviates slightly from the topic? If so, then you may have discovered the latest trend in artificial intelligence (AI).

DALL-E, GPT, and PaLM machine learning systems are making a name for themselves as innovative tools that are able to accomplish creative tasks.

These systems are ranked as "foundation models" and are not all hype and party tricks. So how does this new AI approach work? Does it mean human creativity will find its end and a deep-fake nightmare will start?

1. What is a foundation model?

Foundation models work by forming a single large database of general information and then adapting the scheme to new challenges. Previous models tended to start from scratch for each new challenge. To compare photographs (such as a snapshot of a pet cat) with the caption ("Mr. Fuzzyboots the tabby cat is relaxing in the sun") required scanning hundreds of millions of examples.

After it’s trained, this model is able to tell what cats (and other things) look like in pictures. The model can also be used for several other useful AI tasks, such as creating new images from a caption alone ("Show me a koala dunking a basketball") or editing images based on written instructions ("Make it look like this monkey is paying taxes").

2. How does it work?

Foundation models are based on "deep neural networks," which are loosely inspired by how the brain works. This involves sophisticated mathematics and a considerable amount of computing power, but it boils down to a complicated form of pattern matching.

For example, a deep neural network can associate the word "cat" with patterns of pixels that often appear in images of cats, such as soft, fuzzy, hairy blobs of texture. The more examples the model sees (the more suitable results it is shown), and the larger the model (the more "layers" or "depth" it has), the more complicated these patterns and correlations can be.

In a way, foundation models are just an extension of the "deep learning" models that have dominated AI research for the past decade. However, they do have unprogrammed or "emergent" behaviors that can be both surprising and novel.

For example, Google's PaLM language system appears to be able to provide explanations for difficult metaphors and jokes. This goes beyond simply imitating the types of information it was originally designed to process.

3. For the time being, access is limited.

The sheer scale of these AI systems is overwhelming to consider. PaLM has 540 billion parameters, meaning that even if everyone on the planet memorized 50 numbers, we still wouldn't have enough storage to reproduce the model.

The models are so large that training them requires significant amounts of computational and other resources. One estimate put the cost of teaching OpenAI's language model GPT-3 at around US$5 million.

As a result, only major tech firms such as OpenAI, Google and Baidu can afford to build foundation models at the moment. These companies put a limit on who can use the services, which makes economic sense. Usage limits may give us some hope that these systems will not be used for nefarious purposes (such as creating fake news or defamatory material) any time soon. However, independent researchers are also unable to interrogate these models and report their findings in a transparent and accountable manner. So we don't yet know the full implications of their use.

4. What will these models bring to 'creative' industries?

In the near future, more foundation models will be produced. Smaller models are already being released in open-source versions. Software firms are beginning to experiment with licensing and commercializing these services, while AI researchers are working hard to make the software more effective and accessible.

The remarkable creativity demonstrated by PaLM and DALL-E 2 indicates that creative professions could be affected by this technology sooner than expected.

As it says, robots would take over "blue collar" jobs first. Professions that require creativity and education, known as "white collar" jobs, were supposed to be relatively safe from automation.

However, deep learning AI models already excel in tasks such as analyzing X-rays and determining eye condition macular degeneration. Foundation models may soon offer cheap and "good enough" creativity in fields such as advertisement, copywriting, stock illustration or graphic design.

The future of creative jobs may be a little different than we expected.

5. What does it mean for legal facts, news, and media?

Since we won't be able to say that creative content is the result of human activity, foundation models will eventually influence the legislation in areas such as intellectual property and evidence.

We'll also have to deal with disinformation and misinformation that are generated by these applications. We already have to handle a lot of disinformation problems, as we are seeing in the unfolding Russian invasion of Ukraine and the nascent issue of deep fake images and video. Foundation models are poised to boost these challenges.

It's time to plan!

As researchers who investigate the effects of AI on society, we believe foundation models will cause major transformations. They are tightly controlled (for now), so we may have a little time to consider their implications before they become a big issue. The genie isn't quite out of the bottle yet, but foundation models are a large bottle, and inside there is a very clever genie.