AI Essentials: What are transformers?

This blog continues our series called “AI Essentials,” which aims to bridge the knowledge gap surrounding AI-related topics. Originally designed for natural language processing (NLP), transformers have revolutionized AI due to their ability to handle large-scale tasks efficiently. What does it mean for startups and public policy?

Humans don’t make very good multitaskers, but the same isn’t true for AI. Recent innovations in AI have enabled AI models to quickly and efficiently analyze massive amounts of data in parallel, meaning they process different pieces of inputs at the same time, rather than step-by-step like older models. These AI models are called transformers.

Transformers are far faster than traditional models, enabling AI systems to understand the relationships between different pieces of data — whether that’s words in a sentence, pixels in an image, or even chunks of code. Originally designed for tasks like language translation and text generation, transformers have since expanded their reach to other fields like computer vision and even code generation.

Most leading AI models today are transformers — it’s the “T” in ChatGPT, for example — and startups are leveraging transformers to deploy AI solutions that scale, work faster, and use fewer resources — key advantages when budgets and timeframes are tight.

Transformers’ efficiency lies in their design. They use a self-attention mechanism to focus on different pieces of information within the data to understand relationships among them; positional encoding to keep track of the order of data; and an encoder-decoder structure, where one part of the model processes input data (the encoder) and the other generates outputs (the decoder).

If we were translating “Humpty Dumpty sat on a wall” to Spanish, those elements of the transformer model would each play a role. As data was being encoded, self-attention would capture the relationship between “sat” and “wall.” Even though that entire phrase would be processed at the same time — that is, in parallel — positional encoding would keep track of the order of the words. Finally, the decoder would generate the output: “Humpty Dumpty se sentó en una muralla.”

Transformers are adaptable to a wide range of tasks, and a similar process would follow if we were processing an image of Humpty Dumpty, except with pixels instead of words. The ability to use one model for tasks across multiple modes — like text, images, or other media — can help reduce time and development costs compared to older models created for each task.

For startups, these cost savings can be compounded by some of these models being open-source. Both are key to startups leveraging cutting-edge AI without the high costs typically associated with building models from scratch. However, one of the biggest hurdles for startups is accessing the high-powered compute resources needed to train and run these models. Without sufficient compute resources, startups may find it difficult to keep up with the scale required to train these models or compete with larger firms. Proposals like the National AI Research Resource aim to provide startups and researchers with high-performance compute resources, ensuring a more level playing field.

As transformers evolve and play a larger role in generative AI — the creation of text, images, and other media — they have also led to new policy discussions around intellectual property, data privacy, and fraud prevention. For startups working with AI, it is crucial that policy and regulatory frameworks work to enable innovation.