Meta AI presented a series of language models – LLaMA
Meta AI launched LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. According to the developers LLaMA can compete with or even outperform the best existing models such as GPT-3, Chinchilla and PaLM.
Large Languages Models (LLMs) that are trained on massive bases of data have shown their ability to perform a variety of tasks from fundamental ones such as text summarization, preparing textual instructions and writing poetry to more complex ones, such as creating AI art descriptions.
As a training dataset for LLaMA developers used a mixture of several sources: English CommonCrawl, C4, GitHub, Wikipedia, Books, ArXiv, and Stack Exchange. It covered a diverse set of domains. Unlike Chinchilla, PaLM, or GPT-3, LLaMA only uses publicly available data, making its operation compatible with open-sourcing, while most existing models rely on data that is either not publicly available or undocumented.
To improve training speed, the LLaMA models use an efficient implementation of the causal multi-head attention operator, which reduces the memory usage and computation. To improve the learning efficiency even more, developers decided on checkpointing as a means to reduce the number of activations recomputed during the backward pass.
Contrary to previous studies, Meta’s research on LLaMA demonstrates that state-of-the-art performance can be achieved by training solely on publicly available data without resorting to proprietary datasets. Developers hope that publishing these models to the research community will accelerate the development of large language models, help improve their reliability and reduce known problems such as toxicity and bias.
Read more details about the research in the paper.