On November 15 Meta unveiled a new large language model called Galactica.
Galactica is the first step towards a scientific assistant. This tool might be useful in the future in augmenting scientific work. The model is expected to help researchers who are buried under a mass of papers, increasingly unable to distinguish between the meaningful and the inconsequential. It can perform scientific NLP tasks at a high level, as well as tasks such as citation prediction, mathematical reasoning, molecular property prediction, and protein annotation. Meta promoted its model as a shortcut for researchers and students. In the company’s words, Galactica “can summarize academic papers, solve math problems, generate Wiki articles, write scientific code, annotate molecules and proteins, and more.”
The Galactica language model is an illustration of NLP for science and is trained on 48 million examples of scientific articles, websites, textbooks, lecture notes, and encyclopedias. The training data is a high-quality scientific dataset called NatureBook, making the models capable of working with scientific terminology, math, and chemical formulas as well as source codes.
It was introduced by Meta AI through the release of this Paper and this GitHub Repository. With a few adjustments to its current limitations. Galactica is expected to revolutionize scientific fields such as Machine Learning, Mathematics, Computer Science, Biology, Physics, and Chemistry. Galactica demonstrates the potential for language models as a new interface for science and has been open sourced.
Enroll at HURU School to learn more about Artificial Intelligence and NLP Models.