Why is Gemma the most advanced open model?

This page is also available in:

In the past few years, open large language models (LLMs) have attracted a lot of attention, as they can understand and generate natural language, perform various complex tasks, such as reading comprehension, summarization, translation, dialogue, math, etc. Some well-known open LLMs, such as GPT-3, Llama-2, Mistral, etc., have hundreds of billions or even trillions of parameters, showing amazing capabilities, but also bringing some challenges, such as high computational costs, potential security risks, difficulty in accessing and using, etc.

To solve these problems, Google launched a new generation of open models - Gemma, which are built from the same research and technology used to create the Gemini models. Gemini model is Google’s largest and most powerful AI model, which can understand and generate 101 languages, with 16 trillion parameters. Gemma model is inspired by and extends the Gemini model, and its name comes from the Latin word gemma, meaning “gem”. Gemma model not only provides the model weights, but also provides some tools to support developer innovation, foster collaboration, and guide responsible use of the model.

So, what are the advantages of Gemma model? Why is it the most advanced open model? Let’s find out.

First of all, Gemma model’s performance is the best in its size range, as they share the technical and infrastructure components with the Gemini model, which enables them to surpass Meta Llama-2 and other larger parameter open models in 18 key benchmarks of language understanding, reasoning, math, etc. Moreover, Gemma model can run directly on the developer’s laptop or desktop computer, without the need for additional hardware or cloud services. Notably, Gemma model surpasses significantly larger models on some key benchmarks while adhering to Google’s strict standards for safety and responsibility. For more details on performance, dataset composition, and modeling methods, please refer to the technical report1.

Secondly, Gemma model is designed with Google’s AI principles as the premise. To make Gemma pre-trained model safe and reliable, Google used automated techniques to filter out some personal information and other sensitive data from the training set. In addition, Google also used extensive fine-tuning and reinforcement learning from human feedback (RLHF) to align the instruction-tuned model with responsible behaviors. To understand and reduce the risk of Gemma model, Google conducted robust evaluations, including manual red team testing, automated adversarial testing, and assessments of model capabilities for dangerous activities. These evaluations are outlined in the model card2.

Finally, Gemma model is very easy to use, as Google provides toolchains for inference and supervised fine-tuning (SFT) across all major frameworks, including JAX, PyTorch, and TensorFlow (through native Keras 3.0). Google also provides some easy-to-use Colab and Kaggle notebooks, as well as integration with some popular tools, such as Hugging Face, MaxText, NVIDIA NeMo and TensorRT-LLM, which allow developers to easily start using Gemma model. Google also released a new Responsible Generative AI Toolkit with Gemma model, which helps developers and researchers prioritize building safe and responsible AI applications. The toolkit includes:

As of now, Gemma can be said to be the most advanced open source model, which provides developers with a powerful, flexible, and reliable AI tool, which can help them build AI applications responsibly, solve various complex problems, and create a better future.

If you have any questions please feel free to leave a comment below.

This article was published on 2024-02-22 and last updated on 2024-09-23.

This article is copyrighted by torchtree.com and unauthorized reproduction is prohibited.