EmbeddingGemma on ShareAI: 300M Multilingual Embeddings

EmbeddingGemma is now on ShareAI

We’re announcing that EmbeddingGemma, Google’s compact open embedding model, is now available on ShareAI.

At 300 million parameters, EmbeddingGemma delivers state-of-the-art performance for its size. It’s built from Gemma 3 with T5Gemma initialization and uses the same research and technology behind the Gemini models. The model produces vector representations of text, making it well-suited for search and retrieval tasks, including classification, clustering, and semantic similarity. It was trained with data in 100+ spoken languages.

Why it matters

The model’s small size and on-device focus make it practical to deploy in environments with limited resources—mobile phones, laptops, or desktops—democratizing access to state-of-the-art AI models and fostering innovation for everyone.

Benchmark

Training dataset

EmbeddingGemma was trained with data in 100+ spoken languages.

Web documents
A diverse collection of web text ensures exposure to broad linguistic styles, topics, and vocabulary. The dataset includes content in 100+ languages.
Code and technical documents
Including programming languages and specialized scientific content helps the model learn structure and patterns that improve understanding of code and technical questions.
Synthetic and task-specific data
Curated synthetic data teaches specific skills for information retrieval, classification, and sentiment analysis, fine-tuning performance for common embedding applications.

This combination of diverse sources is crucial for a powerful multilingual embedding model that can handle a wide range of tasks and data formats.

What you can build

Use EmbeddingGemma for search and retrieval, semantic similarity, classification pipelines, and clustering—especially when you need high-quality embeddings that can run on constrained devices.