Learn extra at:
With the introduction of its EmbeddingGemma, Google is offering a multilingual textual content embedding mannequin designed to run immediately on cellphones, laptops, and different edge units for mobile-first generative AI.
Unveiled September 4, EmbeddingGemma includes a 308 million parameter design that allows builders to construct purposes utilizing strategies similar to RAG (retrieval-augmented generation) and semantic search that can run immediately on the focused {hardware}, Google defined. Based mostly on the Gemma 3 light-weight mannequin structure, EmbeddingGemma is skilled on greater than 100 languages and is sufficiently small to run on fewer than 200MB of RAM with quantization. Customizable output dimensions are featured, starting from 768 dimensions to 128 dimensions by way of Matryoshka representation and a 2K token context window.
EmbeddingGemma empowers builders to construct on-device, versatile, privacy-centric purposes, based on Google. Mannequin weights for EmbeddingGemma will be downloaded from Hugging Face, Kaggle, and Vertex AI. By working with the Gemma 3n mannequin, EmbeddingGemma can unlock new use circumstances for cell RAG pipelines, semantic search, and extra, Google mentioned. EmbeddingGemma works with instruments similar to sentence-transformers, llama.cpp, MLX, Ollama, LiteRT, transformers.js, LMStudio, Weaviate, Cloudflare, LlamaIndex, and LangChain. Documentation for EmbeddingGemma will be discovered at ai.google.dev.