Real-Time Voice Translation

Real-Time Voice Translation

June 19, 2026 0 By Rowena Cletus

Twenty years after launching one of its earliest machine learning experiments in translation, Google has unveiled Gemini 3.5 Live Translate, its latest audio model designed for live speech-to-speech translation.

Translation has evolved significantly since Google’s initial efforts, with more than a trillion words now being translated every month across its products for billions of users worldwide. The introduction of Gemini 3.5 Live Translate represents the company’s next step in breaking down language barriers through artificial intelligence.

The new model automatically detects more than 70 languages and generates smooth, natural-sounding translated speech while preserving the speaker’s intonation, pacing and pitch. Unlike conventional turn-by-turn translation systems that wait for a speaker to finish before translating, Gemini 3.5 Live Translate delivers continuous speech translation in real time.

The model intelligently balances the need for additional context to improve translation accuracy while maintaining synchronization with the speaker. This approach enables fluid conversations with minimal delay and eliminates many of the awkward pauses commonly associated with live translation systems.

Google is rolling out Gemini 3.5 Live Translate across multiple platforms starting today:

  • Available for developers through public preview via the Gemini Live API and Google AI Studio
  • Available for enterprises through private preview in Google Meet beginning this month
  • Available for consumers through Google Translate on Android and iOS devices

Designed for Real-Time Communication

Gemini 3.5 Live Translate processes speech as it is streamed, allowing conversations to flow more naturally across different languages. The system supports multilingual inputs without requiring users to manually configure language settings.

The model also features strong noise robustness, enabling reliable performance in loud and unpredictable environments. These capabilities make it suitable for various applications, including multilingual meetings, live interpretation, educational sessions, broadcasts, and customer interactions.

Through the Gemini Live API, developers can integrate real-time voice translation into their own applications. Several developer platforms, including Agora, Fishjam, LiveKit, Pipecat and Vision Agents, are already supporting the technology, allowing developers to focus on user experience while leveraging existing real-time media infrastructure.

Grab Testing Real-Time Translation

Among the early adopters exploring the technology is Grab, which is testing Gemini 3.5 Live Translate to enable near real-time multilingual communication between drivers and passengers during pickups.

The feature could prove particularly valuable in regions where multiple languages are commonly spoken, helping improve communication and reduce misunderstandings. According to Google, Grab users currently make more than 10 million voice calls each month through the platform.

With Gemini 3.5 Live Translate, Google aims to make conversations across languages feel more natural and immediate, bringing users closer to seamless global communication powered by AI.