TONELENS
“Google Translate tells you the words. ToneLens tells you the truth.”
Solo Developer
Submitted — Gemini Live Agent Challenge 2026
Overview
ToneLens is a real-time emotional intelligence agent that watches conversations through your camera, listens through your microphone, and streams back translation, emotional subtext, cultural context, and tactical suggestions — all simultaneously via the Gemini Live API.
The core insight behind ToneLens is that words carry only 30% of meaning. The remaining 70% lives in tone, hesitation, confidence, and cultural context. Existing tools like Google Translate decode language. ToneLens decodes intent.
Built in 7 days for the Gemini Live Agent Challenge 2026, ToneLens runs four distinct agent modes — Travel, Meeting, Present, and Negotiate — each with a specialized system prompt, real-time emotional scoring, and autonomous agent actions triggered by keyword detection across the conversation stream.
Screenshots
Key Features
Tech Deep Dive
ToneLens uses a two-model pipeline to work around a critical constraint — the Gemini Live native audio model does not support function declarations or text modality simultaneously. Audio streams bidirectionally via WebSockets to a FastAPI backend on Cloud Run. The Gemini Live session handles real-time multimodal understanding and responds in audio only. A second Vertex AI call to gemini-2.0-flash reformats the transcribed response into a strict four-line structured output: TRANSLATION, EMOTION, SUBTEXT, and SUGGEST. Agent actions are triggered via keyword detection in Python rather than function calling, keeping the Live session config clean. Session state and conversation history are persisted in Firestore. The entire pipeline runs end-to-end in under two seconds.