QuData News| All about Voice Technology: voice assistants, speech recognition

June 12, 2026

Gemini 3.5 Live Translate: real-time, real voices

Google has launched Gemini 3.5 Live Translate, an advanced AI model that delivers near instant, continuous voice translation across more than 70 languages. Unlike traditional tools, it translates speech as it’s spoken, enabling fluid conversations while preserving the speaker’s natural tone, pitch, and pacing.

LEARN MORE

June 20, 2024

From barks to words: AI decodes dog vocalizations

AI learnt to decode dog barks, identifying playful versus aggressive barks, as well as the dog’s age, sex, and breed. Originally trained on human speech, AI models have achieved impressive accuracy, offering significant advancements in animal care and communication research.

LEARN MORE

May 23, 2024

A new era of multimodal AI with GPT-4o

During the Spring Update event OpenAI’s presented GPT-4о – the unique omnimodel that integrates text, audio and image processing, allowing it to work faster and more efficiently than ever before.

LEARN MORE

February 22, 2024

BASE TTS: the power of billion-parameter text-to-speech model

Amazon's latest TTS model with its innovative architecture sets a new benchmark for speech synthesis. BASE TTS not only achieves unparalleled speech naturalness but also demonstrates remarkable adaptability in handling diverse language attributes and nuances.

LEARN MORE

September 1, 2023

Meta's SeamlessM4T: A Breakthrough in Multilingual Communication

SeamlessM4T breaks down language barriers with its comprehensive translation and transcription capabilities, This AI model can easily convert speech or text, enabling real-time translation, and fostering cross-cultural understanding.

LEARN MORE

June 14, 2023

Generative AI Transforms Virtual Characters

Generative AI is revolutionizing the world of gaming by transforming virtual characters and enhancing their conversational skills. The NVIDIA Avatar Cloud Engine (ACE) for Games empowers developers to infuse intelligence into NPCs, reshaping gaming experiences and pushing the boundaries of what is possible.

LEARN MORE

January 19, 2023

A neural codec language model - VALL-E can reproduce a voice from a three-second audio recording

Text-to-speech models usually require significantly longer training samples, while VALL-E creates a much more natural-sounding synthetic voice from just a few seconds.

LEARN MORE

November 14, 2022

How sound can model the world

MIT researchers have developed a machine-learning technique that precisely collects and models the underlying acoustics of a location from just a limited number of sound recordings.

LEARN MORE

September 3, 2021

W2v-bert: Combining Contrastive Learning and Masked Language Modeling for Self-supervised Speech Pre-training

Motivated by the success of masked language modeling (MLM) in pre-training natural language processing models, the developers propose w2v-BERT that explores MLM for self-supervised speech representation learning.

LEARN MORE

AI/ML News