THE POWER OF TWO

BabbleLabs enhances digitized speech with AI trained on NVIDIA GPUs in Google Cloud

Speech enhancement innovator BabbleLabs has created an AI-powered app that reduces the effect of ambient noise on digitized speech, using deep learning algorithms and neural networks trained using NVIDIA GPUs in Google Cloud.

  • 55% improvement in speech intelligibility
  • Initial software developed in just 6 months
  • Faster turnaround times for end-to-end training runs

Speech recognition technology isn’t perfect – yet. Often, ambient noise can have a major impact on how well human speech is heard and interpreted. But now that’s set to change, with BabbleLabs working to improve the quality and accuracy of speech processing – even in the noisiest of environments.

One of the company’s software products, Clear Cloud, enhances voices and tunes out background noises – such as sirens, crowds, and wind – to eliminate unwanted noise from speech-sensitive audio. Available as enterprise software for conferencing, communication, and contact center server applications, Clear Cloud uses advanced AI to process live audio streams at low latency.

Training Clear Cloud’s algorithms takes hugely complex programming and analysis of thousands of hours of unique noisy speech recordings – something that simply isn’t achievable with general-purpose computing.

Since BabbleLabs was founded in 2017, its team has been working with GPUs, starting out with dual NVIDIA 1080ti GPUs in desk-side servers. As the venture grew and computing demands outpaced on-premises capacity, BabbleLabs looked to NVIDIA GPUs on Google Cloud to continue development.

“GPUs on Google Cloud are a key element in BabbleLabs’ delivery of the world’s best speech enhancement technology.”

Chris Rowen, CEO and Co-Founder at BabbleLabs

“NVIDIA GPUs are a key element in BabbleLabs’ delivery of the world’s best speech enhancement technology.”

Chris Rowen, CEO and Co-Founder at BabbleLabs

Leading accuracy requires leading performance

“Historically, speech developers have made algorithm simplifications in order to fit the necessary processing onto available digital signal processors,” explains Chris Rowen, CEO and co-founder of BabbleLabs. “Higher accuracies require a new class of sophisticated neural network-based speech algorithms.”

For BabbleLabs, that means using a range of NVIDIA GPUs in Google Cloud, stringently assessed and specifically selected for each workload. “We carefully benchmark each NVIDIA GPU to understand the particular strengths of each implementation, and apply different GPUs to different tasks,” says Chris.

BabbleLabs now uses NVIDIA Tesla V100 GPUs to train the neural networks that underpin Clear Cloud. Using the TensorFlow deep learning framework, the team runs optimized programming environments in both Python and C, supported by the cuDNN library. This has allowed the company’s speech experts to tune the software to cover a comprehensive range of languages, accents, and vocabularies.

Of course, it’s not light work. A single full training run of Clear Cloud’s most advanced speech enhancement network requires as many as 1020 floating point operations. Even for a high-end general-purpose CPU, that could take more than five years to complete.

This GPU-powered environment and optimization work have enabled BabbleLabs to run end-to-end training on hundreds of thousands of hours of audio in just weeks. That’s boosted accuracy and performance across the Clear Cloud software, which improves intelligibility by 55% in high-noise conditions, putting BabbleLabs ahead of the pack.

“NVIDIA GPUs are a key element in BabbleLabs’ delivery of the world’s best speech enhancement technology,” says Chris.

Industry-leading enhancement for communications

With NVIDIA GPUs in Google Cloud, BabbleLabs was able to get its Clear Cloud software up and running in just six months, quickly establishing itself as a leader in the space.

Crucially, GPU-powered production and testing is far more cost-effective for BabbleLabs than running the same training environments in non-GPU configurations – which is helping the team offer their product at a highly competitive price point.

This is all making advanced speech enhancement more accessible across a range of applications, opening up new opportunities for speech recognition, speaker authentication, and communication between humans and machines alike.

As Chris explains: “the sheer performance of GPUs, combined with their robust support in deep learning programming environments, allows us to train bigger, more complex networks with vastly more data and deploy them commercially at low cost.”

“The sheer performance of GPUs, combined with their robust support in deep learning programming environments, allows us to train bigger, more complex networks with vastly more data and deploy them commercially at low cost.”

Chris Rowen, CEO and Co-Founder at BabbleLabs

The
Power of Two

and you

Book your discovery session now.

Scroll to Top

Want to stay up to date?