BLOG

Using NVIDIA GPUs for model training in Google Cloud AI Platform: A Technical How-To

NVIDIA GPUs are optimized for AI, and make it easy to train complex machine learning models at speed. Now, with NVIDIA GPUs available alongside the AI Platform in Google Cloud, creating fast, optimized training tasks and workloads is easier than ever.

Using NVIDIA CPUs for training in Google Cloud AI Platform

NVIDIA® GPUs have brought best-in-class AI performance to Google Cloud – enabling more organizations and teams to harness the game-changing power of GPU technology – and redefining possible. That’s the Power of Two in action.

One of the fastest and simplest ways to experience the power of NVIDIA GPUs in Google Cloud is to provision a virtual machine with GPUs for training workloads in the Google Cloud AI Platform.

NVIDIA GPUs can accelerate training for a multitude of machine learning models (ML), but are especially useful for dense, high-throughput use cases such as image classification, video analysis, and natural language processing (NLP) for conversational AI.

Typically, training an ML model using a single CPU can take days. However, by offloading the task to one or more NVIDIA GPUs, you can cut that time to hours.

In this blog, we’ll walk you through the simple process of getting started and training your first ML model using NVIDIA GPUs in Google Cloud.

Before you begin

Google Cloud currently delivers NVIDIA GPUs to more cloud regions than any other Cloud Service Provider. You can check the zoning list—and which GPUs are available in which zone — in our GPUs on Compute Engine page.

Additionally, it’s important to note that not all types of NVIDIA GPUs are available in all regions. Read our full guide to check what’s available where you are.

This is especially important to consider if you want to run multiple GPU types for your training job, as each type of GPU must be available in the same region.

What you’ll need

Once you’ve confirmed that the GPUs you need are available in the Google Cloud regions where you need them, you can build your training workload in Google Cloud.

The Google AI Platform and NVIDIA GPUs support both TensorFlow and custom ML frameworks, giving you the freedom to  create and train your models. However, if you wish to use a non-TensorFlow framework, you’ll need to create a custom container for your training job.

Once you’ve built a clean data set for training, chosen your framework, and set up your untrained model in an AI platform, it’s time to provision the NVIDIA GPUs to handle the training workload.

Requesting an NVIDIA GPU-enabled machine in Google Cloud AI Platform

To configure your training program to access NVIDIA GPU-enabled machines in AI Platform, use Compute Engine machine types and attach GPUs.

If you configure your training job with Compute Engine machine types, you can attach a custom number of GPUs to accelerate the job.

To do this you must create a valid acceleratorConfig, detailing a valid number of NVIDIA GPUs. Each GPU carries different deployment restrictions, so please check our Compute Engine machine types guide to ensure your configuration is valid.

Get started today

Whatever plans you have for AI – whether you want to understand your customers better, infer meaning from text, identify anomalies across visual data sets, or create innovative new AI applications of your own design – AI Platform is a solid place to start.

Now, with NVIDIA GPUs available for raid provisioning through the platform, it’s also a great place to harness the Power of Two for yourself.

Find out more about the powerful combination of NVIDIA GPUs and Google Cloud, gain AI inspiration, and discover how you can redefine possible for your organization at thepoweroftwo.solutions.

SHARE
Share on linkedin
Share on twitter
Share on email
Scroll to Top

Want to stay up to date?