设计工具
应用程序

Getting to the heart of data intelligence

韦斯Vaske | February 2019

AI Matters: Getting to the Heart of Data Intelligence with 内存 and 存储

As I've been exploring infrastructure for AI and Machine Learning systems, 我发现有一种奇怪的现象,那就是缺乏以任何方式讨论底层存储或内存的性能数据. 相反,对话由不同的可用计算资源(gpu)主导, cpu, fpga, tpu, 等.).

At first this was concerning to me as I primarily support our storage products by providing relevant solution engineering validation and information. As I began running my experiments, 然而, 我发现,缺乏讨论非计算基础设施的性能数据并不是因为缺乏必要性, but to it just being ignored as a problem.

Here’s some data to provide a bit of context.

Disck Throughput and Images/Sec by GPU

这个图表显示了不同的人工智能训练系统对特定模型和数据集的性能. (Specifically, 这是在Imagenet数据集上训练的ResNet-50图像分类模型, a dataset of 1.2 million images that's around 145GB in size.)

Going back to 2014, 我们看到AI训练系统中的存储只需要提供大约50mb /秒的磁盘吞吐量来满足8倍顶级(当时)gpu的需求. 和, as much as I'd like you to use 固态硬盘s for every workload, it'd be a str等h to say flash drives were required to support this use case. 50 MB/second is pretty trivial.

Move forward a generation from the K80 GPUs to the P100s, 我们看到存储需求显著增加——从50MB/秒增加到150MB/秒. While that increase was large, 这仍然不是值得担心的——150MB/s对于基于hdd的系统来说可能不是微不足道的, but it doesn't present any real architectural challenges.

然而, 最新一代(以及进一步的软件优化)将事情推向了新的领域. 同样的模型——ResNet-50——处理同样的数据集需要将近每秒千兆字节的存储吞吐量来保持gpu以最高效率运行. An HDD-based system has a hard time meeting those requirements.

So, 现在,当我们谈论人工智能系统时,为什么我们没有谈论存储性能,这是有道理的——直到最近才有必要这样做. 另外, 如果这种趋势继续下去(我们没有理由认为它不会),未来将依赖于我们构建能够管理下一代gpu需求的存储系统的能力.

好吧,我们同意存储性能很重要——但它到底有多重要呢? 对于我们的AI系统,不正确的存储(和内存)架构的实际影响是什么?

为了回答这些问题,我进行了一些额外的实验,试图阐明这个问题. 下面的数据是使用与上面相同的模型和数据集运行的——Resnet-50针对Imagenet数据集进行训练. The hardware was a dual Intel® Xeon 8180M server with 8x Nvidia® V100 GPUs. 每个GPU有32GB的内存,系统有3TB的内存,我的存储空间是8x3.2TB 微米 9200 NVMe™ solid-state drives in RAID10.

Training speed by model and batch size

I tested the impact of two variables, memory amount and disk throughput. 通过更改适当的docker容器参数(mem_limit和device_read_bps)来调整每个变量。.

内存, 容器要么拥有所有可用的内存(3TB),要么拥有更少的内存,导致在系统处于稳定状态(128GB)后,只有一半的数据集可以放入文件系统缓存中。.

用于存储, the container either had unlimited access to the NVMe storage, or it was limited to 500 MB/s of throughput. 之所以选择这个数字,是因为它大约是观察到的峰值吞吐量的一半.2GB/秒),对应于来自各种云提供商的GPU实例可用的磁盘类型.

The results shouldn't be surprising. 如果AI系统中的存储无法跟上gpu的速度,并且没有足够的内存来缓存数据集, then the system performance was seriously degraded. Thankfully, this is a problem we can solve. 尽管你可以通过大量内存和非常快的磁盘从AI系统中获得最大的效率, 只要有快速的磁盘或密集的内存就能让你在很大程度上达到这个目标.

我将在这里讨论的最后一组实验是沙巴体育安卓版下载GPU内存对训练性能的影响. These tests were run on the same hardware as above (8x v100 GPUs), 但我缩放了批处理大小(一次发送到GPU进行处理的图像数量)以及算法的“复杂性”(ResNet模型中的层数)。.

每条线表示特定模型的训练吞吐量(以每秒图像为单位). Once a batch size becomes too large, 没有足够的内存,应用程序将崩溃(如上所示,行结束的地方).

There are a couple things we can take away from this chart. The first and most obvious is that throughput increases with batch size. 这个图表专门描述了训练,但在推理中也可以看到同样的行为. Bigger batches increase the throughput.

下一个结论是,最大批处理大小取决于模型的复杂性. As a model gets larger and more complex, the model's weights take up more of the GPU memory space, leaving less space for the data. Depending on your specific use case, 有可能将模型复杂性推到根本无法训练模型的程度, even limiting the model to a batch size of 1. 当将模型部署到智能边缘或物联网设备时,这一点尤其明显,这些设备的内存容量通常比我在这里使用的gpu要低得多.

To sum it all up, 在设计AI系统时,你需要考虑三个主要组成部分:

  • 存储 performance
  • System memory density
  • GPU/Accelerator memory density
  • 存储性能和系统内存密度对于获得最大的系统性能至关重要, GPU/加速器内存对性能和未来模型开发非常重要.

有关更多详细信息:了解有关将内存和存储与特定AI和机器学习模型对齐的更多信息. Watch the webinar on-demand: 人工智能很重要——通过内存和存储进入数据智能的核心, 以我, Chris Gardner from Forrester®, and Eric Booth from 微米’s Compute and Networking unit.

SMTS Systems Performance Engineer

韦斯Vaske

韦斯Vaske is a principal storage solution engineer with 微米.