代码实战带你了解深度学习中的混合精度训练

代码,实战,了解,深度,学习,混合,精度,训练 · 浏览次数 : 29

小编点评

# resnet18 训练代码 ```python model = ResNet(layers=18, **args) end = time.time() print('no_amp:', end-strat1)epoch: 0, batch_id: 0, loss is: [0.21116894], acc is: [1.]epoch: 1, batch_id: 0, loss is: [0.00010776], acc is: [1.]epoch: 2, batch_id: 0, loss is: [2.5868081e-05], acc is: [1.]epoch: 3, batch_id: 0, loss is: [1.442422e-05], acc is: [1.]epoch: 4, batch_id: 0, loss is: [1.1086402e-05], acc is: [1.]no_amp: 740.6813971996307strat1 = time.time() train_amp(model)end1 = time.time() print('with amp:', end1-strat1)epoch: 0, batch_id: 0, loss is: [0.512834], acc is: [1.]epoch: 1, batch_id: 0, loss is: [0.00025519], acc is: [1.]epoch: 2, batch_id: 0, loss is: [5.9364465e-05], acc is: [1.]epoch: 3, batch_id: 0, loss is: [3.2305197e-05], acc is: [1.]epoch: 4, batch_id: 0, loss is: [2.4556812e-05], acc is: [1.]with amp: 740.96032285690313 ``` # 训练混合精度训练代码 ```python model = ResNet(layers=18, **args) end = time.time() print('no_amp:', end-strat1)epoch: 0, batch_id: 0, loss is: [0.21116894], acc is: [1.]epoch: 1, batch_id: 0, loss is: [0.00010776], acc is: [1.]epoch: 2, batch_id: 0, loss is: [2.5868081e-05], acc is: [1.]epoch: 3, batch_id: 0, loss is: [1.442422e-05], acc is: [1.]epoch: 4, batch_id: 0, loss is: [1.1086402e-05], acc is: [1.]no_amp: 740.6813971996307strat1 = time.time() train_amp(model, **args)end1 = time.time() print('with amp:', end1-strat1)epoch: 0, batch_id: 0, loss is: [0.512834], acc is: [1.]epoch: 1, batch_id: 0, loss is: [0.00025519], acc is: [1.]epoch: 2, batch_id: 0, loss is: [5.9364465e-05], acc is: [1.]epoch: 3, batch_id: 0, loss is: [3.2305197e-05], acc is: [1.]epoch: 4, batch_id: 0, loss is: [2.4556812e-05], acc is: [1.]with amp: 740.96032285690313 ```

正文

摘要:本文为大家介绍一下深度学习中的混合精度训练,并通过代码实战的方式为大家讲解实际应用的理论,并对模型进行测试。

本文分享自华为云社区《浅谈深度学习中的混合精度训练》,作者:李长安。

1 混合精度训练

混合精度训练最初是在论文Mixed Precision Training中被踢出,该论文对混合精度训练进行了详细的阐述,并对其实现进行了讲解,有兴趣的同学可以看看这篇论文。

1.1半精度与单精度

半精度(也被称为FP16)对比高精度的FP32与FP64降低了神经网络的显存占用,使得我们可以训练部署更大的网络,并且FP16在数据转换时比FP32或者FP64更节省时间。

单精度(也被称为32-bit)是通用的浮点数格式(在C扩展语言中表示为float),64-bit被称为双精度(double)。

如图所示,我们能够很直观的看到半精度的存储空间是单精度存储空间的一半。

1.2为什么使用混合精度训练

混合精度训练,指代的是单精度 float和半精度 float16 混合训练。

float16和float相比恰里,总结下来就是两个原因:内存占用更少,计算更快。

内存占用更少:这个是显然可见的,通用的模型 fp16 占用的内存只需原来的一半。memory-bandwidth 减半所带来的好处:

模型占用的内存更小,训练的时候可以用更大的batchsize。

模型训练时,通信量(特别是多卡,或者多机多卡)大幅减少,大幅减少等待时间,加快数据的流通。

计算更快:目前的不少GPU都有针对 fp16 的计算进行优化。论文指出:在近期的GPU中,半精度的计算吞吐量可以是单精度的 2-8 倍;

损失控制原理:

2 实验设计

本次实验主要从两个方面进行测试,分别在精度和速度两个部分进行对比。实验中采用ResNet-18作为测试对象,使用的数据集为美食数据集,共五种类别。

# 解压数据集
!cd data/data64280/ && unzip -q trainset.zip

2.1数据集预处理

import pandas as pd
import numpy as np
import os
all_file_dir = 'data/data64280/trainset'
img_list = []
label_list = []
label_id = 0
class_list = [c for c in os.listdir(all_file_dir) if os.path.isdir(os.path.join(all_file_dir, c))]
for class_dir in class_list:
 image_path_pre = os.path.join(all_file_dir, class_dir)
 for img in os.listdir(image_path_pre):
 img_list.append(os.path.join(image_path_pre, img))
 label_list.append(label_id)
 label_id += 1
img_df = pd.DataFrame(img_list)
label_df = pd.DataFrame(label_list)
img_df.columns = ['images']
label_df.columns = ['label']
df = pd.concat([img_df, label_df], axis=1)
df = df.reindex(np.random.permutation(df.index))
df.to_csv('food_data.csv', index=0)
import pandas as pd
# 读取数据
df = pd.read_csv('food_data.csv')
image_path_list = df['images'].values
label_list = df['label'].values
# 划分训练集和校验集
all_size = len(image_path_list)
train_size = int(all_size * 0.8)
train_image_path_list = image_path_list[:train_size]
train_label_list = label_list[:train_size]
val_image_path_list = image_path_list[train_size:]
val_label_list = label_list[train_size:]

2.2自定义数据集

import numpy as np
from PIL import Image
from paddle.io import Dataset
import paddle.vision.transforms as T
import paddle as pd
class MyDataset(Dataset):
 """
 步骤一:继承paddle.io.Dataset类
    """
 def __init__(self, image, label, transform=None):
 """
 步骤二:实现构造函数,定义数据读取方式,划分训练和测试数据集
        """
 super(MyDataset, self).__init__()
 imgs = image
        labels = label
 self.labels = labels
 self.imgs = imgs
 self.transform = transform
 # self.loader = loader
 def __getitem__(self, index): # 这个方法是必须要有的,用于按照索引读取每个元素的具体内容
 fn = self.imgs
        label = self.labels
 # fn是图片path #fn和label分别获得imgs[index]也即是刚才每行中word[0]和word[1]的信息
 for im,la in zip(fn, label):
 img = Image.open(im)
 img = img.convert("RGB")
 img = np.array(img).astype('float32') / 255.0
            label = np.array([la]).astype(dtype='int64')
 # 按照路径读取图片
 if self.transform is not None:
 img = self.transform(img)
 # 数据标签转换为Tensor
 return img, label
 # return回哪些内容,那么我们在训练时循环读取每个batch时,就能获得哪些内容
 # **********************************  #使用__len__()初始化一些需要传入的参数及数据集的调用**********************
 def __len__(self):
 # 这个函数也必须要写,它返回的是数据集的长度,也就是多少张图片,要和loader的长度作区分
 return len(self.imgs)

2.3训练准备

import paddle
from paddle.metric import Accuracy
import warnings
warnings.filterwarnings("ignore")
import paddle.vision.transforms as T
transform = T.Compose([
 T.Resize([224, 224]),
 T.ToTensor(),
 # T.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),
 # T.Transpose(),
])
train_dataset = MyDataset(image=train_image_path_list, label=train_label_list ,transform=transform)
train_loader = paddle.io.DataLoader(train_dataset, places=paddle.CPUPlace(), batch_size=16, shuffle=True)
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import paddle
from paddle import ParamAttr
import paddle.nn as nn
import paddle.nn.functional as F
from paddle.nn import Conv2D, BatchNorm, Linear, Dropout
from paddle.nn import AdaptiveAvgPool2D, MaxPool2D, AvgPool2D
from paddle.nn.initializer import Uniform
import math
__all__ = ["ResNet18", "ResNet34", "ResNet50", "ResNet101", "ResNet152"]
class ConvBNLayer(nn.Layer):
 def __init__(self,
 num_channels,
 num_filters,
 filter_size,
                 stride=1,
                 groups=1,
                 act=None,
                 name=None,
 data_format="NCHW"):
 super(ConvBNLayer, self).__init__()
 self._conv = Conv2D(
 in_channels=num_channels,
 out_channels=num_filters,
 kernel_size=filter_size,
            stride=stride,
            padding=(filter_size - 1) // 2,
            groups=groups,
 weight_attr=ParamAttr(name=name + "_weights"),
 bias_attr=False,
 data_format=data_format)
 if name == "conv1":
 bn_name = "bn_" + name
 else:
 bn_name = "bn" + name[3:]
 self._batch_norm = BatchNorm(
 num_filters,
            act=act,
 param_attr=ParamAttr(name=bn_name + "_scale"),
 bias_attr=ParamAttr(bn_name + "_offset"),
 moving_mean_name=bn_name + "_mean",
 moving_variance_name=bn_name + "_variance",
 data_layout=data_format)
 def forward(self, inputs):
        y = self._conv(inputs)
        y = self._batch_norm(y)
 return y
class BottleneckBlock(nn.Layer):
 def __init__(self,
 num_channels,
 num_filters,
                 stride,
                 shortcut=True,
                 name=None,
 data_format="NCHW"):
 super(BottleneckBlock, self).__init__()
 self.conv0 = ConvBNLayer(
 num_channels=num_channels,
 num_filters=num_filters,
 filter_size=1,
            act="relu",
            name=name + "_branch2a",
 data_format=data_format)
 self.conv1 = ConvBNLayer(
 num_channels=num_filters,
 num_filters=num_filters,
 filter_size=3,
            stride=stride,
            act="relu",
            name=name + "_branch2b",
 data_format=data_format)
 self.conv2 = ConvBNLayer(
 num_channels=num_filters,
 num_filters=num_filters * 4,
 filter_size=1,
            act=None,
            name=name + "_branch2c",
 data_format=data_format)
 if not shortcut:
 self.short = ConvBNLayer(
 num_channels=num_channels,
 num_filters=num_filters * 4,
 filter_size=1,
                stride=stride,
                name=name + "_branch1",
 data_format=data_format)
 self.shortcut = shortcut
 self._num_channels_out = num_filters * 4
 def forward(self, inputs):
        y = self.conv0(inputs)
        conv1 = self.conv1(y)
        conv2 = self.conv2(conv1)
 if self.shortcut:
            short = inputs
 else:
            short = self.short(inputs)
        y = paddle.add(x=short, y=conv2)
        y = F.relu(y)
 return y
class BasicBlock(nn.Layer):
 def __init__(self,
 num_channels,
 num_filters,
                 stride,
                 shortcut=True,
                 name=None,
 data_format="NCHW"):
 super(BasicBlock, self).__init__()
 self.stride = stride
 self.conv0 = ConvBNLayer(
 num_channels=num_channels,
 num_filters=num_filters,
 filter_size=3,
            stride=stride,
            act="relu",
            name=name + "_branch2a",
 data_format=data_format)
 self.conv1 = ConvBNLayer(
 num_channels=num_filters,
 num_filters=num_filters,
 filter_size=3,
            act=None,
            name=name + "_branch2b",
 data_format=data_format)
 if not shortcut:
 self.short = ConvBNLayer(
 num_channels=num_channels,
 num_filters=num_filters,
 filter_size=1,
                stride=stride,
                name=name + "_branch1",
 data_format=data_format)
 self.shortcut = shortcut
 def forward(self, inputs):
        y = self.conv0(inputs)
        conv1 = self.conv1(y)
 if self.shortcut:
            short = inputs
 else:
            short = self.short(inputs)
        y = paddle.add(x=short, y=conv1)
        y = F.relu(y)
 return y
class ResNet(nn.Layer):
 def __init__(self, layers=50, class_dim=1000, input_image_channel=3, data_format="NCHW"):
 super(ResNet, self).__init__()
 self.layers = layers
 self.data_format = data_format
 self.input_image_channel = input_image_channel
 supported_layers = [18, 34, 50, 101, 152]
 assert layers in supported_layers, \
 "supported layers are {} but input layer is {}".format(
 supported_layers, layers)
 if layers == 18:
            depth = [2, 2, 2, 2]
 elif layers == 34 or layers == 50:
            depth = [3, 4, 6, 3]
 elif layers == 101:
            depth = [3, 4, 23, 3]
 elif layers == 152:
            depth = [3, 8, 36, 3]
 num_channels = [64, 256, 512,
 1024] if layers >= 50 else [64, 64, 128, 256]
 num_filters = [64, 128, 256, 512]
 self.conv = ConvBNLayer(
 num_channels=self.input_image_channel,
 num_filters=64,
 filter_size=7,
            stride=2,
            act="relu",
            name="conv1",
 data_format=self.data_format)
 self.pool2d_max = MaxPool2D(
 kernel_size=3,
            stride=2, 
            padding=1,
 data_format=self.data_format)
 self.block_list = []
 if layers >= 50:
 for block in range(len(depth)):
                shortcut = False
 for i in range(depth[block]):
 if layers in [101, 152] and block == 2:
 if i == 0:
 conv_name = "res" + str(block + 2) + "a"
 else:
 conv_name = "res" + str(block + 2) + "b" + str(i)
 else:
 conv_name = "res" + str(block + 2) + chr(97 + i)
 bottleneck_block = self.add_sublayer(
 conv_name,
 BottleneckBlock(
 num_channels=num_channels[block]
 if i == 0 else num_filters[block] * 4,
 num_filters=num_filters[block],
                            stride=2 if i == 0 and block != 0 else 1,
                            shortcut=shortcut,
                            name=conv_name,
 data_format=self.data_format))
 self.block_list.append(bottleneck_block)
                    shortcut = True
 else:
 for block in range(len(depth)):
                shortcut = False
 for i in range(depth[block]):
 conv_name = "res" + str(block + 2) + chr(97 + i)
 basic_block = self.add_sublayer(
 conv_name,
 BasicBlock(
 num_channels=num_channels[block]
 if i == 0 else num_filters[block],
 num_filters=num_filters[block],
                            stride=2 if i == 0 and block != 0 else 1,
                            shortcut=shortcut,
                            name=conv_name,
 data_format=self.data_format))
 self.block_list.append(basic_block)
                    shortcut = True
 self.pool2d_avg = AdaptiveAvgPool2D(1, data_format=self.data_format)
 self.pool2d_avg_channels = num_channels[-1] * 2
 stdv = 1.0 / math.sqrt(self.pool2d_avg_channels * 1.0)
 self.out = Linear(
 self.pool2d_avg_channels,
 class_dim,
 weight_attr=ParamAttr(
                initializer=Uniform(-stdv, stdv), name="fc_0.w_0"),
 bias_attr=ParamAttr(name="fc_0.b_0"))
 def forward(self, inputs):
        y = self.conv(inputs)
        y = self.pool2d_max(y)
 for block in self.block_list:
            y = block(y)
        y = self.pool2d_avg(y)
        y = paddle.reshape(y, shape=[-1, self.pool2d_avg_channels])
        y = self.out(y)
 return y
def ResNet18(**args):
    model = ResNet(layers=18, **args)
 return model

2.4训练过程定义

import paddle
import numpy
import paddle.nn.functional as F
import time
def train(model):
 model.train()
    epochs = 5
 optim = paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters())
 # 用Adam作为优化函数
 for epoch in range(epochs):
 for batch_id, data in enumerate(train_loader()):
 x_data = data[0]
 y_data = data[1]
 # print(y_data)
            predicts = model(x_data)
            loss = F.cross_entropy(predicts, y_data)
 # 计算损失
            acc = paddle.metric.accuracy(predicts, y_data, k=2)
 loss.backward()
 if batch_id % 10 == 0:
 print("epoch: {}, batch_id: {}, loss is: {}, acc is: {}".format(epoch, batch_id, loss.numpy(), acc.numpy()))
 optim.step()
 optim.clear_grad()
import paddle
import numpy
import paddle.nn.functional as F
import time
def train_amp(model):
 model.train()
    epochs = 5
 optim = paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters())
 # 用Adam作为优化函数
 for epoch in range(epochs):
 for batch_id, data in enumerate(train_loader()):
 x_data = data[0].astype('float16')
 y_data = data[1]
            scaler = paddle.amp.GradScaler(init_loss_scaling=1024)
 with paddle.amp.auto_cast():
                predicts = model(x_data)
                loss = F.cross_entropy(predicts, y_data)
            scaled = scaler.scale(loss) # scale the loss
 scaled.backward() # do backward
            acc = paddle.metric.accuracy(predicts, y_data, k=2)
 if batch_id % 10 == 0:
 print("epoch: {}, batch_id: {}, loss is: {}, acc is: {}".format(epoch, batch_id, loss.numpy(),
 acc.numpy()))
 optim.step()
 optim.clear_grad()

2.5开启训练

此部分,分别对两种训练方式进行对比,主要关注模型的训练速度

model = ResNet18(class_dim=2)
strat = time.time()
train(model)
end = time.time()
print('no_amp:', end-strat)
epoch: 0, batch_id: 0, loss is: [0.21116894], acc is: [1.]
epoch: 1, batch_id: 0, loss is: [0.00010776], acc is: [1.]
epoch: 2, batch_id: 0, loss is: [2.5868081e-05], acc is: [1.]
epoch: 3, batch_id: 0, loss is: [1.442422e-05], acc is: [1.]
epoch: 4, batch_id: 0, loss is: [1.1086402e-05], acc is: [1.]
no_amp: 740.6813971996307
strat1 = time.time()
train_amp(model)
end1 = time.time()
print('with amp:', end1-strat1)
epoch: 0, batch_id: 0, loss is: [0.512834], acc is: [1.]
epoch: 1, batch_id: 0, loss is: [0.00025519], acc is: [1.]
epoch: 2, batch_id: 0, loss is: [5.9364465e-05], acc is: [1.]
epoch: 3, batch_id: 0, loss is: [3.2305197e-05], acc is: [1.]
epoch: 4, batch_id: 0, loss is: [2.4556812e-05], acc is: [1.]
with amp: 740.9603228569031

3 总结

对于本次实验,由于迭代轮数较少,只迭代了5次,故时间上的优势没有体现出来,大家有兴趣的可以增加迭代次数,或者换更深的网络进行测试。

从训练的结果来看,使用混合精度训练,其loss值是高于未使用混合精度训练模型的。

 

点击关注,第一时间了解华为云新鲜技术~

与代码实战带你了解深度学习中的混合精度训练相似的内容:

代码实战带你了解深度学习中的混合精度训练

摘要:本文为大家介绍一下深度学习中的混合精度训练,并通过代码实战的方式为大家讲解实际应用的理论,并对模型进行测试。 本文分享自华为云社区《浅谈深度学习中的混合精度训练》,作者:李长安。 1 混合精度训练 混合精度训练最初是在论文Mixed Precision Training中被踢出,该论文对混合精

一文带你读懂设计模式之责任链模式

翻了一下之前刚入职时候的学习笔记,发现之前在熟悉业务代码的时候曾经专门学习并整理过过设计模式中的责任链模式,之前只是对其简单了解过常用的设计模式有哪些,并未结合实例和源码深入对其探究,利用熟悉代码契机进行系统学习并整理文档如下。

带你揭开神秘的Javascript AST面纱之Babel AST 四件套的使用方法

对 AST 有了初步的认识,还有常规的代码改造应用实践,现在我们来详细说说使用 AST, 如何进行代码改造?

带你读AI论文丨S&P21 Survivalism: Living-Off-The-Land 经典离地攻击

摘要:这篇文章属于系统分析类的文章,通过详细的实验分析了离地攻击(Living-Off-The-Land)的威胁性和流行度,包括APT攻击中的利用及示例代码论证。 本文分享自华为云社区《[论文阅读] (21)S&P21 Survivalism: Living-Off-The-Land 经典离地攻击》

带你读论文丨S&P21 Survivalism: Living-Off-The-Land 经典离地攻击

这篇文章属于系统分析类的文章,通过详细的实验分析了离地攻击(Living-Off-The-Land)的威胁性和流行度,包括APT攻击中的利用及示例代码论证。

零代码零硬件玩转华为云IoT,基于设备联动实时监控设备

本文分享自华为云社区《一键守护,实时洞察:华为云IoT设备联动,智能感知设备状态变化,精准触发告警通知【零代码零硬件玩转华为云IoT】》,作者:周周的奇妙编程。 前言 在前面我们已经体验过了设备接入、设备控制和数据长效存储三个方面的内容。(踏云而行:五步带你运用设备模拟器接入华为云IoT物联网平台,

集群部署看过来,低代码@AWS智能集群的架构与搭建方案

为了帮助充分利用AWS的托管服务快速构建起一套集群环境,彻底去掉“单一故障点”,实现最高的可用性,我们准备了**《低代码智能集群@AWS的架构与搭建方案》**看完本文,带你掌握“基于nginx配置服务器集群”。 应用场景 如果你需要解决如下的问题,可以考虑搭建一套活字格@AWS智能集群: 保障系统高

3种分页列表缓存方式,速收藏~

摘要:本文介绍了实现分页列表缓存的三种方式。 本文分享自华为云社区《分页列表缓存,你真的会吗》,作者: 勇哥java实战分享 。 1 直接缓存分页列表结果 显而易见,这是最简单易懂的方式。 我们按照不同的分页条件来缓存分页结果 ,伪代码如下: public List getPage

Ascend C保姆级教程:我的第一份Ascend C代码

从一个简单的实例出发,带你体验Ascend C算子开发的基本流程。

7种创建方式,带你理解Java的单例模式

介绍创建Java单例对象的七种方式,重点掌握哪些创建方式是线程安全的,哪些方式是线程不安全的,并能够在实际项目中灵活运用设计模式,编写可维护的代码。