深度学习计算：打开工具箱，从基础用户升级为高级用户

作者：袖梨 2026-07-04

深度学习的"工具箱"：层、块、参数与GPU

前几篇我们学会了怎么用现成的积木搭网络，但你有没有想过：这些积木是怎么造出来的？怎么自己造积木？怎么保存和加载模型？怎么用GPU加速？

今天，我们就打开深度学习的"工具箱"，从"基础用户"升级为"高级用户"！我们会用组装电脑的比喻来讲解——你把深度学习库想象成电脑城，层是硬件，块是组装好的主机，参数是硬件的配置，GPU是高性能显卡！

一、层和块：从零件到整机

1. 什么是层和块？

层（Layer）：就像电脑的硬件零件（CPU、显卡、内存）
块（Block）：就像把零件组装好的整机（主机）

通俗理解：

层是单个零件，功能单一
块是多个零件组装在一起，功能更完整

在PyTorch里，nn.Module是所有层和块的基类——就像所有硬件都得符合某个标准接口。

2. 自定义块：自己组装一台电脑！

我们来自定义一个MLP块，就像自己组装一台电脑：

import torch
from torch import nn
from torch.nn import functional as F# 自定义MLP块（就像自己组装一台电脑）
class MLP(nn.Module):
    def __init__(self):
        super().__init__()  # 必须调用父类的构造函数
        self.hidden = nn.Linear(20, 256)  # 隐藏层（CPU）
        self.out = nn.Linear(256, 10)     # 输出层（显卡）
    
    def forward(self, X):
        # 前向传播：数据怎么流动（就像数据在电脑里怎么传输）
        return self.out(F.relu(self.hidden(X)))# 测试一下
net = MLP()
X = torch.randn(2, 20)  # 输入数据
print(net(X))

自己定义块就是这么简单！只需要：

继承nn.Module
在__init__里定义层
实现forward函数（前向传播）

3. 顺序块：用流水线组装！

如果只是简单地把层串起来，PyTorch给我们提供了nn.Sequential——就像流水线组装电脑：

import torch
from torch import nn
from torch.nn import functional as F# 用Sequential定义MLP（流水线组装）
net = nn.Sequential(
    nn.Linear(20, 256),  # 第一步：装CPU
    nn.ReLU(),            # 第二步：装散热
    nn.Linear(256, 10)     # 第三步：装显卡
)# 测试一下
X = torch.randn(2, 20)  # 输入数据
print(net(X))

Sequential的好处是：简单、直观，适合层与层之间是顺序连接的情况。

4. 在前向传播里执行代码：灵活组装！

forward函数里不仅能调用层，还能执行任意Python代码！就像你组装电脑时可以灵活调整：

class FixedHiddenMLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.rand_weight = torch.rand((20, 20), requires_grad=False)  # 这个权重不训练
        self.linear = nn.Linear(20, 20)
    
    def forward(self, X):
        X = self.linear(X)
        X = F.relu(torch.mm(X, self.rand_weight) + 1)  # 用常量计算
        X = self.linear(X)  # 重用同一个层
        while X.abs().sum() > 1:
            X /= 2
        return X.sum()net = FixedHiddenMLP()
print(net(X))

看！forward里不仅能用层，还能用循环、条件判断，甚至不用梯度的常量！这就是深度学习框架的强大之处——灵活！

5. 嵌套块：电脑里可以装服务器！

块可以嵌套块！就像电脑里可以再装一台服务器：

class NestMLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(20, 64),
            nn.ReLU(),
            nn.Linear(64, 32),
            nn.ReLU()
        )
        self.linear = nn.Linear(32, 16)
    
    def forward(self, X):
        return self.linear(self.net(X))# 超级嵌套：块里套块，再套块
chimera = nn.Sequential(
    NestMLP(), 
    nn.Linear(16, 20), 
    FixedHiddenMLP()
)print(chimera(X))

嵌套块让我们可以模块化地构建网络——复杂的网络也是由简单的块组成的！

二、参数管理：查看和调整硬件配置！

1. 参数访问：看看硬件配置！

模型训练后，我们需要查看参数——就像看看电脑的硬件配置：

net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(), nn.Linear(8, 1))
X = torch.rand(size=(2, 4))
net(X)# 访问第二个层（输出层）的参数
print(net[2].state_dict())

每个层的参数都在state_dict里——就像硬件的配置清单。

访问特定参数：

# 访问输出层的权重
print(type(net[2].weight))
print(net[2].weight)
print(net[2].weight.data)  # 只看数值，不看梯度# 访问偏置
print(net[2].bias)
print(net[2].bias.data)# 访问梯度（如果还没反向传播，梯度是None）
print(net[2].weight.grad == None)

一次性访问所有参数：

# 访问所有参数
print(*[(name, param.shape) for name, param in net.named_parameters()])# 或者直接访问
print(net.state_dict()['2.bias'].data)

2. 从嵌套块里收集参数：拆开服务器看配置！

嵌套块的参数怎么访问？递归地找就行了：

import torch
from torch import nn
from torch.nn import functional as F# 重新定义一下需要的类（方便独立运行这个代码块）
class FixedHiddenMLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.rand_weight = torch.rand((20, 20), requires_grad=False)
        self.linear = nn.Linear(20, 20)
    
    def forward(self, X):
        X = self.linear(X)
        X = F.relu(torch.mm(X, self.rand_weight) + 1)
        X = self.linear(X)
        while X.abs().sum() > 1:
            X /= 2
        return X.sum()class NestMLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(20, 64),
            nn.ReLU(),
            nn.Linear(64, 32),
            nn.ReLU()
        )
        self.linear = nn.Linear(32, 16)
    
    def forward(self, X):
        return self.linear(self.net(X))# 创建chimera网络
chimera = nn.Sequential(
    NestMLP(), 
    nn.Linear(16, 20), 
    FixedHiddenMLP()
)# 从嵌套块里收集参数
print(*[(name, param.shape) for name, param in chimera.named_parameters()])

不管嵌套多少层，named_parameters()都能把所有参数找出来！

3. 参数初始化：给硬件设置默认值！

好的参数初始化很重要——就像给硬件设置合适的默认值。

默认初始化：

PyTorch有默认的初始化方式：

线性层的权重：均匀分布或正态分布
偏置：初始化为0

内置初始化：

PyTorch也提供了内置的初始化方法：

import torch
from torch import nn# 先创建一个网络
net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(), nn.Linear(8, 1))
X = torch.rand(size=(2, 4))
net(X)  # 先做一次前向传播，确保参数初始化# 正态分布初始化
def init_normal(m):
    if type(m) == nn.Linear:
        nn.init.normal_(m.weight, mean=0, std=0.01)
        nn.init.zeros_(m.bias)net.apply(init_normal)  # apply会把init_normal应用到每一层
print(net[0].weight.data[0], net[0].bias.data[0])

常数初始化：

import torch
from torch import nnnet = nn.Sequential(nn.Linear(4, 8), nn.ReLU(), nn.Linear(8, 1))
X = torch.rand(size=(2, 4))
net(X)def init_constant(m):
    if type(m) == nn.Linear:
        nn.init.constant_(m.weight, 1)
        nn.init.zeros_(m.bias)net.apply(init_constant)
print(net[0].weight.data[0], net[0].bias.data[0])

自定义初始化：

你也可以自己写初始化逻辑：

import torch
from torch import nnnet = nn.Sequential(nn.Linear(4, 8), nn.ReLU(), nn.Linear(8, 1))
X = torch.rand(size=(2, 4))
net(X)def my_init(m):
    if type(m) == nn.Linear:
        print("Init", *[(name, param.shape)
                        for name, param in m.named_parameters()][0])
        nn.init.uniform_(m.weight, -10, 10)
        # 自定义：绝对值>=5的权重保留，否则设为0
        m.weight.data *= (m.weight.data.abs() >= 5).float()net.apply(my_init)
print(net[0].weight[:2])

直接设置参数：

你甚至可以直接修改参数的值：

import torch
from torch import nnnet = nn.Sequential(nn.Linear(4, 8), nn.ReLU(), nn.Linear(8, 1))
X = torch.rand(size=(2, 4))
net(X)net[0].weight.data[:] += 1  # 所有权重加1
net[0].weight.data[0, 0] = 42  # 第一个权重设为42
print(net[0].weight.data[0])

4. 参数绑定：两台电脑用同一个显卡！

有时我们想让多个层共享参数——就像两台电脑用同一个显卡：

import torch
from torch import nn# 共享层
shared = nn.Linear(8, 8)
net = nn.Sequential(
    nn.Linear(4, 8), 
    nn.ReLU(),
    shared,           # 第一次用shared
    nn.ReLU(),
    shared,           # 第二次用shared（同一个对象！）
    nn.ReLU(),
    nn.Linear(8, 1)
)X = torch.rand(size=(2, 4))
net(X)
# 检查它们是不是一样的
print(net[2].weight.data[0] == net[4].weight.data[0])# 修改一个，另一个也会变
net[2].weight.data[0, 0] = 100
print(net[2].weight.data[0] == net[4].weight.data[0])

参数绑定可以节省内存，也能让模型在不同位置共享权重！

三、延后初始化：先装机，再看需要什么配置！

1. 什么是延后初始化？

你有没有遇到过这种情况：定义网络时不知道输入维度？

import torch
from torch import nnnet = nn.Sequential(nn.LazyLinear(256), nn.ReLU(), nn.LazyLinear(10))
print(net[0].weight)  # 还没初始化，会显示UninitializedParameter

LazyLinear就是延后初始化——它不知道输入维度，所以不初始化参数。

2. 第一次前向传播时才初始化！

当你第一次传入数据时，PyTorch会自动推断输入维度，然后初始化参数：

import torch
from torch import nnnet = nn.Sequential(nn.LazyLinear(256), nn.ReLU(), nn.LazyLinear(10))X = torch.rand(2, 20)
net(X)  # 第一次前向传播，现在初始化了！
print(net[0].weight.shape)  # (256, 20)

延后初始化的好处是：你不需要手动计算每一层的输入维度！

四、自定义层：自己造硬件！

1. 不带参数的层：造一个简单零件！

我们来造一个没有参数的层——就像造一个简单的转接头：

import torch
from torch import nn
from torch.nn import functional as Fclass CenteredLayer(nn.Module):
    def __init__(self):
        super().__init__()
    
    def forward(self, X):
        return X - X.mean()  # 减去均值，让数据中心化layer = CenteredLayer()
print(layer(torch.tensor([1.0, 2.0, 3.0, 4.0, 5.0])))

不带参数的层就是这么简单——只需要实现forward！

我们把这个层放到网络里试试：

import torch
from torch import nnclass CenteredLayer(nn.Module):
    def __init__(self):
        super().__init__()
    
    def forward(self, X):
        return X - X.mean()net = nn.Sequential(nn.Linear(8, 128), CenteredLayer())
Y = net(torch.rand(4, 8))
print(Y.mean())  # 应该接近0

2. 带参数的层：造一个带开关的零件！

我们来造一个带参数的层——就像造一个带开关的零件：

import torch
from torch import nn
from torch.nn import functional as Fclass MyLinear(nn.Module):
    def __init__(self, in_units, units):
        super().__init__()
        self.weight = nn.Parameter(torch.randn(in_units, units))  # 权重参数
        self.bias = nn.Parameter(torch.randn(units,))           # 偏置参数
    
    def forward(self, X):
        # 注意：前向传播中必须直接使用 self.weight 和 self.bias
        # 千万不要写成 self.weight.data，否则会切断计算图，导致无法反向传播计算梯度！
        linear = torch.matmul(X, self.weight) + self.bias
        return F.relu(linear)# 测试一下
linear = MyLinear(5, 3)
print(linear.weight)# 前向传播
print(linear(torch.rand(2, 5)))# 放到Sequential里
net = nn.Sequential(MyLinear(64, 8), MyLinear(8, 1))
print(net(torch.rand(2, 64)))

带参数的层需要：

在__init__里用nn.Parameter定义参数
在forward里用这些参数计算

五、读写文件：保存和加载你的电脑！

1. 加载和保存张量：保存一个硬件！

先从简单的开始——保存和加载张量：

import torch# 保存张量
x = torch.tensor([3.0])
torch.save(x, 'x-file')# 加载张量
x2 = torch.load('x-file')
print(x2)

保存和加载一个张量列表：

import torchx = torch.tensor([3.0])
y = torch.tensor([4.0])
torch.save([x, y], 'x-files')
x2, y2 = torch.load('x-files')
print(x2, y2)

保存和加载一个字典：

import torchx = torch.tensor([3.0])
y = torch.tensor([4.0])
mydict = {'x': x, 'y': y}
torch.save(mydict, 'mydict')
mydict2 = torch.load('mydict')
print(mydict2)

2. 加载和保存模型参数：保存你的整机配置！

保存整个模型的参数——就像保存你电脑的整机配置：

import torch
from torch import nn
from torch.nn import functional as Fclass MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(20, 256)
        self.output = nn.Linear(256, 10)
    
    def forward(self, x):
        return self.output(F.relu(self.hidden(x)))# 创建网络并前向传播
net = MLP()
X = torch.randn(size=(2, 20))
Y = net(X)# 保存模型参数
torch.save(net.state_dict(), 'mlp.params')

加载模型参数——就像用配置文件组装一台一样的电脑：

import torch
from torch import nn
from torch.nn import functional as Fclass MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(20, 256)
        self.output = nn.Linear(256, 10)
    
    def forward(self, x):
        return self.output(F.relu(self.hidden(x)))# 必须先创建网络结构（和保存时一样）
clone = MLP()
clone.load_state_dict(torch.load('mlp.params'))
clone.eval()  # 设为评估模式# 验证一下输出是不是一样的
X = torch.randn(size=(2, 20))
Y_clone = clone(X)
print(Y_clone)

注意：

保存的是参数，不是整个模型
加载时必须先创建结构一样的网络
eval()是设为评估模式（不用dropout等）

六、GPU：装上高性能显卡，速度飞起！

1. 计算设备：看看你有没有显卡！

先看看你有哪些计算设备：

import torch# 查看有没有GPU
print(torch.device('cpu'))
print(torch.cuda.device_count())  # 有几块GPU
print(torch.cuda.is_available())  # GPU可用吗？

选择设备：

# 选择GPU 0，如果有的话，否则用CPU
def try_gpu(i=0):
    if torch.cuda.device_count() >= i + 1:
        return torch.device(f'cuda:{i}')
    return torch.device('cpu')# 选择所有可用的GPU
def try_all_gpus():
    devices = [torch.device(f'cuda:{i}')
             for i in range(torch.cuda.device_count())]
    return devices if devices else [torch.device('cpu')]print(try_gpu())
print(try_gpu(10))
print(try_all_gpus())

2. 张量与GPU：把数据搬到显卡上！

在GPU上创建张量：

import torch# 选择GPU 0，如果有的话，否则用CPU
def try_gpu(i=0):
    if torch.cuda.device_count() >= i + 1:
        return torch.device(f'cuda:{i}')
    return torch.device('cpu')# 在GPU 0上创建张量
X = torch.ones(2, 3, device=try_gpu())
print(X)

把张量从CPU搬到GPU：

import torchdef try_gpu(i=0):
    if torch.cuda.device_count() >= i + 1:
        return torch.device(f'cuda:{i}')
    return torch.device('cpu')# 在CPU上创建
Z = torch.tensor([1, 2, 3])
print(Z.device)# 搬到GPU
if torch.cuda.is_available():
    Z_gpu = Z.cuda(0)
    print(Z_gpu.device)    # 或者用to方法
    Z_gpu2 = Z.to('cuda:0')
    print(Z_gpu2.device)

注意：

只有在同一个设备上的张量才能运算
如果X在GPU 0，Y在GPU 1，不能直接相加！

在GPU上运算：

import torchdef try_gpu(i=0):
    if torch.cuda.device_count() >= i + 1:
        return torch.device(f'cuda:{i}')
    return torch.device('cpu')X = torch.ones(2, 3, device=try_gpu())
Y = torch.rand(2, 3, device=try_gpu())
print(X + Y)  # 都在GPU 0上，可以运算

3. 神经网络与GPU：把模型搬到显卡上！

把网络搬到GPU：

import torch
from torch import nndef try_gpu(i=0):
    if torch.cuda.device_count() >= i + 1:
        return torch.device(f'cuda:{i}')
    return torch.device('cpu')# 创建网络，然后搬到GPU
net = nn.Sequential(nn.Linear(3, 1))
net = net.to(device=try_gpu())# 输入数据也得在GPU上
X = torch.ones(2, 3, device=try_gpu())
print(net(X))# 看看模型参数在哪个设备上
print(net[0].weight.data.device)

记住：

模型和数据必须在同一个设备上
建议：先选好设备，然后把模型和数据都搬到那个设备上

七、小结：从"基础用户"到"高级用户"！

今天我们学会了：

层和块：
- 层是零件，块是组装好的整机
- 自定义块：继承nn.Module，实现forward
- Sequential：简单的顺序块
- 块可以嵌套，可以灵活组装
参数管理：
- 访问参数：state_dict、named_parameters
- 初始化参数：内置方法、自定义方法
- 参数绑定：多个层共享同一个参数
延后初始化：
- LazyLinear：第一次前向传播时才初始化
- 不用手动计算输入维度
自定义层：
- 不带参数：只实现forward
- 带参数：用nn.Parameter定义参数
读写文件：
- 保存/加载张量：torch.save、torch.load
- 保存/加载模型参数：state_dict
- 加载时需要先创建相同结构的网络
GPU加速：
- 查看设备：torch.device、cuda.is_available()
- 张量在GPU上：device参数、to方法、cuda方法
- 模型在GPU上：net.to(device)
- 模型和数据必须在同一个设备