Ubuntu PyTorch调试模型如何操作

作者：袖梨 2026-07-03

在Ubuntu上使用PyTorch调试模型时，可以遵循以下步骤：

1. 环境准备

安装Python和PyTorch：

sudo apt updatesudo apt install python3 python3-pippip3 install torch torchvision torchaudio

安装调试工具：
- pdb：Python的内置调试器。
- ipdb：基于IPython的增强版调试器。
- torch.autograd.set_detect_anomaly(True)：启用PyTorch的自动异常检测。

2. 编写调试代码

使用`pdb`

在代码中插入pdb.set_trace()来设置断点。

import torchimport torch.nn as nnclass SimpleModel(nn.Module):def __init__(self):super(SimpleModel, self).__init__()self.fc = nn.Linear(10, 5)def forward(self, x):x = self.fc(x)pdb.set_trace()# 设置断点return xmodel = SimpleModel()input_tensor = torch.randn(1, 10)output = model(input_tensor)

使用`ipdb`

同样插入ipdb.set_trace()，但会提供更好的交互体验。

import ipdb# ... 同上 ...def forward(self, x):x = self.fc(x)ipdb.set_trace()# 设置断点return x

3. 运行代码并调试

在终端中运行你的Python脚本：

python3 your_script.py

程序会在断点处暂停，你可以使用以下命令进行调试：

n：下一步
s：进入函数
c：继续执行
l：显示当前代码位置
p <variable>：打印变量值
q：退出调试

4. 使用日志记录

在关键步骤添加日志输出，帮助理解程序的执行流程和数据状态。

import logginglogging.basicConfig(level=logging.DEBUG)# ... 同上 ...def forward(self, x):logging.debug(f"Input shape: {x.shape}")x = self.fc(x)logging.debug(f"Output shape: {x.shape}")return x

5. 可视化工具

使用TensorBoard或其他可视化工具来监控模型的训练过程和性能指标。

from torch.utils.tensorboard import SummaryWriterwriter = SummaryWriter('runs/simple_experiment')# ... 在训练循环中使用writer记录数据 ...

6. 单元测试

编写单元测试来验证模型的各个部分是否按预期工作。

import unittestclass TestSimpleModel(unittest.TestCase):def test_forward(self):model = SimpleModel()input_tensor = torch.randn(1, 10)output = model(input_tensor)self.assertEqual(output.shape, (1, 5))if __name__ == '__main__':unittest.main()

7. 使用PyTorch Profiler

分析模型的性能瓶颈，优化计算图。

from torch.profiler import profile, record_function, ProfilerActivitywith profile(activities=[ProfilerActivity.CPU], record_shapes=True) as prof:with record_function("model_inference"):output = model(input_tensor)print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=10))