{ "cells": [ { "cell_type": "markdown", "id": "64715782-d3fc-42dd-8d82-9b97bbf8c7bf", "metadata": {}, "source": [ "

研究生《深度学习》课程
实验报告

\n", "
\n", "
课程名称:深度学习 M502019B
\n", "
实验题目:卷积神经网络实验
\n", "
学号:25120323
\n", "
姓名:柯劲帆
\n", "
授课老师:原继东
\n", "
报告日期:2025年8月13日
\n", "
" ] }, { "cell_type": "code", "execution_count": 1, "id": "74dbbe2c-7b00-40c7-964b-bd01e2835292", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Pytorch version: 2.7.1+cu118\n", "CUDA version: 11.8\n", "CUDA device count: 1\n", "CUDA device name: NVIDIA TITAN Xp\n", "CUDA device capability: (6, 1)\n", "CUDA device memory: 11.90 GB\n", "CPU count: 8\n" ] } ], "source": [ "import os\n", "import numpy as np\n", "import torch\n", "from torch.autograd import Variable\n", "from torch.utils.data import Dataset, DataLoader, Subset, random_split\n", "from torch import nn\n", "from torchvision import datasets, transforms\n", "from PIL import Image\n", "from multiprocessing import cpu_count\n", "import matplotlib.pyplot as plt\n", "from tqdm.notebook import tqdm\n", "import pandas as pd\n", "import collections\n", "from typing import Literal, Union, Optional, List\n", "\n", "print('Pytorch version:',torch.__version__)\n", "if not torch.cuda.is_available():\n", " print('CUDA is_available:', torch.cuda.is_available())\n", "else:\n", " print('CUDA version:', torch.version.cuda)\n", " print('CUDA device count:', torch.cuda.device_count())\n", " print('CUDA device name:', torch.cuda.get_device_name())\n", " print('CUDA device capability:', torch.cuda.get_device_capability())\n", " print('CUDA device memory:', f'{torch.cuda.get_device_properties(0).total_memory/1024/1024/1024:.2f}', 'GB')\n", "print('CPU count:', cpu_count())\n", "\n", "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n", "seed = 42\n", "np.random.seed(seed)\n", "torch.manual_seed(seed)\n", "torch.cuda.manual_seed(seed)\n", "cpu_count = cpu_count()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "1195679d-2174-425f-ab51-86b9ce66dc5c", "metadata": {}, "source": [ "# 1. 二维卷积实验\n", "\n", "- 手写二维卷积的实现,并在至少一个数据集上进行实验,从训练时间、预测精度、Loss变化等角度分析实验结果(最好使用图表展示)(只用循环几轮即可)\n", "- 使用torch.nn实现二维卷积,并在至少一个数据集上进行实验,从训练时间、预测精度、Loss变化等角度分析实验结果(最好使用图表展示)\n", "- 不同超参数的对比分析(包括卷积层数、卷积核大小、batchsize、lr等)选其中至少1-2个进行分析\n", "- 使用PyTorch实现经典模型AlexNet并在至少一个数据集进行试验分析" ] }, { "cell_type": "code", "execution_count": 2, "id": "58d823c9-e690-4a63-bee6-49f9b1485e90", "metadata": {}, "outputs": [], "source": [ "torch.cuda.empty_cache()" ] }, { "cell_type": "markdown", "id": "c659cade-b5aa-4530-9537-bfbe9d4d35cf", "metadata": {}, "source": [ "创建数据集。\n", "\n", "**车辆分类数据集**\n", "\n", "- 输入图片,输出对应的类别\n", "- 共1358张车辆图片\n", "- 分别属于汽车、客车和货车三类\n", " - 汽车:779张\n", " - 客车:218张\n", " - 货车:360张\n", "- 每个类别随机取20-30%当作测试集\n", "- 各图片的大小不一,需要将图片拉伸到相同大小\n", "\n", "对于原数据集进行`8:2`划分处理。将各个类别的数据分别进行划分。\n", "\n", "这里已经将数据集划分完毕,将各部分数据的路径和列表保存在csv文件中。划分代码`dataset/Vehicles/split_dataset.py`内容如下:\n", "\n", "```python\n", "import os\n", "import random\n", "import pandas as pd\n", "\n", "train_list = list()\n", "test_list = list()\n", "\n", "root_dir = \"raw\"\n", "class_index = 0\n", "for vehicle in os.listdir(root_dir):\n", " img_list = [i for i in os.listdir(os.path.join(root_dir, vehicle)) if i.endswith(\".jpg\")]\n", " random.shuffle(img_list)\n", " split_num = int(len(img_list) * 0.8)\n", " for img in img_list[0 : split_num]:\n", " train_list.append([os.path.join(root_dir, vehicle, img), class_index])\n", " for img in img_list[split_num : ]:\n", " test_list.append([os.path.join(root_dir, vehicle, img), class_index])\n", " class_index += 1\n", "\n", "train_list.sort()\n", "test_list.sort()\n", "\n", "pd.DataFrame(data=train_list, columns=[\"Vehicle\", \"Label\"]).to_csv(\"./train.csv\", index=False)\n", "pd.DataFrame(data=test_list, columns=[\"Vehicle\", \"Label\"]).to_csv(\"./test.csv\", index=False)\n", "```" ] }, { "cell_type": "code", "execution_count": 3, "id": "0209565b-6409-4615-9f4e-d015a0fbe839", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Vehicle Train Dataset Size: 1085\n", "Vehicle Test Dataset Size: 272\n", "A Train Sample:\n", "\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAIcAAACdCAYAAACeqmv3AAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjUsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvWftoOwAAAAlwSFlzAAAPYQAAD2EBqD+naQAACWRJREFUeJzt3VtIFV0bB/D/fH6mIpEoWiBhiAYGBpFUiJIdQCIJg+gmqAi6iIIQKuqi9KoIk8IMCqITXYpFVHSV3oRoEUlGlolCRphmmREZstd387nf7YzPHNfsQ+//B17s2WvPrL15XMdZawyllALRAv6T6AxQ8mJwkIjBQSIGB4kYHCRicJCIwUEiBgeJGBwk+quDY2RkBIZh4MKFC9rO2dXVBcMw0NXVpe2cySrpguPWrVswDAMvXrxIdFZC8e7dOzQ0NKCyshKZmZkwDAMjIyOJztaCki44/nbd3d1obW3F9PQ0ysrKEp0dWwyOONuxYwe+f/+O169fY8+ePYnOjq2UDI4/f/7gzJkzWLt2LZYsWYLs7GxUV1ejs7NT/MzFixdRVFSErKwsbNy4Ef39/ZY0AwMD2LVrF3Jzc5GZmYmKigo8ePDAMT+/fv3CwMAAJiYmHNPm5uZi8eLFjumSQUoGx48fP3D9+nXU1NTg/PnzaGpqwvj4OGpra/Hq1StL+jt37qC1tRWHDx/GqVOn0N/fj82bN2NsbCya5s2bN9iwYQPevn2LkydPoqWlBdnZ2aivr8e9e/ds89Pb24uysjK0tbXp/qqJpZLMzZs3FQD1/PlzMc3s7KyamZmZd+zbt29q6dKl6sCBA9Fjw8PDCoDKyspSo6Oj0eM9PT0KgGpoaIge27JliyovL1e/f/+OHotEIqqyslKVlpZGj3V2dioAqrOz03KssbHR03dtbm5WANTw8LCnz8VLSpYcaWlpWLRoEQAgEolgcnISs7OzqKiowMuXLy3p6+vrUVhYGH29bt06rF+/Ho8fPwYATE5O4unTp9i9ezemp6cxMTGBiYkJfP36FbW1tRgcHMSnT5/E/NTU1EAphaamJr1fNMFSMjgA4Pbt21i9ejUyMzORl5eH/Px8PHr0CFNTU5a0paWllmMrV66MdiE/fPgApRROnz6N/Pz8eX+NjY0AgC9fvoT6fZLRfxOdAT/u3r2L/fv3o76+HsePH0dBQQHS0tJw7tw5DA0NeT5fJBIBABw7dgy1tbULpikpKQmU51SUksHR3t6O4uJidHR0wDCM6PG5/3KzwcFBy7H3799jxYoVAIDi4mIAQHp6OrZu3ao/wykqJauVtLQ0AICKuTe6p6cH3d3dC6a/f//+vDZDb28venp6sG3bNgBAQUEBampqcO3aNXz+/Nny+fHxcdv8eOnKppKkLTlu3LiBJ0+eWI4fPXoUdXV16OjowM6dO7F9+3YMDw/j6tWrWLVqFX7+/Gn5TElJCaqqqnDo0CHMzMzg0qVLyMvLw4kTJ6Jprly5gqqqKpSXl+PgwYMoLi7G2NgYuru7MTo6ir6+PjGvvb292LRpExobGx0bpVNTU7h8+TIA4NmzZwCAtrY25OTkICcnB0eOHHHz88RHgntLFnNdWenv48ePKhKJqLNnz6qioiKVkZGh1qxZox4+fKj27dunioqKouea68o2NzerlpYWtXz5cpWRkaGqq6tVX1+f5dpDQ0Nq7969atmyZSo9PV0VFhaquro61d7eHk0TtCs7l6eF/mLzngwMpbhuhRaWkm0Oig8GB4kYHCRicJCIwUEiBgeJGBwkcj1Cahge4siwec88qmKT1u40QSjxxYIH4s4uB5p+WrgZ3mLJQSIGB4n0TLwZti/t3zRiD5iKupBKeEN8YT2Q7FVQLPNXCToxwpKDRAwOEmmpVgL1KrRNCsvVk9/Wv6drWsp0u8/Zf2ebinbea8e8253IBZYcJGJwkIjBQSL/bY6Y+szLyJy3a5jOZNs+kd+z5sd9Do3Y83r5Yuasz3vhqYEicvrdAzY5WHKQjMFBonC6srHVgVNXNSatU6mtXJ7XuQj9J4W3KtD/LKERk18vtYqXCsiumvFUQ/8fSw4SMThIxOAgkf82R0ydZa5D50WcubLzcgnzJK3foXabitvTGQ2b1EFmAfw0CFzwNNS+AJYcJGJwkIjBQSItw+e6hsutbYyIuww4nthXdsI7j1PWbcZz/A6Jc/ictGJwkCjhO/vEdk+9dVXlwWIFuSj2LoTVM56mseV+uJ65XRlLDhIxOEjE4CCRnuFzc20X0jol+88GmcL3f1W3DLuGhbcxfL8f9IwlB4kYHCRicJBIz22CpilnFYkZu9BxAa8ScVGH4RBLuyyEy+ge92DJQSIGB4ncVys2e3DYrxl2GCv2OTrtZcQ5yKIrw2ZY3u4aYbGbYTDfUMZFTRQaBgeJGBwk8t2Vnb8w2LRZSpI9pUPbqjYP0+dmvn8RTT+ln24uSw4SMThIpGlW1k44+xCHV3HpuaFXXxY89F01Y8lBIgYHiRgcJHLf5rDb2tnynpc9t9wLbz9Rd+JxDcBD28Z8l52emYkolhwkYnCQiMFBIg/jHOYhcj0Z8L3BiOWudi8Z8rcKPF5tDvvBFfd3sSt/XzOKJQeJGBwkSvhCat/7c+laHmXZFNTLFX1Wio5TuMkxq82Sg0QMDhIxOEjkoc0xv6Kc9wxiy51gAXKkRdjbmkjXcfeWc3bctl+cThSs882Sg0QMDhIxOEjke5wjLrfP2TYdAjxTK4wce7mEruyYbhPUPbzPkoNEDA4S+Z6VDfJIq8SLw96iXtL6rGbC/plZcpCIwUEiBgeJEj9l74XfSjbhw/kOfN7E5nQa3n1OoWFwkCicaiXoYwl1Xj/VBbjBjE+HpNAwOEjE4CCR/z3B3N7ulfL1f4KXb9s9BNth4TT3IaXQMDhIxOAgUSjjHPMeqOyUOKXbJPF6RLW7pPZ70Lu/xByWHCRicJAolGrFU+dP10MTw2K30NrLwiVNvV5Pa6Xsxs9d/LgsOUjE4CARg4NEWtocQW68jmU3/Gv+bKB1yl74bSuENLKuaw81N1hykIjBQSIGB4n0jHOEsCepY9qkX1Wnic14iePCuYBz9iw5SMTgIJGm4fMEDHTbbPsc5OlWid/PzD+7asZSDXP4nIJgcJCIwUEiTW2OeO37aeefayZ/u8F/oyiejzJlyUEiBgeJtFQr5q5j8hfrySWsB0sHvBGMJQfJGBwkYnCQKKRFTXIl6noB9l9Hf8PCSzuCC6lJKwYHiRgcJIr7PqTB2iP/ltu//NE9icGSg0QMDhIl1fbW5irnr+31mqcb7N/Wchl2ZUkrBgeJGBwkMtS/dzybHLDkIBGDg0QMDhIxOEjE4CARg4NEDA4SMThIxOAg0f8AFhzOq9bpiZQAAAAASUVORK5CYII=", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "{'Image Type': , 'Image Shape': torch.Size([3, 32, 32]), 'Label Type': , 'Label Value': 1}\n" ] } ], "source": [ "class Vehicle(Dataset):\n", " def __init__(self, root: str=\"./dataset\", train: bool=True, transform=None):\n", " root = os.path.join(root, \"Vehicles\")\n", " csv_file = os.path.join(root, \"train.csv\" if train else \"test.csv\")\n", " self.data = pd.read_csv(csv_file)\n", " self.root = root\n", " self.transform = transform\n", "\n", " def __len__(self):\n", " return len(self.data)\n", " \n", " def __getitem__(self, index):\n", " row = self.data.iloc[index]\n", " img_name, label = row['Vehicle'], row['Label']\n", " img_path = os.path.join(self.root, img_name)\n", " image = Image.open(img_path)\n", " label = int(label)\n", " if self.transform:\n", " image = self.transform(image)\n", " return image, label\n", "\n", "image_size = 32\n", "transform = transforms.Compose(\n", " [\n", " transforms.ToTensor(),\n", " transforms.Resize((image_size, image_size), antialias=True),\n", " transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),\n", " ]\n", ")\n", "train_vehicle_dataset = Vehicle(root=\"./dataset\", train=True, transform=transform)\n", "test_vehicle_dataset = Vehicle(root=\"./dataset\", train=False, transform=transform)\n", "\n", "print('Vehicle Train Dataset Size:', len(train_vehicle_dataset))\n", "print('Vehicle Test Dataset Size:', len(test_vehicle_dataset))\n", "\n", "image, label = train_vehicle_dataset[0]\n", "sample = {\n", " 'Image Type': type(image),\n", " 'Image Shape': image.shape,\n", " 'Label Type': type(label),\n", " 'Label Value': label\n", "}\n", "print('A Train Sample:\\n')\n", "plt.figure(figsize=(1.5, 1.5))\n", "plt.imshow(image.permute(1, 2, 0).numpy().astype(np.uint8) / 255)\n", "plt.title(f\"Label: {label}\")\n", "plt.axis('off')\n", "plt.show()\n", "print(sample)\n", "\n", "num_classes = 3" ] }, { "cell_type": "markdown", "id": "a6917da1-4db8-4dba-b140-41d89e25ff4d", "metadata": {}, "source": [ "定义多分类任务的trainer。" ] }, { "cell_type": "code", "execution_count": 4, "id": "128a7e24-939d-4374-90d2-3f1a1fa682a7", "metadata": {}, "outputs": [], "source": [ "class MultiCLSTrainer():\n", " def __init__(\n", " self,\n", " model,\n", " train_dataset: Union[Dataset, DataLoader],\n", " eval_dataset: Union[Dataset, DataLoader],\n", " learning_rate: float,\n", " num_epochs: int,\n", " batch_size: int,\n", " weight_decay: float = 0.0,\n", " adam_beta1: float = 0.9,\n", " adam_beta2: float = 0.999,\n", " test_dataset: Union[Dataset, DataLoader] = None,\n", " plot: bool = True, \n", " print_test_result: bool = True,\n", " logging_steps: int = 1,\n", " eval_steps: int = 1,\n", " print_log_epochs: int = 1,\n", " print_eval: bool = True\n", " ):\n", " self.model = model\n", " self.learning_rate = learning_rate\n", " self.num_epochs = num_epochs\n", " self.batch_size = batch_size\n", " self.plot = plot\n", " self.print_test_result = print_test_result\n", " self.logging_steps = logging_steps\n", " self.eval_steps = eval_steps\n", " self.print_log_epochs = print_log_epochs\n", " self.print_eval = print_eval\n", " \n", " if isinstance(train_dataset, Dataset):\n", " self.train_dataloader = DataLoader(\n", " dataset=train_dataset, batch_size=batch_size, shuffle=True, \n", " num_workers=cpu_count-1, pin_memory=True\n", " )\n", " else:\n", " self.train_dataloader = train_dataset\n", " if isinstance(eval_dataset, Dataset):\n", " self.eval_dataloader = DataLoader(\n", " dataset=eval_dataset, batch_size=batch_size, shuffle=True, \n", " num_workers=cpu_count-1, pin_memory=True\n", " )\n", " else:\n", " self.eval_dataloader = eval_dataset\n", " if isinstance(test_dataset, Dataset):\n", " self.test_dataloader = DataLoader(\n", " dataset=test_dataset, batch_size=batch_size, shuffle=True, \n", " num_workers=cpu_count-1, pin_memory=True\n", " )\n", " else:\n", " self.test_dataloader = test_dataset\n", "\n", " self.total_train_steps = self.num_epochs * len(self.train_dataloader)\n", "\n", " self.optimizer = torch.optim.AdamW(\n", " model.parameters(), lr=learning_rate, \n", " weight_decay=weight_decay, betas=(adam_beta1, adam_beta2)\n", " )\n", " self.criterion = nn.CrossEntropyLoss()\n", "\n", " def train(self):\n", " train_loss_curve = []\n", " eval_loss_curve = []\n", " eval_acc_curve = []\n", " step = 0\n", " with tqdm(total=self.total_train_steps) as pbar:\n", " for epoch in range(self.num_epochs):\n", " total_train_loss = 0\n", " for x, targets in self.train_dataloader:\n", " x = x.to(device=device, dtype=torch.float32)\n", " targets = targets.to(device=device, dtype=torch.long)\n", "\n", " self.optimizer.zero_grad()\n", " output = self.model(x)\n", " loss = self.criterion(output, targets)\n", " total_train_loss += loss.item()\n", " if (step + 1) % self.logging_steps == 0:\n", " train_loss_curve.append((step + 1, loss.item()))\n", " \n", " loss.backward()\n", " self.optimizer.step()\n", " step += 1\n", " pbar.update(1)\n", "\n", " if self.eval_steps > 0 and (step + 1) % self.eval_steps == 0:\n", " avg_eval_loss, avg_eval_acc = self.eval()\n", " eval_loss_curve.append((step + 1, avg_eval_loss))\n", " eval_acc_curve.append((step + 1, avg_eval_acc))\n", " eval_info = {\n", " 'Epoch': f'{(step + 1) / len(self.train_dataloader):.1f}/{self.num_epochs}',\n", " 'Total Valid Loss': f'{avg_eval_loss:.2f}',\n", " 'Avg Valid Acc': f'{avg_eval_acc:.2%}'\n", " }\n", " if self.print_eval:\n", " print(eval_info)\n", " if self.print_log_epochs > 0 and (epoch + 1) % self.print_log_epochs == 0:\n", " log_info = {\n", " 'Epoch': f'{(step + 1) / len(self.train_dataloader):.1f}/{self.num_epochs}',\n", " 'Total Train Loss': f'{total_train_loss:.2f}'\n", " }\n", " print(log_info)\n", "\n", " return_info = {}\n", " if self.test_dataloader:\n", " test_acc = self.test()\n", " if self.print_test_result:\n", " print('Avg Test Acc:', f'{test_acc:.2%}')\n", " return_info['test_acc'] = test_acc\n", " if self.plot:\n", " self.plot_results(train_loss_curve, eval_loss_curve, eval_acc_curve)\n", " return_info['curves'] = {\n", " 'train_loss_curve': train_loss_curve,\n", " 'eval_loss_curve': eval_loss_curve,\n", " 'eval_acc_curve': eval_acc_curve\n", " }\n", " return return_info\n", "\n", " def eval(self):\n", " total_eval_loss = 0\n", " total_eval_acc = 0\n", " total_eval_samples = 0\n", " with torch.inference_mode():\n", " for x, targets in self.eval_dataloader:\n", " x = x.to(device=device, dtype=torch.float32)\n", " targets = targets.to(device=device, dtype=torch.long)\n", " output = self.model(x)\n", " loss = self.criterion(output, targets)\n", " total_eval_loss += loss.item()\n", " preds = nn.functional.softmax(output, dim=1).argmax(dim=1)\n", " total_eval_acc += (preds == targets).float().sum().item()\n", " total_eval_samples += targets.numel()\n", " avg_eval_loss = total_eval_loss / len(self.eval_dataloader)\n", " avg_eval_acc = total_eval_acc / total_eval_samples\n", " return avg_eval_loss, avg_eval_acc\n", "\n", " def test(self):\n", " total_test_acc = 0\n", " total_test_samples = 0\n", " with torch.inference_mode():\n", " for x, targets in self.test_dataloader:\n", " x = x.to(device=device, dtype=torch.float32)\n", " targets = targets.to(device=device, dtype=torch.long)\n", " output = self.model(x)\n", " preds = nn.functional.softmax(output, dim=1).argmax(dim=1)\n", " total_test_acc += (preds == targets).float().sum().item()\n", " total_test_samples += targets.numel()\n", " avg_test_acc = total_test_acc / total_test_samples\n", " return avg_test_acc\n", " \n", " def plot_results(self, train_loss_curve, eval_loss_curve, eval_acc_curve):\n", " fig, axes = plt.subplots(1, 2, figsize=(10, 4))\n", "\n", " train_log_steps, train_losses = zip(*train_loss_curve)\n", " axes[0].plot(train_log_steps, train_losses, label='Training Loss', color='blue')\n", " eval_log_steps, eval_losses = zip(*eval_loss_curve)\n", " axes[0].plot(eval_log_steps, eval_losses, label='Validation Loss', color='orange')\n", " axes[0].set_xlabel('Step')\n", " axes[0].set_ylabel('Loss')\n", " axes[0].set_title('Loss Curve')\n", " axes[0].legend()\n", " axes[0].grid(True, linestyle='--', linewidth=0.5, alpha=0.6)\n", "\n", " eval_log_steps, eval_accuracies = zip(*eval_acc_curve)\n", " axes[1].plot(eval_log_steps, eval_accuracies, label='Validation Accuracy', color='green', marker='o')\n", " axes[1].set_xlabel('Step')\n", " axes[1].set_ylabel('Accuracy')\n", " axes[1].set_title('Validation Accuracy Curve')\n", " axes[1].legend()\n", " axes[1].grid(True, linestyle='--', linewidth=0.5, alpha=0.6)\n", " \n", " plt.tight_layout()\n", " plt.show()" ] }, { "cell_type": "markdown", "id": "87f4b8dd-5718-42b7-823d-6b8414551bc2", "metadata": {}, "source": [ "## 1.1. 题目一\n", "\n", "**手写二维卷积的实现,并在至少一个数据集上进行实验,从训练时间、预测精度、Loss变化等角度分析实验结果(最好使用图表展示)(只用循环几轮即可)**" ] }, { "cell_type": "code", "execution_count": 5, "id": "0e6166df-cb5a-43c4-9456-60041d30e717", "metadata": {}, "outputs": [], "source": [ "torch.cuda.empty_cache()" ] }, { "cell_type": "markdown", "id": "df8196e0-cc68-465d-a05f-0325b376954b", "metadata": {}, "source": [ "在传统的二维卷积中,卷积是通过一个滑动的卷积核进行计算的,这就意味着会有大量的`for`循环,会增加计算的时间复杂度。\n", "\n", "对于拥有良好矩阵运算性能的GPU来说,上面的计算可以进行优化,即:将卷积核转化为矩阵,原图像数据也裁剪成对应的矩阵,叠加起来,这样需要多层`for`循环的卷积运算就可以由一次矩阵运算完成。\n", "\n", "具体运算流程如下:\n", "1. 将原图像进行`padding`操作;\n", "2. 使用`nn.functional.unfold()`将原图像矩阵重塑为`(batch_size, -1, in_channels, kernel_size, kernel_size)`,其中`-1`会被替代为每张图片裁剪成了多少块,等于传统二维卷积的卷积核循环滑动计算次数;\n", "3. 将卷积核重塑为对应图片碎块的卷积核矩阵;\n", "4. 将两者进行矩阵相乘,一次计算完毕,加上偏置`bias`;\n", "5. 重塑相乘结果,转化为正确的输出矩阵。\n", "\n", "代码实现如下。" ] }, { "cell_type": "code", "execution_count": 6, "id": "080d8193-bb86-42b2-a685-fe51b8f7519e", "metadata": {}, "outputs": [], "source": [ "class My_Conv2d(nn.Module):\n", " def __init__(self, in_channels:int, out_channels:int, kernel_size:int, padding:int=0, bias=True):\n", " super(My_Conv2d, self).__init__()\n", " self.has_bias = bias\n", " self.in_channels = in_channels\n", " self.out_channels = out_channels\n", " self.kernel_size = kernel_size\n", " self.padding = padding\n", " self.weight = nn.Parameter(torch.Tensor(out_channels, in_channels, kernel_size, kernel_size))\n", " nn.init.xavier_uniform_(self.weight)\n", " if self.has_bias:\n", " self.bias = nn.Parameter(torch.zeros(out_channels, requires_grad=True, dtype=torch.float32))\n", "\n", " def forward(self, x):\n", " batch_size, _, input_height, input_width = x.shape\n", " if self.padding > 0:\n", " x = nn.functional.pad(x, (self.padding, self.padding, self.padding, self.padding))\n", " x = nn.functional.unfold(x, kernel_size=self.kernel_size)\n", " x = x.permute(0, 2, 1).contiguous()\n", " weight_unfold = self.weight.view(self.out_channels, -1).t()\n", " x = torch.matmul(x, weight_unfold)\n", " if self.has_bias:\n", " x += self.bias\n", " output_height = input_height + 2 * self.padding - self.kernel_size + 1\n", " output_width = input_width + 2 * self.padding - self.kernel_size + 1\n", " x = x.view(batch_size, output_height, output_width, self.out_channels).permute(0, 3, 1, 2).contiguous()\n", " return x\n", "\n", "\n", "class Model_1_1(nn.Module):\n", " def __init__(self, image_size: int, num_classes=3):\n", " super(Model_1_1, self).__init__()\n", " self.net = nn.Sequential(collections.OrderedDict([\n", " ('conv1', My_Conv2d(in_channels=3, out_channels=128, kernel_size=3, padding=1, bias=False)),\n", " ('bn1', nn.BatchNorm2d(128)),\n", " ('relu1', nn.ReLU(inplace=True)),\n", " ('conv2', My_Conv2d(in_channels=128, out_channels=512, kernel_size=3, padding=1, bias=False)),\n", " ('bn2', nn.BatchNorm2d(512)),\n", " ('relu2', nn.ReLU(inplace=True)),\n", " ('pool', nn.AvgPool2d(image_size)),\n", " ('flatten', nn.Flatten()),\n", " ('fc', nn.Linear(in_features=512, out_features=num_classes))\n", " ]))\n", "\n", " def forward(self, x):\n", " return self.net(x)" ] }, { "cell_type": "markdown", "id": "b63eb5a9-dea0-4e04-870d-2b48e59cd721", "metadata": {}, "source": [ "运行测试。" ] }, { "cell_type": "code", "execution_count": 7, "id": "ddceee2f-7084-4ae8-9c73-adb4b18b8f32", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "9950ac2fadc44ba6b6f369265d354bc0", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/500 [00:00" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "training_args = {\n", " 'train_dataset': train_vehicle_dataset,\n", " 'eval_dataset': test_vehicle_dataset,\n", " 'learning_rate': 2.0e-4,\n", " 'num_epochs': 100,\n", " 'batch_size': 256,\n", " 'weight_decay': 0.01,\n", " 'logging_steps': 3,\n", " 'eval_steps': 50,\n", " 'print_log_epochs': 0\n", "}\n", "\n", "model = Model_1_1(image_size=image_size, num_classes=num_classes).to(device)\n", "trainer = MultiCLSTrainer(model=model, **training_args)\n", "_ = trainer.train()" ] }, { "cell_type": "markdown", "id": "08c6d960-9d0e-4282-9c3f-b493140c14ae", "metadata": {}, "source": [ "模型能够正常收敛并且达到$90\\%$以上的准确率。" ] }, { "cell_type": "markdown", "id": "29a8a92f-d610-44c1-883a-64d253afd351", "metadata": {}, "source": [ "## 题目二\n", "\n", "**使用torch.nn实现二维卷积,并在至少一个数据集上进行实验,从训练时间、预测精度、Loss变化等角度分析实验结果(最好使用图表展示)**" ] }, { "cell_type": "code", "execution_count": 8, "id": "9b2e4cf3-297a-480a-a8c3-5b9a1a391d25", "metadata": {}, "outputs": [], "source": [ "torch.cuda.empty_cache()" ] }, { "cell_type": "markdown", "id": "2f8a9d69-cacf-4481-b17e-970d6c4745e4", "metadata": {}, "source": [ "使用上面定义的二维卷积进行车辆分类的训练和预测。\n", "\n", "同时,使用`nn.Conv2d`组建相同结构的模型,与手写二维卷积组建的模型进行比较。" ] }, { "cell_type": "code", "execution_count": 9, "id": "ad4d1aac-32e3-456b-b905-1708c72fe989", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "1113bcd152a0499b90f9530a09eb1836", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/500 [00:00" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "class Model_1_2(nn.Module):\n", " def __init__(self, image_size: int, num_classes=3):\n", " super(Model_1_2, self).__init__()\n", " self.net = nn.Sequential(collections.OrderedDict([\n", " ('conv1', nn.Conv2d(in_channels=3, out_channels=128, kernel_size=3, padding=1, bias=False)),\n", " ('bn1', nn.BatchNorm2d(128)),\n", " ('relu1', nn.ReLU(inplace=True)),\n", " ('conv2', nn.Conv2d(in_channels=128, out_channels=512, kernel_size=3, padding=1, bias=False)),\n", " ('bn2', nn.BatchNorm2d(512)),\n", " ('relu2', nn.ReLU(inplace=True)),\n", " ('pool', nn.AvgPool2d(image_size)),\n", " ('flatten', nn.Flatten()),\n", " ('fc', nn.Linear(in_features=512, out_features=num_classes))\n", " ]))\n", "\n", " def forward(self, x):\n", " return self.net(x)\n", "\n", "\n", "training_args = {\n", " 'train_dataset': train_vehicle_dataset,\n", " 'eval_dataset': test_vehicle_dataset,\n", " 'learning_rate': 2.0e-4,\n", " 'num_epochs': 100,\n", " 'batch_size': 256,\n", " 'weight_decay': 0.1,\n", " 'logging_steps': 3,\n", " 'eval_steps': 50,\n", " 'print_log_epochs': 0\n", "}\n", "\n", "model = Model_1_2(image_size=image_size, num_classes=num_classes).to(device)\n", "trainer = MultiCLSTrainer(model=model, **training_args)\n", "_ = trainer.train()" ] }, { "cell_type": "markdown", "id": "5fc9101e-75f7-4b13-9abf-913e2db4543f", "metadata": {}, "source": [ "很显然,在车辆分类的任务上,手动实现的二维卷积和`nn.Conv2d`都能够完成任务,且准确率相差不大。\n", "\n", "但是`nn.Conv2d`的优化显然比手动实现的好,每个epoch的训练用时和显存占用情况都优于手动实现的二维卷积。" ] }, { "cell_type": "markdown", "id": "c4d35fc5-2af1-40b2-a762-80fc88d2401e", "metadata": {}, "source": [ "## 1.3. 题目三\n", "\n", "**不同超参数的对比分析(包括卷积层数、卷积核大小、batchsize、lr等)选其中至少1-2个进行分析**" ] }, { "cell_type": "code", "execution_count": 10, "id": "8d936883-9a7e-4f38-a361-87c44f2663c7", "metadata": {}, "outputs": [], "source": [ "torch.cuda.empty_cache()" ] }, { "cell_type": "markdown", "id": "05937845-82a2-4261-9e8f-2da9d4f6e78d", "metadata": {}, "source": [ "接下来从**卷积层数**进行对比分析。分别构造具有1、2、3、4个卷积层的模型,进行车辆分类任务的训练和预测。为控制变量,卷积层的输出统一为512个特征,变量为卷积层层数和各卷积层之间out_channels的大小分配。" ] }, { "cell_type": "code", "execution_count": 11, "id": "583c3b5d-f807-4c51-9c6a-3212cf69ecec", "metadata": {}, "outputs": [], "source": [ "class Model_1_3(nn.Module):\n", " def __init__(self, conv_config: list[tuple[int]], image_size: int, num_classes=3):\n", " super(Model_1_3, self).__init__()\n", " assert len(conv_config) >= 1\n", " layers = collections.OrderedDict()\n", " for i, (in_c, out_c, k, s, p, d) in enumerate(conv_config, start=1):\n", " layers[f\"conv{i}\"] = nn.Conv2d(\n", " in_channels=in_c, out_channels=out_c, kernel_size=k, \n", " stride=s, padding=p, dilation=d, bias=False\n", " )\n", " layers[f\"bn{i}\"] = nn.BatchNorm2d(out_c)\n", " layers[f\"relu{i}\"] = nn.ReLU(inplace=True)\n", " layers[\"avgpool\"] = nn.AvgPool2d(image_size)\n", " layers[\"flatten\"] = nn.Flatten()\n", " layers[\"fc\"] = nn.Linear(in_features=512, out_features=num_classes)\n", " self.net = nn.Sequential(layers)\n", "\n", " def forward(self, x):\n", " return self.net(x)" ] }, { "cell_type": "code", "execution_count": 12, "id": "22e14159-fd98-43c3-8a54-3c68a63d7d51", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "模型1(1层卷积)开始训练:\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "9667fd6f7b7d4b77ac575b5d89ff6023", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/500 [00:00" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "conv_configs = [\n", " [(3, 512, 3, 1, 1, 1),],\n", " [(3, 128, 3, 1, 1, 1), (128, 512, 3, 1, 1, 1),],\n", " [(3, 64, 3, 1, 1, 1), (64, 256, 3, 1, 1, 1), (256, 512, 3, 1, 1, 1),],\n", " [(3, 64, 3, 1, 1, 1), (64, 128, 3, 1, 1, 1), (128, 256, 3, 1, 1, 1), (256, 512, 3, 1, 1, 1),],\n", "]\n", "plot_colors = ['blue', 'green', 'orange', 'purple']\n", "\n", "fig, axes = plt.subplots(1, 2, figsize=(7, 3.5))\n", "\n", "axes[0].set_xlabel('Step')\n", "axes[0].set_ylabel('Loss')\n", "axes[0].set_title('Validation Loss Curve')\n", "axes[0].grid(True, linestyle='--', linewidth=0.5, alpha=0.6)\n", "axes[1].set_xlabel('Step')\n", "axes[1].set_ylabel('Accuracy')\n", "axes[1].set_title('Validation Accuracy Curve')\n", "axes[1].grid(True, linestyle='--', linewidth=0.5, alpha=0.6)\n", "\n", "training_args = {\n", " 'train_dataset': train_vehicle_dataset,\n", " 'eval_dataset': test_vehicle_dataset,\n", " 'learning_rate': 2.0e-5,\n", " 'num_epochs': 100,\n", " 'batch_size': 256,\n", " 'weight_decay': 0.1,\n", " 'logging_steps': 3,\n", " 'eval_steps': 50,\n", " 'plot': False,\n", " 'print_log_epochs': 0,\n", " 'print_eval': False\n", "}\n", "\n", "for index, conv_config in enumerate(conv_configs):\n", " model = Model_1_3(conv_config, image_size=image_size, num_classes=num_classes).to(device)\n", " \n", " print(f\"模型{index + 1}({len(conv_config)}层卷积)开始训练:\")\n", " trainer = MultiCLSTrainer(model=model, **training_args)\n", " curves = trainer.train()['curves']\n", "\n", " eval_log_steps, eval_losses = zip(*curves['eval_loss_curve'])\n", " axes[0].plot(\n", " eval_log_steps, eval_losses,\n", " label=f\"conv layers={len(conv_config)}\", color=plot_colors[index]\n", " )\n", " eval_log_steps, eval_accuracies = zip(*curves['eval_acc_curve'])\n", " axes[1].plot(\n", " eval_log_steps, eval_accuracies, \n", " label=f\"conv layers={len(conv_config)}\", color=plot_colors[index]\n", " )\n", "\n", "axes[0].legend()\n", "axes[1].legend()\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "388f5b62-31b5-4b2f-be4b-145095fcd03f", "metadata": {}, "source": [ "模型训练的显存占用、单个epoch的训练/测试时长都随着卷积层的数量增加而增加。\n", "\n", "从曲线上看,模型训练的稳定程度随着卷积层数量的增加而增加。\n", "\n", "当卷积层数量逐渐增加,正确率提高,说明模型的拟合能力也逐渐提高。" ] }, { "cell_type": "markdown", "id": "dbd99d1b-9599-44e3-825a-47013608523a", "metadata": {}, "source": [ "对**卷积核大小**进行比较分析。分别构造卷积核大小为3、5、7、9的模型,进行车辆识别任务的训练和预测。" ] }, { "cell_type": "code", "execution_count": 13, "id": "4bca87f1-f637-4651-a4ed-6256fe1d4a26", "metadata": {}, "outputs": [], "source": [ "torch.cuda.empty_cache()" ] }, { "cell_type": "code", "execution_count": 14, "id": "6edf7de8-45ee-4735-93a1-dae1b711cf0f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "模型1(卷积核大小=3)开始训练:\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "e13bf785dfed43c58c6bb48c23618c40", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/500 [00:00" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "conv_configs = [\n", " [(3, 128, 3, 1, 1, 1), (128, 512, 3, 1, 1, 1),],\n", " [(3, 128, 5, 1, 2, 1), (128, 512, 5, 1, 2, 1),],\n", " [(3, 128, 7, 1, 3, 1), (128, 512, 7, 1, 3, 1),],\n", " [(3, 128, 9, 1, 4, 1), (128, 512, 9, 1, 4, 1),]\n", "]\n", "plot_colors = ['blue', 'green', 'orange', 'purple']\n", "\n", "fig, axes = plt.subplots(1, 2, figsize=(7, 3.5))\n", "\n", "axes[0].set_xlabel('Step')\n", "axes[0].set_ylabel('Loss')\n", "axes[0].set_title('Validation Loss Curve')\n", "axes[0].grid(True, linestyle='--', linewidth=0.5, alpha=0.6)\n", "axes[1].set_xlabel('Step')\n", "axes[1].set_ylabel('Accuracy')\n", "axes[1].set_title('Validation Accuracy Curve')\n", "axes[1].grid(True, linestyle='--', linewidth=0.5, alpha=0.6)\n", "\n", "training_args = {\n", " 'train_dataset': train_vehicle_dataset,\n", " 'eval_dataset': test_vehicle_dataset,\n", " 'learning_rate': 1.0e-5,\n", " 'num_epochs': 100,\n", " 'batch_size': 256,\n", " 'weight_decay': 0.1,\n", " 'logging_steps': 3,\n", " 'eval_steps': 50,\n", " 'plot': False,\n", " 'print_log_epochs': 0,\n", " 'print_eval': False\n", "}\n", "\n", "for index, conv_config in enumerate(conv_configs):\n", " model = Model_1_3(conv_config, image_size=image_size, num_classes=num_classes).to(device)\n", " \n", " print(f\"模型{index + 1}(卷积核大小={conv_config[0][2]})开始训练:\")\n", " trainer = MultiCLSTrainer(model=model, **training_args)\n", " curves = trainer.train()['curves']\n", "\n", " eval_log_steps, eval_losses = zip(*curves['eval_loss_curve'])\n", " axes[0].plot(\n", " eval_log_steps, eval_losses,\n", " label=f\"kernel size={conv_config[0][2]}\", color=plot_colors[index]\n", " )\n", " eval_log_steps, eval_accuracies = zip(*curves['eval_acc_curve'])\n", " axes[1].plot(\n", " eval_log_steps, eval_accuracies, \n", " label=f\"kernel size={conv_config[0][2]}\", color=plot_colors[index]\n", " )\n", "\n", "axes[0].legend()\n", "axes[1].legend()\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "cdff91d8-f1ac-4f1f-81d7-f2d8a82e8c80", "metadata": {}, "source": [ "随着卷积核的增大,训练时长增加,显存占用也增加,性能也增加。这是由于卷积核增大,参数量增加的结果。" ] }, { "cell_type": "markdown", "id": "2b049ba1-b908-4e9b-ac37-3db4abeb5df2", "metadata": {}, "source": [ "## 1.4. 题目四\n", "\n", "**使用PyTorch实现经典模型AlexNet并在至少一个数据集进行试验分析**" ] }, { "cell_type": "code", "execution_count": 15, "id": "f91714e7-a95c-496f-94a9-5a170e4df6c6", "metadata": {}, "outputs": [], "source": [ "torch.cuda.empty_cache()" ] }, { "cell_type": "markdown", "id": "9b52cd44-23ef-43a7-9f19-b12193cd430b", "metadata": {}, "source": [ "构建AlexNet网络。为匹配车辆识别数据集,输出维度为3。" ] }, { "cell_type": "code", "execution_count": 16, "id": "b9bd9b7d-ae8c-44e3-bb78-f332018b8c89", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "f18a1422c7bd4516bad0cf07665e720b", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/500 [00:00" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "class AlexNet(nn.Module):\n", " def __init__(self):\n", " super(AlexNet, self).__init__()\n", " self.features = nn.Sequential(collections.OrderedDict([\n", " ('conv1', nn.Conv2d(in_channels=3, out_channels=96, kernel_size=11, stride=4, padding=0)), # 55 * 55\n", " ('relu1', nn.ReLU(inplace=True)),\n", " ('pool1', nn.MaxPool2d(kernel_size=3, stride=2)), # 27 * 27\n", " ('conv2', nn.Conv2d(in_channels=96, out_channels=256, kernel_size=5, stride=1, padding=2)), # 27 * 27\n", " ('relu2', nn.ReLU(inplace=True)),\n", " ('pool2', nn.MaxPool2d(kernel_size=3, stride=2)), # 13 * 13\n", " ('conv3', nn.Conv2d(in_channels=256, out_channels=384, kernel_size=3, stride=1, padding=1)), # 13 * 13\n", " ('relu3', nn.ReLU(inplace=True)),\n", " ('conv4', nn.Conv2d(in_channels=384, out_channels=384, kernel_size=3, stride=1, padding=1)), # 13 * 13\n", " ('relu4', nn.ReLU(inplace=True)),\n", " ('conv5', nn.Conv2d(in_channels=384, out_channels=256, kernel_size=3, stride=1, padding=1)), # 13 * 13\n", " ('relu5', nn.ReLU(inplace=True)),\n", " ('pool5', nn.MaxPool2d(kernel_size=3, stride=2)), # 6 * 6\n", " ]))\n", " self.classifier = nn.Sequential(collections.OrderedDict([\n", " ('fc6', nn.Linear(in_features=9216, out_features=4096)),\n", " ('relu6', nn.ReLU(inplace=True)),\n", " ('dropout6', nn.Dropout(p=0.5)),\n", " ('fc7', nn.Linear(in_features=4096, out_features=4096)),\n", " ('relu7', nn.ReLU(inplace=True)),\n", " ('dropout7', nn.Dropout(p=0.5)),\n", " ('fc8', nn.Linear(in_features=4096, out_features=3)),\n", " ]))\n", "\n", " def forward(self, x):\n", " x = self.features(x)\n", " x = torch.flatten(x, 1)\n", " x = self.classifier(x)\n", " return x\n", "\n", "\n", "alexnet_transform = transforms.Compose(\n", " [\n", " transforms.ToTensor(),\n", " transforms.Resize((227, 227), antialias=True),\n", " transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),\n", " ]\n", ")\n", "train_alexnet_dataset = Vehicle(root=\"./dataset\", train=True, transform=alexnet_transform)\n", "test_alexnet_dataset = Vehicle(root=\"./dataset\", train=False, transform=alexnet_transform)\n", "\n", "training_args = {\n", " 'train_dataset': train_alexnet_dataset,\n", " 'eval_dataset': test_alexnet_dataset,\n", " 'learning_rate': 1.0e-5,\n", " 'num_epochs': 100,\n", " 'batch_size': 256,\n", " 'weight_decay': 5.0e-5,\n", " 'logging_steps': 3,\n", " 'eval_steps': 50,\n", " 'print_log_epochs': 0\n", "}\n", "\n", "model = AlexNet().to(device)\n", "trainer = MultiCLSTrainer(model=model, **training_args)\n", "_ = trainer.train()" ] }, { "cell_type": "markdown", "id": "4f6d3679-fd02-4b5b-9f06-5dcd0e53b301", "metadata": {}, "source": [ "实验表明,AlexNet在车辆识别数据集上能够正常收敛,且准确率达到了$90\\%$以上。但是由于模型复杂度较高,训练到最后存在过拟合问题。可以通过增加dropout层缓解这个问题。" ] }, { "attachments": {}, "cell_type": "markdown", "id": "5ac6541c-c367-4fc1-a6b7-1cf2c3cc0062", "metadata": {}, "source": [ "# 2. 空洞卷积实验\n", "\n", "- 使用torch.nn实现空洞卷积,要求dilation满足HDC条件(如1,2,5)且要堆叠多层并在至少一个数据集上进行实验,从训练时间、预测精度、Loss变化等角度分析实验结果(最好使用图表展示)\n", "- 将空洞卷积模型的实验结果与卷积模型的结果进行分析比对,训练时间、预测精度、Loss变化等角度分析\n", "- 不同超参数的对比分析(包括卷积层数、卷积核大小、不同dilation的选择,batchsize、lr等)选其中至少1-2个进行分析(选做)" ] }, { "cell_type": "code", "execution_count": 17, "id": "c5111a37-674f-4468-a8cb-4a267814e645", "metadata": {}, "outputs": [], "source": [ "torch.cuda.empty_cache()" ] }, { "cell_type": "markdown", "id": "8aaefc6f-3866-4bf2-b070-b2a03f3eeaa6", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "对**dilation**进行比较分析。分别构造dilation为\n", "- \\[\\[1, 1, 1\\], \\[1, 1, 1\\]\\](普通卷积)\n", "- \\[\\[1, 2, 5\\], \\[1, 2, 5\\]\\]\n", "- \\[\\[1, 3, 5\\], \\[1, 3, 5\\]\\]\n", "- \\[\\[1, 3, 7\\], \\[1, 3, 7\\]\\]\n", "\n", "的模型,进行车辆分类任务的训练和预测。为控制变量,`channels`的变化统一为\\[3, 16, 32, 64, 128, 256, 512\\]。" ] }, { "cell_type": "code", "execution_count": 18, "id": "b31ad185-f854-4ae4-97dc-3044ab157288", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "模型1(dilation=[1, 1, 1])开始训练:\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "67473f0f22bb4aadb3634aa5ed099f5a", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/500 [00:00" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "conv_configs = [\n", " [(3, 16, 3, 1, 1, 1), (16, 32, 3, 1, 1, 1), (32, 64, 3, 1, 1, 1), \n", " (64, 128, 3, 1, 1, 1), (128, 256, 3, 1, 1, 1), (256, 512, 3, 1, 1, 1),],\n", " [(3, 16, 3, 1, 1, 1), (16, 32, 3, 1, 2, 2), (32, 64, 3, 1, 5, 5), \n", " (64, 128, 3, 1, 1, 1), (128, 256, 3, 1, 2, 2), (256, 512, 3, 1, 5, 5),],\n", " [(3, 16, 3, 1, 1, 1), (16, 32, 3, 1, 3, 3), (32, 64, 3, 1, 5, 5), \n", " (64, 128, 3, 1, 1, 1), (128, 256, 3, 1, 3, 3), (256, 512, 3, 1, 5, 5),],\n", " [(3, 16, 3, 1, 1, 1), (16, 32, 3, 1, 3, 3), (32, 64, 3, 1, 7, 7), \n", " (64, 128, 3, 1, 1, 1), (128, 256, 3, 1, 3, 3), (256, 512, 3, 1, 7, 7),],\n", "]\n", "plot_colors = ['blue', 'green', 'orange', 'purple']\n", "\n", "fig, axes = plt.subplots(1, 2, figsize=(7, 3.5))\n", "\n", "axes[0].set_xlabel('Step')\n", "axes[0].set_ylabel('Loss')\n", "axes[0].set_title('Validation Loss Curve')\n", "axes[0].grid(True, linestyle='--', linewidth=0.5, alpha=0.6)\n", "axes[1].set_xlabel('Step')\n", "axes[1].set_ylabel('Accuracy')\n", "axes[1].set_title('Validation Accuracy Curve')\n", "axes[1].grid(True, linestyle='--', linewidth=0.5, alpha=0.6)\n", "\n", "training_args = {\n", " 'train_dataset': train_vehicle_dataset,\n", " 'eval_dataset': test_vehicle_dataset,\n", " 'learning_rate': 5.0e-6,\n", " 'num_epochs': 100,\n", " 'batch_size': 256,\n", " 'weight_decay': 0.1,\n", " 'logging_steps': 3,\n", " 'eval_steps': 20,\n", " 'plot': False,\n", " 'print_log_epochs': 0,\n", " 'print_eval': False\n", "}\n", "\n", "for index, conv_config in enumerate(conv_configs):\n", " model = Model_1_3(conv_config, image_size=image_size, num_classes=num_classes).to(device)\n", " dilation_str = f'dilation=[{conv_config[0][5]}, {conv_config[1][5]}, {conv_config[2][5]}]'\n", " \n", " print(f\"模型{index + 1}({dilation_str})开始训练:\")\n", " trainer = MultiCLSTrainer(model=model, **training_args)\n", " curves = trainer.train()['curves']\n", "\n", " eval_log_steps, eval_losses = zip(*curves['eval_loss_curve'])\n", " axes[0].plot(\n", " eval_log_steps, eval_losses,\n", " label=dilation_str, color=plot_colors[index]\n", " )\n", " eval_log_steps, eval_accuracies = zip(*curves['eval_acc_curve'])\n", " axes[1].plot(\n", " eval_log_steps, eval_accuracies, \n", " label=dilation_str, color=plot_colors[index]\n", " )\n", "\n", "axes[0].legend()\n", "axes[1].legend()\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "e02fc389-bdf4-467f-8e14-511a66255602", "metadata": {}, "source": [ "从loss曲线和准确率曲线来看,4种dilation配置最终都能够收敛到较好的结果(准确率$90\\%$左右),导致的原因是模型复杂度高而训练数据量少。\n", "\n", "但是从收敛速度来看,准确率曲线左端,dilation为$[1, 3, 7]$的模型在第一次验证时就已经达到约$70\\%$的准确率,相比之下普通卷积的准确率还不到$50\\%$,其他两种配置的准确率介于两者之间;验证集loss同理。\n", "\n", "说明dilation跨度越大的模型,收敛越快,空洞卷积模型有更好的拟合能力。\n", "\n", "从训练速度来看,dilation的模型训练时长显著多于普通卷积,猜测是改变图形形状适配空洞卷积核的步骤会耗费较多的算力。" ] }, { "attachments": {}, "cell_type": "markdown", "id": "995aecd0-3490-44f5-8733-7c626e368e6d", "metadata": {}, "source": [ "# 3. 残差网络实验\n", "\n", "- 实现给定结构的残差网络,在至少一个数据集上进行实验,从训练时间、预测精度、Loss变化等角度分析实验结果(最好使用图表展示)\n" ] }, { "cell_type": "code", "execution_count": 19, "id": "97f25712-792e-4428-9856-290419980557", "metadata": {}, "outputs": [], "source": [ "torch.cuda.empty_cache()" ] }, { "cell_type": "code", "execution_count": 20, "id": "ae6a09f9-f568-4f2f-bba4-f247106c367a", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "9833117f476b460aa3b2d5d051464f1c", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/500 [00:00" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "class BasicResidualBlock(nn.Module):\n", " def __init__(self, in_channels, out_channels, stride=1):\n", " super(BasicResidualBlock, self).__init__()\n", " self.conv1 = nn.Sequential(\n", " nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False),\n", " nn.BatchNorm2d(out_channels),\n", " nn.ReLU(inplace=True)\n", " )\n", " self.conv2 = nn.Sequential(\n", " nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False),\n", " nn.BatchNorm2d(out_channels),\n", " )\n", " self.relu = nn.ReLU(inplace=True)\n", " self.shortcut = nn.Sequential()\n", " if stride != 1 or in_channels != out_channels:\n", " self.shortcut = nn.Sequential(\n", " nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False),\n", " nn.BatchNorm2d(out_channels)\n", " )\n", "\n", " def forward(self, x):\n", " return self.relu(self.conv2(self.conv1(x)) + self.shortcut(x))\n", "\n", "\n", "class ResNet(nn.Module):\n", " def __init__(self, num_classes=3):\n", " super(ResNet, self).__init__()\n", " self.features = nn.Sequential(collections.OrderedDict([\n", " ('conv1', nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, padding=1, bias=False)),\n", " ('bn1', nn.BatchNorm2d(64)),\n", " ('relu1', nn.ReLU(inplace=True)),\n", " ('resnet_block2', BasicResidualBlock(in_channels=64, out_channels=64)),\n", " ('resnet_block3', BasicResidualBlock(in_channels=64, out_channels=64)),\n", " ('resnet_block4', BasicResidualBlock(in_channels=64, out_channels=128, stride=2)),\n", " ('resnet_block5', BasicResidualBlock(in_channels=128, out_channels=128)),\n", " ('resnet_block6', BasicResidualBlock(in_channels=128, out_channels=256, stride=2)),\n", " ('resnet_block7', BasicResidualBlock(in_channels=256, out_channels=256)),\n", " ('resnet_block8', BasicResidualBlock(in_channels=256, out_channels=512, stride=2)),\n", " ('resnet_block9', BasicResidualBlock(in_channels=512, out_channels=512)),\n", " ('pool', nn.AvgPool2d(4)),\n", " ]))\n", " self.classifier = nn.Linear(in_features=512, out_features=num_classes)\n", "\n", " def forward(self, x):\n", " x = self.features(x)\n", " x = self.classifier(torch.flatten(x, 1))\n", " return x\n", "\n", "\n", "training_args = {\n", " 'train_dataset': train_vehicle_dataset,\n", " 'eval_dataset': test_vehicle_dataset,\n", " 'learning_rate': 5.0e-6,\n", " 'num_epochs': 100,\n", " 'batch_size': 256,\n", " 'weight_decay': 0.1,\n", " 'logging_steps': 3,\n", " 'eval_steps': 50,\n", " 'print_log_epochs': 0\n", "}\n", "model = ResNet(num_classes=num_classes).to(device)\n", "trainer = MultiCLSTrainer(model=model, **training_args)\n", "_ = trainer.train()" ] }, { "cell_type": "markdown", "id": "82793282-b7c0-49e4-9292-f5f58738d0e6", "metadata": {}, "source": [ "实验证明,残差网络的效果比纯卷积网络好。原因在于残差网络能保留原图片输入的大部分特征,不会在卷积计算中遗漏。\n", "\n", "但是由于网络复杂度比较高,模型出现了过拟合,可以加入dropout缓解。" ] }, { "cell_type": "markdown", "id": "dbd67b91-d8b8-4ffc-882a-3d47ee3b9d87", "metadata": { "jp-MarkdownHeadingCollapsed": true }, "source": [ "# 心得体会\n", "\n", "通过本次卷积神经网络实验,我深入理解和掌握了卷积神经网络的原理,并且在多个数据集上设计并训练了不同结构的卷积神经网络模型,通过比较分析不同模型的性能,加深了我对卷积神经网络中不同组件作用的理解。\n", "\n", "实验中,我实现了自定义的二维卷积运算,并在车辆分类任务上与PyTorch内置的二维卷积运算进行了比较。这让我深入理解了卷积运算转换为矩阵运算的过程,以及GPU对矩阵运算的加速优化。我也对比研究了卷积层数、卷积核大小等超参数对模型性能的影响。\n", "\n", "通过对dilation参数的研究,我意识到在神经网络设计中参数选择的细微差别可能导致显著的性能变化。特别是在对比普通卷积和不同dilation配置的空洞卷积时,我观察到虽然较高的dilation在一开始的学习效率和准确率方面可能不如普通卷积,但随着训练的进行,它们能更好地拟合数据且较不易过拟合。这种洞见对我理解如何平衡网络的学习速度和泛化能力非常有帮助。\n", "\n", "此外,通过残差网络的实验,我学到了网络结构设计的重要性。残差网络能够有效地解决深度网络训练过程中的梯度消失问题,同时保留更多的原始特征信息。这一点在实验中得到了明显的体现,残差网络在几乎所有指标上都优于传统的纯卷积网络。\n", "\n", "在实验过程中,我也遇到了一些挑战,比如调整网络参数以避免过拟合,以及理解不同网络结构背后的理论基础。通过不断尝试和阅读相关文献,我逐步克服了这些难题,并对这些概念有了更深刻的理解。\n", "\n", "通过整个实验,我掌握了卷积神经网络的组成结构,了解了调节不同超参数对模型性能的影响,加深了对卷积神经网络代表性结构的理解,为后续课程项目奠定了基础。我会在今后的学习中进一步深化对卷积神经网络的研究,运用到更多实际问题中。这次实验不仅增强了我的技术技能,也加深了我对深度学习领域的热情和认识。我期待未来能在这一领域继续探索和成长。" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.13" } }, "nbformat": 4, "nbformat_minor": 5 }