[论文笔记] SPECTRAL NORMALIZATION FOR GENERATIVE ADVERSARIAL NETWORKS

xiaoxiao2022-07-05 247

简介：对 normalization层进行改进，提出spectral normalization（SN-GAN），以提高Discriminator的训练稳定度；优点： 1、Lipschitz常数是唯一需要进行调节的超参； 2、实现简单，额外的计算成本很低；

一、背景

原始（2014年）GAN公式， $E_{x～ q_{data}}[\log{D(x)}]+E_{x^{'}～ p_{G}}[\log{(1-D(x^{'})})]$

一个样本x输入，它可能来自于真实分布，也可能来自于生成器的输出分布。该样本对损失函数的贡献为 $q_{data}(x)\log{D(x)}+p_{G}(x)\log{(1-D(x)})$

当生成器固定，最优的鉴别器求解如下： $\frac{q_{data}(x)}{D(x)}-\frac{p_{G}(x)} {1-D(x)}=0$

因此（文中直接给了下式，没给推导）， $D(x)^*=\frac{q_{data}(x)}{q_{data}(x)+p_{G}(x)}$

又有原始GAN的鉴别器最后一层为sigmoid层，则 $D(x)^*=\frac{q_{data}(x)}{q_{data}(x)+p_{G}(x)}=sigmoid(f^*(x))$

其中， $f^*(x)=\log{q_{data}(x)}-\log{p_G(x)}$ ，它的导数为 $\nabla_xf^*(x)=\frac{1}{q_{data}(x)}\nabla_xq_{data}(x)-\frac{1}{p_G(x)}\nabla_xp_G(x)$

然而这个导数项无界甚至不可计算，实际中需要对此加入正则限制。对此已经有一系列成功的研究（如齐国君的LSGAN，WGAN、WGAN-GP等），它们通过引入正则项对鉴别器的Lipschitz常数进行限制，即 ${\arg{max}}_{||f||_{Lip}\leq{K}}V(G,D)$

其中， $f||_{Lip}$ 为Lipschitz正则，表示存在常数M，对任意 $x、x^{'}$ 有 $\frac{||f(x)-f(x^{'})||_2}{||x-x^{'}||_2}\leq M$

二、spectral norm

要限制鉴别器函数 $f$ 满足 $||f||_{Lip}\leq{1}$ ，根据性质 $||g_1 \circ g_2||_{Lip}\leq ||g_1 ||_{Lip} \cdot||g_2||_{Lip}$

则只需使得每一层网络层均满足Lipschitz常数不大于1即可。对于一个线性层 $g (h) = W h$

有 $||g||_{Lip}=\sigma(W)$ ，其中 $\sigma(W)$ 为矩阵 $W$ 的二阶范数（也被称为谱范数），spectral norm 所要做的即引入如下归一化： $W_{SN}:=\frac{W}{\sigma(W)}$

则有 $W_{SN}=1$ ，满足了1-Lipschitz条件。

三、spectral norm 计算方法

关键点在于如何高效计算 ${\sigma(W)}$ ，其值为 $W$ 的最大奇异值（也为 $W^TW$ 的最大特征值的开平方）。直接计算，计算成本会很大，更合适的方法是采用 power iteration 的方法来估计 ${\sigma(W)}$ 。

三、spectral norm 与其他正则技巧的比较

1、weight normalization、WGAN：WN面临一个矛盾：WN对网络权重有很强的限制，优化会迫使权重的秩为1；而为了更好的训练GAN，我们需要更大的norm学习更多的特征。 2、Orthonormal regularization：通过迫使奇异值为1，破坏了谱信息； 3、WGAN-GP：非常依赖生成网络输出分布的支撑集，使得这种正则效果不稳定；此外，WGAN-GP计算量较大；

四、实验结果

五、pytorch 源码

https://github.com/hellopipu/pytorch-spectral-normalization-gan

torch.mv(a,b) 用于矩阵和向量相乘，其中b必须是一维向量； torch.t() 矩阵转置 getattr(x, 'y') 相当于调用 x.y ; setattr(x,'y',v) 相当于调用 x.y=v； register_parameter(name,parameter) 向模块中添加参数，该参数能通过name索引到

import torch from torch.optim.optimizer import Optimizer from torch.autograd import Variable import torch.nn.functional as F from torch import nn from torch import Tensor from torch.nn import Parameter def l2normalize(v, eps=1e-12): return v / (v.norm() + eps) class SpectralNorm(nn.Module): def __init__(self, module, name='weight', power_iterations=1): super(SpectralNorm, self).__init__() self.module = module self.name = name self.power_iterations = power_iterations if not self._made_params(): self._make_params() self._update_u_v() def _update_u_v(self): u = getattr(self.module, self.name + "_u") v = getattr(self.module, self.name + "_v") w = getattr(self.module, self.name + "_bar") height = w.data.shape[0] for _ in range(self.power_iterations): # print(w.view(height,-1).data.shape) v.data = l2normalize(torch.mv(torch.t(w.view(height,-1).data), u.data)) u.data = l2normalize(torch.mv(w.view(height,-1).data, v.data)) # sigma = torch.dot(u.data, torch.mv(w.view(height,-1).data, v.data)) sigma = u.dot(w.view(height, -1).mv(v)) #sigma = (u^T) W v setattr(self.module, self.name, w / sigma.expand_as(w)) #update W to W_SN def _made_params(self): try: u = getattr(self.module, self.name + "_u") v = getattr(self.module, self.name + "_v") w = getattr(self.module, self.name + "_bar") return True except AttributeError: return False def _make_params(self): w = getattr(self.module, self.name) #conv.weight ( input_channel , output_channel , kernel_w , kernel_h ) height = w.data.shape[0] # height = input_channel width = w.view(height, -1).data.shape[1] # width = output_channel x kernel_w x kernel_h u = Parameter(w.data.new(height).normal_(0, 1), requires_grad=False) #initiate left singular vector # shape: (input_channel) v = Parameter(w.data.new(width).normal_(0, 1), requires_grad=False) #initiate right singular vector # shape: (output_channel x kernel_w x kernel_h) # print(u.shape) u.data = l2normalize(u.data) v.data = l2normalize(v.data) w_bar = Parameter(w.data) del self.module._parameters[self.name] # will add after update # add parameter to module self.module.register_parameter(self.name + "_u", u) self.module.register_parameter(self.name + "_v", v) self.module.register_parameter(self.name + "_bar", w_bar) def forward(self, *args): self._update_u_v() #更新完W后，再调用原模块的forward() return self.module.forward(*args) if __name__ == '__main__': conv2 = SpectralNorm(nn.Conv2d(64, 64, 4, stride=2, padding=(1, 1)))

注意：使用SN之后不要再加BN等其他归一化层，因为 Batch norm 的“除方差”和“乘以缩放因子”这两个操作很明显会破坏判别器的 Lipschitz 连续性以resblock为例，引入SN，代码如下：

class Block(nn.Module): def __init__(self, in_ch, out_ch, h_ch=None, ksize=3, pad=1, activation=F.relu, downsample=False): super(Block, self).__init__() self.activation = activation self.downsample = downsample self.learnable_sc = (in_ch != out_ch) or downsample if h_ch is None: h_ch = in_ch else: h_ch = out_ch self.c1 = utils.spectral_norm(nn.Conv2d(in_ch, h_ch, ksize, 1, pad)) self.c2 = utils.spectral_norm(nn.Conv2d(h_ch, out_ch, ksize, 1, pad)) if self.learnable_sc: self.c_sc = utils.spectral_norm(nn.Conv2d(in_ch, out_ch, 1, 1, 0)) self._initialize() def _initialize(self): init.xavier_uniform_(self.c1.weight.data, math.sqrt(2)) init.xavier_uniform_(self.c2.weight.data, math.sqrt(2)) if self.learnable_sc: init.xavier_uniform_(self.c_sc.weight.data) def forward(self, x): return self.shortcut(x) + self.residual(x) def shortcut(self, x): if self.learnable_sc: x = self.c_sc(x) if self.downsample: return F.avg_pool2d(x, 2) return x def residual(self, x): h = self.c1(self.activation(x)) h = self.c2(self.activation(h)) if self.downsample: h = F.avg_pool2d(h, 2) return h

参考资料 https://zhuanlan.zhihu.com/p/55393813 https://www.zhihu.com/search?type=content&q=wgan https://www.cnblogs.com/pinard/p/6251584.html http://kaizhao.net/blog/posts/spectral-norm/

最新回复(0)