其实如果仔细对比,差别不大。代码量也相似。因为这里没有使用pytorch的自动梯度计算。
需要定义torch.autograd.Function的一个子类,并且实现forward和backward函数。
# -*- coding: utf-8 -*- import torch class MyReLU(torch.autograd.Function): """ We can implement our own custom autograd Functions by subclassing torch.autograd.Function and implementing the forward and backward passes which operate on Tensors. """ @staticmethod def forward(ctx, input): """ In the forward pass we receive a Tensor containing the input and return a Tensor containing the output. ctx is a context object that can be used to stash information for backward computation. You can cache arbitrary objects for use in the backward pass using the ctx.save_for_backward method. """ ctx.save_for_backward(input) return input.clamp(min=0) @staticmethod def backward(ctx, grad_output): """ In the backward pass we receive a Tensor containing the gradient of the loss with respect to the output, and we need to compute the gradient of the loss with respect to the input. """ input, = ctx.saved_tensors grad_input = grad_output.clone() grad_input[input < 0] = 0 return grad_input dtype = torch.float device = torch.device("cpu") # device = torch.device("cuda:0") # Uncomment this to run on GPU # N is batch size; D_in is input dimension; # H is hidden dimension; D_out is output dimension. N, D_in, H, D_out = 64, 1000, 100, 10 # Create random Tensors to hold input and outputs. x = torch.randn(N, D_in, device=device, dtype=dtype) y = torch.randn(N, D_out, device=device, dtype=dtype) # Create random Tensors for weights. w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True) w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True) learning_rate = 1e-6 for t in range(500): # To apply our Function, we use Function.apply method. We alias this as 'relu'. relu = MyReLU.apply # Forward pass: compute predicted y using operations; we compute # ReLU using our custom autograd operation. y_pred = relu(x.mm(w1)).mm(w2) # Compute and print loss loss = (y_pred - y).pow(2).sum() print(t, loss.item()) # Use autograd to compute the backward pass. loss.backward() # Update weights using gradient descent with torch.no_grad(): w1 -= learning_rate * w1.grad w2 -= learning_rate * w2.grad # Manually zero the gradients after updating weights w1.grad.zero_() w2.grad.zero_()其实tensorflow和pytorch很相似,都是使用图的方式来跟踪计算。区别是:tensorflow使用静态图,而pytorch使用动态图。
静态图的好处是:你可以提前优化图的结构,这对于多次重复计算很有好处。
动态图的好处是:你可以使用命令对每个数据点执行不同的计算。
nn包定义了一组模块,它们大致相当于神经网络层。
模块接收输入张量并计算输出张量,但也可以保持内部状态,例如包含可学习参数的张量。
nn包还定义了一组在训练神经网络时常用的有用损失函数。
(注意区别前面的定义方式)
# -*- coding: utf-8 -*- import torch class TwoLayerNet(torch.nn.Module): def __init__(self, D_in, H, D_out): """ In the constructor we instantiate two nn.Linear modules and assign them as member variables. """ super(TwoLayerNet, self).__init__() self.linear1 = torch.nn.Linear(D_in, H) self.linear2 = torch.nn.Linear(H, D_out) def forward(self, x): """ In the forward function we accept a Tensor of input data and we must return a Tensor of output data. We can use Modules defined in the constructor as well as arbitrary operators on Tensors. """ h_relu = self.linear1(x).clamp(min=0) y_pred = self.linear2(h_relu) return y_pred # N is batch size; D_in is input dimension; # H is hidden dimension; D_out is output dimension. N, D_in, H, D_out = 64, 1000, 100, 10 # Create random Tensors to hold inputs and outputs x = torch.randn(N, D_in) y = torch.randn(N, D_out) # Construct our model by instantiating the class defined above model = TwoLayerNet(D_in, H, D_out) # 之前的定义方式 # model = torch.nn.Sequential( # torch.nn.Linear(D_in, H), # torch.nn.ReLU(), # torch.nn.Linear(H, D_out), # ) # Construct our loss function and an Optimizer. The call to model.parameters() # in the SGD constructor will contain the learnable parameters of the two # nn.Linear modules which are members of the model. criterion = torch.nn.MSELoss(reduction='sum') optimizer = torch.optim.SGD(model.parameters(), lr=1e-4) for t in range(500): # Forward pass: Compute predicted y by passing x to the model y_pred = model(x) # Compute and print loss loss = criterion(y_pred, y) print(t, loss.item()) # Zero gradients, perform a backward pass, and update the weights. optimizer.zero_grad() loss.backward() optimizer.step()