training loop related

August 14, 2023

so i'm working with a very simple model

# random seed

# create linear regression model class
class LinearRegressionModel(nn.Module):
    def __init__(self):
        self.weight = nn.Parameter(torch.randn(1,
        self.bias = nn.Parameter(torch.randn(1,

    # forward method to define the computation in the model
    # Defines the computation performed at every call.
    # Should be overridden by all subclasses.
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.weight * x + self.bias

# setup  loss function
loss_fn = nn.L1Loss()

# setup optimizer
optimizer = torch.optim.SGD(params=model_0.parameters(),

epochs = 10

### training loop
for epoch in range(epochs):

    #forward pass
    y_pred = model_0.forward(X_train)

    # calculate loss
    loss = loss_fn(y_pred, y_train)
    print("loss", loss)

    # optimizer zero grad. gradients accumulate over each epoch (by default)
    # so reset them

    # backpropogation

    # gradient descent



the methodology is this:

### building training loop

1. loop through data
2. forward pass
3. calculate loss
4. optimizer zero grad
5. loss backward - move backwards through network to calculate gradient of each parameter of our model w.r.t. the loss (backpropagation)
6. optimizer (adjust parameters to improve loss)

but let's break the code down

first up L1Loss

class L1Loss(_Loss):
    __constants__ = ['reduction']

    def __init__(self, size_average=None, reduce=None, reduction: str = 'mean') -> None:
        super().__init__(size_average, reduce, reduction)

    def forward(self, input: Tensor, target: Tensor) -> Tensor:
        return l1_loss(input, target, reduction=self.reduction)

def l1_loss(
    input: Tensor,
    target: Tensor,
    size_average: Optional[bool] = None,
    reduce: Optional[bool] = None,
    reduction: str = "mean",
) -> Tensor:
    r"""l1_loss(input, target, size_average=None, reduce=None, reduction='mean') -> Tensor

    Function that takes the mean element-wise absolute value difference.

    See :class:`~torch.nn.L1Loss` for details.
    if has_torch_function_variadic(input, target):
        return handle_torch_function(
            l1_loss, (input, target), input, target, size_average=size_average, reduce=reduce, reduction=reduction
    if not (target.size() == input.size()):
            "Using a target size ({}) that is different to the input size ({}). "
            "This will likely lead to incorrect results due to broadcasting. "
            "Please ensure they have the same size.".format(target.size(), input.size()),
    if size_average is not None or reduce is not None:
        reduction = _Reduction.legacy_get_string(size_average, reduce)

    expanded_input, expanded_target = torch.broadcast_tensors(input, target)
    return torch._C._nn.l1_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))
  • The handle_torch_function function plays a role in ensuring that custom torch function implementations are properly utilized for certain operations. It helps provide a way for custom logic to be executed when PyTorch functions are called with custom tensor types. This mechanism is useful in contexts where you want to define specialized behavior for operations involving custom tensor types.

  • handle_torch_function is used to ensure that custom tensor types can provide their own implementation of the L1 loss computation. If any of the input tensors (input or target) have a custom __torch_function__ implementation, handle_torch_function makes sure to call the appropriate implementation. This mechanism allows you to extend or modify the behavior of PyTorch functions when working with custom tensor types, ensuring that the computations are handled in a way that is meaningful for those tensor types.

  • Broadcasting allows element-wise operations between tensors of different shapes.

  • torch._C._nn.l1_loss:

    • This is the C++ backend implementation of the L1 loss computation.

    • It takes the expanded input and target tensors along with the specified reduction mode and computes the L1 loss.