so i'm working with a very simple model
# random seed
torch.manual_seed(42)
# create linear regression model class
class LinearRegressionModel(nn.Module):
def __init__(self):
super().__init__()
self.weight = nn.Parameter(torch.randn(1,
requires_grad=True,
dtype=torch.float32))
self.bias = nn.Parameter(torch.randn(1,
requires_grad=True,
dtype=torch.float32))
# forward method to define the computation in the model
# Defines the computation performed at every call.
# Should be overridden by all subclasses.
def forward(self, x: torch.Tensor) > torch.Tensor:
return self.weight * x + self.bias
# setup loss function
loss_fn = nn.L1Loss()
# setup optimizer
optimizer = torch.optim.SGD(params=model_0.parameters(),
lr=0.0001)
epochs = 10
### training loop
for epoch in range(epochs):
model_0.train()
#forward pass
y_pred = model_0.forward(X_train)
# calculate loss
loss = loss_fn(y_pred, y_train)
print("loss", loss)
# optimizer zero grad. gradients accumulate over each epoch (by default)
# so reset them
optimizer.zero_grad()
# backpropogation
loss.backward()
# gradient descent
optimizer.step()
model_0.eval()
print(model_0.state_dict())
the methodology is this:
### building training loop
1. loop through data
2. forward pass
3. calculate loss
4. optimizer zero grad
5. loss backward  move backwards through network to calculate gradient of each parameter of our model w.r.t. the loss (backpropagation)
6. optimizer (adjust parameters to improve loss)
but let's break the code down
first up L1Loss
class L1Loss(_Loss):
__constants__ = ['reduction']
def __init__(self, size_average=None, reduce=None, reduction: str = 'mean') > None:
super().__init__(size_average, reduce, reduction)
def forward(self, input: Tensor, target: Tensor) > Tensor:
return l1_loss(input, target, reduction=self.reduction)
def l1_loss(
input: Tensor,
target: Tensor,
size_average: Optional[bool] = None,
reduce: Optional[bool] = None,
reduction: str = "mean",
) > Tensor:
r"""l1_loss(input, target, size_average=None, reduce=None, reduction='mean') > Tensor
Function that takes the mean elementwise absolute value difference.
See :class:`~torch.nn.L1Loss` for details.
"""
if has_torch_function_variadic(input, target):
return handle_torch_function(
l1_loss, (input, target), input, target, size_average=size_average, reduce=reduce, reduction=reduction
)
if not (target.size() == input.size()):
warnings.warn(
"Using a target size ({}) that is different to the input size ({}). "
"This will likely lead to incorrect results due to broadcasting. "
"Please ensure they have the same size.".format(target.size(), input.size()),
stacklevel=2,
)
if size_average is not None or reduce is not None:
reduction = _Reduction.legacy_get_string(size_average, reduce)
expanded_input, expanded_target = torch.broadcast_tensors(input, target)
return torch._C._nn.l1_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))

The
handle_torch_function
function plays a role in ensuring that custom torch function implementations are properly utilized for certain operations. It helps provide a way for custom logic to be executed when PyTorch functions are called with custom tensor types. This mechanism is useful in contexts where you want to define specialized behavior for operations involving custom tensor types. 
handle_torch_function
is used to ensure that custom tensor types can provide their own implementation of the L1 loss computation. If any of the input tensors (input
ortarget
) have a custom__torch_function__
implementation,handle_torch_function
makes sure to call the appropriate implementation. This mechanism allows you to extend or modify the behavior of PyTorch functions when working with custom tensor types, ensuring that the computations are handled in a way that is meaningful for those tensor types. 
Broadcasting allows elementwise operations between tensors of different shapes.

torch._C._nn.l1_loss
:
This is the C++ backend implementation of the L1 loss computation.

It takes the expanded input and target tensors along with the specified reduction mode and computes the L1 loss.
