torch.nn.parameter

August 12, 2023

A neural network can be conceptualized as a machine that facilitates learning by attempting to comprehend patterns in data. Similar to how humans learn from examples, a neural network learns from data by modifying its internal settings, referred to as parameters. These parameters function as adjustable knobs that the network manipulates to enhance its performance in a specific task, such as image recognition or language translation.

The ability to modify parameters enables the neural network to adapt to diverse patterns in data. Comparable to how a musician adjusts their playing technique to match different melodies, the network modifies parameters to identify distinct features in data, whether they are edges in images or linguistic patterns in text.

Parameters are special tensors used within neural networks that are automatically registered as attributes of a Module. These parameters represent learnable weights and biases that are optimized during the training process to enhance the model's performance in a specific task.

The Parameter class provides a structured way to handle these knobs within PyTorch. It ensures that these knobs are treated as special attributes of the neural network (Module), making it easy to manage and optimize them during training. Additionally, the concept of uninitialized parameters (UninitializedParameter) helps handle cases where the initial shape of parameters is unknown, giving flexibility during model construction.

  • import torch
    from torch._C import _disabled_torch_function_impl
    from collections import OrderedDict
    
    # Metaclass to combine _TensorMeta and the instance check override for Parameter.
    class _ParameterMeta(torch._C._TensorMeta):
        # Make `isinstance(t, Parameter)` return True for custom tensor instances that have the _is_param flag.
        def __instancecheck__(self, instance):
            return super().__instancecheck__(instance) or (
                isinstance(instance, torch.Tensor) and getattr(instance, '_is_param', False))
    
    • Metaclasses provide the ability to regulate the construction of classes, specify their attributes, and incorporate distinctive functionality. In the instance of _ParameterMeta, it is employed to alter the conduct of the Parameter class and facilitate certain exceptional characteristics associated with parameters. torch._C._TensorMeta is a metaclass in PyTorch that defines the common behavior and attributes of tensor-like objects.

    It offers a consistent approach to identifying and distinguishing between regular tensors and parameters, irrespective of whether the tensor is an instance of the Parameter class or a custom tensor type.

    • The instancecheck method is modified by the _ParameterMeta, thereby enabling the recognition of instances of the Parameter class as parameters, even when utilized in custom tensor types. This guarantees that instances of custom tensor types can be treated as parameters, provided they possess the _is_param flag.

    • The _ParameterMeta metaclass guarantees that any tensor-like object assigned as an attribute of a Module instance with the _is_param flag is automatically registered as a parameter. This automatic registration simplifies the management of parameters within neural networks.

  • class Parameter(torch.Tensor, metaclass=_ParameterMeta):
        r"""A kind of Tensor that is to be considered a module parameter.
    
        Parameters are :class:`~torch.Tensor` subclasses, that have a
        very special property when used with :class:`Module` s - when they're
        assigned as Module attributes they are automatically added to the list of
        its parameters, and will appear e.g. in :meth:`~Module.parameters` iterator.
        Assigning a Tensor doesn't have such effect. This is because one might
        want to cache some temporary state, like last hidden state of the RNN, in
        the model. If there was no such class as :class:`Parameter`, these
        temporaries would get registered too.
    
        Args:
            data (Tensor): parameter tensor.
            requires_grad (bool, optional): if the parameter requires gradient. See
                :ref:`locally-disable-grad-doc` for more details. Default: `True`
        """
        def __new__(cls, data=None, requires_grad=True):
            if data is None:
                data = torch.empty(0)
            if type(data) is torch.Tensor or type(data) is Parameter:
                # For ease of BC maintenance, keep this path for standard Tensor.
                # Eventually (tm), we should change the behavior for standard Tensor to match.
                return torch.Tensor._make_subclass(cls, data, requires_grad)
    
            # Path for custom tensors: set a flag on the instance to indicate parameter-ness.
            t = data.detach().requires_grad_(requires_grad)
            if type(t) is not type(data):
                raise RuntimeError(f"Creating a Parameter from an instance of type {type(data).__name__} "
                                   "requires that detach() returns an instance of the same type, but return "
                                   f"type {type(t).__name__} was found instead. To use the type as a "
                                   "Parameter, please correct the detach() semantics defined by "
                                   "its __torch_dispatch__() implementation.")
            t._is_param = True
            return t
    
        # Note: the 3 methods below only apply to standard Tensor. Parameters of custom tensor types
        # are still considered that custom tensor type and these methods will not be called for them.
        def __deepcopy__(self, memo):
            if id(self) in memo:
                return memo[id(self)]
            else:
                result = type(self)(self.data.clone(memory_format=torch.preserve_format), self.requires_grad)
                memo[id(self)] = result
                return result
    
        def __repr__(self):
            return 'Parameter containing:\n' + super().__repr__()
    
        def __reduce_ex__(self, proto):
            state = torch._utils._get_obj_state(self)
    
            # See Note [Don't serialize hooks]
            hooks = OrderedDict()
            if not state:
                return (
                    torch._utils._rebuild_parameter,
                    (self.data, self.requires_grad, hooks)
                )
    
            return (
                torch._utils._rebuild_parameter_with_state,
                (self.data, self.requires_grad, hooks, state)
            )
    
        __torch_function__ = _disabled_torch_function_impl
    • Constructor (__new__):

      • The __new__ method constructs a new instance of the Parameter class.

      • The act of generating a novel subclass enables the creation of a distinct category of tensor, namely Parameter, which possesses unique attributes and functionalities that are tailored to its role as a parameter in a neural network. This facilitates the process of parameter administration and monitoring throughout the phases of training and optimization.

      • If no data tensor is provided, an empty tensor is created.

      • If data is already a tensor or a Parameter, it is returned as a Parameter.

      • torch.Tensor._make_subclass creates a new subclass of torch.Tensor with the provided cls as the base class. This new subclass will inherit all the properties and methods of torch.Tensor, and you can add your own custom methods and behaviors to it.

        •   class NewCustomTensor(torch.Tensor):
            def some_custom_method(self):
            print("This is some custom method.")
          
            data = torch.tensor([1, 2, 3], requires_grad=True)
            custom_tensor = torch.Tensor.\_make_subclass(NewCustomTensor, data, requires_grad=True)
      • If data is a custom tensor type, it is detached and marked as requiring gradients.

      • Next, in the event that the data type is not of torch.Tensor or Parameter, it means that a custom tensor is being used. In such an instance, the code disassociates the data tensor from the computation graph (tracks its gradient history) by means of detach() and configures the requires_grad attribute through requires_grad_ (in-place). This is done to ensure that the parameter's gradient tracking behaves as expected.

      • The resulting tensor is marked as a parameter by setting the _is_param flag.

      • This is used to distinguish parameters from regular tensors when they are assigned as attributes to a Module.

    • torch_function = _disabled_torch_function_impl

      • The __torch_function__ attribute is set to _disabled_torch_function_impl, which disables the torch function dispatch mechanism for Parameter.

      • This prevents unintended interactions with the torch function interface.

      • by setting __torch_function__ = _disabled_torch_function_impl, the intention is intended to disable the __torch_function__ mechanism for instances of the Parameter class. This measure is taken to ensure that when operations involving instances of Parameter are executed, the _disabled_torch_function_impl function will be invoked instead of the customary __torch_function__ behavior.

  • UninitializedTensorMixin

    • The purpose of UninitializedTensorMixin is to provide a consistent way to handle uninitialized tensors (parameters or buffers) in PyTorch, preventing unintended errors or behaviors when working with them. It defines a set of rules and restrictions to ensure that uninitialized tensors are treated properly and that certain operations are restricted until the tensors are properly initialized. This is particularly useful for scenarios where tensors need to be created dynamically or lazily, and their properties are not fully known at the time of creation.

    •     class UninitializedTensorMixin:
              _allowed_methods = [
                  torch.Tensor.__hash__,
                  torch.Tensor.size,
                  torch.Tensor.copy_,
                  torch.Tensor.is_floating_point,
                  torch.Tensor.half,
                  torch.Tensor.float,
                  torch.Tensor.double,
                  torch.Tensor.char,
                  torch.Tensor.short,
                  torch.Tensor.int,
                  torch.Tensor.long,
                  torch.Tensor.cuda,
                  torch.Tensor.cpu,
                  torch.Tensor.to,
                  torch.Tensor.get_device,
                  torch._has_compatible_shallow_copy_type,
              ]
      
              def materialize(self, shape, device=None, dtype=None):
                  r"""Create a Parameter or Tensor with the same properties of the uninitialized one.
                  Given a shape, it materializes a parameter in the same device
                  and with the same `dtype` as the current one or the specified ones in the
                  arguments.
      
                  Args:
                      shape : (tuple): the shape for the materialized tensor.
                      device (:class:`torch.device`): the desired device of the parameters
                          and buffers in this module. Optional.
                      dtype (:class:`torch.dtype`): the desired floating point type of
                          the floating point parameters and buffers in this module. Optional.
                  """
                  if device is None:
                      device = self.data.device
                  if dtype is None:
                      dtype = self.data.dtype
                  self.data = torch.empty(shape, device=device, dtype=dtype)
                  self.__class__ = self.cls_to_become
      
              @property
              def shape(self):
                  raise RuntimeError(
                      'Can\'t access the shape of an uninitialized parameter or buffer. '
                      'This error usually happens in `load_state_dict` when trying to load '
                      'an uninitialized parameter into an initialized one. '
                      'Call `forward` to initialize the parameters before accessing their attributes.')
      
              def share_memory_(self):
                  raise RuntimeError(
                      'Can\'t share memory on an uninitialized parameter or buffer. '
                      'Call `forward` to initialize the parameters before calling '
                      '`module.share_memory()`.')
      
              def __repr__(self):
                  return f'<{self.__class__.__name__}>'
      
              def __reduce_ex__(self, proto):
                  # See Note [Don't serialize hooks]
                  return (
                      self.__class__,
                      (self.requires_grad,)
                  )
      
              @classmethod
              def __torch_function__(cls, func, types, args=(), kwargs=None):
                  # method-wrapper is to detect access to Tensor properties that are
                  # wrapped in descriptors
                  if func in cls._allowed_methods or func.__class__.__name__ == 'method-wrapper':
                      if kwargs is None:
                          kwargs = {}
                      return super().__torch_function__(func, types, args, kwargs)
                  raise ValueError(
                      'Attempted to use an uninitialized parameter in {}. '
                      'This error happens when you are using a `LazyModule` or '
                      'explicitly manipulating `torch.nn.parameter.{}` '
                      'objects. When using LazyModules Call `forward` with a dummy batch '
                      'to initialize the parameters before calling torch functions'.format(func, cls.__name__))
    • materialize: This method is used to create a tensor or parameter with the same properties as the uninitialized tensor. Given a shape, device, and dtype, it materializes a tensor on the specified device with the specified dtype.

      • useful when you want to create an uninitialized tensor or parameter with specific properties such as shape, device, and dtype. It is especially useful in situations where a placeholder tensor or parameter with a predetermined shape and data type is required, but the actual data is yet to be inserted. For instance, in certain cases, it may be necessary to establish the architecture of a neural network before initializing the parameters with appropriate values. This approach can enhance memory efficiency and minimize redundant allocations.
    • __torch_function__: This method allows customization of how torch functions should behave when applied to an instance of the class. It restricts certain operations for uninitialized tensors and raises an error if an unauthorized operation is attempted.

      • The method verifies whether the func (PyTorch function) is present in the list of permitted methods or if it is a method-wrapper (property access). If either of these conditions is satisfied, it proceeds to invoke the corresponding PyTorch function using the super().__torch_function__ method.

      • If the func is not included in the authorized methods and is not a method-wrapper, it raises a ValueError with a comprehensive error message.