Prompt Detail:
No, when using batch normalization, it is not okay to drop the parameter (bias term) from the forward propagation. Batch normalization involves normalizing the input of a layer by subtracting the mean and dividing by the standard deviation calculated over a mini-batch. However, the bias term is an important parameter that should not be dropped.
The forward propagation for a layer with batch normalization is typically expressed as follows:
Here, is the weight matrix, is the bias term, and are learnable parameters introduced by batch normalization, and is a small constant to avoid division by zero.
Dropping the bias term would affect the expressiveness of the model, as the bias term allows the network to introduce an offset to the normalized values. The bias term is still present and used in the batch normalization equations.