Handle an Exploding Gradient
In this mock interview, Angie asks Raj (MLE @ Snapchat) to explain “how he’d handle an exploding gradient.” Below is a supplemental written solution that shows how to approach the question.
Answer
An exploding gradient occurs because of backpropagation in a neural network, specifically when there are successive layers in a network for which the gradients must be computed. Typically those are calculated with the chain rule, which involves multiplications of many different gradients. One way of handling it is to clip the gradients at a certain threshold, which is a brute-force technique.
You could also use what's become a lot more common in the past few years, which is batch normalization. This technique uses a type of normalization after a particular layer or activation and then takes the mean and standard deviation, based on the batch of examples. This can help scale the gradients to more reasonable, stable values.
You can also change your architecture or choose your architecture to help mitigate these exploding gradients. You could directly reduce the number of hidden layers, which will therefore reduce the amount of multiplications that need to happen for the chain rule.
You could also choose architectures, for example, the transformer with skip connections, which are basically pathways from certain layers to layers further down in the network, rather than the layer that directly follows it. That gives the network a pathway for the gradient to follow without having to pass through several consecutive layers. This can definitely help mitigate the exploding gradient problem.”
What makes this answer effective
The answer correctly identifies the reasoning behind the exploding gradient. It explicitly mentions the chain rule as the reason for the multiplications of large gradients, which result in exponentially large gradients when doing backpropagation. It also offers multiple ways of handling the exploding gradient, including batch normalization, gradient clipping, and utilizing a different architecture.