Stochastic Gradient Descent Optimizes Overparameterized Deep ReLU Networks
Deep neural network training, despite appearing random, exhibits stability. Through random weight initialization and gradient descent iteration, these models effectively converge to the global minimum of training loss.