TR2017-140
Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks
-
- "Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks", Advances in Neural Information Processing Systems (NIPS), December 2017.BibTeX TR2017-140 PDF
- @inproceedings{Ziming2017dec,
- author = {Ziming, Zhang and Brand, Matthew},
- title = {Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks},
- booktitle = {Advances in Neural Information Processing Systems (NIPS)},
- year = 2017,
- month = dec,
- url = {https://www.merl.com/publications/TR2017-140}
- }
,
- "Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks", Advances in Neural Information Processing Systems (NIPS), December 2017.
-
MERL Contact:
-
Research Areas:
Abstract:
By lifting the ReLU function into a higher dimensional space, we develop a smooth multi-convex formulation for training feed-forward deep neural networks (DNNs). This allows us to develop a block coordinate descent (BCD) training algorithm consisting of a sequence of numerically well-behaved convex optimizations. Using ideas from proximal point methods in convex analysis, we prove that this BCD algorithm will converge globally to a stationary point with R-linear convergence rate of order one. In experiments with the MNIST database, DNNs trained with this BCD algorithm consistently yielded better test-set error rates than identical DNN architectures trained via all the stochastic gradient descent (SGD) variants in the Caffe toolbox.