BIBLIOGRAPHY 97
[61] B. T. Polyak and A. B. Juditsky, “Acceleration of stochastic approximation by averag-
ing,” SIAM Journal on Control and Optimization, vol. 30, no. 4, pp. 838–855, 1992.
[62] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout:
a simple way to prevent neural networks from overfitting,” The Journal of Machine
Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
[63] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell, “Decaf:
A deep convolutional activation feature for generic visual recognition,” in International
conference on machine learning, pp. 647–655, 2014.
[64] A. Sharif Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, “Cnn features off-the-
shelf: an astounding baseline for recognition,” in Proceedings of the IEEE conference
on computer vision and pattern recognition workshops, pp. 806–813, 2014.
[65] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on com-
puter vision, pp. 1440–1448, 2015.
[66] J. Johnson, “cnn-benchmarks.” https://github.com/jcjohnson/cnn-benchmarks.
[67] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison,
L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
[68] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghe-
mawat, G. Irving, M. Isard, et al., “Tensorflow: A system for large-scale machine
learning,” in 12th {USENIX} Symposium on Operating Systems Design and Imple-
mentation ({OSDI} 16), pp. 265–283, 2016.
[69] T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and
Z. Zhang, “Mxnet: A flexible and efficient machine learning library for heterogeneous
distributed systems,” arXiv preprint arXiv:1512.01274, 2015.
[70] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Van-
houcke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the
IEEE conference on computer vision and pattern recognition, pp. 1–9, 2015.
[71] M. Lin, Q. Chen, and S. Yan, “Network in network,” arXiv preprint arXiv:1312.4400,
2013.
[72] S. Zagoruyko and N. Komodakis, “Wide residual networks,” arXiv preprint
arXiv:1605.07146, 2016.
[73] S. Xie, R. Girshick, P. Doll´ar, Z. Tu, and K. He, “Aggregated residual transformations
for deep neural networks,” in Proceedings of the IEEE conference on computer vision
and pattern recognition, pp. 1492–1500, 2017.
[74] G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Q. Weinberger, “Deep networks with
stochastic depth,” in European conference on computer vision, pp. 646–661, Springer,
2016.
[75] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected
convolutional networks,” in Proceedings of the IEEE conference on computer vision
and pattern recognition, pp. 4700–4708, 2017.
[76] G. Larsson, M. Maire, and G. Shakhnarovich, “Fractalnet: Ultra-deep neural networks
without residuals,” arXiv preprint arXiv:1605.07648, 2016.
[77] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer,
“Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model
size,” arXiv preprint arXiv:1602.07360, 2016.
[78] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, et al., “Gradient-based learning applied to
document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[79] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy,
A. Khosla, M. Bernstein, et al., “Imagenet large scale visual recognition challenge,”
International journal of computer vision, vol. 115, no. 3, pp. 211–252, 2015.
[80] A. Canziani, A. Paszke, and E. Culurciello, “An analysis of deep neural network models
for practical applications,” arXiv preprint arXiv:1605.07678, 2016.
[81] K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,”
in European conference on computer vision, pp. 630–645, Springer, 2016.