Title: Fast & Sample-efficient Algorithms for Neural Network Training with Theoretical Guarantee
Abstract: As deep learning (DL) breakthroughs drive the recent artificial intelligence (AI) boom, neural networks have become the most powerful tools in developing AI learning systems. Despite the early success, DL-based AI learning systems lack public trust because they work in a “black-box” manner. To trust its decisions, we need to know how an AI system arrives at its conclusions, which is to change the black magic of AI into scientific theorems. Moreover, as the sizes of both datasets and learning models grow increasingly fast, modern learning tasks demand a significantly high amount of reliable (labeled) data and computational resources to learn and extract desired information. In this talk, two algorithms, (i) magnitude-based neural network pruning and (ii) self-training via unlabeled data, with theoretical explanations, will be shared in addressing the challenges above in training neural networks. First, network pruning removes unnecessary parts (weights, neurons, or even layers) of the neural networks, and we observe an equal or even better performance at a significantly reduced computational cost. In addition, we connect the numerical success of pruning with theoretical characterizations of a benign landscape in training pruned networks, which leads to reduced sample complexity and fast convergence. Second, self-training leverages a large amount of unlabeled data to improve learning when the labeled data are limited. Acquiring labeled data involves human experts, which is costly and slow. Therefore, the labeled data amount does not always support the convergence in training neural networks. However, with the help of unlabeled data, we prove that the self-training algorithm converges to the desired point with a bounded distance even with insufficient labeled data. In addition, quantitative justifications between the unlabeled data amount and the improved performance are provided in both theoretical and numerical aspects.
Speaker Bio: Shuai is currently a postdoc at Rensselaer Polytechnic Institute. He received his Ph.D. from the Department of Electrical, Computer, and Systems Engineering at Rensselaer Polytechnic Institute in 2021, supervised by Prof. Meng Wang. He received his Bachelor’s degree in Electrical Engineering at the University of Science and Technology of China in 2016. His research interests span artificial intelligence/deep learning, optimization, data science, and signal processing, focusing on theoretical foundations of deep learning, and developing explainable and efficient learning algorithms.