3:30 pm to 4:30 pm
3305 Newell-Simon Hall
Abstract: Transformers are ubiquitous. They influence nearly every aspect of modern AI. However, the mechanics of their training remain poorly understood. This poses a problem for the field due to the immense amounts of data, computational power, and energy being invested in the training of these networks. I highlight a recent intriguing empirical result from our group. Specifically, selfattention catastrophically fails to train unless it is paired with a skip connection. This contrasts with other components of a transformer that continue to demonstrate good performance (albeit suboptimal) when skip connections are removed. In this talk, I explore why this is the case and what could be done to enhance the fundamental training efficiency of modern transformers. We even showcase some practical cases in which removing self-attention completely can lead to significantly improved performance.
Bio: Simon Lucey Ph.D. is the Director of the Australian Institute for Machine Learning (AIML) and a professor in the School of Computer and Mathematical Sciences, at the University of Adelaide. He is also Director of the CommBank Foundational AI Research Centre. Prior to this he was an associate research professor at Carnegie Mellon University’s Robotics Institute (RI) in Pittsburgh USA; where he spent over 10 years as an academic. He was also Principal Research Scientist at the autonomous vehicle company Argo AI from 2017-2022. He has received various career awards, notably the AmCham AI Scientist of the year in 2024. He is also currently a member of the Australian Government’s AI Expert Group, and their National Robotics Strategy committee. Simon’s research interests span AI, machine learning, computer vision and robotics.
Sponsor
The VASC seminar is generously sponsored by HeyGen, an all-in-one AI-powered video generation platform that leverages advances in computer vision, generative modeling, and multimodal learning to make high-quality video creation both scalable and accessible.
