Machine translation and computer vision have greatly benefited of the advances in deep learning. The large and diverse amount of textual and visual data have been used to train neural networks whether in a supervised or self-supervised manner. Nevertheless, the convergence of the two field in sign language translation and production is still poses multiple open challenges, like the low video resources, limitations in hand pose estimation, or 3D spatial grounding from poses. This talk will present these challenges and the How2✌️Sign dataset (https://how2sign.github.io) recorded at CMU in collaboration with UPC, BSC, Gallaudet University and Facebook.
Xavier Giro-i-Nieto is an associate professor at the Universitat Politecnica de Catalunya (UPC) in Barcelona, member of the Image Processing Group (GPI), Intelligent Data Science and Artificial Intelligence Research Center (IDEAI-UPC), Institute of Industrial Robotics (IRI), and also a visiting researcher at Barcelona Supercomputing Center (BSC). He graduated in Telecommunications Engineering at ETSETB (UPC) in 2000, after completing his master thesis on image compression at the Vrije Universiteit in Brussels (VUB) with Prof. Peter Schelkens. After working one year in Sony Brussels, he returned to UPC to obtain a PhD on computer vision, supervised by Prof. Ferran Marqués and Prof. Shih-Fu Chang from the Digital Video and MultiMedia laboratory at Columbia University, that he repeatedly visited between 2008-2014. He serves as associate editor at IEEE Transactions on Multimedia, is an area chair in ICCV 2021, and reviews for top tier conferences & journals in machine learning (NeurIPS, ICML, TPAMI), computer vision (CVPR, ECCV, ICCV) and multimedia (ACMMM, ICMR).
Sponsored in part by: Facebook Reality Labs Pittsburgh