BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Robotics Institute Carnegie Mellon University - ECPv6.15.12.1//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-ORIGINAL-URL:https://www.ri.cmu.edu
X-WR-CALDESC:Events for Robotics Institute Carnegie Mellon University
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20230312T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20231105T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20240310T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20241103T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20250309T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20251102T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20260308T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20261101T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20270314T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20271107T060000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20260513T153000
DTEND;TZID=America/New_York:20260513T163000
DTSTAMP:20260530T180338
CREATED:20260429T191158Z
LAST-MODIFIED:20260429T191629Z
UID:151146-1778686200-1778689800@www.ri.cmu.edu
SUMMARY:Quanta Perception as Probabilistic Events
DESCRIPTION:Abstract:  Autonomous systems ultimately rely on extracting information from light\, yet remain brittle in extreme environments\, from nighttime navigation to high-speed robotics. This limitation stems from a classical imaging abstraction: conventional sensors integrate photon flux over fixed exposure windows\, imposing trade-offs between sensitivity\, dynamic range\, and temporal resolution that degrade perception when photons are scarce or dynamics are rapid. Emerging quanta (single-photon) image sensors overcome these limits by detecting individual photons\, but they generate photon streams that exceed the compute and latency budgets of real-time systems by orders of magnitude. \n\nHere we introduce probabilistic events\, a computational primitive for real-time quanta perception at the limit of individual photons. By computing the posterior distribution over the time since the last abrupt intensity change\, we represent photon streams as recursively computed belief states. Rather than the binary\, fixed-threshold triggers of event cameras\, this recursive Bayesian formulation yields three simultaneous\, low-latency signals: motion-adaptive scene flux\, high-fidelity activity maps\, and an entropy measure quantifying perceptual uncertainty. This representation enables perception in extreme conditions\, including detecting and estimating the pose of a running person at ~0.05 lux illumination—without retraining standard vision models. Our approach sustains input throughputs exceeding 50\,000 quanta frames per second on commodity GPU hardware—four to five orders of magnitude faster than state-of-the-art quanta reconstruction baselines—yielding kilohertz-scale outputs even for megapixel arrays. By replacing frame reconstruction with direct probabilistic inference over photon streams\, this work enables real-time perception at the photon limit and bridges photon-counting quanta sensing with practical robotic vision.\n \nBio:   Varun Sundar is a graduate student at the University of Wisconsin–Madison\, pursuing a Ph.D. in computer science. At UW–Madison\, he is advised by Prof. Mohit Gupta\, where he focuses on single-photon imaging techniques. His work has been published at venues such as CVPR\, ICCV\, and SIGGRAPH\, and has included live demos at ICCP 2023\, CVPR 2024 and SIGGRAPH 2024 (which won the best-in-show award in the Emerging Technologies track). In 2026\, he was awarded the Ivanisevic Award at UW–Madison\, which recognizes outstanding computer science dissertators. He previously received a bachelor’s degree in electrical engineering from the Indian Institute of Technology\, Madras in 2020. \nHomepage:   https://varun19299.github.io/ \nSponsor:\nThe VASC seminar is generously sponsored by HeyGen\, an all-in-one AI-powered video generation platform that leverages advances in computer vision\, generative modeling\, and multimodal learning to make high-quality video creation both scalable and accessible.
URL:https://www.ri.cmu.edu/event/quanta-perception-as-probabilistic-events/
LOCATION:3305 Newell-Simon Hall
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2026/04/5-13-26.jpg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20260427T153000
DTEND;TZID=America/New_York:20260427T163000
DTSTAMP:20260530T180338
CREATED:20260421T163228Z
LAST-MODIFIED:20260421T163228Z
UID:151102-1777303800-1777307400@www.ri.cmu.edu
SUMMARY:Learning Through Fitting: Advancing Non-Pixel Representations for Visual Inference
DESCRIPTION:Abstract:  Gridded pixel and voxel representations form the backbone of visual computing\, but they struggle to scale efficiently to large\, high-dimensional data\, such as volumetric medical scans and complex scientific simulations. Consequently\, continuous\, nongridded models such as implicit neural representations (INRs) and Gaussian splatting have gained significant research traction over the past five years. However\, their use has largely been confined to signal reconstruction rather than acting as foundational data types for downstream analysis. In this talk\, I will present our recent work on elevating continuous models beyond mere signal representation. First\, I will discuss how injecting learned priors into INRs via strategic parameter initialization enables powerful new capabilities\, including rapid\, amortized fitting to novel signals and even semantic segmentation. Second\, I will briefly outline our recent efforts in performing visual recognition tasks directly on 2D Gaussian image representations. Finally\, I will highlight interesting future directions in this “learning through fitting” paradigm of visual computing. \nBio:  Guha Balakrishnan is an Assistant Professor in the Electrical and Computer Engineering Department at Rice University. His research group tackles a diverse range of problems across computer vision and imaging\, with a primary focus on developing efficient neural representations for complex visual signals and advancing responsible AI through uncertainty estimation and interpretability techniques. He frequently grounds these methods in real-world applications by collaborating with domain experts in scientific disciplines such as medicine and the geosciences. His scientific contributions have been recognized with several honors\, including the NSF CAREER Award and the MICCAI Best Paper Award. Before joining Rice\, he completed his Ph.D. at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL)\, and earned his undergraduate degrees in Computer Science and Computer Engineering from the University of Michigan\, Ann Arbor. \nHomepage:  www.guhabalakrishnan.com \nSponsor:\nThe VASC seminar is generously sponsored by HeyGen\, an all-in-one AI-powered video generation platform that leverages advances in computer vision\, generative modeling\, and multimodal learning to make high-quality video creation both scalable and accessible.
URL:https://www.ri.cmu.edu/event/learning-through-fitting-advancing-non-pixel-representations-for-visual-inference/
LOCATION:Newell-Simon Hall 4305
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2026/04/4-27-26-Balakrishnan.jpeg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20260413T153000
DTEND;TZID=America/New_York:20260413T163000
DTSTAMP:20260530T180338
CREATED:20260331T235136Z
LAST-MODIFIED:20260331T235318Z
UID:150830-1776094200-1776097800@www.ri.cmu.edu
SUMMARY:Generative Re-Photography with Video Models
DESCRIPTION:Abstract: I will introduce “generative re-photography” methods that use new generative video models to get more out of your photos—even the blurry ones. First\, I will present a method for converting motion-blurred images to video. This method can even predict the “past” and “future” (right before and after the capture) of a motion-blurred image. I will then show how this method can bring “historical scenes to life” such as photos of soldiers landing on north side of France during the Normandy invasion of 1944 or a boxing match between Mohammed Ali and Jurgen Blin in 1971. Then\, I will present a robust post-capture refocusing method that converts a single defocus-blurred image into a focal stack spanning multiple focus distances. Our work overturns the conventional wisdom of photography\, suggesting these “corrupted images” can actually reveal more about the world than the “perfect” images which have been the holy grail of image processing. Additionally\, our findings suggest that video models implicitly understand how camera capture settings affect image appearance\, and I will discuss how this exciting capability could inspire new directions for computational photography. \nBio: Sai Tedla is a PhD student at York University\, Toronto\, supervised by Michael Brown. He currently works on the intersection of computational photography and generative models. He is a visiting student at the University of Toronto supervised by David Lindell and Kyros Kutulakos\, and will soon join the university as a Schmidt AI Postdoctoral Fellow. Additionally\, Sai is a current intern at Sony AI Japan and has previously interned at Samsung AI Center Toronto and Adobe NextCam. \nHomepage:  https://sites.google.com/view/tedlasai \n  \nSponsor \nThe VASC seminar is generously sponsored by HeyGen\, an all-in-one AI-powered video generation platform that leverages advances in computer vision\, generative modeling\, and multimodal learning to make high-quality video creation both scalable and accessible.
URL:https://www.ri.cmu.edu/event/generative-re-photography-with-video-models/
LOCATION:3305 Newell-Simon Hall
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2026/03/4-13-26-rotated.jpeg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20260309T153000
DTEND;TZID=America/New_York:20260309T163000
DTSTAMP:20260530T180338
CREATED:20260303T211052Z
LAST-MODIFIED:20260303T211052Z
UID:150524-1773070200-1773073800@www.ri.cmu.edu
SUMMARY:Nano-optics for smart sensing and display
DESCRIPTION:Abstract: Nano-optical devices provide a new way to control light at the subwavelength scale\, enabling optical functionalities beyond conventional optics. By engineering the nanostructures\, we can tailor the optical response as a function of space\, polarization\, wavelength\, and angle of incidence — effectively turning the optical front end into a controllable\, programmable physical layer. This creates an interesting interplay between optical design and computation: on one hand\, nanooptics can be incorporated and co-designed within the computational pipeline\, enabling new approaches to smart sensing\, imaging\, and display; on the other hand\, computational methods can be used to discover and optimize new classes of optical instruments that go beyond intuitive\, hand designed architectures. \nIn this talk\, I will first introduce the basics of nanooptics\, highlighting key opportunities and current limitations. I will then present several concrete examples: nanooptics for depth sensing\, polarization imaging\, and nanooptics-based new AR display architectures. I will conclude with a view of what it would take to make these systems robust and scalable\, and where collaboration with the computer vision community can have the most impact. \nBio: Zhujun Shi is an Assistant Professor of Physics and Astronomy at the University of Pittsburgh. Her group explores new frontiers in light manipulation using nanophotonics. Prior to joining Pitt\, she was a research scientist at Meta Reality Labs. She received her B.S. in Physics from Tsinghua University in 2015 and her Ph.D. in Physics from Harvard University in 2020. \nHomepage:  https://www.shiphotonics.org/ \nSponsor \nThe VASC seminar is generously sponsored by HeyGen\, an all-in-one AI-powered video generation platform that leverages advances in computer vision\, generative modeling\, and multimodal learning to make high-quality video creation both scalable and accessible.
URL:https://www.ri.cmu.edu/event/nano-optics-for-smart-sensing-and-display/
LOCATION:3305 Newell-Simon Hall
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2026/03/shi-6.jpg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20260202T153000
DTEND;TZID=America/New_York:20260202T163000
DTSTAMP:20260530T180338
CREATED:20260123T150837Z
LAST-MODIFIED:20260123T150837Z
UID:150171-1770046200-1770049800@www.ri.cmu.edu
SUMMARY:From Lab to Reality: Reliable 3D Vision in the Wild
DESCRIPTION:VIRTUAL SEMINAR \nAbstract: While deep learning has revolutionized 3D computer vision\, a significant gap remains between the performance achieved in controlled laboratory settings and that in complex\, uncontrolled real-world environments. This talk addresses the critical challenges of robustness and generalization required to bridge this gap. In this presentation\, I will first discuss our contributions to 3D reconstruction\, including robust multi-view reconstruction\, physically grounded 3D shape generation\, and 3D Gaussian Splatting under sparse-view conditions. Next\, I will cover 3D interaction with a focus on generalizable object pose estimation. I will demonstrate how leveraging different types of reference information can facilitate pose estimation for previously unseen objects in uncontrolled environments. Finally\, I will conclude by outlining future directions toward multi-modal 3D understanding\, unified 3D representations\, and the development of 3D foundation models. \nBio: Chen Zhao is a Postdoctoral Research Fellow at the Computer Vision Lab\, EPFL\, working with Dr. Mathieu Salzmann and Prof. Pascal Fua. Earlier\, he was a PhD candidate at EPFL\, supervised by Dr. Mathieu Salzmann and Prof. Pascal Fua. His research interests lie in 3D computer vision\, with a specific focus on 3D reconstruction\, 3D interaction\, and 3D understanding. \nHomepage:  https://sailor-z.github.io/ \nSponsor:  The VASC seminar is generously sponsored by HeyGen\, an all-in-one AI-powered video generation platform that leverages advances in computer vision\, generative modeling\, and multimodal learning to make high-quality video creation both scalable and accessible.
URL:https://www.ri.cmu.edu/event/from-lab-to-reality-reliable-3d-vision-in-the-wild/
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2026/01/2-2-26-chen.jpeg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20251215T153000
DTEND;TZID=America/New_York:20251215T163000
DTSTAMP:20260530T180338
CREATED:20251203T162327Z
LAST-MODIFIED:20251210T163815Z
UID:149661-1765812600-1765816200@www.ri.cmu.edu
SUMMARY:Should we skip attention?
DESCRIPTION:Abstract: Transformers are ubiquitous. They influence nearly every aspect of modern AI. However\, the mechanics of their training remain poorly understood. This poses a problem for the field due to the immense amounts of data\, computational power\, and energy being invested in the training of these networks. I highlight a recent intriguing empirical result from our group. Specifically\, selfattention catastrophically fails to train unless it is paired with a skip connection. This contrasts with other components of a transformer that continue to demonstrate good performance (albeit suboptimal) when skip connections are removed. In this talk\, I explore why this is the case and what could be done to enhance the fundamental training efficiency of modern transformers. We even showcase some practical cases in which removing self-attention completely can lead to significantly improved performance. \nBio: Simon Lucey Ph.D. is the Director of the Australian Institute for Machine Learning (AIML) and a professor in the School of Computer and Mathematical Sciences\, at the University of Adelaide. He is also Director of the CommBank Foundational AI Research Centre. Prior to this he was an associate research professor at Carnegie Mellon University’s Robotics Institute (RI) in Pittsburgh USA; where he spent over 10 years as an academic. He was also Principal Research Scientist at the autonomous vehicle company Argo AI from 2017-2022. He has received various career awards\, notably the AmCham AI Scientist of the year in 2024. He is also currently a member of the Australian Government’s AI Expert Group\, and their National Robotics Strategy committee. Simon’s research interests span AI\, machine learning\, computer vision and robotics. \n  \nSponsor \nThe VASC seminar is generously sponsored by HeyGen\, an all-in-one AI-powered video generation platform that leverages advances in computer vision\, generative modeling\, and multimodal learning to make high-quality video creation both scalable and accessible.
URL:https://www.ri.cmu.edu/event/should-we-skip-attention/
LOCATION:3305 Newell-Simon Hall
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2025/12/12-12-25-Lucy.jpg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20251208T153000
DTEND;TZID=America/New_York:20251208T163000
DTSTAMP:20260530T180338
CREATED:20251202T190520Z
LAST-MODIFIED:20251206T151245Z
UID:149651-1765207800-1765211400@www.ri.cmu.edu
SUMMARY:What Can We Learn from a Million Models?
DESCRIPTION:Abstract: Machine learning has transformed many fields by learning from large collections of data. Yet\, it is rarely applied to its own outputs: the models themselves. Today\, with millions of publicly available models\, a natural question arises: what can we do with so many models? In this talk\, I will motivate two core applications that leverage this untapped potential\, demonstrating their utility in the context of computer vision: (i) identifying emerging trends in model design\, and (ii) reducing the need to train models from scratch through model recycling. To support these goals\, I introduce the Model Atlas: a structured graph that represents models\, their attributes\, and the weight-space transformations that interconnect them. My research into weight-space learning enables the construction of this atlas by treating models themselves as data and inferring properties such as functionality\, performance\, and lineage directly from their weights. I will present key observations and methodologies that make weight-space learning possible at scale. As a visual prelude\, you can explore the repository under study at: https://horwitz.ai/model-atlas . \nBio: Eliahu Horwitz is a Google PhD Fellow in Machine Learning and ML Foundations and a final-year PhD candidate in Computer Science at The Hebrew University of Jerusalem\, advised by Prof. Yedid Hoshen. His research centers on learning representations of neural network weights and understanding model populations directly in weight space. He is particularly interested in how weight-space learning can enable new downstream capabilities\, such as model forensics\, model discovery\, and interpretability\, and in how treating models as data points can advance broader areas of machine learning. Eliahu is also a recipient of the Israeli Council for Higher Education Scholarship and has previously interned at Google Research. \nHomepage:  https://horwitz.ai \nSponsor \nThe VASC seminar is generously sponsored by HeyGen\, an all-in-one AI-powered video generation platform that leverages advances in computer vision\, generative modeling\, and multimodal learning to make high-quality video creation both scalable and accessible.
URL:https://www.ri.cmu.edu/event/what-can-we-learn-from-a-million-models/
LOCATION:3305 Newell-Simon Hall
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2025/12/12-8-25.jpeg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20251103T153000
DTEND;TZID=America/New_York:20251103T163000
DTSTAMP:20260530T180338
CREATED:20251027T174614Z
LAST-MODIFIED:20251027T174614Z
UID:149194-1762183800-1762187400@www.ri.cmu.edu
SUMMARY:From Video Generation to Video World Models
DESCRIPTION:Abstract:\nVideo diffusion models have achieved remarkable success in content creation\, yet they still fall short of simulating interactive worlds that respond to users in real time. This talk examines the fundamental challenges preventing these models from evolving into true world simulators. I will present a series of works — CausVid\, Self-Forcing\, MotionStream\, and State-Space World Model — that collectively mark a paradigm shift from non-causal diffusion models to autoregressive–diffusion hybrids capable of streaming long-duration videos with real-time interactivity. These advances move beyond passive video generation toward dynamic\, immersive experiences\, unlocking new possibilities across gaming\, robotics\, live video editing\, and augmented/virtual reality. \nBio: Xun Huang was a Research Scientist at Adobe\, NVIDIA\, as well as an Adjunct Professor at Carnegie Mellon University. He is currently the Founder and CEO of a stealth startup. He obtained his Ph.D. from Cornell University in 2020 under the advisement of Professor Serge Belongie. His doctoral research was recognized with the Fellowship from NVIDIA\, Adobe\, and Snap. His research interests lie broadly in deep generative models\, with a recent focus on video world models. \nHomepage:  xunhuang.me \nSponsor \nThe VASC seminar is generously sponsored by HeyGen\, an all-in-one AI-powered video generation platform that leverages advances in computer vision\, generative modeling\, and multimodal learning to make high-quality video creation both scalable and accessible.
URL:https://www.ri.cmu.edu/event/from-video-generation-to-video-world-models/
LOCATION:3305 Newell-Simon Hall
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2025/10/11-3-25.jpeg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20251015T130000
DTEND;TZID=America/New_York:20251015T140000
DTSTAMP:20260530T180338
CREATED:20251008T135546Z
LAST-MODIFIED:20251008T140726Z
UID:149052-1760533200-1760536800@www.ri.cmu.edu
SUMMARY:Seeing Deep Inside Scattering Tissue Using Efficient\, Noise-Robust Wavefront Shaping
DESCRIPTION:Abstract:\nScattering limits our ability to see inside biological tissue\, as light penetration is severely distorted by tissue components with varying refractive indices. One promising method to overcome scattering aberration is wavefront shaping. This technique involves placing a spatial light modulator (SLM) in the microscope’s optical path to correct the wavefront emitted from a point deep within the tissue. The goal is to bring light photons from a single target point to a single sensor point\, despite tissue aberrations. This technique has the potential to revolutionize tissue imaging by enabling high-SNR imaging deep within scattering biological targets. However\, estimating wavefront-shaping modulations in practice is challenging\, since the modulations must be estimated in real time\, using non-invasive feedback\, and under a low photon budget. \nIn the first part of this talk\, I will discuss efforts to derive noise-robust score functions that can identify effective modulation corrections using non-invasive feedback. I will review previous approaches and introduce a new\, simple\, noise-robust method that uses confocal correction of both incoming and outgoing light with linear single-photon fluorescent excitation. We show that despite the fact that we are only measuring light outside the tissue and have no direct way to measure how well light has focused inside the tissue\, maximizing the single-photon confocal intensity guarantees that we also focus all light into a spot inside the tissue. \nGiven a score function\, estimating the desired modulation becomes an optimization problem. However\, since the desired modulation depends on the unknown tissue structure\, typical optimization strategies involve slow sequential scanning\, where each modulation parameter is queried independently. In the second part of this talk\, I will present a novel approach for rapid modulation optimization. This method leverages optical computing ideas and uses the optical system to directly measure the gradient of the score function\, allowing simultaneous updates of all modulation parameters from a single measurement. \nBio: Anat Levin is a Professor at the department of Electrical and Computer Engineering\, Technion\, Israel\, doing research in the field of computational imaging. She received a Ph.D. in computer science from the Hebrew University in 2006. During the years 2007- 2009 she was a postdoc at MIT CSAIL\, and during 2009-2016 she was an Assistant and Associate Prof. at the department of Computer Science and Applied Math\, the Weizmann Inst. of Science.\nProf. Levin has received numerous awards for her research\, including the CVRP PAMI young researcher award in 2013; the eurographics young researcher award in 2010; the eurographics outstanding technical contributions award in 2024; the Blavatnik award in 2018; and 3 ERC grants. \nHomepage:  https://webee.technion.ac.il/people/anat.levin/ \n  \nSponsor \nThe VASC seminar is generously sponsored by HeyGen\, an all-in-one AI-powered video generation platform that leverages advances in computer vision\, generative modeling\, and multimodal learning to make high-quality video creation both scalable and accessible.
URL:https://www.ri.cmu.edu/event/seeing-deep-inside-scattering-tissue-using-efficient-noise-robust-wavefront-shaping/
LOCATION:3305 Newell-Simon Hall
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2025/10/10-15-25.jpg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20250922T153000
DTEND;TZID=America/New_York:20250922T163000
DTSTAMP:20260530T180338
CREATED:20250909T182843Z
LAST-MODIFIED:20250918T191536Z
UID:148799-1758555000-1758558600@www.ri.cmu.edu
SUMMARY:From Sparse to Dense\, and Back to Sparse Again?
DESCRIPTION:Abstract: Computer vision architectures used to be built on a sparse sample of points in the 80s and 90s. In the 2000s\, dense models started to become popular for visual recognition as heuristically defined sparse models do not cover all the important parts of an image. However\, with deep learning and end-to-end training approaches\, this does not have to continue and sparse models may still have significant advantages in saving unnecessary computation as well as being more flexible. In this talk\, I will talk about point cloud deep learning\, how to make it aware of invariances\, as well as its diverse applications\, such as point cloud segmentation\, interaction modeling among objects\, point cloud completion and world models for robot manipulation tasks. Point cloud approaches also excel as 2D image recognition backbones. I will introduce our work AutoFocusFormer that uses point cloud backbones and decoders to solve dense 2D prediction tasks such as segmentation\, with a novel end-to-end learned adaptive hierarchical downsampling module. This is very helpful for detecting tiny objects faraway in the scene which would have been decimated by conventional grid downsampling approaches. Finally I will introduce some recent work applying AutoFocusFormer for Gaussian splatting semantic SLAM which greatly advanced state-of-the-art.\n \nBio: Fuxin Li is currently an associate professor in the School of Electrical Engineering and Computer Science at Oregon State University. He has held research positions at Apple Inc.\, University of Bonn and Georgia Institute of Technology. He had obtained a Ph.D. degree in the Institute of Automation\, Chinese Academy of Sciences in 2009. He has won an NSF CAREER award\, an Amazon Research Award\, CVPR 2024 Best Student Paper runner-up award\, (co-)won the PASCAL VOC semantic segmentation challenges from 2009-2012\, and led a team to the 4th place finish in the DAVIS Video Segmentation challenge 2017. He is a program chair of CVPR 2025. He has published more than 90 papers in computer vision\, machine learning\, as well as applications of machine learning and computer vision. His main research interests are point cloud deep networks\, human understanding of deep learning\, video object segmentation\, multi-target tracking and uncertainty estimation in deep learning.\n \nHomepage:  Dr. Fuxin Li\n\n\n\nSponsor:  The VASC seminar is generously sponsored by HeyGen\, an all-in-one AI-powered video generation platform that leverages advances in computer vision\, generative modeling\, and multimodal learning to make high-quality video creation both scalable and accessible.
URL:https://www.ri.cmu.edu/event/from-sparse-to-dense-and-back-to-sparse-again/
LOCATION:Newell-Simon Hall 3305
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2025/09/9-22-25-Li.jpg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20250915T153000
DTEND;TZID=America/New_York:20250915T163000
DTSTAMP:20260530T180338
CREATED:20250909T173311Z
LAST-MODIFIED:20250909T173311Z
UID:148795-1757950200-1757953800@www.ri.cmu.edu
SUMMARY:Whole-Body Conditioned Egocentric Video Prediction
DESCRIPTION:Abstract: We train models to Predict Ego-centric Video from human Actions (PEVA)\, given the past video and an action represented by the relative 3D body pose. By conditioning on kinematic pose trajectories\, structured by the joint hierarchy of the body\, our model learns to simulate how physical human actions shape the environment from a first-person point of view. We train an auto-regressive conditional diffusion transformer on Nymeria\, a large-scale dataset of real-world egocentric video and body pose capture. We further design a hierarchical evaluation protocol with increasingly challenging tasks\, enabling a comprehensive analysis of the model’s embodied prediction and control abilities. Our work represents an initial attempt to tackle the challenges of modeling complex real-world environments and embodied agent behaviors with video prediction from the perspective of a human. \nBio: Yutong Bai is currently a Postdoc Researcher at UC Berkeley (Berkeley AI Research)\, advised by Prof. Alexei (Alyosha) Efros\, Prof. Jitendra Malik\, and Prof. Trevor Darrell. Prior to that\, she obtained her PhD in Computer Science at Johns Hopkins University advised by Prof. Alan Yuille. She has interned at Meta AI (FAIR Labs) and Google Brain\, and was selected as a 2023 Apple Scholar and an MIT EECS Rising Star. Her work was nominated for the CVPR 2022 Best Paper Award. \nHer research aims to build intelligent systems from first principles—systems that do not merely fit patterns or follow instructions\, but that gradually develop structure\, abstraction\, and behavior through learning itself. She is interested in how intelligence emerges not from handcrafted pipelines or task-specific heuristics\, but from exposure to behaviorally rich\, understructured environments where models must learn what to attend to\, how to reason\, and how to improve. This involves designing learning systems that are not narrowly optimized for a single goal\, but that can self-organize and grow increasingly competent through interaction\, experience\, and computation. While she sees scale as a powerful tool\, she does not view it as the whole solution: larger models open up capacity\, but what fills that capacity—and how it forms—is just as important. Her research explores how to use scale to amplify the right signals—not just data quantity\, but the structural richness of behavior and the dynamics of learning itself. \n\nSpeaker Homepage: yutongbai.com
URL:https://www.ri.cmu.edu/event/whole-body-conditioned-egocentric-video-prediction/
LOCATION:Newell-Simon Hall 3305
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2025/09/9-15-25-yutong.jpeg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20250501T120000
DTEND;TZID=America/New_York:20250501T130000
DTSTAMP:20260530T180338
CREATED:20250422T202936Z
LAST-MODIFIED:20250422T202936Z
UID:146276-1746100800-1746104400@www.ri.cmu.edu
SUMMARY:When Spatial Computing meets Accelerated Computing
DESCRIPTION:Abstract:  NVIDIA has been pioneering Accelerated Computing for the past three decades\, driving innovations that have transformed society. Among all personal computing mediums\, Spatial Computing and Extended Reality (XR) stand out as some of the most promising beneficiaries of accelerated computing. In this talk\, we will explore the latest developments and trends in the XR ecosystem\, highlighting a range of form factors: from augmented reality on handheld devices and AI glasses\, to fully immersive mixed reality experiences in head-mounted displays. We will also delve into some of the most compelling immersive use-cases. Additionally\, we will discuss NVIDIA’s contributions at the intersection of XR and AI\, illustrating how AI is being leveraged to mold and enhance XR experiences. By synthesizing spatial computing and accelerated computing\, their dynamic interaction will shape the future of computing and society. \nBio:  David Chu is VP of Spatial Computing and XR at NVIDIA\, where he is bringing accelerated computing to spatial computing. Formerly\, he was VP of Engineering at Magic Leap\, where he led teams on perception\, deep learning and immersion for AR. Prior to that\, David was at Google where he worked on AR and VR\, as well as cloud services\, edge computing and cloud gaming. Before that\, David was faculty at the University of Illinois\, Urbana-Champaign (UIUC)\, and a staff member at Microsoft Research. David’s work has received awards such as Best of CES\, Best Papers in MobiSys\, and Best Demos in MobiSys and SenSys. He has served as PC Chair for ISMAR and NetGames\, and General Chair of HotMobile. His individual work has appeared in such places as CNBC\, Fast Company\, VentureBeat\, TechCrunch\, PC Magazine\, GameSpot\, Ars Technia\, Slashdot\, The Verge\, Engadget\, Yahoo and Wired. \nHomepage:  https://www.linkedin.com/in/chudavid/
URL:https://www.ri.cmu.edu/event/when-spatial-computing-meets-accelerated-computing/
LOCATION:3305 Newell-Simon Hall
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2025/04/5-1-25-david_chu.jpeg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20250428T153000
DTEND;TZID=America/New_York:20250428T163000
DTSTAMP:20260530T180338
CREATED:20250416T212119Z
LAST-MODIFIED:20250416T212119Z
UID:146213-1745854200-1745857800@www.ri.cmu.edu
SUMMARY:Generating a Physical World
DESCRIPTION:Abstract:  Generating an interactive\, enlivened\, and physical world enables a wide range of applications in entertainment\, embodied AI\, education\, and creative designs. Recent image/video models have shown promise in producing realistic visuals\, yet they operate purely at the pixel level and lack underlying physical grounding\, leading to failures in physical fidelity and user interactivity. In this talk\, I’ll introduce our recent efforts in physical world generation by grounding pixel models onto physical models. This methodology inherently incorporates physical world knowledge about 3D spatial structures and dynamics\, simultaneously acquiring visual realism\, physical fidelity\, and user interactivity. I’ll showcase how this methodology is applied to enable fast generation of diverse worlds\, with which users can interact via 3D actions. \nBio:  Hong-Xing “Koven” Yu (https://kovenyu.com/) is a 5th-year PhD candidate at the Computer Science Department of Stanford University\, advised by Prof. Jiajun Wu. His research interest centers around how AI can understand and generate the physical world. He is a recipient of the SIGGRAPH Asia Best Paper Award\, the Stanford SoE Fellowship\, the Qualcomm Innovation Fellowship\, and the Meshy Fellowship\, and a finalist of the NVIDIA Fellowship\, the Meta Fellowship\, the Jane Street Fellowship\, and the Roblox Fellowship. \nHomepage:  https://kovenyu.com/
URL:https://www.ri.cmu.edu/event/generating-a-physical-world/
LOCATION:3305 Newell-Simon Hall
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2025/04/4-28-25-Hong-Xing_Koven_Yu.jpeg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20250324T153000
DTEND;TZID=America/New_York:20250324T163000
DTSTAMP:20260530T180338
CREATED:20250321T160455Z
LAST-MODIFIED:20250321T160455Z
UID:145781-1742830200-1742833800@www.ri.cmu.edu
SUMMARY:Autoregressive Models: Foundations and Open Questions
DESCRIPTION:Abstract: \nThe success of Autoregressive (AR) models in language today is so tremendous that their scope has\, in turn\, been largely narrowed to specific instantiations. In this talk\, we will revisit the foundations of classical AR models\, discussing essential concepts that may have been overlooked in modern practice. We will then introduce our recent research on broadening the scope of modern AR models in the context of image generation\, challenging the common beliefs about how AR models can be built. We will also discuss open questions and potential directions for future research. \nBio: \nKaiming He is an Associate Professor in the Department of EECS at MIT which he joined in Feb 2024. Before that\, he was a research scientist in industrial labs including Facebook AI Research (FAIR\, 2016-2024) and Microsoft Research (MSR\, 2011-2016). His research covers a wide range of topics in Computer Vision and Machine Learning. His work has been recognized by numerous prestigious awards in the community\, including the PAMI Young Researcher Award 2018 and multiple Best Paper Awards at top-tier conferences such as CVPR\, ICCV\, and ECCV.
URL:https://www.ri.cmu.edu/event/autoregressive-models-foundations-and-open-questions/
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2025/03/3-24-25-He.jpg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20250320T140000
DTEND;TZID=America/New_York:20250320T150000
DTSTAMP:20260530T180338
CREATED:20250310T154425Z
LAST-MODIFIED:20250317T150342Z
UID:145620-1742479200-1742482800@www.ri.cmu.edu
SUMMARY:The New Era of Video Generation
DESCRIPTION:Abstract: \nTraditional video production is slow\, expensive\, and requires specialized skills. Founded by CMU alumni\, HeyGen is an AI-native video platform designed to revolutionize the video creation process by making visual storytelling accessible to all. We’ve successfully grown to more than 20M users\, and tens of millions revenue in less than one year\, with recognition as the #1 Fastest Growing Software Product by G2 in 2025. \nIn this talk\, I will talk about how HeyGen leverages cutting-edge AI to enable users to create\, localize\, personalize\, and interact with videos with our state-of-the-art human-centric video engine. In particular\, I will cover our key user cases including avatar videos\, video translation\, interactive avatar and a number of AI innovations for video generation. I will share more on our in-house research areas around video generation as well as real-world challenges in building AI applications. \nBio: \nDr. Rong Yan is currently the CTO for HeyGen which is an innovative AI-driven video platform that allows users to create videos with AI-generated avatars and voices. Our mission is to make visual storytelling accessible to everyone. Before joining HeyGen\, Rong was a VP of Engineering in Hubspot Inc. responsible for their Data products including data platform\, automation\, data integration and reporting. Before Hubspot\, he was the Senior Director of Product Engineering in Snapchat\, Director of Data in Square\, and Eng Manager of Ads Ranking in Facebook. Dr. Yan received his M.Sc. (2004) and Ph.D. (2006) degree from Carnegie Mellon University’s School of Computer Science. He has received 2 Best Paper runner-Up awards\, published more than 60 papers\, and co-chaired more than 10 conferences / workshops in the related domains.
URL:https://www.ri.cmu.edu/event/the-new-era-of-video-generation/
LOCATION:Newell-Simon Hall 4305
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2025/03/3-20-25-Rong-scaled.jpeg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20250224T153000
DTEND;TZID=America/New_York:20250224T163000
DTSTAMP:20260530T180338
CREATED:20250213T141722Z
LAST-MODIFIED:20250213T141722Z
UID:145416-1740411000-1740414600@www.ri.cmu.edu
SUMMARY:Discovering and Erasing Undesired Concepts
DESCRIPTION:Abstract: \nThe rapid growth of generative models allows an ever-increasing variety of capabilities. Yet\, these models may also produce undesired content such as unsafe or misleading images\, private information\, or copyrighted material. \nIn this talk\, I will discuss practical methods to prevent undesired generations. First\, I will show how the challenge of avoiding undesired generations manifested itself in a simple Capture-the-Flag LLM setting\, where even our top defense strategy was breached. Next\, I will demonstrate a similar vulnerability in state-of-the-art concept erasure methods for Text-to-Image models. Finally\, I will distinguish between erasure through Guidance-Based Avoidance and Destruction-Based Removal methods. I will discuss the trade-offs of each approach and their behavior in various settings.\n \nBio: \nNiv is a postdoctoral researcher at New York University hosted by Prof. Chinmay Hegde. He received a BSc in mathematics with physics as part of the Technion Excellence Program. He received his PhD in computer science from the Hebrew University of Jerusalem\, advised by Prof. Yedid Hoshen. Niv was awarded the Israeli data science scholarship for outstanding postdoctoral fellows (VATAT). He is interested in anomaly detection\, representation learning\, and AI safety for Vision & Language models. \n\nHomepage:  https://nivc.github.io/
URL:https://www.ri.cmu.edu/event/discovering-and-erasing-undesired-concepts/
LOCATION:3305 Newell-Simon Hall
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2025/02/2-24-25-Cohen.jpeg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20250217T153000
DTEND;TZID=America/New_York:20250217T163000
DTSTAMP:20260530T180338
CREATED:20250211T211319Z
LAST-MODIFIED:20250211T211319Z
UID:145390-1739806200-1739809800@www.ri.cmu.edu
SUMMARY:Controllable Visual Imagination
DESCRIPTION:Abstract: \nGenerative models have empowered human creators to visualize their imaginations without artistic skills and labor. A prominent example is large-scale text-to-image generation models. However\, these models often are difficult to control and do not respect 3D perspective geometry and temporal consistency of videos. In this talk\, I will showcase several of our recent efforts to improve controllability for image\, video\, 3D generation\, and editing. Specifically\, I will talk about how we improve semantic control for 2D image generations\, generate realistic textures from reference images for 3D objects\, and synthesize novel views\, lighting\, and weather for 3D scenes. \n\nBio: \nJia-Bin Huang is a Capital One-endowed Associate Professor of Computer Science at the University of Maryland College Park. Before coming to UMD\, Huang was a research scientist at Meta Reality Labs. Before that\, he was an Assistant Professor of Electrical and Computer Engineering at Virginia Tech. Huang received his Ph.D. from the University of Illinois\, Urbana-Champaign (UIUC) in 2016. His research interests include 3D computer vision\, generative models\, and computational photography. Huang is the recipient of the Thomas & Margaret Huang Award\, NSF CRII award\, faculty award from Samsung\, Google\, 3M\, Qualcomm\, and a Google Research Scholar Award. \n\nHomepage:  https://jbhuang0604.github.io/
URL:https://www.ri.cmu.edu/event/controllable-visual-imagination/
LOCATION:3305 Newell-Simon Hall
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2025/02/2-17-25.jpeg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20241202T153000
DTEND;TZID=America/New_York:20241202T163000
DTSTAMP:20260530T180338
CREATED:20241125T192417Z
LAST-MODIFIED:20241125T195030Z
UID:144499-1733153400-1733157000@www.ri.cmu.edu
SUMMARY:Practical Challenges and Recent Advances in Data Attribution
DESCRIPTION:Abstract: \nData plays an increasingly crucial role in both the performance and the safety of AI models. Data attribution is an emerging family of techniques aimed at quantifying the impact of individual training data points on a model trained on them\, which has found data-centric applications such as training data curation\, instance-based explanation\, and copyright compensation. In this talk\, I will explore practical challenges of deploying data attribution in real-world applications. \n  \nIn the first part\, I will examine the adversarial robustness of data attribution methods\, particularly in the context of fairly compensating training data providers. Our study reveals a critical vulnerability\, demonstrating how malicious data providers can manipulate these data to unfairly inflate their compensation. In the second part\, I will address the limitations in the flexibility of existing influence function approaches and introduce a novel method that extends data attribution to broader machine learning paradigms\, including survival analysis and contrastive learning. If time permits\, I will also briefly introduce our efforts to tackle challenges related to computational efficiency and group effects in data attribution\, and discuss the current advancements and open problems in this field. \n  \nBio: \nJiaqi Ma is an Assistant Professor at the University of Illinois Urbana-Champaign (UIUC). His research interests lie in the broad area of trustworthy AI\, with recent focuses including data attribution\, machine unlearning\, explainable machine learning\, and training data curation. Jiaqi’s work has been recognized with the Gary M. Olson Outstanding Student Award from University of Michigan and a Best Paper Award from the DPFM Workshop at ICLR 2024. Prior to joining UIUC\, Jiaqi earned his PhD from the University of Michigan and worked as a postdoctoral researcher at Harvard University. \n  \nHomepage:  Jiaqi Ma
URL:https://www.ri.cmu.edu/event/practical-challenges-and-recent-advances-in-data-attribution/
LOCATION:3305 Newell-Simon Hall
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2024/11/12-2-24.jpeg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20241125T153000
DTEND;TZID=America/New_York:20241125T163000
DTSTAMP:20260530T180338
CREATED:20241112T181808Z
LAST-MODIFIED:20241112T182342Z
UID:144237-1732548600-1732552200@www.ri.cmu.edu
SUMMARY:High-resolution cloth simulation in milliseconds: Efficient GPU Cloth Simulation with Non-distance Barriers and Subspace Reuse   Interactions
DESCRIPTION:Abstract: \nWe show how to push the performance of high-resolution cloth simulation\, making the simulation interactive (in milliseconds) for models with one million degrees of freedom (DOFs) while keeping every triangle untangled. The guarantee of being penetration-free is inspired by the interior-point method\, which converts the inequality constraints to barrier potentials. Nevertheless\, we propose a major overhaul of this modality by defining a novel and simple barrier formulation which does not depend on the distance between mesh primitives. Such a non-distance barrier model allows a new way to integrate collision detection into the simulation pipeline. Another contributor to the performance boost comes from the so-called subspace reuse strategy. This is based on the observation that low-frequency strain vibrations are near orthogonal to the deformation induced by collisions or self-collisions\, often of high frequency. Subspace reuse then takes care of low-frequency residuals\, while high-frequency residuals can also be effectively smoothed by GPU-based iterative solvers. We show that our method outperforms existing fast cloth simulators by nearly one order while keeping the entire simulation penetration-free and producing high-equality animations of high-resolution models. \n  \nBio: \nDr. Yin Yang is currently an Associate Professor with the Kahlert School of Computing at the University of Utah. Before joining the U\, he was a faculty member at Clemson University and University of New Mexico. He received Ph.D. degree of Computer Science from The University of Texas\, Dallas in 2013 (the awardee of David Daniel Fellowship Prize). He was a Research/Teaching Assistant at UT Dallas as well as UT Southwestern Medical Center. His research mainly focuses on real-time physics-based computer graphics\, animation and simulation with a strong emphasis on interdisciplinarity. He was a Research Intern in Microsoft Research Asia in 2012. He received NSF CRII (2015) and CAREER (2019) awards. \n  \nHomepage:  https://yangzzzy.github.io \n 
URL:https://www.ri.cmu.edu/event/high-resolution-cloth-simulation-in-milliseconds-efficient-gpu-cloth-simulation-with-non-distance-barriers-and-subspace-reuse-interactions/
LOCATION:3305 Newell-Simon Hall
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2024/11/Jang-11-25-24-3.jpeg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20241118T153000
DTEND;TZID=America/New_York:20241118T163000
DTSTAMP:20260530T180338
CREATED:20241112T173856Z
LAST-MODIFIED:20241112T174652Z
UID:144232-1731943800-1731947400@www.ri.cmu.edu
SUMMARY:Generative Modelling for 3D Multimodal Understanding of Human Physical Interactions
DESCRIPTION:Abstract: \nGenerative modelling has been extremely successful in synthesizing text\, images\, and videos. Can the same machinery also help us better understand how to physically interact with the multimodal 3D world? In this talk\, I will introduce some of my group’s work in answering this question. I will first discuss how we can enable 2D image generation models to edit images in a 3D-aware manner\, and how to generate audio for muted egocentric videos. I will then zoom in specifically on hand interactions by introducing (1) FoundHand\, a large-scale generative model for synthesizing realistic 2D hand images\, and (2) GigaHands\, a new large-scale 3D hand activities dataset designed to push the boundary of hand interaction modeling. Finally\, I will conclude with an outlook of the future of generative modeling for understanding 3D human interactions. \nBio: \nSrinath Sridhar (https://srinathsridhar.com) is an Assistant Professor of Computer Science at Brown University\, where he leads the Interactive 3D Vision & Learning Lab (https://ivl.cs.brown.edu). He received his PhD at the Max Planck Institute for Informatics and was subsequently a postdoctoral researcher at Stanford. His research interests are in 3D computer vision and machine learning. Specifically\, his group focuses on visual understanding of 3D human physical interactions with applications ranging from robotics to mixed reality. He is a recipient of the NSF CAREER award\, a Google Research Scholar award\, and his work received the Eurographics Best Paper Honorable Mention. He spends part of his time as a visiting academic at Amazon Robotics\, and has previously spent time at Microsoft Research Redmond and Honda Research Institute. \nHomepage:  https://srinathsridhar.com/
URL:https://www.ri.cmu.edu/event/generative-modelling-for-3d-multimodal-understanding-of-human-physical-interactions/
LOCATION:3305 Newell-Simon Hall
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2024/11/11-18-24-ssrinath_photo_.jpeg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20241028T153000
DTEND;TZID=America/New_York:20241028T163000
DTSTAMP:20260530T180338
CREATED:20241022T152443Z
LAST-MODIFIED:20241025T005916Z
UID:143824-1730129400-1730133000@www.ri.cmu.edu
SUMMARY:Auptimize: Optimal Placement of Spatial Audio Cues for Extended Reality
DESCRIPTION:Abstract:  Spatial audio in Extended Reality (XR) provides users with better awareness of where virtual elements are placed\, and efficiently guides them to events such as notifications\, system alerts from different windows\, or approaching avatars. Humans\, however\, are inaccurate in localizing sound cues\, especially with multiple sources due to limitations in human auditory perception such as angular discrimination error and front-back confusion. This decreases the efficiency of XR interfaces because users misidentify from which XR element a sound is coming. To address this\, we propose Auptimize\, a novel computational approach for placing XR sound sources\, which mitigates such localization errors by utilizing the ventriloquist effect. Auptimize disentangles the sound source locations from the visual elements and relocates the sound sources to optimal positions for unambiguous identification of sound cues\, avoiding errors due to inter-source proximity and front-back confusion. Our evaluation shows that Auptimize decreases spatial audio-based source identification errors compared to playing sound cues at the paired visual-sound locations. We demonstrate the applicability of Auptimize for diverse spatial audio-based interactive XR scenarios.\n \nBio:  Hyunsung Cho is a fourth-year Ph.D. student in the Human-Computer Interaction Institute (HCII) at Carnegie Mellon University\, advised by Prof. David Lindlbauer. Her research focuses on designing\, implementing\, and evaluating context-aware Extended Reality (XR) interfaces and multimodal interaction techniques in XR to enable seamless\, unobtrusive human-computer interactions. Her work combines computational modeling of human perception and behavior\, user-centered design\, and intelligent systems to create adaptive interfaces for diverse user contexts. Her research has received the Best Paper Awards and Methods Recognition at ACM CSCW and ACM ISS. She holds a M.S. and B.S. in Computer Science from KAIST. She has previously worked as a Research Scientist Intern at Meta’s Reality Labs Research and Nokia Bell Labs’ Pervasive Systems research group.\n \nHomepage:  https://hyunsungcho.com/\n\n \n \nSponsored in part by:   Meta Reality Labs Pittsburgh
URL:https://www.ri.cmu.edu/event/auptimize-optimal-placement-of-spatial-audio-cues-for-extended-reality/
LOCATION:3305 Newell-Simon Hall
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2024/10/profile_hyunsung_cho-3.jpeg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20241028T153000
DTEND;TZID=America/New_York:20241028T163000
DTSTAMP:20260530T180338
CREATED:20241022T151715Z
LAST-MODIFIED:20241025T010021Z
UID:143820-1730129400-1730133000@www.ri.cmu.edu
SUMMARY:EgoTouch: On-Body Touch Input Using AR/VR Headset Cameras
DESCRIPTION:Abstract:  In augmented and virtual reality (AR/VR) experiences\, a user’s arms and hands can provide a convenient and tactile surface for touch input. Prior work has shown on-body input to have significant speed\, accuracy\, and ergonomic benefits over in-air interfaces\, which are common today. In this work\, we demonstrate high accuracy\, bare hands (i.e.\, no special instrumentation of the user) skin input using just an RGB camera\, like those already integrated into all modern XR headsets. Our results show this approach can be accurate\, and robust across diverse lighting conditions\, skin tones\, and body motion (e.g.\, input while walking). Finally\, our pipeline also provides rich input metadata including touch force\, finger identification\, angle of attack\, and rotation. We believe these are the requisite technical ingredients to more fully unlock on-skin interfaces that have been well motivated in the HCI literature but have lacked robust and practical methods.\nPhoto of Speaker.\n \nBio:  I’m a PhD student in the Future Interfaces Group at Carnegie Mellon University where I’m advised by Chris Harrison. I’m interested in creating new ways for people to interact with the world using my background in sensing and machine learning. Previously I graduated with a Bachelors and Masters from IIT Madras\, where I majored in Engineering Design and Data Science.\n \nHomepage:  https://vimalmollyn.com
URL:https://www.ri.cmu.edu/event/egotouch-on-body-touch-input-using-ar-vr-headset-cameras/
LOCATION:3305 Newell-Simon Hall
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2024/10/10-28-24-2-vimal_figlab-1.jpeg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20241028T153000
DTEND;TZID=America/New_York:20241028T163000
DTSTAMP:20260530T180338
CREATED:20241022T150007Z
LAST-MODIFIED:20241025T010101Z
UID:143816-1730129400-1730133000@www.ri.cmu.edu
SUMMARY:Sparse-view Pose Estimation and Reconstruction via Analysis by Generative Synthesis
DESCRIPTION:Abstract:  This talk will present our approach for reconstructing objects from sparse-view images captured in unconstrained environments. In the absence of ground-truth camera poses\, we will demonstrate how to utilize estimates from off-the-shelf systems and address two key challenges: refining noisy camera poses in sparse views and effectively handling outlier poses.\n \nBio:  Qitao is a second-year Master’s student in Computer Vision at CMU\, RI\, advised by Prof. Shubham Tulsiani. His research focuses on camera pose estimation and 3D reconstruction in the wild. He holds a Bachelor’s degree from Shandong University in China and was a visiting student at the University of Central Florida\, where he worked with Prof. Chen Chen.\n \nHomepage:  https://qitaozhao.github.io/\n\n\nSponsored in part by:   Meta Reality Labs Pittsburgh
URL:https://www.ri.cmu.edu/event/sparse-view-pose-estimation-and-reconstruction-via-analysis-by-generative-synthesis/
LOCATION:3305 Newell-Simon Hall
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2024/10/10-28-24-1.jpg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20241021T153000
DTEND;TZID=America/New_York:20241021T163000
DTSTAMP:20260530T180338
CREATED:20241015T152153Z
LAST-MODIFIED:20241015T152153Z
UID:143666-1729524600-1729528200@www.ri.cmu.edu
SUMMARY:Building Scalable Visual Intelligence: From Represention to Understanding and Generation
DESCRIPTION:Abstract: \nIn this talk\, we will dive into our recent work on vision-centric generative AI\, focusing on how it helps with understanding and creating visual content like images and videos. We’ll cover the latest advances\, including multimodal large language models for visual understanding and diffusion transformers for visual generation. We’ll explore how these two areas are closely connected\, along with the challenges and opportunities in building powerful and scalable visual intelligence. Plus\, we’ll look at why these developments matter\, both in practical applications and as key steps toward creating robust visual intelligence that can better understand and interact with the sensory-rich world around us. \n\nBio: \nSaining Xie is an Assistant Professor of Computer Science at the Courant Institute of Mathematical Sciences at New York University and is affiliated with NYU Center for Data Science. He is also a visiting faculty researcher at Google DeepMind. Before joining NYU in 2023\, he was a research scientist at FAIR\, Meta. In 2018\, he received his Ph.D. degree in computer science from the University of California San Diego. He works in computer vision and machine learning\, with a particular interest in scalable visual representation learning for multimodal understanding and generation. His work has been recognized with the Marr Prize honorable mention\, CVPR best paper finalists and an Amazon research award. \n\nHomepage:  Saining Xie \n\nSponsored in part by:   Meta Reality Labs Pittsburgh
URL:https://www.ri.cmu.edu/event/building-scalable-visual-intelligence-from-represention-to-understanding-and-generation/
LOCATION:3305 Newell-Simon Hall
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2024/10/10-21-24.jpeg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20241014T153000
DTEND;TZID=America/New_York:20241014T163000
DTSTAMP:20260530T180338
CREATED:20241003T135843Z
LAST-MODIFIED:20241003T150700Z
UID:143543-1728919800-1728923400@www.ri.cmu.edu
SUMMARY:High-Fidelity Neural Radiance Fields
DESCRIPTION:Abstract: \nI will present three recent projects that focus on high-fidelity neural radiance fields for walkable VR spaces: \nVR-NeRF (SIGGRAPH Asia 2023) is an end-to-end system for the high-fidelity capture\, model reconstruction\, and real-time rendering of walkable spaces in virtual reality using neural radiance fields. To this end\, we designed and built a custom multi-camera rig to densely capture walkable spaces in high fidelity and with multi-view high dynamic range images in unprecedented quality and density. To represent highly detailed scenes\, we introduce a novel perceptual color space for learning accurate HDR appearance\, and an efficient mip-mapping mechanism for level-of-detail rendering with anti-aliasing. Our multi-GPU renderer enables high-fidelity volume rendering at the full VR resolution of dual 2K×2K at 36 Hz on our custom demo machine. \nHybridNeRF (CVPR 2024 Highlight) leverages the strengths of NeRF-style volumetric rendering and SDF-style surface representations by rendering most objects as surfaces while modeling the (typically) small fraction of challenging regions volumetrically. We evaluate HybridNeRF against the challenging Eyeful Tower dataset along with other commonly used view synthesis datasets. When compared to state-of-the-art baselines\, including recent rasterization-based approaches\, HybridNeRF improves error rates by 15–30% while achieving real-time framerates (at least 36 FPS) for virtual-reality resolutions (2K×2K). \nSpecNeRF (CVPR 2024 Highlight) proposes a learnable Gaussian directional encoding to better model view-dependent effects under near-field lighting conditions. Importantly\, our new directional encoding captures the spatially-varying nature of near-field lighting and emulates the behavior of prefiltered environment maps. As a result\, it enables the efficient evaluation of preconvolved specular color at any 3D location with varying roughness coefficients. We further introduce a data-driven geometry prior that helps alleviate the shape radiance ambiguity in reflection modeling. \n  \nBio: \nChristian Richardt is a Research Scientist at Meta Reality Labs Research in Pittsburgh\, PA. His research combines insights from vision\, graphics and perception to reconstruct visual information from images and videos\, to create high-quality visual experiences with a focus on VR experiences. Christian was previously an Associate Professor and EPSRC-UKRI Innovation Fellow in the Visual Computing Group and the CAMERA Centre at the University of Bath\, UK. Before that\, he was a postdoc at the Intel Visual Computing Institute at Saarland University and Max-Planck-Institut für Informatik in Saarbrücken\, Germany. Previously\, he was a postdoc in the REVES team at Inria Sophia Antipolis\, France. Christian graduated with a PhD and BA from the University of Cambridge in 2012 and 2007\, respectively. His doctoral research investigated the full life cycle of RGBD videos: from their acquisition\, via filtering and processing\, to the evaluation of stereoscopic display. \n  \nHomepage:  https://richardt.name \n  \nSponsored in part by:   Meta Reality Labs Pittsburgh
URL:https://www.ri.cmu.edu/event/high-fidelity-neural-radiance-fields/
LOCATION:3305 Newell-Simon Hall
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2024/10/Richardt-10-14-24.jpeg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20241007T153000
DTEND;TZID=America/New_York:20241007T163000
DTSTAMP:20260530T180338
CREATED:20240930T133727Z
LAST-MODIFIED:20240930T141221Z
UID:143480-1728315000-1728318600@www.ri.cmu.edu
SUMMARY:Reconstructing Everything
DESCRIPTION:Abstract: \nThe presentation will be about a long-running\, perhaps quixotic effort to reconstruct all of the world’s structures in 3D from Internet photos\, why this is challenging\, and why this effort might be useful in the era of generative AI. \n  \nBio: \nNoah Snavely is a Professor in the Computer Science Department at Cornell University and Cornell Tech\, and a research scientist at Google DeepMind in NYC. Noah’s research interests are in computer vision and graphics\, in particular in recovering structure from large photo collections for use in understanding and visualizing the world around us. Noah is the recipient of a PECASE\, a Microsoft New Faculty Fellowship\, an Alfred P. Sloan Fellowship\, a SIGGRAPH Significant New Researcher Award\, and is a Fellow of the ACM. \n  \nHomepage:  https://www.cs.cornell.edu/~snavely/ \n  \nSponsored in part by:   Meta Reality Labs Pittsburgh \n 
URL:https://www.ri.cmu.edu/event/reconstructing-everything/
LOCATION:3305 Newell-Simon Hall
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2024/09/NoahSnavely-10-7-24.jpeg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20240923T153000
DTEND;TZID=America/New_York:20240923T163000
DTSTAMP:20260530T180338
CREATED:20240912T190116Z
LAST-MODIFIED:20240912T190116Z
UID:143316-1727105400-1727109000@www.ri.cmu.edu
SUMMARY:Stochastic Graphics Primitives
DESCRIPTION:Abstract:\n\nFor decades computer graphics has successfully leveraged stochasticity to enable both expressive volumetric representations of participating media like clouds and efficient Monte Carlo rendering of large scale\, complex scenes. In this talk\, we’ll explore how these complementary forms of stochasticity (representational and algorithmic) may be applied more generally across computer graphics and vision. In the first part of the talk\, I’ll discuss our work on rendering probabilistic representations of 3D geometry\, which explains the connection between classical volume rendering and more recent techniques like NeRF. For the second part of the talk\, I’ll discuss our work on Monte Carlo simulation where we’ve developed accelerated random walk techniques for physics simulation that are analogous to Monte Carlo rendering for light transport. \n\n \nBio:\n\n\nBailey is a PhD candidate in the Computer Science Department at Carnegie Mellon University where he is advised by Ioannis Gkioulekas. He works on theory and core algorithms for stochastic graphics primitives which are leveraged in applications across both computer graphics and vision. He received his Bachelors in Mathematics and Computer Science from Dartmouth College in 2018 where he had the privilege of working with Wojciech Jarosz. During his PhD\, Bailey has interned with Adobe research\, the Exploratory Design Group at Apple\, and the High-Fidelity Physics Research team at NVIDIA. He is a recipient of the NSF Graduate Research Fellowship\, the NVIDIA Graduate Research Fellowship\, a Best Paper award at SIGGRAPH 2024\, and a Best Student Paper Honorable Mention award at CVPR 2024. \n\n \nHomepage:  bailey-miller.com\n\n\n \nSponsored in part by:   Meta Reality Labs Pittsburgh
URL:https://www.ri.cmu.edu/event/stochastic-graphics-primitives/
LOCATION:3305 Newell-Simon Hall
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2024/09/Miller-9-23-24.jpeg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20240916T153000
DTEND;TZID=America/New_York:20240916T163000
DTSTAMP:20260530T180338
CREATED:20240911T223750Z
LAST-MODIFIED:20240911T224008Z
UID:143256-1726500600-1726504200@www.ri.cmu.edu
SUMMARY:Vectorizing Raster Signals for Spatial Intelligence
DESCRIPTION:Abstract: This seminar will focus on how vectorized representations can be generated from raster signals to enhance spatial intelligence. I will discuss the core methodology behind this transformation\, with a focus on applications in AR/VR and robotics. The seminar will also briefly cover follow-up work that explores rigging and re-animating objects from casual single videos without templates\, showcasing the potential of this approach in scaling 3D content creation. \nBio: Mosam Dabhi is pursuing PhD at Carnegie Mellon University\, specializing in computer vision and AI\, with a focus on transforming raster signals into vectorized representations for spatial intelligence. His work has contributed to enhancing the performance of production headsets and XR devices\, improving their spatial responsiveness. Mosam has developed scalable algorithms for collecting 3D ground truth data\, which are critical for real-world applications. His key contributions include the development of the 3D Lifting Foundation Model (3D LFM) and RAT4D\, enabling 3D content generation from casual videos. His research has applications in robotics\, AR/VR\, and spatial intelligence\, advancing AI’s capability to interact with the physical world. \nHomepage:  https://mosamdabhi.github.io \n  \nSponsored in part by:   Meta Reality Labs Pittsburgh
URL:https://www.ri.cmu.edu/event/vectorizing-raster-signals-for-spatial-intelligence/
LOCATION:3305 Newell-Simon Hall
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2024/09/mdabhi_ID-338x450-2-9-16-24.jpg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20240916T153000
DTEND;TZID=America/New_York:20240916T163000
DTSTAMP:20260530T180338
CREATED:20240911T223235Z
LAST-MODIFIED:20240911T224437Z
UID:143251-1726500600-1726504200@www.ri.cmu.edu
SUMMARY:Remote Rendering and 3D Streaming for Resource-Constrained XR Devices
DESCRIPTION:Abstract: An overview of the motivation and challenges for remote rendering and real-time 3D video streaming on XR headsets. \nBio: Edward is a third year PhD student in the ECE department interested in computer systems for VR/AR devices. \nHomepage: https://users.ece.cmu.edu/~elu2/ \n  \nSponsored in part by:   Meta Reality Labs Pittsburgh \n  \n  \n 
URL:https://www.ri.cmu.edu/event/remote-rendering-and-3d-streaming-for-resource-constrained-xr-devices/
LOCATION:3305 Newell-Simon Hall
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2024/09/Lu-9-16-24.jpeg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20240916T153000
DTEND;TZID=America/New_York:20240916T163000
DTSTAMP:20260530T180338
CREATED:20240905T175713Z
LAST-MODIFIED:20240909T145806Z
UID:143181-1726500600-1726504200@www.ri.cmu.edu
SUMMARY:Instant Visual 3D Worlds Through Split-Lohmann Displays
DESCRIPTION:Abstract:\nSplit-Lohmann displays provide a novel approach to creating instant visual 3D worlds that support realistic eye accommodation. Unlike commercially available VR headsets that show content at a fixed depth\, the proposed display can optically place each pixel region to a different depth\, instantly creating eye-tracking-free 3D worlds without using time-multiplexing. This enables real-time streaming of 3D content over a large depth range at high spatial resolution\, offering an exciting step towards a more immersive real-time 3D experience. We demonstrate the technology’s capabilities through a lab prototype\, showcasing high-quality visuals across various static\, dynamic\, and interactive 3D scenes.\n\n \nBio:\n\nYingsi is a PhD candidate in Electrical and Computer Engineering at Carnegie Mellon University\, advised by Aswin C. Sakaranarayanan and Matthew P. O’Toole. Her research focuses on designing and building next-generation computational 3D displays for Virtual\, Augmented\, and Mixed Reality. The interdisciplinary work involves a fusion of computer vision\, optics\, signal processing\, and machine learning. Yingsi received the Best Paper Award at SIGGRAPH 2023 and the Best Demo Award at ICCP 2023.\nYingsi holds a B.S. in Computer Science from Columbia University and a B.A. in Physics from Colgate University. She was a research intern at Meta Reality Labs in the Display Systems Research team (2024) and Snap Research in the Computational Imaging team (2020). She was also a software engineering intern at Google Search (2019).\n\n \n\nHomepage:  https://yingsiqin.github.io/\n\n \nSponsored in part by:   Meta Reality Labs Pittsburgh
URL:https://www.ri.cmu.edu/event/instant-visual-3d-worlds-through-split-lohmann-displays/
LOCATION:3305 Newell-Simon Hall
CATEGORIES:Seminar,VASC Seminar
ATTACH;FMTTYPE=image/jpeg:https://www.ri.cmu.edu/app/uploads/2024/09/9-16-24.jpeg
END:VEVENT
END:VCALENDAR