Banner
Home      Log In      Contacts      FAQs      INSTICC Portal
 
Documents

Keynote Lectures

Towards Detailed Understanding of the Visual World
Fahad Khan, MBZUAI, United Arab Emirates

Virtual Reality in Mental Health: A Self-Counselling Approach
Mel Slater, University of Barcelona, Spain

Multi-Modal Human-Machine Interaction: Joint Optimization of Single Modalities and Automatic Learning of Communication Channel Fusion
Gerhard Rigoll, Technical University of Munich, Germany

The Predictable Side of Unpredictable Humans
Alvitta Ottley, Washington University, United States

 

Towards Detailed Understanding of the Visual World

Fahad Khan
MBZUAI
United Arab Emirates
 

Brief Bio
Fahad Khan is currently Professor of computer vision at MBZUAI, United Arab Emirates. He also holds a faculty position at Computer Vision Laboratory, Linköping University, Sweden. He received the M.Sc. degree in Intelligent Systems Design from Chalmers University of Technology, Sweden and a Ph.D. degree in Computer Vision from Computer Vision Center Barcelona and Autonomous University of Barcelona, Spain. He has achieved top ranks on various international challenges (Visual Object Tracking VOT: 1st 2014 and 2018, 2nd 2015, 1st 2016; VOT-TIR: 1st 2015 and 2016; OpenCV Tracking: 1st 2015; 1st PASCAL VOC Segmentation and Action Recognition tasks 2010). He received the best paper award in the computer vision track at IEEE ICPR 2016. He has published over 100 reviewed conference papers, journal articles, and book contributions. His research interests include a wide range of topics within computer vision and machine learning. He serves as a regular senior program committee member for leading conferences such as, CVPR, ICCV and NeurIPS.


Abstract
Machine perception that corresponds to the ability to understand the visual world based on the input from sensors, such as cameras is one of the central problems in Artificial Intelligence. To this end, recent years have witnessed tremendous progress in various visual perception tasks having real-world applications in e.g., robotics, autonomous driving and surveillance. In this talk, I will first present our recent results towards understanding state-of-the-art deep learning-based visual recognition networks in terms of their robustness and generalizability. Next, I will discuss moving one step further from instance-level recognition to develop image and video-based conversation models that merges the representational abilities of a pretrained visual encoder and the generative powers of an LLM, capable of understanding and conversing about images and videos with grounding capabilities.



 

 

Virtual Reality in Mental Health: A Self-Counselling Approach

Mel Slater
University of Barcelona
Spain
www.event-lab.org
 

Brief Bio
Mel Slater is a Distinguished Investigator at the University of Barcelona in the Institute of Neurosciences, and co-Director of the Event Lab (Experimental Virtual Environments for Neuroscience and Technology). He was previously Professor of Virtual Environments at University College London in the Department of Computer Science. He has been involved in research in virtual reality since the early 1990s, and has been first supervisor of 40 PhDs in graphics and virtual reality since 1989. He held a European Research Council Advanced Grant TRAVERSE 2009-2015 and has now a second Advanced Grant MoTIVE 2018-2023. He is a Research Award Winner of the Alexander von Humboldt Foundation in 2021, and was elected to the IEEE VGTC Virtual Reality Academy in 2022. He is Field Editor of Frontiers in  Virtual Reality, and Chief Editor of the Human Behaviour in Virtual Reality section. His publications can be seen on http://publicationslist.org/melslater.


Abstract
Extensive research into virtual reality and its applications started in the 1990s. To date there have been over 1,700,000 scientific publications and patents that mention the terms “virtual reality”, and about 5% of these specifically include the term “mental health”. Early work concentrated on specific phobias such as fear of heights and flying, and then expanded into social phobia and general anxiety disorders, and more complex syndromes such as depression. It has most commonly been used in research in the context of exposure and cognitive behavioural therapy. The evidence suggests that the results are at least as good as conventional in vivo treatment. VR has also been used in the study and treatment of psychotic illnesses such as paranoia. In this talk I will review research in this field and move on to discuss a particular paradigm that makes use of VR for self-counselling, including its role in helping people to overcome obesity.



 

 

Multi-Modal Human-Machine Interaction: Joint Optimization of Single Modalities and Automatic Learning of Communication Channel Fusion

Gerhard Rigoll
Technical University of Munich
Germany
 

Brief Bio
Gerhard Rigoll received the Dr.-Ing. degree in 1986 in the area of automatic speech recognition from Stuttgart University / Germany. From 1986 to 1988 he worked as postdoctoral fellow at IBM T.J. Watson Research Center in Yorktown Heights/USA on acoustic modeling and speaker adaptation for the IBM Tangora speech recognition system. From 1991 to 1993 he worked as guest researcher in the framework of the EC Scientific Training Programme in Japan for the NTT Human Interface Laboratories in Tokyo/Japan, in the area of neural networks and hybrid speech recognition systems. In 1993 he was appointed to full professor of computer science at Gerhard-Mercator-University in Duisburg, Germany and joined TU Munich (TUM) in 2002, where he is now heading the institute for Human-Machine Communication. His research interests are in the field of pattern recognition and machine learning for human-machine communication, covering areas such as speech and handwriting recognition, gesture recognition, face detection & identification, action & emotion recognition and interactive computer graphics. Dr. Rigoll is an IEEE Fellow (for contributions to multimodal human-machine communication) and is the author and co-author of more than 550 papers covering the above mentioned application areas. He served as reviewer for many scientific journals, and has been session chairman and program committee member for numerous international conferences. He has been also involved in international research and teaching activities as visiting professor at NAIST in Nara / Japan (2005) and as lecturer at TUM-Asia in Singapore since 2011. He is coordinator for the electrical engineering section of the Chinese-German College for Postgraduate Studies (CDHK) at Tongji-University in Shanghai / China since 2017.


Abstract
In multi-modal human-machine communication, users interact with machines using different human communication channels, such as e.g. voice, vision or haptics. It is therefore not amazing that human-machine communication benefited strongly from the extremely dynamic development of advanced machine learning methods during the last decade, since they were the driving factors in most classical pattern recognition areas, such as speech and emotion recognition, or computer vision.

In this talk, some of the recent research outcomes from the author's institution will be introduced, including e.g. face recognition from partial and occluded face information, recognition for low resolution face images, or action recognition including gait identification with graph neural networks. The talk will end with the presentation of a multi-modal recognition task for a multi-party speaker activity detection scenario where advanced deep learning methods are not only employed for single modality recognition but especially for the fusion of audio-visual information to solve a real multi-modal complex recognition problem. This approach contributes to the future perspective of human-machine communication, namely to employ advanced machine learning methods to jointly optimize the recognition components for the different modalities as well as automatically learning the strategies for their fusion, to create truly multi-modal interactive systems.



 

 

The Predictable Side of Unpredictable Humans

Alvitta Ottley
Washington University
United States
https://alvitta.com
 

Brief Bio
Dr. Alvitta Ottley is an Associate Professor in the Computer Science & Engineering Department at Washington University in St. Louis, Missouri, USA. She also holds a courtesy appointment in the Psychological and Brain Sciences Department. Her research uses interdisciplinary approaches to solve problems such as how best to display information for effective decision-making and how to design human-in-the-loop visual analytics interfaces that are more attuned to how people think. Dr. Ottley received an NSF CRII Award in 2018 for using visualization to support medical decision-making, the NSF Career Award for creating context-aware visual analytics systems, and the 2022 EuroVis Early Career Award. In addition, her work has appeared in leading conferences and journals such as CHI, VIS, and TVCG, achieving the best paper and honorable mention awards.


Abstract
Building AI systems that interact naturally with humans presents a tremendous challenge. This is because humans are not simply logic machines; our emotions, experiences, and social contexts all influence our behavior in unpredictable ways. So how can we, as AI developers, navigate this complexity and design systems that respond intelligently to their human counterparts? In this talk, I discuss how we approach this problem in the context of Visual Analytics by embracing the parallels between human reasoning and AI models. For example, this talk will demonstrate how humans, like AI systems, can be modeled as a set of rules and parameters, which facilitates predicting behavioral outcomes *in specific scenarios*. By employing inferential techniques like Bayesian reasoning, we will go beyond actions and discuss how we might model and infer deeper motivations and beliefs driving them. Truly understanding humans is non-trivial, but we can forge a path toward effective human-AI interaction by leveraging the similarities between human thought processes and AI modeling.



footer