Towards Detailed Understanding of the Visual World
Fahad Khan, Linköping University, Sweden
Designing Data for a ‘Post-infographic’ World
Stefanie Posavec, Independent Work, United Kingdom
Virtual Reality in Mental Health: A Self-Counselling Approach
Mel Slater, University of Barcelona, Spain
Multi-Modal Human-Machine Interaction: Joint Optimization of Single Modalities and Automatic Learning of Communication Channel Fusion
Gerhard Rigoll, Technical University of Munich, Germany
Towards Detailed Understanding of the Visual World
Fahad Khan
Linköping University
Sweden
Brief Bio
Fahad Khan is currently Professor of computer vision at MBZUAI, United Arab Emirates. He also holds a faculty position at Computer Vision Laboratory, Linköping University, Sweden. He received the M.Sc. degree in Intelligent Systems Design from Chalmers University of Technology, Sweden and a Ph.D. degree in Computer Vision from Computer Vision Center Barcelona and Autonomous University of Barcelona, Spain. He has achieved top ranks on various international challenges (Visual Object Tracking VOT: 1st 2014 and 2018, 2nd 2015, 1st 2016; VOT-TIR: 1st 2015 and 2016; OpenCV Tracking: 1st 2015; 1st PASCAL VOC Segmentation and Action Recognition tasks 2010). He received the best paper award in the computer vision track at IEEE ICPR 2016. He has published over 100 reviewed conference papers, journal articles, and book contributions. His research interests include a wide range of topics within computer vision and machine learning. He serves as a regular senior program committee member for leading conferences such as, CVPR, ICCV and NeurIPS.
Abstract
Machine perception that corresponds to the ability to understand the visual world based on the input from sensors, such as cameras is one of the central problems in Artificial Intelligence. To this end, recent years have witnessed tremendous progress in various visual perception tasks having real-world applications in e.g., robotics, autonomous driving and surveillance. In this talk, I will first present our recent results towards understanding state-of-the-art deep learning-based visual recognition networks in terms of their robustness and generalizability. Next, I will present our results on learning visual recognition models with limited human supervision. Finally, I will discuss moving one step further from instance-level recognition to develop video-based conversation models that merges the representational abilities of a pretrained visual encoder and the generative powers of an LLM, capable of understanding and conversing about videos.
Designing Data for a ‘Post-infographic’ World
Stefanie Posavec
Independent Work
United Kingdom
Brief Bio
Stefanie Posavec is a designer, artist, and author focused on creating playful, accessible, human-scaled approaches to communicating with data.Her work has been exhibited at major galleries including the V&A, the Design Museum (Designs of the Year 2016), Somerset House, the Wellcome Collection, Bletchley Park (all UK), the Centre Pompidou (Paris), and MoMA (New York), where her work is also in the permanent collection. Recently she undertook the book design and art direction of chart creation for the activist Greta Thunberg’s much-lauded The Climate Book.Her latest illustrated book (I am a book. I am a portal to the universe., co-authored with Miriam Quick) has received multiple accolades, including winning the UK Royal Society’s Young People’s Book Prize 2021. She has also co-authored two books that emphasize a handmade, personal approach to data: Dear Data and the journal Observe, Collect, Draw!
Abstract
About a decade ago, data visualisation and infographics exploded across our media landscape, and are now a ubiquitous presence across all its forms. Now this design space has matured, where can we go next, and how will we get there?
Stefanie will use examples from her practice to explore what a ‘post-infographic’ design approach might look like, where data is not merely visualised but instead – through experimentation with the dataviz design process – becomes a community conversation starter, a souvenir, an artistic material, and more.
Virtual Reality in Mental Health: A Self-Counselling Approach
Mel Slater
University of Barcelona
Spain
Brief Bio
Mel Slater is a Distinguished Investigator at the University of Barcelona in the Institute of Neurosciences, and co-Director of the Event Lab (Experimental Virtual Environments for Neuroscience and Technology). He was previously Professor of Virtual Environments at University College London in the Department of Computer Science. He has been involved in research in virtual reality since the early 1990s, and has been first supervisor of 40 PhDs in graphics and virtual reality since 1989. He held a European Research Council Advanced Grant TRAVERSE 2009-2015 and has now a second Advanced Grant MoTIVE 2018-2023. He is a Research Award Winner of the Alexander von Humboldt Foundation in 2021, and was elected to the IEEE VGTC Virtual Reality Academy in 2022. He is Field Editor of Frontiers in Virtual Reality, and Chief Editor of the Human Behaviour in Virtual Reality section. His publications can be seen on http://publicationslist.org/melslater.
Abstract
Extensive research into virtual reality and its applications started in the 1990s. To date there have been over 1,700,000 scientific publications and patents that mention the terms “virtual reality”, and about 5% of these specifically include the term “mental health”. Early work concentrated on specific phobias such as fear of heights and flying, and then expanded into social phobia and general anxiety disorders, and more complex syndromes such as depression. It has most commonly been used in research in the context of exposure and cognitive behavioural therapy. The evidence suggests that the results are at least as good as conventional in vivo treatment. VR has also been used in the study and treatment of psychotic illnesses such as paranoia. In this talk I will review research in this field and move on to discuss a particular paradigm that makes use of VR for self-counselling, including its role in helping people to overcome obesity.
Multi-Modal Human-Machine Interaction: Joint Optimization of Single Modalities and Automatic Learning of Communication Channel Fusion
Gerhard Rigoll
Technical University of Munich
Germany
Brief Bio
Gerhard Rigoll received the Dr.-Ing. degree in 1986 in the area of automatic speech recognition from Stuttgart University / Germany. From 1986 to 1988 he worked as postdoctoral fellow at IBM T.J. Watson Research Center in Yorktown Heights/USA on acoustic modeling and speaker adaptation for the IBM Tangora speech recognition system. From 1991 to 1993 he worked as guest researcher in the framework of the EC Scientific Training Programme in Japan for the NTT Human Interface Laboratories in Tokyo/Japan, in the area of neural networks and hybrid speech recognition systems. In 1993 he was appointed to full professor of computer science at Gerhard-Mercator-University in Duisburg, Germany and joined TU Munich (TUM) in 2002, where he is now heading the institute for Human-Machine Communication. His research interests are in the field of pattern recognition and machine learning for human-machine communication, covering areas such as speech and handwriting recognition, gesture recognition, face detection & identification, action & emotion recognition and interactive computer graphics. Dr. Rigoll is an IEEE Fellow (for contributions to multimodal human-machine communication) and is the author and co-author of more than 550 papers covering the above mentioned application areas. He served as reviewer for many scientific journals, and has been session chairman and program committee member for numerous international conferences. He has been also involved in international research and teaching activities as visiting professor at NAIST in Nara / Japan (2005) and as lecturer at TUM-Asia in Singapore since 2011. He is coordinator for the electrical engineering section of the Chinese-German College for Postgraduate Studies (CDHK) at Tongji-University in Shanghai / China since 2017.
Abstract
In multi-modal human-machine communication, users interact with machines using different human communication channels, such as e.g. voice, vision or haptics. It is therefore not amazing that human-machine communication benefited strongly from the extremely dynamic development of advanced machine learning methods during the last decade, since they were the driving factors in most classical pattern recognition areas, such as speech and emotion recognition, or computer vision.
In this talk, some of the recent research outcomes from the author's institution will be introduced, including e.g. face recognition from partial and occluded face information, recognition for low resolution face images, or action recognition including gait identification with graph neural networks. The talk will end with the presentation of a multi-modal recognition task for a multi-party speaker activity detection scenario where advanced deep learning methods are not only employed for single modality recognition but especially for the fusion of audio-visual information to solve a real multi-modal complex recognition problem. This approach contributes to the future perspective of human-machine communication, namely to employ advanced machine learning methods to jointly optimize the recognition components for the different modalities as well as automatically learning the strategies for their fusion, to create truly multi-modal interactive systems.