Banner
Home      Log In      Contacts      FAQs      INSTICC Portal
 
Documents

Tutorials

The role of the tutorials is to provide a platform for a more intensive scientific exchange amongst researchers interested in a particular topic and as a meeting point for the community. Tutorials complement the depth-oriented technical sessions by providing participants with broad overviews of emerging fields. A tutorial can be scheduled for 1.5 or 3 hours.



Tutorial on
First Person (Egocentric) Vision: History and Applications


Instructor

Francesco Ragusa
University of Catania
Italy
 
Brief Bio
Francesco Ragusa is a Research Fellow at the University of Catania. He is member of the IPLAB (University of Catania) research group since 2015. He has completed an Industrial Doctorate in Computer Science in 2021. During his PhD studies, he has spent a period as Research Student at the University of Hertfordshire, UK. He received his master’s degree in computer science (cum laude) in 2017 from the University of Catania. Francesco has authored one patent and more than 10 papers in international journals and international conference proceedings. He serves as reviewer for several international conferences in the fields of computer vision and multimedia, such as CVPR, ECCV, BMVC, WACV, ACM Multimedia, ICPR, ICIAP, and for international journals, including TPAMI, Pattern Recognition Letters and IeT Computer Vision. Francesco Ragusa is member of IEEE, CVF e CVPL. He has been involved in different research projects and has honed in on the issue of human-object interaction anticipation from egocentric videos as the key to analyze and understand human behavior in industrial workplaces. He is co-founder and CEO of NEXT VISION s.r.l., an academic spin-off the the University of Catania since 2021. His research interests concern Computer Vision, Pattern Recognition, and Machine Learning, with focus on First Person Vision.
Abstract

Wearable devices equipped with a camera and computing abilities are attracting the attention of both the market and the society, with commercial devices more and more available and many companies announcing the upcoming release of new devices. The main appeal of wearable devices is due to their mobility and to their ability to enable user-machine interaction through Augmented Reality. Due to these characteristics, wearable devices provide an ideal platform to develop intelligent assistants able to assist humans and augment their abilities, for which Artificial Intelligence and Computer Vision play a major role.

Differently from classic computer vision (the so called “third person vision”), which analyses images collected from a static point of view, first person (egocentric) vision assume that images are collected from the point of view of the user, which gives privileged information on the user’s activities and the way they perceive and interact with the world. Indeed, the visual data acquired with wearable cameras usually provides useful information about the users, their intentions, and how they interact with the world.

This tutorial will discuss the challenges and opportunities offered by first person (egocentric) vision, covering the historical background and seminal works, presenting the main technological tools and building blocks, and discussing applications.

Keywords
Wearable, first person vision, egocentric vision, augmented reality, visual localization, action recognition, action anticipation, human-object interaction.

Aims and Learning Objectives
The participants will understand the main advantages of first person (egocentric) vision over third person vision to analyze the user’s behavior, build personalized applications and predict future events. Specifically, the participants will learn about: 1) the main differences between third person and first person (egocentric) vision, including the way in which the data is collected and processed, 2) the devices which can be used to collect data and provide services to the users, 3) the algorithms which can be used to manage first person visual data for instance to perform localization, indexing, object detection, action recognition, human-object interaction and the prediction of future events.

Target Audience
First year PhD students, graduate students, researchers, practitioners.

Prerequisite Knowledge of Audience
Fundamentals of Computer Vision and Machine Learning (including Deep Learning).

Detailed Outline
The tutorial is divided into two parts and will cover the following topics:
Part I: History and motivation
• Agenda of the tutorial;
• Definitions, motivations, history and research trends of First Person (egocentric) Vision;
• Seminal works in First Person (Egocentric) Vision;
• Differences between Third Person and First Person Vision;
• First Person Vision datasets;
• Wearable devices to acquire/process first person visual data;
• Main research trends in First Person (Egocentric) Vision;
Part II: Fundamental tasks for first person vision systems:
• Localization;
• Hand/Object detection;
• Attention;
• Action/Activity recognition;
• Action anticipation;
• Industrial Applications.







Secretariat Contacts
e-mail: visigrapp.secretariat@insticc.org

Tutorial on
Efficient 3D Facial Animation using Low Cost Motion Capture Solutions


Instructors

João Marcelo Xavier Natario Teixeira
Voxar Labs
Brazil
 
Brief Bio
João Marcelo Xavier Natário Teixeira holds a Ph.D. in Computer Science from the Federal University of Pernambuco (UFPE), with a sandwich Ph.D. period at the University of Chile. Currently Associate Professor 2 and head of the Department of Electronics and Systems at UFPE, he is a member of the Board of the Postgraduate Program in Design at UFPE and also a member of the Board of the Postgraduate Program in Electrical Engineering. He has experience in the field of Computer Science, with emphasis on interaction devices, Virtual and Augmented Reality, high-performance computing, and parallel programming. He is currently the CTO of the Voxar Labs Research Group at the UFPE Computer Science Center. He has previously participated as a member of CERV (Virtual Reality Special Committee of the Brazilian Computer Society) and has been involved in organizing important events in the field of Virtual Reality, such as SVR and ISMAR.
Artur Tavares de Carvalho Cruz
Universidade Federal de Pernambuco
Brazil
 
Brief Bio
Artur Cruz graduated in Design from the Federal University of Pernambuco (UFPE). He is a Master's student in Digital Artifact Design at UFPE. Artur, as the CEO and Director of Computer Graphics at SirCruX Studios, is engaged in developing solutions in 3D Animation and Branded Entertainment for product visualization and animated series for streaming. He focuses on the use of procedural technologies and Facial and Body Motion Capture. Additionally, Artur is committed to developing tools to improve the 3D Animation Pipeline specifically for long-term productions involving smaller teams, aiming to enhance efficiency and productivity in more compact creative environments.
Abstract

A large number of 3D facial animation techniques have emerged with different goals, to make the representation of facial expressions more realistic, decrease computing time, the need for specialized equipment, etc. With new techniques come new definitions, concepts, and terms that correlate with methods that have existed for decades or even more recent ones. Parameterization, interpolation, blendshapes, motion-capture and others are concepts that often appear in the literature, in a generic way, as techniques, but which actually have different levels in information hierarchy. The first parte o this tutorial aims to clearly classify the different techniques and concepts of the 3D facial animation literature, locating them in each step of the 3D facial animation pipeline, through a parametric analysis.
The second part of the proposed tutorial will focus on Facepipe, an efficient and low cost alternative for capturing face motion information and applying it on 3D facial animation software. The proposed solution will be compared to existing ones regarding cost, features and utilization. At last, a hands-on step-by-step use of Facepipe will be performed to demonstrate the possibilities it brings to the 3D animation community.

Keywords
3D avatars, 3D facial animation, facial movement analysis.

Aims and Learning Objectives
The participants will learn about the most used alternatives for 3D facial animation and the main differences between them, in tandem with their advantages and disadvantages. Specifically, the participants will learn about: 1) A review regarding the 3D facial animation pipeline, 2) the devices that can be used to collect facial data and the existing solutions to perform facial capture and then apply it to 3D character animation, 3) a step-by-step guide on how to use Facepipe to perform facial capture and 3D facial animation.

Target Audience
PhD students, graduate students, researchers, designers, animators.

Prerequisite Knowledge of Audience
Fundamentals of 3D modeling (Blender will be used).

Detailed Outline
The proposed tutorial is divided into two parts and will cover the following topics:

Part I: A review regarding the 3D facial animation pipeline
• Review methodology
• The Facial Action Coding System (FACS)
• Intermediate Techniques
• Input Techniques
• Parametric Analysis of techniques

Part II: Facepipe and its utilization
• How Facepipe works
• Capturing data using Facepipe
• Using captured data on Blender
• Step-by-step guide







Secretariat Contacts
e-mail: visigrapp.secretariat@insticc.org

Tutorial on
Context-Aware Applications Using Visual Programming: Case Studies on
Mobile Apps and Humanoid Robot Applications


Instructor

Martin Zimmermann
Offenburg University
Germany
 
Brief Bio
Martin Zimmermann is a full professor in the Department of Economics at University of Applied Sciences Offenburg, where he has been since 2001. From 2002 to 2008 he served as Department Chair. From 1994 to 1998 he worked as a scientist at IBM European Networking Center and Deutsche Bank Group. During 1998-2001 he was a Professor at University of Applied Sciences Rapperswil, Switzerland. Between 2016 and 2020, Martin Zimmermann was Vice Director of the newly founded Department of Computer Science at the University of Lucerne. He received a M.S. (Diplom) from KIT in 1989 and a PhD in 1996 from the J.W. Goethe University Frankfurt. He authored more than 50 papers in international book chapters, journals and conference proceedings. He received several best paper awards, e.g., for his contributions to the topics of visual programming and context-based mobile applications. His research interests span mobile devices, the development of cross platform based mobile applications, and visual programming with focus on user-centered context-based mobile applications. Much of his work has been on improving the understanding, design, development and performance of context-based applications.
Abstract

The main privilege of context-aware applications is to provide tailored services by analyzing the environmental context, such as location, time, weather condition, and seasons, and adapting their functionality according to the changing situations in context data without explicit user interaction. For example, mobile devices can obtain context information in various ways in order to provide more adaptable, flexible and user-friendly services. In case of a tourist app, a tourist would like to see relevant tourist attractions on a map together with distance information, depending on its current location. Human robots, require sensors to gather information about the conditions of the environment to allow the robot to make necessary decisions about its position or certain actions that the situation requires. As a consequence, context-aware applications can sense clues about the situational environment making applications more intelligent, adaptive, and personalized.
Visual Programming Languages (VPL) let users develop software programs by combining visual program elements, like sensor and actuator objects, loops or conditional statements rather than by specifying them textually. VPLs are considered to be innovative approaches to address the inherent complexity of developing programs, especially for beginners.
The tutorial will cover important technologies which can be used to build user-centered context-based applications, discussing challenges and open problems and will give conclusions and insights for research in the field. Several concrete scenarios will be analyzed and corresponding mobile applications will be developed during the tutorial by the participants using a cloud-based visual programming environment.

https://drive.google.com/drive/folders/1RnSbUnR9b_9f-V_FbPYFHiudUHq-v87c


Keywords
Context-aware Services; Sensors; Visual Programming; Mobile Applications; Location-based Services; Robot Applications; Humanoid Robots.

Aims and Learning Objectives
Participants will understand how modern technologies (e.g. NFC, BLE, Machine Learning) can provide important contextual information to develop user-centered context-based apps. Building on this, participants will learn how to develop context-based mobile applications using visual programming. For this purpose, several concrete scenarios will be analyzed and corresponding mobile applications will be developed during the tutorial. The participants will understand the main advantages of user-centered context-aware mobile applications and will be able to design user-centered context-based mobile applications by using visual programming.

Target Audience
Researchers, practitioners.

Prerequisite Knowledge of Audience
Fundamentals of Computer Science Concepts & Programming Language Concepts.

Detailed Outline
The tutorial is divided into three parts and will cover the following topics:

Part 1: User-centered context-aware applications 
- User-centered context-aware applications
- Categories of context information that are practically significant
- Technologies for context-aware applications (QR Codes, NFC, BLE, Machine Learning)
- Case Studies: Mobile Applications and Humanoid Robot Applications

Part 2: Visual Programming
- Visual programming vs. traditional programming
- Visual elements
- Cloud-based development environments: MIT App Inventor, Thunkable

Part 3: Case Study:
- Development of a user-centered context-based mobile application
- Development process using visual programming
- Analysis & design of a location-based mobile service by using a cloud-based development environment.

The tutorial will cover main technologies which can be used to build context-aware applications, discussing challenges and open problems and will give conclusions and insights for research in the field.







Secretariat Contacts
e-mail: visigrapp.secretariat@insticc.org

Tutorial on
Grid-based Layout Algorithms for Visual Sorting, Dimensionality Reduction, and Optimization


Instructor

Kai Uwe Barthel
HTW Berlin
Germany
 
Brief Bio
Kai Uwe Barthel has been a professor of visual computing at the University of Applied Sciences in Berlin, Germany, since 2001, where he teaches courses in machine learning, computer vision, and visual information retrieval. He leads the Visual Computing Group, which focuses on technologies that facilitate the discovery of media resources. His recent research includes image understanding and retrieval, and visual image navigation systems such as "navigu.net". As part of his Ph.D. thesis at the Technical University of Berlin, he developed fractal image compression schemes that significantly outperformed the JPEG standard of the time. He later led a research project on 3D video coding at the Technical University of Berlin and served as head of R&D at N-Tec Media and LuraTech Inc. He was a member of the JPEG2000 standardization committee. His work in image and video compression resulted in two patents. In 2009, he founded pixolution", a visual image search company known for its technology used by many stock image agencies. He has numerous publications and awards, including winning the Video Browser Showdown - The Video Retrieval Competition three times.
Abstract

Grid-based layout algorithms have a wide range of applications in visual computing and computer graphics. These techniques are invaluable for tasks such as visualizing high-dimensional data, visually sorting images, and solving optimization problems. In this tutorial, we will focus on managing large amounts of textures or images that can overwhelm human perception. We begin with an overview of dimensionality reduction techniques and an examination of their inadequacy for sorting non-point data such as images or textures. The principles and concepts of various image sorting techniques are explained, including Self-Organizing Maps (SOM), Self-Sorting Maps (SSM), and the new innovative Linear Assignment Sorting (LAS). We also present neural networks for learning latent permutations. A major challenge in image sorting is the lack of appropriate metrics for evaluating sorting quality as perceived by humans. Based on extensive user testing, we present the novel Distance Preservation Quality metric, which shows a stronger correlation with user-perceived sorting quality compared to other metrics. Efficiency is another focus of this tutorial. This includes techniques such as integral image filtering and the use of linear assigments solvers for fast sorting. We also offer practical tips and tricks for achieving specific layout shapes and positioning specifications.
It's important to note that this tutorial goes beyond static image sorting to include dynamic image graph visualization that adapts to the ever-changing nature of image collections. Furthermore, we show how these techniques can be effectively applied to classical dimension reduction or optimization tasks, such as the Traveling Salesman Problem.

Keywords
Grid-based arrangements, Image sorting, Dimensionality reduction, Optimization techniques, Exploratory Image Search, Distance Preservation Quality Metric.

Aims and Learning Objectives
The participants will understand the basics and advantages of grid-based sorting. They will explore different sorting algorithms and draw insights from user testing to determine appropriate metrics for evaluating different sorting methods. Participants will also be introduced to the simple and efficient Linear Assignment Sorting technique, along with optimization strategies and practical programming examples. By the end of the workshop, attendees will have the skills and knowledge necessary to implement image sorting methods while taking into account specific layout requirements. In addition, the workshop will enable participants to adapt these algorithms to various optimization challenges.

Target Audience
First year PhD students, graduate students, researchers, practitioners.

Prerequisite Knowledge of Audience
Fundamentals of Computer Vision and Machine Learning.

Detailed Outline
- Welcome, gather participant information, and provide an overview of tutorial goals, schedule, and materials.
- Discuss the limitations of dimensionality reduction techniques (e.g., PCA, MDS, Isomap, LLE, t-SNE) for image organization.
- Introduce image sorting concepts, including SOM, SSM, LAS, neural networks for permutations.
- Examples of optimization and dimensionality reduction using the new algorithms.
- Highlight appropriate visual feature vectors for image sorting.
- Present insights into human perception and quality evaluation metrics.
- Explain optimization techniques for faster image sorting.
- Provide tips for dealing with layout constraints and fixed positioning during image sorting.
- Explain techniques for dynamic image graph visualization.
- Provide examples of industrial applications.
- Conclude with a final Q&A and recap of the key points of the tutorial.







Secretariat Contacts
e-mail: visigrapp.secretariat@insticc.org

footer