Home      Log In      Contacts      FAQs      INSTICC Portal


The role of the tutorials is to provide a platform for a more intensive scientific exchange amongst researchers interested in a particular topic and as a meeting point for the community. Tutorials complement the depth-oriented technical sessions by providing participants with broad overviews of emerging fields. A tutorial can be scheduled for 1.5 or 3 hours.


A Tutorial on First Person (Egocentric) Vision 
Instructor : Francesco Ragusa and Antonino Furnari

A Tutorial on
First Person (Egocentric) Vision


Francesco Ragusa
University of Catania
Brief Bio
Francesco Ragusa is a Research Fellow at the University of Catania. He is member of the IPLAB (University of Catania) research group since 2015. He has completed an Industrial Doctorate in Computer Science in 2021. During his PhD studies, he has spent a period as Research Student at the University of Hertfordshire, UK. He received his master’s degree in computer science (cum laude) in 2017 from the University of Catania. Francesco has authored one patent and more than 10 papers in international journals and international conference proceedings. He serves as reviewer for several international conferences in the fields of computer vision and multimedia, such as CVPR, ECCV, BMVC, WACV, ACM Multimedia, ICPR, ICIAP, and for international journals, including TPAMI, Pattern Recognition Letters and IeT Computer Vision. Francesco Ragusa is member of IEEE, CVF e CVPL. He has been involved in different research projects and has honed in on the issue of human-object interaction anticipation from egocentric videos as the key to analyze and understand human behavior in industrial workplaces. He is co-founder and CEO of NEXT VISION s.r.l., an academic spin-off the the University of Catania since 2021. His research interests concern Computer Vision, Pattern Recognition, and Machine Learning, with focus on First Person Vision.
Antonino Furnari
Mathematics and Computer Science, University of Catania
Brief Bio
Antonino Furnari is an Assistant Professor at the University of Catania. He received his PhD in Mathematics and Computer Science in 2017 from the University of Catania and authored one patent and more than 50 papers in international book chapters, journals and conference proceedings. Antonino Furnari is involved in the organization of different international events, such as the Assistive Computer Vision and Robotics (ACVR) workshop series (since 2016), the International Computer Vision Summer School (ICVSS) (since 2017), and the Egocentric Perception Interaction and Computing (EPIC) workshop series (since 2018) and the EGO4D workshop series (since 2022). Since 2018, he has been involved in the collection, release, and maintenance of the EPIC-KITCHENS dataset series, and in particular in the egocentric action anticipation and action detection challenges. Since 2021, he has been involved in the collection and benchmarking of the EGO4D dataset. He is co-founder of NEXT VISION s.r.l., an academic spin-off the the University of Catania since 2021. His research interests concern Computer Vision, Pattern Recognition, and Machine Learning, with focus on First Person Vision. More information is available at

Wearable devices equipped with a camera and computing abilities are attracting the attention of both the market and the society, with commercial devices more and more available and many companies announcing the upcoming release of new devices. The main appeal of wearable devices is due to their mobility and to their ability to enable user-machine interaction through Augmented Reality. Due to these characteristics, wearable devices provide an ideal platform to develop intelligent assistants able to assist humans and augment their abilities, for which Artificial Intelligence and Computer Vision play a major role.

Differently from classic computer vision (the so called “third person vision”), which analyses images collected from a static point of view, first person (egocentric) vision assume that images are collected from the point of view of the user, which gives privileged information on the user’s activities and the way they perceive and interact with the world. Indeed, the visual data acquired with wearable cameras usually provides useful information about the users, their intentions, and how they interact with the world.

This tutorial will discuss the challenges and opportunities offered by first person (egocentric) vision, covering the historical background and seminal works, presenting the main technological tools and building blocks, and discussing applications.


Wearable, First Person Vision, Egocentric Vision, Augmented Reality, Visual Localization, Action Recognition, Action Anticipation

Aims and Learning Objectives The participants will understand the main advantages of first person (egocentric) vision over third person vision to analyze the user’s behavior, build personalized applications and predict future events. Specifically, the participants will learn about: 1) the main differences between third person and first person (egocentric) vision, including the way in which the data is collected and processed, 2) the devices which can be used to collect data and provide services to the users, 3) the algorithms which can be used to manage first person visual data for instance to perform localization, indexing, object detection, action recognition, and the prediction of future events. Target Audience First year PhD students, graduate students, researchers, practitioners. Prerequisite Knowledge of Audience Fundamentals of Computer Vision and Machine Learning (including Deep Learning) Detailed Outline The tutorial is divided into two parts and will cover the following topics:
Part I: History and motivation
• Agenda of the tutorial;
• Definitions, motivations, history and research trends of First Person (egocentric) Vision;
• Seminal works in First Person (Egocentric) Vision;
• Differences between Third Person and First Person Vision;
• First Person Vision datasets;
• Wearable devices to acquire/process first person visual data;
• Main research trends in First Person (Egocentric) Vision;
Part II: Fundamental tasks for first person vision systems:
• Localization;
• Hand/Object detection;
• Attention;
• Action/Activity recognition;
• Action anticipation;
The tutorial will cover the main technological tools (devices and algorithms) which can be used to build first person vision applications, discussing challenges and open problems and will give conclusions and insights for research in the field.
Secretariat Contacts