22 - 25 January, 2008 Funchal, Madeira - Portugal  
  Keynote Lectures Click Here to go Back  
Keynote lectures are plenary sessions which are scheduled for taking about 45 minutes + 10 minutes for questions

Keynote Lectures List:
- Prof. Adrian Hiton, Computer Vision and Graphics at the University of Surrey, U.K.
Title: Video-based Animation of People

- Prof. Geneviève Lucet, Computer Services for Research at the UNAM, Mexico
Title: Virtual Reality, a Knowledge Tool for Cultural Heritage

- Prof. Peter Sturm, INRIA Rhone-Alpes, France
Title: General Imaging - Design, Modelling and Applications

- Prof. Sharathchandra Pankanti, IBM - Exploratory Computer Vision Group, USA
Title: Retail Vision-Based Self-Checkout: Exploring Real Time Real Purpose General Vision System

  Keynote Lecture 1

Video-based Animation of People
    Prof. Adrian Hilton
Computer Vision and Graphics at the University of Surrey
web page
Brief Bio:

Adrian Hilton is Professor of Computer Vision and Graphics and Head of the Visual Media Research Group at the University of Surrey, UK. Over the past decade he has published over 100 refereed journal and international conference research articles in robust computer vision techniques to build models of real world objects from images to meet the requirements of the entertainment and communication industries.

Scientific contributions have been recognised by two journal and one conference best paper awards. Innovative contributions of this research led to the first commercial hand-held 3D scanner and the first system for capturing animated models of people have been recognised through two EU IST Awards for Innovation, a DTI Manufacturing Industry Achievement Award and a Computer Graphics World Innovation Award. He currently serves as an area editor for the journal Computer Vision and Image Understanding, the EPSRC Peer Review College for UK funding applications and the Executive of the IEE Professional Network in Multimedia Communications. He is a Chartered Engineer and member of IEE, IEEE and ACM.

Capture and representation of a persons appearance during movement in a form that can be manipulated for highly realistic computer animation in games and film is an open research problem. This talk will present a number of approaches that have been introduced to capture people from multiple view video using both model-based and model-free computer vision methodologies. Surface Motion Capture (SurfCap) will be introduced which allows representation and animation control of people with the captured dynamics of clothing during movement. SurfCap will be presented as an analogous technology to skeletal human motion capture using markers (MoCap) which has become a standard production tool. Surface motion graphs are used to animate people from multiple captured surface sequences allowing control of movement and action. Surface matching methods based on geometry image sequences using spherical parameterisation are used to transition between captured motion sequences and reconstruct skeletal movement. SurfCap's potential as a future technology for production in games and film will be discussed.
  Keynote Lecture 2

Virtual Reality, a Knowledge Tool for Cultural Heritage
    Prof. Geneviève Lucet
Computer Services for Research at the UNAM
Brief Bio:

Geneviève Lucet graduated from UP8 in Paris, with a degree in Architecture; she received her Ph.D in Architecture from the Universidad Nacional Autónoma de México (UNAM), where she has worked since 1988 as a researcher in the Department of Academic Computing Services. Her research interests focus on Prehispanic architecture, specifically on registering and studying this architectural heritage with computer graphics and virtual reality techniques. Her historical study of the archaeological site of Cacaxtla and the constructive sequence she proposed have been a reference for further research on the site. She has also made surveys in Teotihuacan, Bonampak, and Suchilquitongo.

Since 2000, she has been Director of Computer Services for Research at the UNAM. In 2002, she supervised the construction of IXTLI, the University’s Visualization Observatory, an immersive virtual reality installation built for research and teaching, the first of its kind in a Latin American university. Currently, her responsibilities as Director include coordinating IXTLI..

Based on experience in building 3D models of Mesoamerican archaeological sites, we will discuss the issue of accuracy in relation to historical data, as well as the need to integrate an understanding of the actual state of the ruins i.e. the initial building and later transformations that affect the site, in order to construct quality models. An archaeological survey cannot rely solely on computer science techniques but must include knowledge of the historical object. Furthermore, the modeling phase necessarily leads to a synthesis of the information based on criteria that may vary depending on the objectives of each project. Thus, the difficulty in obtaining a neutral survey of the historical site or monument while at the same time satisfying all the requirements of both restorers and historians of architecture.

The applications of Immersive Virtual Reality are also explained, specifically as a recording tool to study the architectural registry of a site or monument, and thus be able to analyze, understand, and explain ancient architecture thanks to the possibilities it provides for visualizing and interacting with the virtual world, and the necessity to count on better interfaces to explore the available information. Additionally, this technology is used to improve teaching methods, especially in, but not restricted to, the fields of art history and architecture; the different ways in which it was applied and the cognitive factors involved in these experiences are also discussed.

Finally, we will comment on the difficulties we have faced while attempting to achieve the same level of quality in the projection in the virtual reality facility than that of the 3D model, and in controlling the parameters which might help reproduce an accurate perception of the 3D world.

  Keynote Lecture 3

General Imaging - Design, Modelling and Applications
    Prof. Peter Sturm
INRIA Rhône-Alpes
web page
Brief Bio:

Peter obtained MSc degrees from INPG (National Polytechnical Institute of Grenoble, France) and the University of Karlsruhe, both in 1994, and a PhD degree from INPG in 1997. His PhD thesis was awarded the SPECIF award (given to one French PhD thesis per year in Computer Science). After a two-year post-doc at Reading University, working with Steve Maybank, he joined INRIA on a permanent research position in 1999.

He has been a member of programme committees for over 40 events, among which all major conferences in computer vision, image processing and pattern recognition. He was an Area Chair for the 2006 European Conference on Computer Vision and is on the Editorial Board of the Image and Vision Computing journal. He is organization co-chair of the 2008 European Conference on Computer Vision, and has organized workshops and given tutorials at several conferences.

His main research topics are in Computer Vision, and specifically related to camera (self-)calibration, 3D reconstruction and motion estimation, both for traditional perspective cameras and omnidirectional sensors.

During his undergraduate studies, he had his own one-person software company, within which he was mainly writing and selling software for the organization of sports events. He was involved in the organization of the 2001 Judo World Championships, the 1999 Sumo Amateur World Championships (the first ever to be held outside Japan), the 1994 Judo University World Championships, two European Championships and numerous other international and national events.

Different image-based applications may benefit from different imaging technologies.

A popular example are omnidirectional cameras which, due to their large field of view, are extremely useful in robotics and videosurveillance.

We will review these and other imaging technologies that go beyond usual cameras and explain motivations for using them.

The core of the lecture will concern the geometric modelling of different imaging systems and its use for calibrating them and perform structure-from-motion computations such as motion estimation and 3D modelling.

  Keynote Lecture 4

Retail Vision-Based Self-Checkout: Exploring Real Time Real Purpose General Vision System
    Prof. Sharathchandra Pankanti
IBM - Exploratory Computer Vision Group
web page
Brief Bio:

Sharath Pankanti received the Ph.D. degree in computer science from Michigan State University, East Lansing, in 1995. He joined IBM T. J. Watson Research Center, Yorktown Heights, NY, in 1995 and was with the IBM Advanced Identification Project until 1999. During 2000–2001, he worked on “footprints”—a system for tracking people based on their infrared emission. From 2001 to 2003, he worked on PeopleVision, a system for detecting and tracking individuals in indoor and outdoor environments. From 2003 to 2004, he worked on large-scale biometric indexing systems and since 2005, has worked on object recognition and human interface designs for effective security and convenience. He has co-edited a comprehensive book on biometrics Biometrics: Personal Identification (Kluwer, 1999) and coauthored A Guide to Biometrics (Springer, 2004).

Self-checkout systems are perceived as the future of retail checkout and are emerging as attractive business solutions that empower both retailers and consumers alike. The self checkout systems allow a shopper to checkout (e.g., purchase) products from a physical store with as little assistance from store staff. The self checkout systems need to validate the shopper item selection and accept appropriate payment for the transaction. Some simple self checkout systems such as bank ATMs, gas pumps, and airline kiosks are already very successful. Before self-checkout becomes ubiquitous for all point-of-sale applications, following three fundamental, challenging (and often conflicting) problems need to be overcome: (i) cost: the system must be reasonably inexpensive to build/install and should work with as much of the existing equipment as possible; (ii) Security: the system must effective against theft (small false miss rate) without annoying honest customers (small false alarm rate); and (iii) Usability: a usable (e.g. higher throughput) system must not unduly inconvenience the user and owner (e.g., retailer) In other words, inexpensive self-checkout lanes that are more accurate, easier to use, and faster will provide a better shopping experience.

The conventional automatic self checkout systems at retail stores are wanting in performance as defined by the three metrics defined above. The cost of the typical self checkout systems is significantly higher than the cashiered checkouts because the sensor instrumentation involves customized fabrication. The accuracy performance of the system may not be acceptable since the sensor measurements used for verification are impoverished in discriminatory information (e.g., item weight). Finally, the conventional self checkout technology has a very limited view of user interaction and, is therefore, not very user-friendly.

The new generations of self checkout systems are increasingly considering camera-based "video analytic" technology because they help satisfy all of these concerns. Driven largely by the cell phone and consumer photography market, inexpensive high-quality cameras are becoming commodity items. Coupled with the increasingly powerful CPUs found in point-of-sales, minimal additional resources are needed to run such systems. Moreover the visual information that can be obtained from cameras is much richer than that provided by other sensors, thus allowing better detection of fraud. Finally, cameras are relatively unobtrusive to the customer, and also provide new avenues for further augmenting usability. Designing camera-based self checkout system is a challenging computer vision research problem because there are tens of thousands of items in a store. Moreover, there is a wide variety of different forms, colors, shapes, and sizes that must be accounted for. Furthermore, because of a variety of different illumination conditions, learning invariant visual features of the shopping items is also very complex.

In this talk, we will present our research on computer vision based self checkout system design which is completely automatic in operation ranging from image capture, object segmentation, training/learning, and matching. Based on the real data involving thousand of shopping items collected over extended periods of time (more than 20 months), our experimental results demonstrate that visual technology is an effective and inexpensive component of design of next generation self checkout systems. Here are some of the specific results we will elucidate in the presentation:

(i) Cost: the estimated cost of the visually augmentation is relatively inexpensive and would afford removal of some of the existing sensor-based subsystems without affecting the accuracy performance. The estimated cost and resource requirements the self checkout system based exclusively on vision sensors is very attractive.

(ii) Accuracy: In several technology tests spanning more than a billion matches, we show that false positive rates (fraction of times one item is mistaken for the other) and false reject rates (fraction of time an item fails to match another image of the same item) of the visually augmented system is significantly better than its conventional counterpart. Our results demonstrate that the new visually augmented self checkout system is at least twice as accurate and twice as shopper-friendly as the existing technology. We also show that the exclusively vision-based self checkout system can also offer acceptable accuracy performance. We show that the statistical feature matcher performs significantly better in our design than it structural counterpart.

(iii) Usability: The results also quantify how the new technology will significantly improve shopper assistance, lane throughput, and shopper queue lengths. Further, we demonstrate that the visual system can be effectively trained from a very few samples arbitrarily selected from the shopping data. This “on-the-fly learning” design feature offers significant advantage since the manually training is impractical in a real store where there are tens of thousands items (many of which are changing their appearance on a weekly basis.

In summary, we conclude that the visual appearance of items is rich in information, that we can reliably extract this information, and that it is sufficiently distinctive to yield real-life practical general purpose vision system with acceptable item verification performance.