VISAPP 2018 Abstracts

Area 1 - Image Formation and Preprocessing

Full Papers
Paper Nr: 17
Title:

Towards an Augmented Reality Head Mounted Display System Providing Stereoscopic Wide Field of View for Indoor and Outdoor Environments with Interaction through the Gaze Direction

Authors:

Jessica Combier, Bertrand Vandeportaele and Patrick Danès

Abstract: An Augmented Reality prototype is presented. Its hardware architecture is composed of a Head Mounted Display, a wide Field of View (FOV) stereo-vision passive system, a gaze tracker and a laptop. An associated software architecture is proposed to immerse the user in augmented environments where he/she can move freely. The system maps the unknown real-world (indoor or outdoor) environment and is localized into this map by means of binocular state-of-the-art Simultaneous Localization and Mapping techniques. It overcomes the FOV limitations of conventional augmented reality devices by using wide-angle cameras and associated algorithms. It also solves the parallax issue induced by the distinct locations of the two cameras and of the user’s eyes by using Depth Image Based Rendering. An embedded gaze tracker, together with environment modeling techniques, enable gaze controlled interaction. A simple application is presented, in which a virtual object is inserted into the user’s FOV and follows his/her gaze. While the targeted real time performance has not yet been achieved, the paper discusses ways to improve both frame rate and latency. Other future works are also overviewed.

Paper Nr: 74
Title:

A Novel Handwritten Digits Recognition Method based on Subclass Low Variances Guided Support Vector Machine

Authors:

Abstract: Handwritten Digits Recognition (HWDR) is one of the very popular application in computer vision and it has always been a challenging task in pattern recognition. But it is very hard practical problem and many problems are still unresolved. To develop a high performance automatic HWDR, several learning algorithms have been proposed, studied and modified. Much of the effort involved in Handwritten digits classification with Support Vector Machine (SVM). More specifically, in the current study we are focusing on one-class SVM (OSVM) approaches which are of huge interest for our problem. Covariance Guided OSVM (COSVM) algorithm improves up on the OSVM method, by emphasizing the low variance directions. However, COSVM does not handle multi-modal target class data. Thus, we design a new subclass algorithm based on COSVM, which takes advantage of the target class clusters variance information. To investigate the effectiveness of the novel Subclass COSVM (SCOSVM), we compared our proposed approach with other methods based on other contemporary one-class classifiers, on well-known standard MNIST benchmark datasets and Optical Recognition of Handwritten Digits datasets. The experimental results verify the significant superiority of our method.

Paper Nr: 110
Title:

Infrared Image Enhancement in Maritime Environment with Convolutional Neural Networks

Authors:

Purbaditya Bhattacharya, Jörg Riechen and Udo Zölzer

Abstract: Image enhancement approach with Convolutional Neural Network (CNN) for infrared (IR) images from maritime environment, is proposed in this paper. The approach includes different CNNs to improve the resolution and to reduce noise artefacts in maritime IR images. The denoising CNN employs a residual architecture which is trained to reduce graininess and fixed pattern noise. The super-resolution CNN employs a similar architecture to learn the mapping from a low-resolution to multi-scale high-resolution images. The performance of the CNNs is evaluated on the IR test dataset with standard evaluation methods and the evaluation results show an overall improvement in the quality of the IR images.

Paper Nr: 122
Title:

Flash and Storm: Fast and Highly Practical Tone Mapping based on Naka-Rushton Equation

Authors:

Nikola Banić and Sven Lončarić

Abstract: Tone mapping operators (TMOs) are used to convert high dynamic range (HDR) images to their low dynamic range (LDR) versions mostly to display them on standard display devices. The problem with many TMOs that produce high-quality results is that they are too slow to be used in real-time applications. In this paper, a new TMO is proposed whose steps are primarily designed to achieve high speed and to be practically implementable. Under this constraint the secondary goal is to produce low dynamic range images of high quality. The proposed TMO is based on Naka-Rushton equation used in combination with additional improvements and it has O(1) per-pixel complexity. The presented and discussed results show that, beside being faster and more practical, the proposed TMO outperforms many state-of-the-art TMOs in terms of resulting LDR image quality. To further demonstrate its practicality, the source code written in C++, Matlab, Python, Java, and HTML+JavaScript is available at http://TO_BE_INSERTED.

Paper Nr: 138
Title:

A Simple and Exact Algorithm to Solve l1 Linear Problems - Application to the Compressive Sensing Method

Authors:

Igor Ciril, Jérôme Darbon and Yohann Tendero

Abstract: This paper considers l1-regularized linear inverse problems that frequently arise in applications. One striking example is the so called compressive sensing method that proposes to reconstruct a high dimensional signal u P Rn from low dimensional measurements Rm Q b Au, m ! n. The basis pursuit is another example. For most of these problems the number of unknowns is very large. The recovered signal is obtained as the solution to an optimization problem and the quality of the recovered signal directly depends on the quality of the solver. Theoretical works predict a sharp transition phase for the exact recovery of sparse signals. However, to the best of our knowledge, other state-of-the-art algorithms are not effective enough to accurately observe this transition phase. This paper proposes a simple algorithm that computes an exact l1 minimizer under the constraints Au b. This algorithm can be employed in many problems: as soon as A has full row rank. In addition, a numerical compar ison with standard algorithms available in the literature is exhibited. These comparisons illustrate that our algorithm compares advantageously: the aforementioned transition phase is empirically observed with a much better quality.

Paper Nr: 140
Title:

Supervised Person Re-ID based on Deep Hand-crafted and CNN Features

Authors:

Salma Ksibi, Mahmoud Mejdoub and Chokri Ben Amar

Abstract: Gaussian Fisher Vector (GFV) encoding is an extension of the conventional Fisher Vector (FV) that effectively discards the noisy background information by localizing the pedestrian position in the image. Nevertheless, GFV can only provide a shallow description of the pedestrian features. In order to capture more complex structural information, we propose in this paper a layered extension of GFV that we called LGFV. The representation is based on two nested layers that hierarchically refine the FV encoding from one layer to the next by integrating more spatial neighborhood information. Besides, we present in this paper a new rich multi-level semantic pedestrian representation built simultaneously upon complementary deep hand-crafted and deep Convolutional Neural Network (CNN) features. The deep hand-crafted feature is depicted by the combination of GFV mid-level features and high-level LGFV ones while a deep CNN feature is obtained by learning in a classification mode an effective embedding of the raw pedestrian pixels. The proposed deep hand-crafted features produce competitive accuracy with respect to the deep CNN ones without needing neither pre-training nor data augmentation, and the proposed multi-level representation further boosts the re-ID performance.

Paper Nr: 188
Title:

Dual-channel Geometric Registration of a Multispectral-augmented Endoscopic Prototype

Authors:

O. Zenteno, A. Krebs, S. Treuillet, Y. Lucas, Y. Benezeth and F. Marzani

Abstract: Multispectral measurement and analysis have proven to be useful to detect and monitor gastric pathologies at early stages. We developed a multispectral-augmented endoscopic prototype which allows exploration in the visible and near infrared range (400-1000 nm), increasing the common number of bands under analysis. The prototype comprises a fiberscope connected to two multispectral snapshot cameras which is inserted through the instrument channel of a commercial endoscope. However, due to aseptic practices, the system must be sterilized between exams, forcing physicians to remove and reintroduce it on each examination and leading to different relative positions between modalities. In the present work, we introduce an axial displacement correction function for dual-channel registration (i.e., RGB and multispectral) based on the insertion depth of the fiberscope. The performance was assessed using a chessboard pattern and its corner coordinates as ground truth. The mean RMSE error of the control points after registration using our method was 2.3 ± 0.7 pixels, whereas the RMSE error using a frame by frame homographic registration was 1.2 ± 0.4 pixels. In addition, the technique was tested on mouth exploration samples to simulate in-vivo acquisition. The results reveal that our method provides similar results when compared to a homographic transformation which would be impossible to perform in-vivo.

Short Papers
Paper Nr: 49
Title:

Joint Brightness and Tone Stabilization of Capsule Endoscopy Videos

Authors:

Sibren van Vliet, André Sobiecki and Alexandru Telea

Abstract: Pill endoscopy cameras generate hours-long videos that need to be manually inspected by medical specialists. Technical limitations of pill cameras often create large and uninformative color variations between neighboring frames, which make exploration more difficult. To increase the exploration efficiency, we propose an automatic method for joint intensity and hue (tone) stabilization that reduces such artifacts. Our method works in real time, has no free parameters, and is simple to implement. We thoroughly tested our method on several real-world videos and quantitatively and qualitatively assessed its results and optimal parameter values by both image quality metrics and user studies. Both types of comparisons strongly support the effectiveness, ease-of-use, and added value claims for our new method.

Paper Nr: 98
Title:

Reliable Stereoscopic Video Streaming Considering Important Objects of the Scene

Authors:

Ehsan Rahimi and Chris Joslin

Abstract: In this paper, we introduce a new reliable method of stereoscopic Video Streaming based on multiple description coding strategy. The proposed multiple description coding generates 3D video descriptions considering interesting objects contained in its scene. To be able to find interesting objects in the scene, we use two metrics from the second order statistics of the depth map image in a block-wise manner. Having detected the objects, the proposed multiple description coding algorithm generates the 3D video descriptions for the color video using a non-identical decimation method with respect to the identified objects. The objective test results verify the fact that the proposed method provides an improved performance than that provided by the polyphase subsampling multiple description coding and our previous work using pixel variation.

Paper Nr: 111
Title:

360 Panorama Super-resolution using Deep Convolutional Networks

Authors:

Vida Fakour-Sevom, Esin Guldogan and Joni-Kristian Kämäräinen

Abstract: We propose deep convolutional neural network (CNN) based super-resolution for 360 (equirectangular) panorama images used by virtual reality (VR) display devices (e.g. VR glasses). Proposed super-resolution adopts the recent CNN architecture proposed in (Dong et al., 2016) and adapts it for equirectangular panorama images which have specific characteristics as compared to standard cameras (e.g. projection distortions). We demonstrate how adaptation can be performed by optimizing the trained network input size and fine-tuning the network parameters. In our experiments with 360 panorama images of rich natural content CNN based super-resolution achieves average PSNR improvement of 1.36 dB over the baseline (bicubic interpolation) and 1.56 dB by our equirectangular specific adaptation.

Paper Nr: 117
Title:

Efficient Line Tracker in the Parameter Space based on One-to-one Hough Transform

Authors:

Yannick Wend Kuni Zoetgnande and Antoine Manzanera

Abstract: We propose a new method for line detection and tracking in videos in real-time. It is based on an optimised version of the dense Hough transform, that computes, via one-to-one projection, an accumulator in the polar parameter space using the gradient direction in the grayscale image. Our method then performs mode (cluster) tracking in the Hough space, using prediction and matching of the clusters based on both their position and appearance. The implementation takes advantage of the high performance video processing library Video++, that allows to parallelise simply and efficiently many video primitives.

Paper Nr: 2
Title:

A Real-Time Edge-Preserving Denoising Filter

Authors:

Simon Reich, Florentin Wörgötter and Babette Dellen

Abstract: Even in todays world, where augmented reality glasses and 3d sensors become rapidly less expensive and widely more used, the most important sensor remains the 2d RGB camera. Every camera is an optical device and prone to sensor noise, especially in dark environments or environments with extreme high dynamic range. The here introduced filter removes a wide variation of noise, for example Gaussian noise and salt-and-pepper noise, but preserves edges. Due to the highly parallel structure of the method, the implementation on a GPU runs in real-time, allowing us to process standard images within tens of milliseconds. The filter is first tested on 2d image data and based on the Berkeley Image Dataset and Coco Dataset we outperform other standard methods. Afterwards, we show a generalization to arbitrary dimensions using noisy low level sensor data. As a result the filter can be used not only for image enhancement, but also for noise reduction on sensors like acceleremoters, gyroscopes, or GPS-trackers, which are widely used in robotic applications.

Paper Nr: 40
Title:

Efficient Projective Transformation and Lanczos Interpolation on ARM Platform using SIMD Instructions

Authors:

Abstract: This paper proposes a novel way of exploiting NEON SIMD instructions for accelerating projective transformation in ARM platforms. Instead of applying data parallelism to linear algorithms, we study the effectiveness of SIMD intrinsics on this non-linear algorithm. For image resampling, Lanczos interpolation is used since it is adequately accurate, despite its rather large complexity. Multithreading is also employed for optimal use of system resources. Moreover, qualitative and quantitative results of NEON’s performance are presented and analyzed.

Paper Nr: 65
Title:

Point Cloud Simplification by Clustering for Robotics and Computer Vision Applications

Authors:

Benjamin Bird, Barry Lennox and Simon Watson

Abstract: This paper introduces new methods of point cloud simplification methods (reducing the complexity whilst preserving integrity), which are targeted at mobile robotics applications, where computational power is limited and online processing is beneficial. A review of existing point cloud simplification methods has been conducted which highlighted the existing methods focus on maintaining the accuracy of the point cloud rather than the computational requirements. The proposed algorithms presented are compared to known clustering algorithms intended for embedded application, and are evaluated based on the computational time required to generate a clustered point cloud, the quality of the resulting point cloud and its suitability to form a polygonal mesh. All algorithms were benchmarked on several popular single board computers (SBCs) as well as an x86 computer. The proposed algorithms are shown to have significantly better performance in terms of computational time, compared to existing methods, whilst attempting to maintain overall quality.

Paper Nr: 68
Title:

Multi-Forest Classification and Layered Exhaustive Search using a Fully Hierarchical Hand Posture/Gesture Database

Authors:

Abstract: In this paper, we propose a systematic approach to building an entirely hierarchical hand posture database. The hierarchy provides the possibility of considering a large number of hand poses while requires a low time-space complexity for construction. Furthermore, two algorithms (random decision forest and exhaustive search) are chosen and tested on this database. We show that by utilizing such a database one will achieve better performances on classifiers’ training and search strategies (two main categories of the algorithms in the field of machine learning) compared with conventional (all-in-one-layer) databases.

Paper Nr: 73
Title:

Depth-Map Compression using Directional Wavelets Transform

Authors:

Sourena Maroofi and Boshra Rajaei

Abstract: In this work, a new coding scheme for depth-map compression in multiview video application is presented. The special structure of these images with sharp edges and large gradient smooth regions, distinguishes them from natural images. Normal quantization in coders employing conventional transforms may result in the information spread close to sharp edges. Here, we propose a sharp variant of the well-known directional wavelet transform which prevents the wavelet support from cutting depth-map sharp edges. This has been achieved by estimating the location of edges using the corresponding texture frames in a multiview system. We call the coding algorithm as sharp directional wavelet (SDIW) transform and we numerically show that a significant decrease in the energy of wavelet coefficients improves performance compared to the other coding schemes using for instance separable wavelets.

Paper Nr: 99
Title:

Low Complex Image Resizing Algorithm using Fixed-point Integer Transformation

Authors:

James McAvoy, Ehsan Rahimi and Chris Joslin

Abstract: This paper proposes an efficient image resizing algorithm, including both halving and doubling, in the DCT domain. The proposed image resizing algorithm works on a 4 by 4 DCT block framework with a lower complexity compared to the similar previous methods. Compared to the images that were halved or doubled through the bilinear interpolation, the proposed algorithm produces images with similar or higher PSNR or SSIM values at the significantly lower computational cost. The test results also confirm that our approach improves the current frequency domain resizing algorithms through the fixed-point integer transformation which reduces the computational cost by more than 60\% with negligible dB loss.

Paper Nr: 100
Title:

A Video Dataset for an Efficient Camcording Attack Evaluation

Authors:

Asma Kerbiche, Saoussen Ben Jabra, Ezzeddine Zagrouba and Vincent Charvillat

Abstract: Any video watermarking scheme dedicated to copyright protection should be robust against several attacks and especially against malicious and dangerous attacks such as camcording. Indeed, this attack has become a real problem for cinematographic production companies. However, until now the researchers don't evaluate the robustness of their video watermarking approaches against this attack or they consider it as a combination of some usual attacks. To resolve this problem, several studies proposed camcording simulators which encourage and help researchers in video watermarking domain to include the camcording in the robustness evaluation. In this paper, a dataset of camcorder videos dedicated to an efficient robustness evaluation of watermarking schemes is proposed which can help researches on camcording simulators' creation. In this dataset, videos are captured in realistic scenarios in the cinema and are recorded using five capture devices and from four positions. In more, the proposed dataset contains marked versions of the proposed videos using three different video watermarking techniques. This allows researchers comparing their approaches with these techniques. Experimental results show that the robustness evaluation based on the proposed dataset is more efficient than simulators based evaluation thanks to the diversity of the used capturing devices and the real conditions of videos recording.

Paper Nr: 115
Title:

Stereo and LIDAR Fusion based Detection of Humans and Other Obstacles in Farming Scenarios

Authors:

Stefan-Daniel Suvei, Frederik Haarslev, Leon Bodenhagen and Norbert Krüger

Abstract: In this paper we propose a fusion method which uses the depth information acquired from a LIDAR sensor to guide a block matching stereo algorithm. The resulting fused point clouds are then used for obstacle detection, either by processing the raw data and clustering the protruding objects in the scene, or by applying a Convolutional Neural Network on the 3D points and labeling them into classes. The performance of the proposed method is evaluated by carrying out a series of experiments on different data sets obtained from the SAFE robotic platform. The results show that the fusion algorithm significantly improves the F1 detection score of the trained networks.

Paper Nr: 123
Title:

Unsupervised Learning for Color Constancy

Authors:

Nikola Banić and Sven Lončarić

Abstract: Most digital camera pipelines use color constancy methods to reduce the influence of illumination and camera sensor on the colors of scene objects. The highest accuracy of color correction is obtained with learning-based color constancy methods, but they require a significant amount of calibrated training images with known ground-truth illumination. Such calibration is time consuming, preferably done for each sensor individually, and therefore a major bottleneck in acquiring high color constancy accuracy. Statistics-based methods do not require calibrated training images, but they are less accurate. In this paper an unsupervised learning-based method is proposed that learns its parameter values after approximating the unknown ground-truth illumination of the training images, thus avoiding calibration. In terms of accuracy the proposed method outperforms all statistics-based and many state-of-the-art learning-based methods. The results are presented and discussed.

Paper Nr: 129
Title:

The Application of Neural Networks for Facial Landmarking on Mobile Devices

Authors:

Connah Kendrick, Kevin Tan, Kevin Walker and Moi Hoon Yap

Abstract: Many modern mobile applications incorporate face detection and landmarking into their systems, such as Snapchat, beauty filters and camera auto-focusing systems, where they implement regression based machine learning algorithms for accurate face landmark detection, allowing the manipulation of facial appearance. The mobile applications that incorporate machine learning have to overcome issues such as lighting, occlusion, camera quality and false detections. A solution could be provided through the resurgence of deep learning with neural networks, as they are showing significant improvements in accuracy and reliability in comparison to the state-of-the-art machine learning. Here, we demonstrate the process by using trained networks on mobile devices and review its effectiveness. We also compare the effects of employing max-pooling layers, as an efficient method to reduce the required processing power. We compared network with 3 different amounts of max-pooling layer and ported one to the mobile device, the other two could not be ported due to memory restrictions. We will be releasing all code to build, train and use the model in a mobile application. The results show that despite the limited processing capability of mobile devices, neural networks can be used for difficult challenges while still working in real-time. We show a network running on a mobile device on a live data stream and give a recommendation on the structure of the network.

Paper Nr: 165
Title:

Does Vision Work Well Enough for Industry?

Authors:

Frederik Hagelskjær, Anders Glent Buch and Norbert Krüger

Abstract: A multitude of pose estimation algorithms has been developed in the last decades and many proprietary computer vision packages exist which can simplify the setup process. Despite this, pose estimation still lacks the ease of use that robots have attained in the industry. The statement ”vision does not work” is still not uncommon in the industry, even from integrators. This points to difficulties in setting up solutions in industrial applications. In this paper, we analyze and investigate the current usage of pose estimation algorithms. A questionnaire was sent out to both university and industry. From this survey, it is clear that the actual setup time of pose estimation solutions is on average between 1–2 weeks, which poses a severe hindrance for the application of pose estimation algorithms. Finally, steps required for facilitating the use of pose estimation systems are discussed that can reduce complexities and thus the setup times in industrial applications.

Paper Nr: 171
Title:

Fast Detection and Removal of Glare in Gray Scale Laparoscopic Images

Authors:

Nefeli Lamprinou and Emmanouil Z. Psarakis

Abstract: Images captured by laparoscopic cameras, often suffer from glare due to specular reflections from surgical tools and some tissue surfaces that can disturb the attention of surgeon. In this paper, inspired by their form, the photometric distortions caused by specular reflections are modeled as the superposition of a smooth and a pulse shaped curve. Based on this model a new fast technique for the detection and removal of glare in gray scale laparoscopic images is proposed. The proposed technique, as well as other state of the art image inpainting algorithms are used in a number of experiments based on artificial and real laparoscopic data, and the proposed algorithm seems to outperform its rivals.

Paper Nr: 183
Title:

Infrared Microscopic Imaging Analysis

Authors:

Anselmo Jara, Guillermo Machuca, Sergio Torres and Pablo Gutiérrez

Abstract: In this paper, we present imaging processing advances and applications of mid-wavelength infrared (MWIR) microscopy imaging. Practical issues related to imaging acquisition, image nonuniformity correction, infrared image quality assessment, and even the MWIR microscope optical Point Spread Function experimental estimation are discussed. The built-up MWIR microscope imaging system allows us to analyse thermal features near to the system diffraction limit, up to 200 frames per second and to focus on less than 2 mm2 area. On basis of this technology, our group has been focused efforts in exothermal biological processes, achieving the results exposed in this paper.

Paper Nr: 191
Title:

A Study on Calibration Methods for Infrared Focal Plane Array Cameras

Authors:

Rasim Caner Çalık, Emre Tunali, Burak Ercan and Sinan Öz

Abstract: Imaging systems that are benefiting from infrared focal plane arrays (IRFPA) inevitably suffer from some visually unpleasant artifacts due to limits of detector materials and manufacturing processes. To address these artifacts and benefit the most from IRFPAs, factory level calibrations become obligatory. Considering nonlinear characteristics of infrared focal plane arrays, fixed pattern noise elimination, a.k.a. non-uniformity correction (NUC), and bad pixel replacement are considered as the most crucial calibration processes for capturing details of the scene. In this paper, we present two different NUC methods from two different families (temperature and integration time based NUC), together with a bad pixel detection strategy in order to achieve wide dynamic range and maximized contrast span.

Paper Nr: 194
Title:

White Blood Cells Counting Via Vector Field Convolution Nuclei Segmentation

Authors:

Simone Porcu, Andrea Loddo, Lorenzo Putzu and Cecilia Di Ruberto

Abstract: Haematological procedures like analysis, counting and classification of White Blood Cells (WBCs) are very helpful in the medical field, in order to recognize a pathology, e.g., WBCs analysis leukaemia correlation. Expert technicians manually perform these procedures, therefore, they are influenced by their tiredness and subjectivity. Their automation is still an open issue. Our proposal aims to replicate every single step of the haematologists’ job with a semi-automatic system. The main targets of this work are to decrease the time needed for an analysis and to improve the efficiency of the procedure. It is based on the Vector Field Convolution (VFC) to describe cells edges, going beyond more classic methods like the active contour model. This approach is crucial to face the WBCs clumps and overlaps segmentation issue. To sum up, we defined a system that is able to recognise the leukocytes, to differentiate them from the other blood cells and, finally, to divide the overlapping leukocytes. Experimental results obtained on three public datasets showed that the method is accurate and robust, outperforming the state of the art methods for cells clumps identification and cells counting.

Area 2 - Image and Video Analysis

Full Papers
Paper Nr: 14
Title:

Computer Vision based System for Apple Detection in Crops

Authors:

Mercedes Marzoa Tanco, Gonzalo Tejera and Matías Di Martino

Abstract: In recent times there has been an increasing need to improve apple production competitiveness. The automatic estimation of the crop yield or the automatic collection may contribute to this improvement. This article proposes a simple and efficient approach to automatically detect the apples present on a given set of images. We tested the proposed algorithm on several images taken on many different apple crops under natural lighting conditions. The proposed method has two main steps. First we implement a classification step in which each pixel is classified as part of an apple (positive pixel) or as part of the background (negative pixel). Then, a second step explore the morphology of the set of positive pixels, to detect the most likely configuration of circular structures. We compare the performance of methods such as: Support Vector Machine, k-Nearest Neighbor and a basic Decision Tree on the classification step. A database with 266 high resolution images was created and made publicly available. This database was manually labeled and we provide for each image, a label (positive or negative) for each pixel, plus the location of the center of each apple.

Paper Nr: 15
Title:

Fast 3D Scene Alignment with Stereo Images using a Stixel-based 3D Model

Authors:

Dennis W. J. M. van de Wouw, Willem P. Sanberg, Gijs Dubbelman and Peter H. N. de With

Abstract: Scene alignment for images recorded from different viewpoints is a challenging task, especially considering strong parallax effects. This work proposes a diorama-box model for a 2.5D hierarchical alignment approach, which is specifically designed for image registration from a moving vehicle using a stereo camera. For this purpose, the Stixel World algorithm is used to partition the scene into super-pixels, which are transformed to 3D. This model is further refined by assigning a slanting orientation to each stixel and by interpolating between stixels, to prevent gaps in the 3D model. The resulting alignment shows promising results, where under normal viewing conditions, more than 96% of all annotated points are registered with an alignment error up to 5 pixels at a resolution of 1920x1440 pixels, executing at near-real time performance (4 fps) for the intended application.

Paper Nr: 28
Title:

Multiple Sclerosis Lesion Segmentation using Improved Convolutional Neural Networks

Authors:

Erol Kazancli, Vesna Prchkovska, Paulo Rodrigues, Pablo Villoslada and Laura Igual

Abstract: The Multiple Sclerosis (MS) lesion segmentation is critical for the diagnosis, treatment and follow-up of the MS patients. Nowadays, the MS lesion segmentation in Magnetic Resonance Image (MRI) is a time-consuming manual process carried out by medical experts, which is subject to intra- and inter- expert variability. Machine learning methods including Deep Learning has been applied to this problem, obtaining solutions that outperformed other conventional automatic methods. Deep Learning methods have especially turned out to be promising, attaining human expert performance levels. Our aim is to develop a fully automatic method that will help experts in their task and reduce the necessary time and effort in the process. In this paper, we propose a new approach based on Convolutional Neural Networks (CNN) to the MS lesion segmentation problem. We study different CNN approaches and compare their segmentation performance. We obtain an average dice score of 57.5% and a true positive rate of 59.7% for a real dataset of 59 patients with a specific CNN approach, outperforming the other CNN approaches and a commonly used automatic tool for MS lesion segmentation.

Paper Nr: 62
Title:

Towards Image Colorization with Random Forests

Authors:

Helge Mohn, Mark Gaebelein, Ronny Hänsch and Olaf Hellwich

Abstract: Image colorization refers to the task of assigning color values to grayscale images. While previous work is based on either user input or very large training data sets, the proposed method is fully automatic and based on several orders of magnitude less training data. A Random Forest variation is tailored towards the regression task of estimating the proper color values when presented with a grayscale image patch. A simple position prior as well as scale invariance are included in order to improve the estimation results. The proposed approach leads to satisfying results over various colorization tasks and compares favorably with state of the art based on convolutional networks.

Paper Nr: 108
Title:

Anomaly Detection in Crowded Scenes Using Log-Euclidean Covariance Matrix

Authors:

Efsun Sefa Sezer and Ahmet Burak Can

Abstract: In this paper, we propose an approach for anomaly detection in crowded scenes. For this purpose, two important types of features that encode motion and appearance cues are combined with the help of covariance matrix. Covariance matrices are symmetric positive definite (SPD) matrices which lie in the Riemannian manifold and are not suitable for Euclidean operations. To make covariance matrices suitable for use in the Euclidean space, they are converted to log-Euclidean covariance matrices (LECM) by using log-Euclidean framework. Then LECM features created in two different ways are used with one-class SVM to detect abnormal events. Experiments carried out on an anomaly detection benchmark dataset and comparison made with previous studies show that successful results are obtained.

Paper Nr: 113
Title:

Keyframe-based Video Summarization with Human in the Loop

Authors:

Antti E. Ainasoja, Antti Hietanen, Jukka Lankinen and Joni-Kristian Kämäräinen

Abstract: In this work, we focus on the popular keyframe-based approach for video summarization. Keyframes represent important and diverse content of an input video and a summary is generated by temporally expanding the keyframes to key shots which are merged to a continuous dynamic video summary. In our approach, keyframes are selected from scenes that represent semantically similar content. For scene detection, we propose a simple yet effective dynamic extension of a video Bag-of-Words (BoW) method which provides over segmentation (high recall) for keyframe selection. For keyframe selection, we investigate two effective approaches: local region descriptors (visual content) and optical flow descriptors (motion content). We provide several interesting findings. 1) While scenes (visually similar content) can be effectively detected by region descriptors, optical flow (motion changes) provides better keyframes. 2) However, the suitable parameters of the motion descriptor based keyframe selection vary from one video to another and average performances remain low. To avoid more complex processing, we introduce a human-in-the-loop step where user selects keyframes produced by the three best methods. 3) Our human assisted and learning-free method achieves superior accuracy to learning-based methods and for many videos is on par with average human accuracy.

Short Papers
Paper Nr: 12
Title:

Line-based Registration of Photogrammetric Point Clouds with 3D City Models by Means of Mixed Integer Linear Programming

Authors:

Steffen Goebbels and Regina Pohle-Fröhlich

Abstract: This paper describes a method to align photogrammetric point clouds with CityGML 3D city models. Amongst others, we use photogrammetric point clouds that are generated from videos taken from the driver’s perspective of a car. Clouds are computed with the Structure-from-Motion algorithm. We detect wall planes to rotate these clouds so that walls become vertical. This allows us to find buildings’ footprints by accumulating points that are orthogonally projected to the ground. Thus, the main alignment step can be performed in 2D. To this end, we match detected footprints with corresponding footprints of CityGML models in a x-y-plane based on line segments. These line segments are detected using a probabilistic Hough transform. Then we apply a Mixed Integer Linear Program to find a maximum number of matching line segment pairs. Using a Linear Program, we optimize a rigid affine transformation to align the lines of these pairs. Finally, we use height information along CityGML terrain intersection lines to estimate scaling and translation in z-direction. By combining the results, we obtain an affine mapping that aligns the point cloud with the city model. Linear Programming is not widely applied to registration problems; however the technique presented is a fast alternative to Iterative Closest Point algorithms that align photogrammetric point clouds with clouds sampled from city models.

Paper Nr: 42
Title:

Mind the Regularized GAP, for Human Action Classification and Semi-supervised Localization based on Visual Saliency

Authors:

Marc Moreaux, Natalia Lyubova, Isabelle Ferrané and Frédéric Lerasle

Abstract: This work addresses the issue of image classification and localization of human actions based on visual data acquired from RGB sensors. Our approach is inspired by the success of deep learning in image classification. In this paper, we describe our method and how the concept of Global Average Pooling (GAP) applies in the context of semi-supervised class localization. We benchmark it with respect to Class Activation Mapping initiated in (Zhou et al., 2016), propose a regularization over the GAP maps to enhance the results, and study whether a combination of these two ideas can result in a better classification accuracy. The models are trained and tested on the Stanford 40 Action dataset (Yao et al., 2011) describing people performing 40 different actions such as drinking, cooking or watching TV. Compared to the aforementioned baseline, our model improves the classification accuracy by 5.3 percent points, achieves a localization accuracy of 50.3%, and drastically diminishes the computation needed to retrieve the class saliency from the base convolutional model.

Paper Nr: 47
Title:

Segmentation of 3D Point Clouds using a New Spectral Clustering Algorithm Without a-priori Knowledge

Authors:

Hannes Kisner and Ulrike Thomas

Abstract: For many applications like pose estimation it is important to obtain good segmentation results as a pre-processing step. Spectral clustering is an efficient method to achieve high quality results without a priori knowledge about the scene. Among other methods, it is either the k-means based spectral clustering approach or the bi-spectral clustering approach, which are suitable for 3D point clouds. In this paper, a new method is introduced and the results are compared to these well-known spectral clustering algorithms. When implementing the spectral clustering methods key issues are: how to define similarity, how to build the graph Laplacian and how to choose the number of clusters without any or less a-priori knowledge. The suggested spectral clustering approach is described and evaluated with 3D point clouds. The advantage of this approach is that no a-priori knowledge about the number of clusters is necessary and not even the number of clusters or the number of objects need to be known. With this approach high quality segmentation results are achieved.

Paper Nr: 66
Title:

The Shape of an Image - A Study of Mapper on Images

Authors:

Alejandro Robles, Mustafa Hajij and Paul Rosen

Abstract: We study the topological construction called Mapper in the context of simply connected domains, in particular on images. The Mapper construction can be considered as a generalization for contour, split, and joint trees on simply connected domains. A contour tree on an image domain assumes the height function to be a piecewise linear Morse function. This is a rather restrictive class of functions and does not allow us to explore the topology for most real world images. The Mapper construction avoids this limitation by assuming only continuity on the height function allowing this construction to robustly deal with a significantly larger set of images. We provide a customized construction for Mapper on images, give a fast algorithm to compute it, and show how to simplify the Mapper structure in this case. Finally, we provide a simple procedure that guarantees the equivalence of Mapper to contour, join, and split trees on a simply connected domain.

Paper Nr: 82
Title:

Optimization of Person Re-Identification through Visual Descriptors

Authors:

Naima Mubariz, Saba Mumtaz, M. M. Hamayun and M. M. Fraz

Abstract: Person re-identification is a complex computer vision task which provides authorities a valuable tool for maintaining high level security. In surveillance applications, human appearance is considered critical since it possesses high discriminating power. Many re-identification algorithms have been introduced that employ a combination of visual features which solve one particular challenge of re-identification. This paper presents a new type of feature descriptor which incorporates multiple recently introduced visual feature representations such as Gaussian of Gaussian (GOG) andWeighted Histograms of Overlapping Stripes (WHOS) latest version into a single descriptor. Both these feature types demonstrate complementary properties that creates greater overall robustness to re-identification challenges such as variations in lighting, pose, background etc. The new descriptor is evaluated on several benchmark datasets such as VIPeR, CAVIAR4REID, GRID, 3DPeS, iLIDS, ETHZ1 and PRID450s and compared with several state-of-the-art methods to demonstrate effectiveness of the proposed approach.

Paper Nr: 93
Title:

3D Adaptive Histogram Equalization Method for Medical Volumes

Authors:

Paulo Amorim, Thiago Moraes, Jorge Silva and Helio Pedrini

Abstract: Medical imaging plays a fundamental role in the diagnosis and treatment of several diseases, enabling the visualization of internal organs and tissues for use in clinical procedures. The quality of medical images can be degraded by several factors, such as noise and poor contrast. The application of filtering and contrast enhancement techniques is usually necessary to improve the quality of images, which facilitates the segmentation and classification stages. In this paper, we develop and analyze a novel three-dimensional adaptive histogram equalization method for improving contrast in the context of medical imaging. Several data sets are used to demonstrate the effectiveness of the proposed approach.

Paper Nr: 95
Title:

Foveal Vision for Instance Segmentation of Road Images

Authors:

Benedikt Ortelt, Christian Herrmann, Dieter Willersinn and Jürgen Beyerer

Abstract: Instance-based semantic labeling is an important task for the interpretation of images in the area of autonomous or assisted driving applications. Not only indicating the semantic class for each pixel of an image, but also separating different instances from the same class, even if neighboring in the image, it can replace a multi-class object detector. In addition, it offers a better localization of objects in the image by replacing the object detector bounding box with a fine-grained object shape. The recently presented Cityscapes dataset promoted this topic by offering a large set of data labeled on pixel level. Building on the previous work of \cite{uhrig2016b}, this work proposes two improvements compared to this baseline strategy leading to significant performance improvements. First, a better distance measure for angular differences, which is unaffected by the $-\pi/\pi$ discontinuity, is proposed. This leads to improved object center localization. Second, the imagery from vehicle perspective includes a fixed vanishing point. A foveal concept counteracts the fact that objects get smaller in the image towards this point. This strategy especially improves the results for small objects in large distances from the vehicle.

Paper Nr: 141
Title:

Combination of Texture and Geometric Features for Age Estimation in Face Images

Authors:

Marcos Vinicius Mussel Cirne and Helio Pedrini

Abstract: Automatic age estimation from facial images has recently received an increasing interest due to a variety of applications, such as surveillance, human-computer interaction, forensics, and recommendation systems. Despite such advances, age estimation remains an open problem due to several challenges associated with the aging process. In this work, we develop and analyze an automatic age estimation method from face images based on a combination of textural and geometric features. Experiments are conducted on the Adience dataset (Adience Benchmark, 2017; Eidinger et al., 2014), a large known benchmark used to evaluate both age and gender classification approaches.

Paper Nr: 151
Title:

Image Segmentation of Multi-shaped Overlapping Objects

Authors:

Kumar Abhinav, Jaideep Singh Chauhan and Debasis Sarkar

Abstract: In this work, we propose a new segmentation algorithm for images containing convex objects present in multiple shapes with a high degree of overlap. The proposed algorithm is carried out in two steps, first we identify the visible contours, segment them using concave points and finally group the segments belonging to the same object. The next step is to assign a shape identity to these grouped contour segments. For images containing objects in multiple shapes we begin first by identifying shape classes of the contours followed by assigning a shape entity to these classes. We provide a comprehensive experimentation of our algorithm on two crystal image datasets. One dataset comprises of images containing objects in multiple shapes overlapping each other and the other dataset contains standard images with objects present in a single shape. We test our algorithm against two baselines, with our proposed algorithm outperforming both the baselines.

Paper Nr: 156
Title:

Dynamic Multiscale Tree Learning using Ensemble Strong Classifiers for Multi-label Segmentation of Medical Images with Lesions

Authors:

Samya Amiri, Mohamed Ali Mahjoub and Islem Rekik

Abstract: We introduce a dynamic multiscale tree (DMT) architecture that learns how to leverage the strengths of different state-of-the-art classifiers for supervised multi-label image segmentation. Unlike previous works that simply aggregate or cascade classifiers for addressing image segmentation and labeling tasks, we propose to embed strong classifiers into a tree structure that allows bi-directional flow of information between its classifier nodes to gradually improve their performances. Our DMT is a generic classification model that inherently embeds different cascades of classifiers while enhancing learning transfer between them to boost up their classification accuracies. Specifically, each node in our DMT can nest a Structured Random Forest (SRF) classifier or a Bayesian Network (BN) classifier. The proposed SRF-BN DMT architecture has several appealing properties. First, while SRF operates at a patch-level (regular image region), BN operates at the super-pixel level (irregular image region), thereby enabling the DMT to integrate multi-level image knowledge in the learning process. Second, the proposed DMT robustly overcomes the limitations of the aggregated classifiers through the ascending and descending flow of contextual information between each parent node and its children nodes. Third, we train DMT using different scales to capture a coarse-to-fine image details. Last, DMT demonstrates its outperformance in comparison to several state-of-the-art segmentation methods for multi-labeling of brain images with gliomas.

Paper Nr: 169
Title:

Towards Spatio-temporal Face Alignment in Unconstrained Conditions

Authors:

Abstract: Face alignment is an essential task for many applications. Its objective is to locate feature points on the face, in order to identify its geometric structure. Under unconstrained conditions, the different variations that may occur in the visual context, together with the instability of face detection, make it a difficult problem to solve. While many methods have been proposed, their performances under these constraints are still not satisfactory. In this article, we claim that face alignment should be studied using image sequences rather than still images, as it has been done so far. We show the importance of taking into consideration the temporal information under unconstrained conditions.

Paper Nr: 172
Title:

Real-time Image Registration with Region Matching

Authors:

Charles Beumier and Xavier Neyt

Abstract: Image registration, the task of aligning two images, is a fundamental operation for applications like image stitching or image comparison. In our project in surveillance for route clearance operations, a drone will be used to detect suspicious people and vehicles. This paper presents an approach for real-time image alignment of video images acquired by a moving camera. The high correlation between successive images allows for relatively simple algorithms. We considered region segmentation as an alternative to the more classical corner or interest point detectors and evaluated the appropriateness of connected component labeling with a connectivity defined by the gray-level similarity between neighboring pixels. Real-time processing is intended thanks to a very fast segment-based (as opposed to pixel-based) connected component labeling. The regions, even if not always pleasing the human eye, proved stable enough to be linked across images by trivial features such as the area and the centroid. The vector shifts between matching regions were filtered and modeled by an affine transform. The paper discusses the execution time obtained by this feasibility study for all the steps needed for image registration and indicates the planned improvements to achieve real-time.

Paper Nr: 175
Title:

Plantation Rows Identification by Means of Image Tiling and Hough Transform

Authors:

Guilherme Afonso Soares, Daniel Abdala and Mauricio Escarpinati

Abstract: In this work we address the problem of plantation rows identification on UAV imaged coffee crop fields. A fair number of approaches address the problem using the Hough Transform. However it assumes the plantation lines are straight which is hardly the case in Aerial images. We propose a tiling scheme which allows one to acceptably approximate the rows inside each tile to straight lines making it feasible to apply the Hough Transform. Experimental results compared to ground truths seems to indicate the proposed approach successfully approximate real plantation rows.

Paper Nr: 63
Title:

AngioUnet - A Convolutional Neural Network for Vessel Segmentation in Cerebral DSA Series

Authors:

Christian Neumann, Klaus-Dietz Tönnies and Regina Pohle-Fröhlich

Abstract: The U-net is a promising architecture for medical segmentation problems. In this paper, we show how this architecture can be effectively applied to cerebral DSA series. The usage of multiple images as input allows for better distinguishing between vessel and background. Furthermore, the U-net can be trained with a small corpus when combined with useful data augmentations like mirroring, rotation, and additionally biasing. Our variant of the network achieves a DSC of 87.98% on the segmentation task. We compare this to different configurations and discuss the effect on various artifacts like bones, glue, and screws.

Paper Nr: 83
Title:

Image Analysis based on Radon-type Integral Transforms Over Conic Sections

Authors:

Dhekra El Hamdi, Mai K. Nguyen, Hedi Tabia and Atef Hamouda

Abstract: This paper presents a generalized Radon transform defined on conic sections called Conic Radon Transform (CRT) for image analysis. The proposed CRT extends the classical Radon transform (RT) which integrates a image function f(x,y) over straight lines. As the CRT is capable of detecting conic sections with any position and orientation in original images it makes possible to build a new descriptor based on integrating an image over conic sections. In order to test and verify the utility and performance of this new approach we have developed, in this work, the Radon transforms defined on circles and on parabolas, then built a descriptor combining the features extracted by the circular RT, parabolic RT and linear RT. This descriptor is applied to object classification. A number of experiments on both synthetic and real datasets illustrates the efficiency and the advantages of this new approach taking into account the global features of different (circular, parabolic and linear) shapes of images under study.

Paper Nr: 101
Title:

Automatic Object Segmentation on RGB-D Data using Surface Normals and Region Similarity

Authors:

Hamdi Yalin Yalic and Ahmet Burak Can

Abstract: In this study, a method for automatic object segmentation on RGB-D data is proposed. Surface normals extracted from depth data are used to determine segment candidates first. Filtering steps are applied to depth map to get a better representation of the data. After filtering, an adapted version of region growing segmentation is performed using surface normal comparisons on depth data. Extracted surface segments are then compared with their spatial color similarity and depth proximity, and finally region merging is applied to obtain object segments. The method is tested on a well-known dataset, which has some complex table-top scenes containing multiple objects. The method produces comparable segmentation results according to related works.

Paper Nr: 135
Title:

Pulmonary Lobe Segmentation in CT Images using Alpha-Expansion

Authors:

Nicola Giuliani, Christian Payer, Michael Pienn, Horst Olschewski and Martin Urschler

Abstract: Fully-automatic lung lobe segmentation in pathological lungs is still a challenging task. A new approach for automatic lung lobe segmentation is presented based on airways, vessels, fissures and prior knowledge on lobar shape. The anatomical information and prior knowledge are combined into an energy equation, which is minimized via graph cuts to yield an optimal segmentation. The algorithm is quantitatively validated on an in-house dataset of 25 scans and on the LObe and Lung Analysis 2011 (LOLA11) dataset, which contains a range of different challenging lungs (total of 55) with respect to lobe segmentation. Both experiments achieved solid results including a median absolute distance from manually set fissure markers of 1.04mm (interquartile range: 0.88-1.09mm) on the in-house dataset and a score of 0.866 on the LOLA11 dataset. We conclude that our proposed method is robust even in case of pathologies.

Paper Nr: 143
Title:

Computer Vision System for Weld Bead Analysis

Authors:

Luciane Baldassari Soares, Atila Astor Weis, Bruna de Vargas Guterres, Ricardo Nagel Rodrigues and Silvia Silva da Costa Botelho

Abstract: Welding processes are very important in different industries and requires precision and attention in the steps that will be performed. This article proposes the use of an autonomous weld bead geometric analysis system in order to verify the presence of geometric failures that may compromise the weld integrity. Using an vision system attached to a linear welding robot, images of pre-welded and post-welded metal plates are captured and compared and metrics are applied for evaluation. The proposed method uses Hidden Markov Model (HMM) to identify the weld bead edges and calculate several evaluation metrics to detect geometric failures such as misalignment, lack or excess of fusion, among others.

Paper Nr: 158
Title:

Material Classification in theWild: Do Synthesized Training Data Generalise Better than Real-world Training Data?

Authors:

Grigorios Kalliatakis, Anca Sticlaru, George Stamatiadis, Shoaib Ehsan, Ales Leonardis, Juergen Gall and Klaus D. McDonald-Maier

Abstract: We question the dominant role of real-world training images in the field of material classification by investigating whether synthesized data can generalise more effectively than real-world data. Experimental results on three challenging real-world material databases show that the best performing pre-trained convolutional neural network (CNN) architectures can achieve up to 91.03% mean average precision when classifying materials in cross-dataset scenarios. We demonstrate that synthesized data achieve an improvement on mean average precision when used as training data and in conjunction with pre-trained CNN architectures, which spans from  5% to  19% across three widely used material databases of real-world images.

Paper Nr: 174
Title:

FLASH: A New Key Structure Extraction used for Line or Crack Detection

Authors:

Yannick Faula, Stéphane Bres and Véronique Eglin

Abstract: Key structures extraction like points, short-lines or regions extraction is a big issue in computer vision. Many fields of application need large image acquisition and fast extraction of fine structures. Several methods have been proposed with different accuracies and execution times. In this study, we focus on situations where existing local feature extractors give not enough satisfying results concerning both accuracy and time processing. Especially, we focus on short-line extraction in local low-contrasted images. To this end, we propose a new Fast Local Analysis by threSHolding (FLASH) designed to process large images under hard time constraints. We apply FLASH on the field of concrete infrastructure monitoring where robots and UAVs(Unmanned Aerial Vehicles) are more and more used for automated defect detection (like cracks). For large concrete surfaces, there are several hard constraints such as the computational time and the reliability. Results show that the computations are faster than several existing algorithms without learning stage, and lead to an automated monitoring of infrastructures.

Paper Nr: 176
Title:

Feature Extraction and Pattern Recognition from Fisheye Images in the Spatial Domain

Authors:

Konstantinos K. Delibasis and Ilias Maglogiannis

Abstract: Feature extraction for pattern recognition is a very common task in image analysis and computer vision. Most of the work has been reported for images / image sequences acquired by perspective cameras. This paper discusses the algorithms for feature extraction and pattern recognition in images acquired by omnidirectional (fisheye) cameras. Work has been reported using operators in the frequency domain, which in the case of fisheye/omnidirectional images involves spherical harmonics. In this paper we review the recent literature, including relevant results from our team and state the position that features can be extracted from spherical images, by modifying the existing operators in the spatial domain, without the need to correct the image for distortions.

Paper Nr: 180
Title:

Automatic Skin Tone Extraction for Visagism Applications

Authors:

Abstract: In this paper we propose a skin tone classification system on three skin colors: dark, medium and light. We work on two methods which don’t require any camera or color calibration. The first computes color histograms in various color spaces on representative facial sliding patches that are further combined in a large feature vector. The dimensionality of this vector is reduced using Principal Component Analysis a Support Vector Machine determines the skin color of each region. The skin tone is extrapolated using a voting schema. The second method uses Convolutional Neural Networks to automatically extract chromatic features from augmented sets of facial images. Both algorithms were trained and tested on publicly available datasets. The SVM method achieves an accuracy of 86.67%, while the CNN approach obtains an accuracy of 91.29%. The proposed system is developed as an automatic analysis module in an optical visagism system where the skin tone is used in an eyewear virtual try-on software that allows users to virtually try glasses on their face using a mobile device with a camera. The system proposes only esthetically and functionally fit frames to the user, based on some facial features –skin tone included.

Paper Nr: 187
Title:

An Efficient 2D Curve Matching Algorithm under Affine Transformations

Authors:

Sinda Elghoul and Faouzi Ghorbel

Abstract: Most of the existing works on partially occluded shape recognition are suited for Euclidean transformations. As a result, the performance would be degraded in the affine and perspective transformation. This paper presents a new estimation and matching method of the 2D partially occluded recognition under affine transformation including translation, rotation, scaling, and shearing. The proposed algorithm is designed to estimate the motion between two open 2D shapes based on an affine curve matching algorithms (ACMA). This ACMA considers the normalized affine arc length coordinated to the 2D contour. Then, it will correlate them in order to minimize the L2 distance according to any planar affine transformation by means of a method based upon a pseudo-inverse matrix. Experiments are carried on the Multiview Curve Dataset (MCD). They demonstrate that our algorithm outperforms other methods proposed in the state-of-the-art.

Area 3 - Image and Video Understanding

Full Papers
Paper Nr: 6
Title:

A Search Space Strategy for Pedestrian Detection and Localization in World Coordinates

Authors:

Mikael Nilsson, Martin Ahrnbom, Håkan Ardö and Aliaksei Laureshyn

Abstract: The focus of this work is detecting pedestrians, captured in a surveillance setting, and locating them in world coordinates. Commonly adopted search strategies operate in the image plane to address the object detection problem with machine learning, for example using scale-space pyramid with the sliding windows methodology or object proposals. In contrast, here a new search space is presented, which exploits camera calibration information and geometric priors. The proposed search strategy will facilitate detectors to directly estimate pedestrian presence in world coordinates of interest. Results are demonstrated on real world outdoor collected data along a path in dim light conditions, with the goal to locate pedestrians in world coordinates. The proposed search strategy indicate a mean error under 20 cm, while image plane search methods, with additional processing adopted for localization, yielded around or above 30 cm in mean localization error. This while only observing 3-4% of patches required by the image plane searches at the same task.

Paper Nr: 8
Title:

Evaluation of Visual Object Trackers on Equirectangular Panorama

Authors:

Ugur Kart, Joni-Kristian Kämäräinen, Lixin Fan and Moncef Gabbouj

Abstract: Equirectangular (360 spherical) panorama is the most widely adopted format to store and broadcast virtual reality (VR) videos. Equirectangular projection provides a new challenge to adapt existing computer vision methods for the novel input type. In this work, we introduce a new dataset which consists of high quality equirectangular videos captured using a high-end VR camera (Nokia OZO).We also provide the original wide angle (8 195) videos and densely annotated bounding boxes for evaluating object detectors and trackers. In this work, we introduce the dataset, compare state-of-the-art trackers for object tracking in equirectangular panorama and report detailed analysis of the failure cases which reveal potential factors to improve the existing visual object trackers for the new type of input.

Paper Nr: 10
Title:

Image Restoration using Autoencoding Priors

Authors:

Siavash Arjomand Bigdeli and Matthias Zwicker

Abstract: We propose to leverage denoising autoencoder networks as priors to address image restoration problems. We build on the key observation that the output of an optimal denoising autoencoder is a local mean of the true data density, and the autoencoder error (the difference between the output and input of the trained autoencoder) is a mean shift vector. We use the magnitude of this mean shift vector, that is, the distance to the local mean, as the negative log likelihood of our natural image prior. For image restoration, we maximize the likelihood using gradient descent by backpropagating the autoencoder error. A key advantage of our approach is that we do not need to train separate networks for different image restoration tasks, such as non-blind deconvolution with different kernels, or super-resolution at different magnification factors. We demonstrate state of the art results for non-blind deconvolution and super-resolution using the same autoencoding prior.

Paper Nr: 11
Title:

Hierarchical Deformable Part Models for Heads and Tails

Authors:

Fatemeh Shokrollahi Yancheshmeh, Ke Chen and Joni-Kristian Kämäräinen

Abstract: Imbalanced long-tail distributions of visual class examples inhibit accurate visual detection, which is addressed by a novel Hierarchical Deformable Part Model (HDPM). HDPM constructs a sub-category hierarchy by alternating bootstrapping and Visual Similarity Network (VSN) based discovery of head and tail sub-categories. We experimentally evaluate HDPM and compare with other sub-category aware visual detection methods with a moderate size dataset (Pascal VOC 2007), and demonstrate its scalability to a large scale dataset (ILSVRC 2014 Detection Task). The proposed HDPM consistently achieves significant performance improvement in both experiments.

Paper Nr: 29
Title:

Head Detection with Depth Images in the Wild

Authors:

Diego Ballotta, Guido Borghi, Roberto Vezzani and Rita Cucchiara

Abstract: Head detection and localization is a demanding task and a key element for many computer vision applications, like video surveillance, Human Computer Interaction and face analysis. The stunning amount of work done for detecting faces on RGB images, together with the availability of huge face datasets, allowed to setup very effective systems on that domain. However, due to illumination issues, infrared or depth cameras may be required in real applications. In this paper, we introduce a novel method for head detection on depth images that exploits the classification ability of deep learning approaches. In addition to reduce the dependency on the external illumination, depth images implicitly embed useful information to deal with the scale of the target objects. Two public datasets have been exploited: the first one, called Pandora, is used to train a deep binary classifier with face and non-face images. The second one, collected by Cornell University, is used to perform a cross-dataset test during daily activities in unconstrained environments. Experimental results show that the proposed method overcomes the performance of state-of-art methods working on depth images.

Paper Nr: 46
Title:

Learning Transformation Invariant Representations with Weak Supervision

Authors:

Benjamin Coors, Alexandru Condurache, Alfred Mertins and Andreas Geiger

Abstract: Deep convolutional neural networks are the current state-of-the-art solution to many computer vision tasks. However, their ability to handle large global and local image transformations is limited. Consequently, extensive data augmentation is often utilized to incorporate prior knowledge about desired invariances to geometric transformations such as rotations or scale changes. In this work, we combine data augmentation with an unsupervised loss which enforces similarity between the predictions of augmented copies of an input sample. Our loss acts as an effective regularizer which facilitates the learning of transformation invariant representations. We investigate the effectiveness of the proposed similarity loss on rotated MNIST and the German Traffic Sign Recognition Benchmark (GTSRB) in the context of different classification models including ladder networks. Our experiments demonstrate improvements with respect to the standard data augmentation approach for supervised and semi-supervised learning tasks, in particular in the presence of little annotated data. In addition, we analyze the performance of the proposed approach with respect to its hyperparameters, including the strength of the regularization as well as the layer where representation similarity is enforced.

Paper Nr: 51
Title:

A Survey on Databases for Facial Expression Analysis

Authors:

Raphaël Weber, Catherine Soladié and Renaud Séguier

Abstract: Facial expression databases are essential to develop and test a system of facial expressions analysis. We propose in this paper a survey based on the review of 61 databases. To the best of our knowledge, there are no other surveys with so many databases. We identify 18 characteristics to describe the database and group them in 6 categories, (population, modalities, data acquisition hardware, experimental conditions, experimental protocol and annotations). These characteristics are useful to create or choose a database relevant to the targeted context application. We propose to classify the databases according to these characteristics so it can be helpful for researchers to choose the database suited to their context application. We bring to light the trends between posed, spontaneous and in-the-wild databases. We finish with future directions, including crowd sourcing and databases with groups of people.

Paper Nr: 72
Title:

Fine-Grained Retrieval with Autoencoders

Authors:

Tiziano Portenier, Qiyang Hu, Paolo Favaro and Matthias Zwicker

Abstract: In this paper we develop a representation for fine-grained retrieval. Given a query, we want to retrieve data items of the same class, and, in addition, rank these items according to intra-class similarity. In our training data we assume partial knowledge: class labels are available, but the intra-class attributes are not. To compensate for this knowledge gap we propose using an autoencoder, which can be trained to produce features both with and without labels. Our main hypothesis is that network architectures that incorporate an autoencoder can learn features that meaningfully cluster data based on the intra-class variability. We propose and compare different architectures to construct our features, including a Siamese autoencoder (SAE), a classifying autoencoder (CAE) and a separate classifier-autoencoder (SCA). We find that these architectures indeed improve fine-grained retrieval compared to features trained purely in a supervised fashion for classification. We perform experiments on four datasets, and observe that the SCA generally outperforms the other two. In particular, we obtain state of the art performance on fine-grained sketch retrieval.

Paper Nr: 109
Title:

Empirical Evaluation of Variational Autoencoders for Data Augmentation

Authors:

Javier Jorge, Jesús Vieco, Roberto Paredes, Joan Andreu Sanchez and José Miguel Benedí

Abstract: Since the beginning of Neural Networks, different mechanisms have been required to provide a sufficient number of examples to avoid overfitting. Data augmentation, the most common one, is focused on the generation of new instances performing different distortions in the real samples. Usually, these transformations are problem-dependent, and they result in a synthetic set of, likely, unseen examples. In this work, we have studied a generative model, based on the paradigm of encoder-decoder, that works directly in the data space, that is, with images. This model encodes the input in a latent space where different transformations will be applied. After completing this, we can reconstruct the latent vectors to get new samples. We have analysed various procedures according to the distortions that we could carry out, as well as the effectiveness of this process to improve the accuracy of different classification systems. To do this, we could use both the latent space and the original space after reconstructing the altered version of these vectors. Our results have shown that using this pipeline (encoding-altering-decoding) helps the generalisation of the classifiers that have been selected.

Paper Nr: 119
Title:

Cross-context Analysis for Long-term View-point Invariant Person Re-identification via Soft-biometrics using Depth Sensor

Authors:

Athira Nambiar, Alexandre Bernardino and Jacinto. C. Nascimento

Abstract: We propose a novel methodology for cross-context analysis in person re-identification using 3D features acquired from consumer grade depth sensors. Such features, although theoretically invariant to perspective changes, are nevertheless immersed in noise that depends on the view point, mainly due to the low depth resolution of these sensors and imperfections in skeleton reconstruction algorithms. Thus, the re-identification of persons observed on different poses requires the analysis of the features that transfer well its characteristics between view-points. Taking view-point as context, we propose a cross-context methodology to improve the re-identification of persons on different view-points. On the contrary to 2D cross-view re-identification methods, our approach is based on 3D features that do not require an explicit mapping between view-points, but nevertheless take advantage of feature selection methods that improve the re-identification accuracy.

Paper Nr: 127
Title:

Pedestrian Attribute Recognition with Part-based CNN and Combined Feature Representations

Authors:

Yiqiang Chen, Stefan Duffner, Andrei Stoian, Jean-Yves Dufour and Atilla Baskurt

Abstract: In video surveillance, pedestrian attributes such as gender, clothing or hair types are useful cues to identify people. The main challenge in pedestrian attribute recognition is the large variation of visual appearance and location of attributes due to different poses and camera views. In this paper, we propose a neural network combining high-level learnt Convolutional Neural Network (CNN) features and low-level handcrafted features to address the problem of highly varying viewpoints. We first extract low-level robust Local Maximal Occurrence (LOMO) features and learn a body part-specific CNN to model attribute patterns related to different body parts. For small datasets which have few data, we propose a new learning strategy, where the CNN is pre-trained in a triplet structure on a person re-identification task and then fine-tuned on attribute recognition. Finally, we fuse the two feature representations to recognise pedestrian attributes. Our approach achieves state-of-the-art results on three public pedestrian attribute datasets.

Paper Nr: 153
Title:

Combined Framework for Real-time Head Pose Estimation using Facial Landmark Detection and Salient Feature Tracking

Authors:

Jilliam María Díaz Barros, Frederic Garcia, Bruno Mirbach, Kiran Varanasi and Didier Stricker

Abstract: This paper presents a novel approach to address the head pose estimation (HPE) problem in real world and demanding applications. We propose a new framework that combines the detection of facial landmarks with the tracking of salient features within the head region. That is, rigid facial landmarks are detected from a given face image, while at the same time, salient features are detected within the head region. The 3D coordinates of both set of features result from their intersection on a simple geometric head model (e.g., cylinder or ellipsoid). We then formulate the HPE problem as a perspective-n-point problem that we separately solve by minimizing the reprojection error of each 3D features set and their corresponding facial or salient features in the next face image. The resulting head pose estimations are then combined using Kalman Filter, which allows us to take advantage of the high accuracy when using facial landmarks while enabling us to handle extreme head poses by using salient features. Results are comparable to those from the related literature, with the advantage of being robust under real world situations that might not be covered in the evaluated datasets.

Short Papers
Paper Nr: 20
Title:

Improving Bag-of-Visual-Words Towards Effective Facial Expressive Image Classification

Authors:

Dawood Al Chanti and Alice Caplier

Abstract: Bag-of-Visual-Words (BoVW) approach has been widely used in the recent years for image classification purposes. However, the limitations regarding optimal feature selection, clustering technique, the lack of spatial organization of the data and the weighting of visual words are crucial. These factors affect the stability of the model and reduce performance. We propose to develop an algorithm based on BoVW for facial expression analysis which goes beyond those limitations. Thus the visual codebook is built by using k-Means++ method to avoid poor clustering. To exploit reliable low level features, we search for the best feature detector that avoids locating a large number of keypoints which do not contribute to the classification process. Then, we propose to compute the relative conjunction matrix in order to preserve the spatial order of the data by coding the relationships among visual words. In addition, a weighting scheme that reflects how important a visual word is with respect to a given image is introduced. We speed up the learning process by using histogram intersection kernel by Support Vector Machine to learn a discriminative classifier. The efficiency of the proposed algorithm is compared with standard bag of visual words method and with bag of visual words method with spatial pyramid. Extensive experiments on the CK+, the MMI and the JAFFE databases show good average recognition rates. Likewise, the ability to recognize spontaneous and non-basic expressive states is investigated using the DynEmo database.

Paper Nr: 31
Title:

Ship Detection in Harbour Surveillance based on Large-Scale Data and CNNs

Authors:

Matthijs H. Zwemer, Rob G. J. Wijnhoven and Peter H. N. de With

Abstract: This paper aims at developing a real-time vessel detection and tracking system using surveillance cameras in harbours with the purpose to improve the current Vessel Tracking Systems (VTS) performance. To this end, we introduce a novel maritime dataset, containing 70,513 ships in 48,966 images, covering 10 camera viewpoints indicating real-life ship traffic situations. For detection, a Convolutional Neural Network (CNN) detector is trained, based on the Single Shot Detector (SSD) from literature. This detector is modified and enhanced to support the detection of extreme variations of ship sizes and aspect ratios. The modified SSD detector offers a high detection performance, which is based on explicitly exploiting the aspect-ratio characteristics of the dataset. The performance of the original SSD detector trained on generic object detection datasets (including ships) is significantly lower, showing the added value of a novel surveillance dataset for ships. Due to the robust performance of over 90% detection, the system is able to accurately detect all types of vessels. Hence, the system is considered a suitable complement to conventional radar detection, leading to a better operational picture for the harbour authorities.

Paper Nr: 33
Title:

The Discriminative Generalized Hough Transform as a Proposal Generator for a Deep Network in Automatic Pedestrian Localization

Authors:

Eric Gabriel, Hauke Schramm and Carsten Meyer

Abstract: Pedestrian detection is one of the most essential and still challenging tasks in computer vision. Among traditional feature- or model-based techniques (e.g., histograms of oriented gradients, deformable part models etc.), deep convolutional networks have recently been applied and significantly advanced the state-of-the-art. While earlier versions (e.g., Fast-RCNN) rely on an explicit proposal generation step, this has been integrated into the deep network pipeline in recent approaches. It is, however, not fully clear if this yields the most efficient way to handle large ranges of object variability (e.g., object size), especially if the amount of training data covering the variability range is limited. We propose an efficient pedestrian detection framework consisting of a proposal generation step based on the Discriminative Generalized Hough Transform and a rejection step based on a deep convolutional network. With a few hundred proposals per (2D) image, our framework achieves state-of-the-art performance compared to traditional approaches on several investigated databases. In this work, we analyze in detail the impact of different components of our framework.

Paper Nr: 36
Title:

Evaluating Method Design Options for Action Classification based on Bags of Visual Words

Authors:

Victoria Manousaki, Konstantinos Papoutsakis and Antonis Argyros

Abstract: The Bags of Visua lWords (BoVWs) framework has been applied successfully to several computer vision tasks. In this work we are particularly interested on its application to the problem of action recognition/classification. The key design decisions for a method that follows the BoVWs framework are (a) the visual features to be employed, (b) the size of the codebook to be used for representing a certain action and (c) the classifier applied to the developed representation to solve the classification task. We perform several experiments to investigate a variety of options regarding all the aforementioned design parameters. We also propose a new feature type and we suggest a method that determines automatically the size of the codebook. The experimental results show that our proposals produce results that are competitive to the outcomes of state of the art methods.

Paper Nr: 38
Title:

Leveraging the Spatial Label Structure for Semantic Image Labeling using Random Forests

Authors:

Manuel Wöllhaf, Ronny Hänsch and Olaf Hellwich

Abstract: Data used to train models for semantic segmentation have the same spatial structure as the image data, are mostly densely labeled, and thus contain contextual information such as class geometry and cooccurrence. We aim to exploit this information for structured prediction. Multiple structured label spaces, representing different aspects of context information, are introduced and integrated into the Random Forest framework. The main advantage are structural subclasses which carry information about the context of a data point. The output of the applied classification forest is a decomposable posterior probability distribution, which allows substituting the prior by information carried by these subclasses. The experimental evaluation shows results superior to standard Random Forests as well as a related method of structured prediction.

Paper Nr: 44
Title:

Micro Expression Detection and Recognition from High Speed Cameras using Convolutional Neural Networks

Authors:

Diana Borza, Razvan Itu and Radu Danescu

Abstract: In this paper, we propose a micro-expression detection and recognition framework based on convolutional neural networks. This paper presents the following contributions: the relevant features are learned by a convolutional neural network that uses as input difference images of three equally spaced frames from the video sequence, capturing important motion information. Next, a sliding time window is used to iterate through the video sequence and the output of the network in order to eliminate false positives. The method was trained using images from two publicly available micro-expression databases. The effectiveness of the proposed solution is demonstrated by the experiments we performed, from which a recognition rate of 72.22% was obtained.

Paper Nr: 53
Title:

Building Robust Industrial Applicable Object Detection Models using Transfer Learning and Single Pass Deep Learning Architectures

Authors:

Steven Puttemans, Timothy Callemein and Toon Goedemé

Abstract: The uprising trend of deep learning in computer vision and artificial intelligence can simply not be ignored. On the most diverse tasks, from recognition and detection to segmentation, deep learning is able to obtain state-of-the-art results, reaching top notch performance. In this paper we explore how deep convolutional neural networks dedicated to the task of object detection can improve our industrial-oriented object detection pipelines, using state-of-the-art open source deep learning frameworks, like Darknet. By using a deep learning architecture that integrates region proposals, classification and probability estimation in a single run, we aim at obtaining real-time performance. We focus on reducing the needed amount of training data drastically by exploring transfer learning, while still maintaining a high average precision. Furthermore we apply these algorithms to two industrially relevant applications, one being the detection of promotion boards in eye tracking data and the other detecting and recognizing packages of warehouse products for augmented advertisements.

Paper Nr: 55
Title:

Context Dependent Action Affordances and their Execution using an Ontology of Actions and 3D Geometric Reasoning

Authors:

Simon Reich, Mohamad Aein and Florentin Wörgötter

Abstract: When looking at an object humans can quickly and efficiently assess which actions are possible given the scene context. This task remains hard for machines. Here we focus on manipulation actions and in the first part of this study define an object-action linked ontology for such context dependent affordance analysis. We break down every action into three hierarchical pre-condition layers starting on top with abstract object relations (which need to be fulfilled) and in three steps arriving at the movement primitives required to execute the action. This ontology will then, in the second part of this work, be linked to actual scenes. First the system looks at the scene and for any selected object suggests some actions. One will be chosen and, we use now a simple geometrical reasoning scheme by which this action’s movement primitives will be filled with the specific parameter values, which are then executed by the robot. The viability of this approach will be demonstrated by analysing several scenes and a large number of manipulations.

Paper Nr: 64
Title:

Comparing Boosted Cascades to Deep Learning Architectures for Fast and Robust Coconut Tree Detection in Aerial Images

Authors:

Steven Puttemans, Kristof Van Beeck and Toon Goedemé

Abstract: Object detection using a boosted cascade of weak classifiers is a principle that has been used in a variety of applications, ranging from pedestrian detection to fruit counting in orchards, and this with a high average precision. In this work we prove that using both the boosted cascade approach suggest by Viola & Jones and the adapted approach based on integral or aggregate channels by Dollár yield promising results on coconut tree detection in aerial images. However with the rise of robust deep learning architectures for both detection and classification, and the significant drop in hardware costs, we wonder if it is feasible to apply deep learning to solve the task of fast and robust coconut tree detection and classification in aerial imagery. We examine both classification- and detection-based architectures for this task. By doing so we prove that deep learning is indeed a feasible alternative for robust coconut tree detection with a high average precision in aerial imagery, keeping attention to known issues with the selected architectures.

Paper Nr: 69
Title:

Automatic Query Image Disambiguation for Content-based Image Retrieval

Authors:

Björn Barz and Joachim Denzler

Abstract: Query images presented to content-based image retrieval systems often have various different interpretations, making it difficult to identify the search objective pursued by the user. We propose a technique for overcoming this ambiguity, while keeping the amount of required user interaction at a minimum. To achieve this, the neighborhood of the query image is divided into coherent clusters from which the user may choose the relevant ones. A novel feedback integration technique is then employed to re-rank the entire database with regard to both the user feedback and the original query. We evaluate our approach on the publicly available MIRFLICKR-25K dataset, where it leads to a relative improvement of average precision by 23% over the baseline retrieval, which does not distinguish between different image senses.

Paper Nr: 76
Title:

Superpixel-based Road Segmentation for Real-time Systems using CNN

Authors:

Farnoush Zohourian, Borislav Antic, Jan Siegemund, Mirko Meuter and Josef Pauli

Abstract: Convolutional Neural Networks (CNN) contributed considerable improvements for image segmentation tasks in the field of computer vision. Despite their success, an inherent challenge is the trade-off between accuracy and computational cost. The high computational efforts for large networks operating on the image’s pixel grid makes them ineligible for many real time applications such as various Advanced Driver Assistance Systems (ADAS). In this work, we propose a novel CNN approach, based on the combination of super-pixels and high dimensional feature channels applied for road segmentation. The core idea is to reduce the computational complexity by segmenting the image into homogeneous regions (superpixels) and feed image descriptors extracted from these regions into a CNN rather than working on the pixel grid directly. To enable the necessary convolutional operations on the irregular arranged superpixels, we introduce a lattice projection scheme as part of the superpixel creation method, which composes neighbourhood relations and forces the topology to stay fixed during the segmentation process. Reducing the input to the superpixel domain allows the CNN’s structure to stay small and efficient to compute while keeping the advantage of convolutional layers. The method is generic and can be easily generalized for segmentation tasks other than road segmentation.

Paper Nr: 81
Title:

Co-occurrence Background Model with Hypothesis on Degradation Modification for Robust Object Detection

Authors:

Wenjun Zhou, Shun’ichi Kaneko, Manabu Hashimoto, Yutaka Satoh and Dong Liang

Abstract: This paper presents a prospective background model for robust object detection in severe scenes. This background model using a novel algorithm, Co-occurrence Pixel-block Pairs (CPB), that extracts the spatiotemporal information of pixels from background and identifies the state of pixels at current frame. First, CPB realizes a robust background model for each pixel with spatiotemporal information based on a “pixel to block” structure. And then, CPB employs an efficient evaluation strategy to detect foreground sensitively, which is named as correlation dependent decision function. On the basis of this, a Hypothesis on Degradation Modification (HoD) for CPB is introduced to adapt dynamic changes in scenes and reinforce robustness of CPB to against “noise” in real conditions. This proposed model is robust to extract foreground against changes, such as illumination changes and background motion. Experimental results in different challenging datasets prove that our model has good effect for object detection.

Paper Nr: 88
Title:

Open Set Logo Detection and Retrieval

Authors:

Andras Tüzkö, Christian Herrmann, Daniel Manger and Jürgen Beyerer

Abstract: Current logo retrieval research focuses on closed set scenarios. We argue that the logo domain is too large for this strategy and requires an open set approach. To foster research in this direction, a large-scale logo dataset, called Logos in the Wild, is collected and released to the public. A typical open set logo retrieval application is, for example, assessing the effectiveness of advertisement in sports event broadcasts. Given a query sample in shape of a logo image, the task is to find all further occurrences of this logo in a set of images or videos. Currently, common logo retrieval approaches are unsuitable for this task because of their closed world assumption. Thus, an open set logo retrieval method is proposed in this work which allows searching for previously unseen logos by a single query sample. A two stage concept with separate logo detection and comparison is proposed where both modules are based on task specific CNNs. If trained with the Logos in the Wild data, significant performance improvements are observed, especially compared with state-of-the-art closed set approaches.

Paper Nr: 94
Title:

Statistical Measures from Co-occurrence of Codewords for Action Recognition

Authors:

Carlos Caetano, Jefersson A. dos Santos and William Robson Schwartz

Abstract: In this paper, we propose a novel spatiotemporal feature representation based on co-occurrence matrices of codewords, called Co-occurrence of Codewords (CCW), to tackle human action recognition, a significant problem for many real-world applications, such as surveillance, video retrieval and health care. The method captures local relationships among the codewords (densely sampled), through the computation of a set of statistical measures known as Haralick textural features. We apply a classical visual recognition pipeline in which involves the extraction of spatiotemporal features and SVM classification. We investigate the proposed representation in three well-known and publicly available datasets for action recognition (KTH, UCF Sports and HMDB51) and show that it outperforms the results achieved by several widely employed spatiotemporal features available in the literature encoded by a Bag-of-Words model with a more compact representation.

Paper Nr: 106
Title:

Combined Correlation Rules to Detect Skin based on Dynamic Color Clustering

Authors:

Rodrigo Augusto Dias Faria and Roberto Hirata Jr.

Abstract: Skin detection plays an important role in a wide range of image processing and computer vision applications. In short, there are three major approaches for skin detection: rule-based, machine learning and hybrid. They differ in terms of accuracy and computational efficiency. Generally, machine learning and hybrid approaches outperform the rule-based methods, but require a large and representative training dataset as well as costly classification time, which can be a deal breaker for real time applications. In this paper, we propose an improvement of a novel method on rule-based skin detection that works in the YCbCr color space. Our motivation is based on the hypothesis that: (1) the original rule can be reversed and, (2) human skin pixels do not appear isolated, i.e. neighborhood operations are taken in consideration. The method is a combination of some correlation rules based on these hypothesis. Such rules evaluate the combinations of chrominance Cb, Cr values to identify the skin pixels depending on the shape and size of dynamically generated skin color clusters. The method is very efficient in terms of computational effort as well as robust in very complex image scenes.

Paper Nr: 137
Title:

Real-time Human Pose Estimation with Convolutional Neural Networks

Authors:

Marko Linna, Juho Kannala and Esa Rahtu

Abstract: In this paper, we present a method for real-time multi-person human pose estimation from video by utilizing convolutional neural networks. Our method is aimed for use case specific applications, where good accuracy is essential and variation of the background and poses is limited. This enables us to use a generic network architecture, which is both accurate and fast. We divide the problem into two phases: (1) pre-training and (2) finetuning. In pre-training, the network is learned with highly diverse input data from publicly available datasets, while in finetuning we train with application specific data, which we record with Kinect. Our method differs from most of the state-of-the-art methods in that we consider the whole system, including person detector, pose estimator and an automatic way to record application specific training material for finetuning. Our method is considerably faster than many of the state-of-the-art methods. Our method can be thought of as a replacement for Kinect in restricted environments. It can be used for tasks, such as gesture control, games, person tracking, action recognition and action tracking. We achieved accuracy of 96.8% (PCK@0.2) with application specific data.

Paper Nr: 142
Title:

CovP3DJ: Skeleton-parts-based-covariance Descriptor for Human Action Recognition

Authors:

Hany A. El-Ghaish, Amin Shoukry and Mohamed E. Hussein

Abstract: A highly discriminative and computationally efficient descriptor is needed in many computer vision applications involving human action recognition. This paper proposes a hand-crafted skeleton-based descriptor for human action recognition. It is constructed from five fixed size covariance matrices calculated using strongly related joints coordinates over five body parts (spine, left/ right arms, and left/ right legs). Since covariance matrices are symmetric, the lower/ upper triangular parts of these matrices are concatenated to generate an efficient descriptor. It achieves a saving from 78.26 % to 80.35 % in storage space and from 75 % to 90 % in processing time (depending on the dataset) relative to techniques adopting a covariance descriptor based on all the skeleton joints. To show the effectiveness of the proposed method, its performance is evaluated on five public datasets: MSR-Action3D, MSRC-12 Kinect Gesture, UTKinect-Action, Florence3D-Action, and NTU RGB+D. The obtained recognition rates on all datasets outperform many existing methods and compete with the current state of the art techniques.

Paper Nr: 157
Title:

Towards Large-scale Image Retrieval with a Disk-only Index

Authors:

Daniel Manger, Dieter Willersinn and Jürgen Beyerer

Abstract: Facing ever-growing image databases, the focus of research in content-based image retrieval, where a query image is used to search for those images in a large database that show the same object or scene, has shifted in the last decade. Instead of using local features such as SIFT together with quantization and inverted file indexing schemes, models working with global features and exhaustive search have been proposed to encounter limited main memory and increasing query times. This, however, impairs the capability to find small objects in images with cluttered background. In this paper, we argue, that it is worth reconsidering image retrieval with local features because since then, two crucial ingredients became available: large solid-state disks providing dramatically shorter access times, and more discriminative models enhancing the local features, for example, by encoding their spatial neighborhood using features from convolutional neural networks resulting in way fewer random read memory accesses. We show that properly combining both insights renders it possible to keep the index of the database images on the disk rather than in the main memory which allows even larger databases on today’s hardware. As proof of concept we support our arguments with experiments on established public datasets for large-scale image retrieval.

Paper Nr: 164
Title:

An Approach for Skeleton Fitting in Long-Wavelength Infrared Images - First Results for a Robust Head Localisation using Probability Masks

Authors:

Julia Richter, Christian Wiede and Gangolf Hirtz

Abstract: Human skeleton extraction has become a key instrument for motion analysis in the fields of surveillance, entertainment and medical diagnostics. While a vast amount of research has been carried out on skeleton extraction using RGB and depth images, far too little attention has been paid to extraction methods using long-wavelength infrared images. This paper provides an overview about existing approaches and explores their limitations. So far, extant studies have exploited thermal data only for silhouette generation as a preprocessing step. Moreover, they make strong assumptions, such as T-pose initialization. On this basis, we are developing an algorithm to fit the joints of a skeleton model into thermal images without such restrictions. We propose to find the head location as an initial step by using probability masks. These masks are designed to allow a robust head localisation in unrestricted settings. For the future algorithm design, we plan to localise the remaining skeleton joints by means of geometrical constraints. At this point, we will also consider sequences where persons wear thick clothes, which is aggravating the extraction procedure. This paper presents the current state of this project and outlines further approaches that have to be investigated to extract the complete skeleton.

Paper Nr: 167
Title:

Anticipating Suspicious Actions using a Small Dataset of Action Templates

Authors:

Renato Baptista, Michel Antunes, Djamila Aouada and Björn Ottersten

Abstract: In this paper, we propose to detect an action as soon as possible and ideally before it is fully completed. The objective is to support the monitoring of surveillance videos for preventing criminal or terrorist attacks. For such a scenario, it is of importance to have not only high detection and recognition rates but also low time latency for the detection. Our solution consists in an adaptive sliding window approach in an online manner, which efficiently rejects irrelevant data. Furthermore, we exploit both spatial and temporal information by constructing feature vectors based on temporal blocks. For an added efficiency, only partial template actions are considered for the detection. The relationship between the template size and latency is experimentally evaluated. We show promising preliminary experimental results using Motion Capture data with a skeleton representation of the human body.

Paper Nr: 186
Title:

Recovering 3D Human Poses and Camera Motions from Deep Sequence

Authors:

Takashi Shimizu, Fumihiko Sakaue and Jun Sato

Abstract: In this paper, we propose a novel method for recovering 3D human poses and camera motions from sequential images by using CNN and LSTM. The human pose estimation from deep learning has been studied extensively in recent years. However, the existing methods aim to classify 2D human motions in images. Although some methods have been proposed for recovering 3D human poses recently, they only considered single frame poses, and sequential properties of human actions were not used efficiently. Furthermore, the existing methods recover only 3D poses relative to the viewpoints. In this paper, we propose a method for recovering 3D human poses and 3D camera motions simultaneously from sequential input images. In our network, CNN is combined with LSTM, so that the proposed network can learn sequential properties of 3D human poses and camera motions efficiently. The efficiency of the proposed method is evaluated by using real images as well as synthetic images.

Paper Nr: 193
Title:

ScaleNet: Scale Invariant Network for Semantic Segmentation in Urban Driving Scenes

Authors:

Mohammad Dawud Ansari, Stephan Krauß, Oliver Wasenmüller and Didier Stricker

Abstract: The scale difference in driving scenarios is one of the essential challenges in semantic scene segmentation. Close objects cover significantly more pixels than far objects. In this paper, we address this challenge with a scale invariant architecture. Within this architecture, we explicitly estimate the depth and adapt the pooling field size accordingly. Our model is compact and can be extended easily to other research domains. Finally, the accuracy of our approach is comparable to the state-of-the-art and superior for scale problems. We evaluate on the widely used automotive dataset Cityscapes as well as a self-recorded dataset.

Paper Nr: 13
Title:

CRN: End-to-end Convolutional Recurrent Network Structure Applied to Vehicle Classification

Authors:

Mohamed Ilyes Lakhal, Sergio Escalera and Hakan Cevikalp

Abstract: Vehicle type classification is considered to be a central part of Intelligent Traffic Systems. In the recent years, deep learning methods have emerged in as being the state-of-the-art in many computer vision tasks. In this paper, we present a novel yet simple deep learning framework for the vehicle type classification problem. We propose an end-to-end trainable system, that combines convolution neural network for feature extraction and recurrent neural network as a classifier. The recurrent network structure is used to handle various types of feature inputs, and at the same time allows to produce a single or a set of class predictions. In order to assess the effectiveness of our solution, we have conducted a set of experiments in two public datasets, obtaining state of the art results. In addition, we also report results on the newly released MIO-TCD dataset.

Paper Nr: 32
Title:

3D Point Cloud Descriptor for Posture Recognition

Authors:

Margarita Khokhlova, Cyrille Migniot and Albert Dipanda

Abstract: This paper introduces a simple yet powerful algorithm for global human posture description based on 3D Point Cloud data. The proposed algorithm preserves spatial contextual information about a 3D object in a video sequence and can be used as an intermediate step in human-motion related Computer Vision applications such as action recognition, gait analysis, human-computer interaction. The proposed descriptor captures a point cloud structure by means of a modified 3D regular grid and a corresponding cells space occupancy information. The performance of our method was evaluated on the task of posture recognition and automatic action segmentation.

Paper Nr: 34
Title:

Simultaneous Object Classification and Viewpoint Estimation using Deep Multi-task Convolutional Neural Network

Authors:

Ahmed J. Afifi, Olaf Hellwich and Toufique A. Soomro

Abstract: Convolutional Neural Networks (CNNs) have shown an impressive performance in many computer vision tasks. Most of the CNN architectures were proposed to solve a single task. This paper proposes a CNN model to tackle the problem of object classification and viewpoint estimation simultaneously, where these problems are opposite in terms of feature representation. While object classification task aims to learn viewpoint invariant features, viewpoint estimation task requires features that capture the variations of the viewpoint for the same object. This study addresses this problem by introducing a multi-task CNN architecture that performs object classification and viewpoint estimation simultaneously. The first part of the CNN is shared between the two tasks, and the second part is two subnetworks to solve each task separately. Synthetic images are used to increase the training dataset to train the proposed model. To evaluate our model, PASCAL3D+ dataset is used to test our proposed model, as it is a challenging dataset for object detection and viewpoint estimation. According to the results, the proposed model performs as a multi-task model, where we can exploit the shared layers to feed their features for different tasks. Moreover, 3D models can be used to render images in different conditions to solve the lack of training data and to enhance the training of the CNNs.

Paper Nr: 67
Title:

GPU Accelerated ACF Detector

Authors:

Wiebe Van Ranst, Floris De Smedt and Toon Goedemé

Abstract: The field of pedestrian detection has come a long way in recent decades. In terms of accuracy, the current state-of-the-art is hands down reached by Deep Learning methods. However in terms of running speed this is not always the case, traditional methods are often still faster than their Deep Learning counterparts. This is especially true on embedded hardware, embedded platforms are often used in applications that require realtime performance while at same the time having to make do with a limited amount of resources. In this paper we present a GPU implementation of the ACF pedestrian detector and compare it to current Deep Learning approaches (YOLO) on both a desktop GPU as well as the Jetson TX2 embedded GPU platform.

Paper Nr: 84
Title:

One-class Selective Transfer Machine for Personalized Anomalous Facial Expression Detection

Authors:

Hirofumi Fujita, Tetsu Matsukawa and Einoshin Suzuki

Abstract: An anomalous facial expression is a facial expression which scarcely occurs in daily life and coveys cues about an anomalous physical or mental condition. In this paper, we propose a one-class transfer learning method for detecting the anomalous facial expressions. In facial expression detection, most articles propose generic models which predict the classes of the samples for all persons. However, people vary in facial morphology, e.g., thick versus thin eyebrows, and such individual differences often cause prediction errors. While a possible solution would be to learn a single-task classifier from samples of the target person only, it will often overfit due to the small sample size of the target person in real applications. To handle individual differences in anomaly detection, we extend Selective Transfer Machine (STM) (Chu et al., 2013), which learns a personalized multi-class classifier by re-weighting samples based on their proximity to the target samples. In contrast to related methods for personalized models on facial expressions, including STM, our method learns a one-class classifier which requires only one-class target and source samples, i.e., normal samples, and thus there is no need to collect anomalous samples which scarcely occur. Experiments on a public dataset show that our method outperforms generic and single-task models using one-class SVM, and a state-of-the-art multi-task learning method.

Paper Nr: 92
Title:

Novel Anomalous Event Detection based on Human-object Interactions

Authors:

Rensso Mora Colque, Carlos Caetano, Victor C. de Melo, Guillermo Camara Chavez and William Robson Schwartz

Abstract: This study proposes a novel approach to anomalous event detection that collects information from a specific context and is flexible enough to work in different scenes (i.e., the camera does need to be at the same location or in the same scene for the learning and test stages of anomaly event detection), making our approach able to learn normal patterns (i.e., patterns that do not entail an anomaly) from one scene and be employed in another as long as it is within the same context. For instance, our approach can learn the normal behavior for a context such the office environment by \emph{watching} a particular office, and then it can monitor the behavior in another office, without being constrained to aspects such as camera location, optical flow or trajectories, as required by the current works. Our paradigm shift anomalous event detection approach exploits human-object interactions to learn normal behavior patterns from a specific context. Such patterns are used afterwards to detect anomalous events in a different scene. The proof of concept shown in the experimental results demonstrate the viability of two strategies that exploit this novel paradigm to perform anomaly detection.

Paper Nr: 112
Title:

Deep Learning for 3D Shape Classification based on Volumetric Density and Surface Approximation Clues

Authors:

Ludovico Minto, Pietro Zanuttigh and Giampaolo Pagnutti

Abstract: This paper proposes a novel approach for the classification of 3D shapes exploiting surface and volumetric clues inside a deep learning framework. The proposed algorithm uses three different data representations. The first is a set of depth maps obtained by rendering the 3D object. The second is a novel volumetric representation obtained by counting the number of filled voxels along each direction. Finally NURBS surfaces are fitted over the 3D object and surface curvature parameters are selected as the third representation. All the three data representations are fed to a multi-branch Convolutional Neural Network. Each branch processes a different data source and produces a feature vector by using convolutional layers of progressively reduced resolution. The extracted feature vectors are fed to a linear classifier that combines the outputs in order to get the final predictions. Experimental results on the ModelNet dataset show that the proposed approach is able to obtain a state-of-the-art performance.

Paper Nr: 120
Title:

A Hybrid Pedestrian Detection System based on Visible Images and LIDAR Data

Authors:

Mohamed El Ansari, Redouan Lahmyed and Alain Tremeau

Abstract: This paper presents a hybrid pedestrian detection system on the basis of 3D LIDAR data and visible images of the same scene. The proposed method consists of two main stages. In the first stage, the 3D LIDAR data are classified to obtain a set of clusters, which will be mapped into the visible image to get regions of interests (ROIs). The second stage classifies the ROIs (pedestrian/non pedestrian) using SVM as classifier and color based histogram of oriented gradients (HOG) together with the local self-similarity (LSS) as features. The proposed method has been tested on LIPD dataset and the results demonstrate its effectiveness.

Paper Nr: 144
Title:

Image Quality-aware Deep Networks Ensemble for Efficient Gender Recognition in the Wild

Authors:

Mohamed Selim, Suraj Sundararajan, Alain Pagani and Didier Stricker

Abstract: Gender recognition is an important task in the field of facial image analysis. Gender can be detected using different visual cues, for example gait, physical appearance, and most importantly, the face. Deep learning has been dominating many classification tasks in the past few years. Gender classification is a binary classification problem, usually addressed using the facial image. In this work, we present a deep and compact CNN (GenderCNN) to estimate the gender from a facial image. We also, tackle the illumination and blurriness that appear in still images and appear more in videos. We use Adaptive Gamma Correction (AGC) to enhance the contrast and thus, get more details from the facial image. We use AGC as a pre-processing step in gender classification in still images. In videos, we propose a pipeline that quantifies the blurriness of an image using a blurriness metric (EMBM), and feeds it to its corresponding GenderCNN that was trained on faces with similar blurriness. We evaluated our proposed methods on challenging, large, and publicly available datasets, CelebA, IMDB-WIKI still images datasets and on McGill, and Point and Shoot Challenging (PaSC) videos datasets. Experiments show that we outperform or in some cases match the state of the art methods.

Paper Nr: 148
Title:

Evaluation of Transfer Learning Scenarios in Plankton Image Classification

Authors:

Francisco Caio Maia Rodrigues, Nina S. T. Hirata, Antonio A. Abello, Leandro T. De La Cruz, Rubens M. Lopes and R. Hirata Jr.

Abstract: Automated in situ plankton image classification is a challenging task. To take advantage of recent progress in machine learning techniques, a large amount of labeled data is necessary. However, beyond being time consuming, labeling is a task that may require frequent redoing due to variations in plankton population as well as image characteristics. Transfer learning, which is a machine learning technique concerned with transferring knowledge obtained in some data domain to a second distinct data domain, appears as a potential approach to be employed in this scenario. We use convolutional neural networks, trained on publicly available distinct datasets, to extract features from our plankton image data and then train SVM classifiers to perform the classification. Results show evidences that indicate the effectiveness of transfer learning in real plankton image classification situations

Paper Nr: 168
Title:

To Know and To Learn - About the Integration of Knowledge Representation and Deep Learning for Fine-Grained Visual Categorization

Authors:

Francesco Setti

Abstract: Fine-grained visual categorization is becoming a very popular topic for computer vision community in the last few years. While deep convolutional neural networks have been proved to be extremely effective in object classification and recognition, even when the number of classes becomes very large, they are not as good in handling fine-grained classes, and in particular in extracting subtle differences between subclasses of a common parent class. One way to boost performances in this task is to embed external prior knowledge into standard machine learning approaches. In this paper we will review the state of the art in knowledge representation applied to fine-grained object recognition, focusing on methods that use (or can potentially use) convolutional neural networks. We will show that many research works have been published in the last years, but most of them make use of knowledge representation in a very naïve (or even unaware) way.

Area 4 - Applications and Services

Full Papers
Paper Nr: 1
Title:

SPICE: Superpixel Classification for Cell Detection and Counting

Authors:

Oman Magaña-Tellez, Michalis Vrigkas, Christophoros Nikou and Ioannis Kakadiaris

Abstract: An algorithm for the localization and counting of cells in histopathological images is presented. The algorithm relies on the presegmentation of an image into a number of superpixels followed by two random forests for classification. The first random forest determines if there are any cells in the superpixels at its input and the second random forest provides the number of cells in the respective superpixel. The algorithm is evaluated on a bone marrow histopathological dataset. We argue that a single random forest is not sufficient to detect all the cells in the image while a cascade of classifiers achieves higher accuracy. The results compare favorably with the state of the art but with a lower computational cost.

Paper Nr: 19
Title:

Robust Remote Heart Rate Determination for E-Rehabilitation - A Method that Overcomes Motion and Intensity Artefacts

Authors:

Christian Wiede, Jingting Sun, Julia Richter and Gangolf Hirtz

Abstract: Due to an increasing demand for post-surgical rehabilitations, the need for e-rehabilitation is continuously rising. At this point, a continuous monitoring of vital parameters, such as the heart rate, could improve the efficiency assessment of training exercises by measuring a patient’s physical condition. This study proposes a robust method to remotely determine a person’s heart rate with an RGB camera. In this approach, we used an individual and situation depending skin colour determination in combination with an accurate tracking. Furthermore, our method was evaluated by means of twelve different scenarios with 117 videos. Altogether, the results show that this method performed accurately and robustly for e-rehabilitation applications.

Paper Nr: 114
Title:

An Efficient Group of Pictures Decomposition based Watermarking for Anaglyph 3D Video

Authors:

Dorra Dhaou, Saoussen Ben Jabra and Ezzeddine Zagrouba

Abstract: Due to the rapid grow of 3D technology, 3D video consumption over the internet is proliferated. 3D Content protection has then become an important challenging problem for many researchers. Watermarking allows resolving this problem by embedding a signature into 3D video content. However, only a few works are proposed for 3D anaglyph content protection. In this paper, a new approach of 3D video anaglyph watermarking is proposed. In fact, the anaglyph 3D technique is considered as the best used technique for creating 3D perception for both images and videos. The proposed approach is based on GOP decomposition where original video is considered as a set of Group of pictures (GOP). Each GOP will be divided in three types of images : only one reference image and several B and R images. Then, every type of images will be marked using a different algorithm based on blue, red or depth channel. This allows to benefit from advantage of every channel. Experimental results show a high level of invisibility of the proposed approach and a robustness against several attacks such as compression, noise, filtering, frame suppression, and geometric transformations.

Paper Nr: 136
Title:

Soft-tissue Artefact Assessment and Compensation in Motion Analysis by Combining Motion Capture Data and Ultrasound Depth Measurements

Authors:

Abstract: Accurately determining the hip joint centre is a necessary component in biomechanical human motion analysis to measure skeletal parameters and describe human motion. The hip joint centre can be estimated using functional methods based on the relative motion of the femur to pelvis using reflective markers attached to the skin surface through an optical motion capture system; but this suffers inaccuracy due to the soft tissue artefact. A key objective in movement analysis is the assessment and correction of this artefact; in this case we present a non-invasive method to assess and reduce the soft tissue artefact effects using optical motion capture data and tissue thickness from ultrasound measurements during flexion, extension, and abduction of the hip joint. Results show that the displacement of markers is non-linear and larger in areas closer to the hip joint. The marker displacements are dependent on the movement type, being relatively larger in abduction movement. The quantification of soft tissue artefacts is used as a basis for a correction procedure for hip joint centre and minimizing effects. Results show that our method for soft tissue artefact assessment and minimization reduces the error in the functional hip joint centre approximately from 13-23mm to 7-14mm.

Paper Nr: 189
Title:

Facial Expression Recognition for Traumatic Brain Injured Patients

Authors:

Chaudhary Muhammad Aqdus Ilyas, Mohammad A. Haque, Matthias Rehm, Kamal Nasrollahi and Thomas B. Moeslund

Abstract: In this paper, we investigate the issues associated with facial expression recognition of Traumatic Brain Insured (TBI) patients in a realistic scenario. These patients have restricted or limited muscle movements with reduced facial expressions along with non-cooperative behavior, impaired reasoning and inappropriate responses. All these factors make automatic understanding of their expressions more complex. While the existing facial expression recognition systems showed high accuracy by taking data from healthy subjects, their performance is yet to be proved for real TBI patient data by considering the aforementioned challenges. To deal with this, we devised scenarios for data collection from the real TBI patients, collected data which is very challenging to process, devised effective way of data preprocessing so that good quality faces can be extracted from the patients facial video for expression analysis, and finally, employed a state-of-the-art deep learning framework to exploit spatio-temporal information of facial video frames in expression analysis. The experimental results confirms the difficulty in processing real TBI patients data, while showing that better face quality ensures better performance in this case.

Short Papers
Paper Nr: 35
Title:

Automatic Identification of Macular Edema in Optical Coherence Tomography Images

Authors:

Gabriela Samagaio, Aída Estévez, Joaquim de Moura, Jorge Novo, Marcos Ortega and María Isabel Fernández

Abstract: This paper proposes a novel system for the simultaneous identification and characterization of the three types of Macular Edema (ME) in Optical Coherence Tomography (OCT). These MEs are clinically defined, by the reference classification of the field, as: Serous Retinal Detachment (SRD), Diffuse Retinal Thickening (DRT) and Cystoid Macular Edema (CME). Our system uses multilevel image thresholding approaches to identify the SRD and CME cases and a learning approach for the DRT identification. The system provided promising results with F-Measures of 83.35% and 81.95% for the DRT and CME detections, respectively. It was also efficient in detecting all the SRD cases included in the testing image dataset. The system was able to identify individually the different types of ME on the OCT images but it was also capable to detect simultaneously the existence of the three ME cases when they appeared merged in the lower retinal layers.

Paper Nr: 79
Title:

A Wearable Embedded System for Detecting Accidents while Running

Authors:

Vincenzo Carletti, Antonio Greco, Alessia Saggese, Mario Vento and Vincenzo Vigilante

Abstract: Every year 424,000 fatal accidents occur, they are the second cause of unintentional death after road traffic injuries. The difference between fatal and not fatal accidents often is the presence of other people able to promptly provide first aid or call for help. Unfortunately, even during the practice of group activities (e.g. team sports) an accident can happen when a person is alone or out of sight; thus, the availability of devices able to detect if a serious accident is occurred and consequently arise an alarm to other people is an important issue for the safety of people. Starting from these considerations, in this paper we propose a wearable device able to detect accidents occurring during the practice of running. The device uses a one class SVM trained only on the normal activity and classifies as anomalies all the unknown situations. Then, in order to avoid alarms related to non dangerous events, the output of the classifier is analyzed by an additional stage responsible to detect if the person is or not unconscious after an abnormal event. In the former case an alarm is arisen by the system.

Paper Nr: 87
Title:

Wrinkles Individuality Preserving Aged Texture Generation using Multiple Expression Images

Authors:

Pavel A. Savkin, Tsukasa Fukusato, Takuya Kato and Shigeo Morishima

Abstract: Aging of a human face is accompanied by visible changes such as sagging, spots, somberness, and wrinkles. Age progression techniques that estimate an aged facial image are required for long-term criminal or missing person investigations, and also in 3DCG facial animations. This paper focuses on aged facial texture and introduces a novel age progression method based on medical knowledge, which represents an aged wrinkles shapes and positions individuality. The effectiveness of the idea including expression wrinkles in aging facial image synthesis is confirmed through subjective evaluation.

Paper Nr: 96
Title:

Development of a Computer Interface for People with Disabilities based on Computer Vision

Authors:

Gustavo Scalabrini Sampaio and Maurício Marengoni

Abstract: The growing of the population with disabilities in the world must be accompanied by the growing of research and development of tools that help these users on basic computer activities. This paper presents the development of a system that allows the use of personal computers using only face movements. The system can be used by people with motor disabilities who still have head movements, such as superior members amputees and tetraplegic. For the development of the proposed system, the most efficient techniques in previous works were collected and analyzed, and new ones were developed in order to build a system with high performance and precision, ensuring the digital and social inclusion of the target public. Tests have shown that the tool is easy to learn, has a good performance and can be used in everyday computer applications.

Paper Nr: 130
Title:

DETCIC: Detection of Elongated Touching Cells with Inhomogeneous Illumination using a Stack of Conditional Random Fields

Authors:

A. Memariani, C. Nikou, B. T. Endres, E. Bassères, K. W. Garey and I. A. Kakadiaris

Abstract: Automated detection of touching cells in images with inhomogeneous illumination is a challenging problem. A detection framework using a stack of two conditional random fields is proposed to detect touching elongated cells in scanning electron microscopy images with inhomogeneous illumination. The first conditional random field employs shading information to segment the cells where the effect of inhomogeneous illumination is reduced. The second conditional random field estimates the cell walls using their estimated cell wall probability. The method is evaluated using a dataset of Clostridium difficile cells. Finally, the method is compared with two region-based cell detection methods, CellDetect and DeTEC, improving the F-score by at least 20%.

Paper Nr: 152
Title:

Towards a Pre-diagnose of Surgical Wounds through the Analysis of Visual 3D Reconstructions

Authors:

Neus Muntaner Estarellas, Francisco Bonin-Font, Juan J. Segura-Sampedro, Andres Jiménez Ramírez, Pep L. Negre Carrasco, Miquel Massot Campos, Francesc X. Gonzalez-Argenté and Gabriel Oliver Codina

Abstract: This paper presents a new methodology to pre-diagnose the state of post-surgical abdominal wounds based on visual information. The process consist of four major phases: a) building dense 3D reconstruction of the abdominal area around the wound, b) selecting an area close to the wound to fit a plane, c) calculating the distance from each point of the 3D model to the plane, d) analyzing this map of distances to infer if the wound is inflamed or not. This method needs to be wrapped in an application to be used by patients in order to save unnecessary visits to the medical center.

Paper Nr: 166
Title:

Real-time Integral Photography Holographic Pyramid using a Game Engine

Authors:

Shohei Anraku, Toshiaki Yamanouchi and Kazuhisa Yanaka

Abstract: A new holographic pyramid system that can display an animation of integral photography images that appear to be floating is developed using a game engine and by writing its shader. An animation of the object, as viewed from the front, rear, left, and right, are displayed on the four surfaces of the pyramid. All animations are autostereoscopic and are provided in horizontal and vertical parallaxes. The user can rotate the object left or right by operating the keyboard. This system can be regarded as an autostereoscopic mixed-reality system because real and virtual objects can coexist in one pyramid.

Paper Nr: 90
Title:

Intelligent Digital Built Heritage Models: An Approach from Image Processing and Building Information Modelling Technology

Authors:

Pedro V. V. de Paiva, Camila K. Cogima, Eloisa Dezen-Kempter, Marco A. G. de Carvalho and Lucas R. Cerqueira

Abstract: Conservation and maintenance of historic buildings have exceptional requirements and need a detailed diagnosis and an accurate as-is documentation. This paper reports the use of Unmanned Aerial Vehicle (UAV) imagery to create an Intelligent Digital Built Heritage Model (IDBHM) based on Building Information Modeling (BIM) technology. Our work outlines a model-driven approach based on UAV data acquisition, photogrammetry, post-processing and segmentation of point clouds to promote partial automation of BIM modeling process. The methodology proposed was applied to a historical building facade located in Brazil. A qualitative and quantitative assessment of the proposed segmentation method was undertaken through the comparison between segmented clusters and as-designed documents, also as between point clouds and ground control points. An accurate and detailed parametric IDBHM was created from high-resolution Dense Surface Model (DSM). This Model can improve conservation and rehabilitation works. The results demonstrate that the proposed approach yields good results in terms of effectiveness in the clusters segmentation, compared to the as-designed model.

Paper Nr: 134
Title:

VIOL: Viewpoint Invariant Object Localizator - Viewpoint Invariant Planar Features in Man-Made Environments

Authors:

Marco Filax and Frank Ortmeier

Abstract: Object detection is one of the fundamental issues in computer vision. The established methods, rely on different feature descriptors to determine correspondences between significant image points. However, they do not provide reliable results, especially for extreme viewpoint changes. This is because feature descriptors do not adhere to the projective distortion introduced with an extreme viewpoint change. Different approaches have been proposed to lower this hurdle, e.g., by randomly sampling multiple virtual viewpoints. However, these methods are either computationally intensive or impose strong assumptions of the environment. In this paper, we propose an algorithm to detect corresponding quasi-planar objects in man-made environments. We make use of the observation that these environments typically contain rectangular structures. We exploit the information gathered from a depth sensor to detect planar regions. With these, we unwrap the projective distortion, by transforming the planar patch into a fronto-parallel view. We demonstrate the feasibility and capabilities of our approach in a real-world scenario: a supermarket.

Paper Nr: 161
Title:

Prototyping and Evaluating Sensory Substitution Devices by Spatial Immersion in Virtual Environments

Authors:

Aziliz Guezou-Philippe, Sylvain Huet, Denis Pellerin and Christian Graff

Abstract: Various audio-vision Sensory Substitution Devices (SSDs) are in development to assist people without sight. They all convert optical information extracted from a camera, into sound parameters but are evaluated for different tasks in different contexts. The use of 3D environments is proposed here to compare the advantages and disadvantages of not only software (transcoding) solutions but also of hardware (component) specifics, in various situations and activities. By use of a motion capture system, the whole person, not just a guided avatar, was immersed in virtual places that were modelled and that could be replicated at will. We evaluated the ability to hear depth for various tasks: detecting and locating an open window, moving and crossing an open door. Participants directed the modelled depth-camera with a real pointing device that was either held in the hand or fastened on the head. Mixed effects on response delays were analyzed with a linear model to highlight the respective importance of the pointing device, the target specifics and the individual participants. Results are encouraging to further exploit our prototyping set-up and test many solutions by implementing e.g., environments, sensor devices, transcoding rules, and pointing devices including the use of an eye-tracker.

Area 5 - Motion, Tracking and Stereo Vision

Full Papers
Paper Nr: 21
Title:

GPU Accelerated Probabilistic Latent Sequential Motifs for Activity Analysis

Authors:

Khaja Wasif Mohiuddin, Jagannadan Varadarajan, Rémi Emonet, Jean-Marc Odobez and Pierre Moulin

Abstract: In this paper, we present an optimized GPU based implementation of Probabilistic Latent Sequential motifs (PLSM) that was proposed for sequential pattern mining from video sequences. PLSM mines for recurrent sequential patterns from documents given as word-time occurrences, and outputs a set of sequential activity motifs and their starting occurrences. PLSM’s uniqueness comes from modeling the co-occurrence and temporal order in which the words occur within a temporal window while also dealing with activities which occur concurrently in the video. However, the expectation-maximization algorithm used in PLSM has a very high time complexity due to complex nested loops, requiring several dimensionality reduction steps before invoking PLSM. In order to truly realize the benefits of the model, we propose two GPU based implementations of PLSM called GPU-pLSM (sparse and dense). The two implementations differ based on whether the entire word-count matrix (dense) or only the non-zero entries (sparse) are considered in inferring the latent motifs respectively. Our implementation achieves an impressive 265X and 366X times speed up for dense and sparse approaches respectively on NVIDIA GeForce GTX Titan. This speed up enables us to remove several pre-processing and dimension reduction steps used to generate the input temporal documents and thus apply PLSM directly on the input documents. We validate our results through qualitative comparisons of the inferred motifs on two different publicly available datasets. Quantitative comparison on document reconstruction based abnormality measure show that both GPU-PLSM and PLSA+PLSM are strongly correlated.

Paper Nr: 26
Title:

Deep Parts Similarity Learning for Person Re-Identification

Authors:

María José Gómez-Silva, José María Armingol and Arturo de la Escalera

Abstract: Measuring the appearance similarity in Person Re-Identification is a challenging task which not only requires the selection of discriminative visual descriptors but also their optimal combination. This paper presents a unified learning framework composed by Deep Convolutional Neural Networks to simultaneously and automatically learn the most salient features for each one of nine different body parts and their best weighting to form a person descriptor. Moreover, to cope with the cross-view variations, these have been coded in a Mahalanobis Matrix, in an adaptive process, also integrated into the learning framework, which takes advantage of the discriminative information given by the dataset labels to analyse the data structure. The effectiveness of the proposed approach, named Deep Parts Similarity Learning (DPSL), has been evaluated and compared with other state-of-the-art approaches over the challenging PRID2011 dataset.

Paper Nr: 59
Title:

Online Multi-target Visual Tracking using a HISP Filter

Authors:

Nathanael L. Baisa

Abstract: We propose a new multi-target visual tracker based on the recently developed Hypothesized and Independent Stochastic Population (HISP) filter. The HISP filter combines advantages of traditional tracking approaches like multiple hypothesis tracking (MHT) and point-process-based approaches like probability hypothesis density (PHD) filter, and has a linear complexity while maintaining track identities. We apply this filter for tracking multiple targets in video sequences acquired under varying environmental conditions and targets density using a tracking-by-detection approach. In addition, we alleviate the problem of two or more targets having identical label taking into account the weight propagated with each confirmed hypothesis. Finally, we carry out extensive experiments on Multiple Object Tracking 2016 (MOT16) benchmark dataset and find out that our tracker significantly outperforms several state-of-the-art trackers in terms of tracking accuracy.

Paper Nr: 71
Title:

A Visual Computing Approach for Estimating the Motility Index in the Frail Elder

Authors:

Chiara Martini, Nicoletta Noceti, Manuela Chessa, Annalisa Barla, Alberto Cella, Gian Andrea Rollandi, Alberto Pilotto, Alessandro Verri and Francesca Odone

Abstract: The accurate estimation of frailty is an important objective to assess the overall well-being and to predict the risk of mortality of elderly. Such evaluation is commonly based on subjective quantities both from self-reported outcomes and occasional physicians evaluations, leading to possibly biased results. An objective and continuous frailty screening tool may be more appropriate for routine assessment. In this paper, we present a data driven method to evaluate one of the main aspect contributing to the frailty estimation, i.e. the motility of the subject. To this aim, we define a motility index, estimated following a visual computing approach analysing streams of RGB-D data. We provide an extensive experimental assessment performed on two sets of data acquired in a sensorised facility located within a local hospital. The results are in good agreement with the assessment manually performed by the physicians, nicely showing the potential of our approach.

Paper Nr: 116
Title:

Subtle Motion Analysis and Spotting using the Riesz Pyramid

Authors:

Carlos Andres Arango, Olivier Alata, Rémi Emonet, Anne-Claire Legrand and Hubert Konik

Abstract: Analyzing and temporally spotting motions which are almost invisible to the human eye might reveal interesting information about the world. However, detecting these events is difficult due to their short duration and low intensities. Taking inspiration from video magnification techniques, we design a workflow for analyzing and temporally spotting subtle motions based on the Riesz pyramid. In addition, we propose a filtering and masking scheme that segments motions of interest without producing undesired artifacts or delays. In order to be able to evaluate the spotting accuracy of our method, we introduce our own database containing videos of subtle motions. Experiments are carried out under different types and levels of noise. Finally, we show that our method is able to outperform other state of the art methods in this challenging task.

Paper Nr: 132
Title:

Combining 2D to 2D and 3D to 2D Point Correspondences for Stereo Visual Odometry

Authors:

Stephan Manthe, Adrian Carrio, Frank Neuhaus, Pascual Campoy and Dietrich Paulus

Abstract: Self-localization and motion estimation are requisite skills for autonomous robots. They enable the robot to navigate autonomously without relying on external positioning systems. The autonomous navigation can be achieved by making use of a stereo camera on board the robot. In this work a stereo visual odometry algorithm is developed which uses FAST features in combination with the Rotated-BRIEF descriptor and an approach for feature tracking. For motion estimation we utilize 3D to 2D point correspondences as well as 2D to 2D point correspondences. First we estimate an initial relative pose by decomposing the essential matrix. After that we refine the initial motion estimate by solving an optimization problem that minimizes the reprojection error as well as a cost function based on the epipolar constraint. The second cost function enables us to take also advantage of useful information from 2D to 2D point correspondences. Finally, we evaluate the implemented algorithm on the well known KITTI and EuRoC datasets.

Paper Nr: 177
Title:

Approximate Epipolar Geometry from Six Rotation Invariant Correspondences

Authors:

Dániel Baráth

Abstract: We propose a method for estimating an approximate fundamental matrix from six rotation invariant feature correspondences exploiting their rotation components, e.g. provided by SIFT or ORB detectors. The cameras are not calibrated. First, a linear sub-space is calculated from the point coordinates, then the rotations are used assuming orthographic projection. It is demonstrated that combining the proposed method with Graph-cut RANSAC makes it superior to the state-of-the-art in terms of accuracy for tasks requiring a strict time limit. These tasks are practically the ones which need to be done real time. We tested the method on 203 publicly available real image pairs.

Paper Nr: 178
Title:

Enhancing Correlation Filter based Trackers with Size Adaptivity and Drift Prevention

Authors:

Emre Tunali, Sinan Oz and Mustafa Eral

Abstract: To enhance correlation filter (CF) based trackers with size adaptivity and more robustness; we propose a new strategy which integrates an external segmentation methodology with CF based trackers in a closed feedback loop. Employing this strategy both enables object size disclosure during tracking; and automatic alteration of track models and parameters online in non-disturbing manner, yielding better target localization. Obviously, consolidation of CF based trackers with these properties introduces much more robustness against track center drifts and relaxes widespread perfectly centralized track initialization assumption. In other words, even if track window center is given with certain offset to center of target object at track initialization; proposed methodology achieves target centralization by aligning tracker template center with target center smoothly in time. Experimental results indicates that proposed algorithm increases performance of CF trackers in terms of accuracy and robustness without disrupting their real-time processing capabilities.

Paper Nr: 184
Title:

Super-Resolution 3D Reconstruction from Multiple Cameras

Authors:

Tomoaki Nonome, Fumihiko Sakaue and Jun Sato

Abstract: In this paper, we propose a novel method for reconstructing high resolution 3D structure and texture of the scene. In the image processing, it is known that image super-resolution is possible from multiple low resolution images. In this paper, we extend the image super-resolution into 3D space, and show that it is possible to recover high resolution 3D structure and high resolution texture of the scene from low resolution images taken at different viewpoints. The experimental results from real and synthetic images show the efficiency of the proposed method.

Paper Nr: 196
Title:

Pedestrian Detection and Tracking in Thermal Images from Aerial MPEG Videos

Authors:

Ichraf Lahouli, Robby Haelterman, Zied Chtourou, Geert De Cubber and Rabah Attia

Abstract: Video surveillance for security and intelligence purposes has been a precious tool as long as the technology has been available but is computationally heavy. In this paper, we present a fast and efficient framework for pedestrian detection and tracking using thermal images. It is designed for automatic surveillance applications in an outdoor environment like preventing border intrusions or attacks on sensitive facilities using image and video processing techniques implemented on-board Unmanned Aerial Vehicles (UAV)s. The proposed framework exploits raw H.264 compressed video streams with limited computational overhead. Our work is driven by the fact that Motion Vectors (MV) are an integral part of any video compression technique, by day and night capabilities of thermal sensors and the distinguished thermal signature of humans. Six different scenarios were carried out and filmed using a thermal camera in order to simulate suspicious events. The obtained results show the effectiveness of the proposed framework and its low computational requirements which make it adequate for on-board processing and real-time applications.

Short Papers
Paper Nr: 4
Title:

New Error Measures for Evaluating Algorithms that Estimate the Motion of a Range Camera

Authors:

Boris Bogaerts, Rudi Penne, Bart Ribbens, Seppe Sels and Steve Vanlanduit

Abstract: We compare the classical point-based algorithms for the extrinsic calibration of a range camera to the recent plane-based method. This method does not require any feature detection, and appears to perform well using a small number of planes (minimally 3). In order to evaluate the accuracy of the computed rigid motion we propose two new error metrics that get direct access to the ground truth provided by a mechanism with reliable motion control. Furthermore, these error metrics do not depend on an additional hand-eye calibration between the mechanism and the sensor. By means of our objective measures, we demonstrate that the plane-based method outperforms the point-based methods that operate on 3-D or 2-D point correspondences. In our experiments we used two types of TOF cameras attached to a robot arm, but our evaluation tool applies to other sensors and moving systems.

Paper Nr: 48
Title:

Embedded Navigation and Classification System for Assisting Visually Impaired People

Authors:

Antonio Miguel Batista Dourado and Emerson Carlos Pedrino

Abstract: Loss of vision has a large detrimental impact on a person’s mobility. Every day, visually impaired people (VIPs) face various challenges just to get around in the most diverse environments. Technological solutions, called Electronic Travel Aids, help a VIP with these challenges, giving greater confidence in the task of getting around in unfamiliar surroundings. Thus, this article presents an embedded navigation and classification system for helping VIPs indoors. Using stereo vision, the system is able to detect obstacles and choose safe ways for the VIP to walk around without colliding. A convolutional neural network using a graphics processing unit (GPU) classifies the obstacles. Acoustic feedback is transmitted to the VIP. The article also features a wearable prototype, to which the system hardware is docked for use. Using the system, the prototype could detect and classify obstacles in real time defining free paths, all with battery autonomy of about 6 hours.

Paper Nr: 50
Title:

Simulation-based Optimization of Camera Placement in the Context of Industrial Pose Estimation

Authors:

Troels B. Jørgensen, Thorbjørn M. Iversen, Anders P. Lindvig, Christian Schlette, Dirk Kraft, Thiusius R. Savarimuthu, Jürgen Rossmann and Norbert Krüger

Abstract: In this paper, we optimize the placement of a camera in simulation in order to achieve a high success rate for a pose estimation problem. This is achieved by simulating 2D images from a stereo camera in a virtual scene. The stereo images are then used to generate 3D point clouds based on two different methods, namely a single shot stereo matching approach and a multi shot approach using phase shift patterns. After a point cloud is generated, we use a RANSAC-based pose estimation algorithm, which relies on feature matching of local 3D descriptors. The object we pose estimate is a tray containing items to be grasped by a robot. The pose estimation is done for different positions of the tray and with different item configuration in the tray, in order to determine the success rate of the pose estimation algorithm for a specific camera placement. Then the camera placement is varied according to different optimization algorithms in order to maximize the success rate. Finally, we evaluate the simulation in a real world scene, to determine whether the optimal camera position found in simulation matches the real scenario.

Paper Nr: 70
Title:

Object Oriented Structure from Motion: Can a Scribble Help?

Authors:

Rahaf Rahal, Daniel Asmar, Elie Shammas and Bernard Ghanem

Abstract: The concept of anywhere anytime scanning of 3D objects is very appealing. One promising solution to extract structure is to rely on a monocular camera to perform, what is well-known as Structure from Motion (SfM). Despite the significant progress achieved in SfM, the structures that are obtained are still below par the quality of reconstruction obtained through laser scanning, especially when objects are kept as part of their background. This paper looks into the idea of treating points in the scene non-uniformly, in an attempt to give more weight to the objects of interest. The system presented utilizes a minimal user interaction, in the form of a scribble, to segment the pertinent objects from different views and focus the reconstruction on them, leading to what we call Object Oriented SfM (OOSfM). We test the effect of OOSfM on the reconstruction of specific objects by formulating the bundle adjustment (BA) step in three novel manners. Our proposed system is tested on several real and synthetic datasets, and results of the different formulations of BA presented are reported and compared to the conventional (vanilla) SfM pipeline results. Experiments show that keeping the background points actually improves the reconstructed objects of interest.

Paper Nr: 103
Title:

Wearable RGB Camera-based Navigation System for the Visually Impaired

Authors:

Reham Abobeah, Mohamed Hussein, Moataz Abdelwahab and Amin Shoukry

Abstract: This paper proposes a wearable RGB camera-based system for sightless people through which they can easily and independently navigate their surrounding environment. The system uses a single head or chest mounted RGB camera to capture the visual information from the current user’s path, and an auditory system to inform the user about the right direction to follow. This information is obtained through a novel alignment technique which takes as input a visual snippet from the current user’s path and responds with the corresponding location on the training path. Then, assuming that the wearable camera pose reflects the user’s pose, the system corrects the current user’s pose to align with the corresponding pose in the training location. As a result, the user receives periodically an acoustic instruction to assist him in reaching his destination safely. The experiments conducted to test the system, in various collected indoor and outdoor paths, have shown that it satisfies its design specifications in terms of correctly generating the instructions for guiding the visually impaired along these paths, in addition to its ability to detect and correct deviations from the predefined paths.

Paper Nr: 150
Title:

Bee Hive Traffic Monitoring by Tracking Bee Flight Paths

Authors:

Baptiste Magnier, Gaëtan Ekszterowicz, Joseph Laurent, Matthias Rival and François Pfister

Abstract: The number of pollinator insect is in decline in Europe and this raises concerns about the supply of pollination services to agriculture. Thus, countries with a low number of honeybees are more vulnerable to negative shifts in wild pollinator communities. Consequently, the demand for honeybee pollination is higher than ever but beekeepers are also very concerned by the strength of their colonies. To measure this factor, a very important indicator to take into account is the flight activity at the beehive entrance. A quantitative measure of the activity can be related to the environment and does not only benefit beekeepers but scientists too. In this paper, we present a complete method of measuring this activity. It is represented by the number of bees going in or out of the beehive. The developed method is divided in three parts: the first one consists in bee detection thanks to several image transformations using background subtraction and ellipse approximations. The second one is about tracking bees, by assuming their future positions in order to determine whether they are going in or getting out of the beehive. The last one consists in counting the bees. Finally, the experimental results demonstrate that our system created with limited resources can be used to precisely measure the flight activity at the beehive entrance.

Paper Nr: 154
Title:

Reconstructing Textureless Objects - Image Enhancement for 3D Reconstruction of Weakly-Textured Surfaces

Authors:

Nader H. Aldeeb and Olaf Hellwich

Abstract: Photogrammetric techniques for 3D reconstruction of weakly-textured surfaces are challenging. This paper proposes a new method to enhance image-based 3D reconstruction of weakly-textured surfaces. The idea behind it is to enhance the contrast of images, especially in weakly-textured regions, before feeding them to the reconstruction pipeline. Images contrast is enhanced using a recently proposed approach for noise reduction. The dynamic range of the generated denoised-images has to be squeezed to the limited 8-bit range that is used by the standard 3D reconstruction techniques. Dynamic range squeezing is a very critical process and can lead to information losses, since many levels in the original range will no longer be available in the limited target range. To this end, this paper proposes a new tone-mapping approach that is based on Contrast Limited Adaptive Histogram Equalization (CLAHE). It amplifies the local contrast adaptively to effectively use the limited target range. At the same time, it uses a limit to prevent local noise from being amplified. Using our approach leads to a significant improvement of up to 400% in the completeness of the 3D reconstruction.

Paper Nr: 181
Title:

Countering Bias in Tracking Evaluations

Authors:

Gustav Häger, Michael Felsberg and Fahad Khan

Abstract: Recent years have witnessed a significant leap in visual object tracking performance mainly due to powerful features, sophisticated learning methods and the introduction of benchmark datasets. Despite this significant improvement, the evaluation of state-of-the-art object trackers still relies on the classical intersection over union (IoU) score. In this work, we argue that the object tracking evaluations based on classical IoU score are sub-optimal. As our first contribution, we theoretically prove that the IoU score is biased in the case of large target objects and favors over-estimated target prediction sizes. As our second contribution, we propose a new score that is unbiased with respect to target prediction size. We systematically evaluate our proposed approach on benchmark tracking data with variations in relative target size. Our empirical results clearly suggest that the proposed score is unbiased in general.

Paper Nr: 3
Title:

Omnidirectional Visual Odometry for Flying Robots using Low-power Hardware

Authors:

Simon Reich, Maurice Seer, Lars Berscheid, Florentin Wörgötter and Jan-Matthias Braun

Abstract: Currently, flying robotic systems are in development for package delivery, aerial exploration in catastrophe areas, or maintenance tasks. While many flying robots are used in connection with powerful, stationary computing systems, the challenge in autonomous devices—especially in indoor-rescue or rural missions—lies in the need to do all processing internally on low power hardware. Furthermore, the device cannot rely on a well ordered or marked surrounding. These requirements make computer vision an important and challenging task for such systems. To cope with the cumulative problems of low frame rates in combination with high movement rates of the aerial device, a hyperbolic mirror is mounted on top of a quadrocopter, recording omnidirectional images, which can capture features during fast pose changes. The viability of this approach will be demonstrated by analysing several scenes. Here, we present a novel autonomous robot, which performs all computations online on low power embedded hardware and is therefore a truly autonomous robot. Furthermore, we introduce several novel algorithms, which have a low computational complexity and therefore enable us to refrain from external resources.

Paper Nr: 52
Title:

Efficient Dense Disparity Map Reconstruction using Sparse Measurements

Authors:

Oussama Zeglazi, Mohammed Rziza, Aouatif Amine and Cédric Demonceaux

Abstract: In this paper, we propose a new stereo matching algorithm able to reconstruct efficiently a dense disparity maps from few sparse disparity measurements. The algorithm is initialized by sampling the reference image using the Simple Linear Iterative Clustering (SLIC) superpixel method. Then, a sparse disparity map is generated only for the obtained boundary pixels. The reconstruction of the entire disparity map is obtained through the scanline propagation method. Outliers were effectively removed using an adaptive vertical median filter. Experimental results were conducted on the standard and the new Middleburya datasets show that the proposed method produces high-quality dense disparity results.

Paper Nr: 89
Title:

Road Surface Scanning using Stereo Cameras for Motorcycles

Authors:

Joerg Deigmoeller, Nils Einecke, Oliver Fuchs and Herbert Janssen

Abstract: Active and semi-active suspension systems for vehicles became quite popular in the recent years as they allow for a smoother and safer ride compared to conventional suspension systems. The performance of an active/semi-active suspension system can be even more improved if the road condition in front of the vehicle is known. Currently only a few luxury cars combine fully active suspension with stereo cameras for such a predictive adaptation. However, we are not aware of any existing system for motorcycles. In this work, we present an algorithm that can cope with the rolling movement of a motorcycle. In addition, it can robustly reconstruct the road profile within a single time step and does not require temporal integration which allows real-time processing up to very high speeds at a precision in the order of millimeters. The complete system has been successfully tested on a German highway and a precise road laser scan has been used for evaluation.

Paper Nr: 182
Title:

Jointly Optical Flow and Occlusion Estimation for Images with Large Displacements

Authors:

Vanel Lazcano, Luis Garrido and Coloma Ballester

Abstract: This paper deals with motion estimation of objects in a video sequence. This problem is known as optical flow estimation. Traditional models to estimate it fail in presence of occlusions and non-uniform illumination. To tackle these problems we propose a variational model to jointly estimate optical flow and occlusions. The proposed model is able to deal with the usual drawback of variational methods in dealing with large displacements of objects in the scene which are larger than the object itself. The addition of a term that balances gradient and intensities increases the robustness to illumination changes of the proposed model. The inclusion of a supplementary matching obtained by exhaustive search in specific locations helps to follow large displacements.