OCVISIGRAPP 2016 Abstracts

Short Papers

Paper Nr:	1
Title:	A Study on Fast Ellipse Detection Approach for Large-size Document Images
Authors:	Chinthaka Premachandra, H. Waruna H. Premachandra, Chandana Dinesh Parape and Kawanaka Hiroharu
Abstract:	This paper presents a speed-up ellipse detection approach by parallel image scanning and Hough Transform (HT) for large-size document images. Objects in the images are generally detected considering their geometrical information with the raster scanning. In the raster scanning, image is scanned starting from left top point and ending with right down point. However, in the case of large-size images, considerable processing time is needed for detecting objects by scanning all the pixels. In this paper, an object detection approach for large-size images is proposed without scanning all the pixels in the images. The speed-up detection of ellipses in large-size document images is addressed as the problem and here, pixels on vertical parallel lines are scanned keeping a constant gap between each two lines, if an object larger than a certain size is found while scanning, ellipse existence is assumed. Then ellipse detection is conducted only applying HT in to a defined local image area over that object. With this approach, processing time can be dramatically reduced by skipping detection of some undesired objects and reducing the image area used for ellipse detection.

Paper Nr:	2
Title:	Grouping of Motion Signals from Sparse Representations
Authors:	Toshiro Kubota
Abstract:	Human vision has the ability to recognize a wide variety of objects with great accuracy from minimal motion signal information such as frame differences and point light displays. Replicating this capability on computer has been difficult. However, the utility of such artificial system will be immense as it brings portable cost-effective solutions to surveillance, gaming, robotics, to name a few. Thus, it is important from both theoretical and practical perspectives to design an algorithm that can 1) extract sparse representations of motion signals, 2) group them into coherent spatial-temporal patterns, and 3) interpret the underlying activities. Most currently available approaches require special sensors such as range and stereo, background models, and/or foreground models such as pedestrians and cars. These requirements simplify the problem but limit their applicability tremendously. In this open communication, we outline our recent efforts to the first two goals without any special hardware and background model. To extract sparse representations from video frames, we calculate motion signals by frame differences, reduce the motion signals of each frame into a sparse dot pattern by subsampling them at 3x3 windows, and finally approximate the dot pattern with skeletons using the following graph algorithm. First, the dot pattern is clustered into connected components. For each component, a Delaunay triangulated graph (G) is derived. From G, edges that are longer than the block size (3 pixels) are removed. Then, the skeleton of the component is derived as a longest shortest path in G. To test if the reduction maintained important information, we recorded 5 movies with various farm animals (cat, chicken, dog, geese, and llama) and 2 movies with humans imitating animals. We reduced each movie into the sparse representation and presented to human volunteers who were asked to pick an animal (including human) from a list of 12. The recognition rate of the sparse representation was 76% (n=30) while the recognition of the frame differences was 84% (n=31). To group the skeletons across frames, we establish grouping of skeletons within and across frames by estimating optimum rigid transformation, intra-frame grouping, and inter-frame correspondence simultaneously by formulating the problem as an Expectation-Maximization one. Our preliminary results indicate that the approach is highly accurate and robust against noise and clutters. In this talk, we will present our algorithms and their results, and propose a future research direction for how to recognize animals and humans in the sparse representation without any explicit foreground models.