Abstract: |
In classification tasks, the robustness against various image transformations remains a crucial property of the Convolutional Neural Networks (CNNs). It can be acquired using data augmentation. However, it comes at the price of risk of increased training time and network size. Consequently, other ways to endow CNN with invariance to various transformations -- and mainly to the rotations -- are an intensive field of study.
It is common to find that the filters in the first layers of CNNs contain rotated copies of the same filter (e.g., identical edge detectors in several orientations). We propose a network containing a bank of learnable steerable filters in the first layer. The network learns a unique basis filter and generates an ensemble of oriented rotated copies. We organize the filter-ensemble in increasing order of orientation. Each filter then gets activated by an edge aligned with its orientation. This methodology allows the network to capture the angular geometric relationship properties of the input data.
The filter bank is then seen as the input decomposition in oriented features. These features are then aligned to the vertical reference obtaining a translational feature space that is covariant with the input rotation. The filter bank is then a roto-translational feature space containing the information of the rotations of the input encoded as translations over the depth of the feature space. Then we apply a shared weights predictor that scans each translation (hence each orientation) and outputs a probability set for each one. This probability distribution contains the class information as predictor's output and information of the angle encoded in the translations' position. The maximum probability position corresponds to the angle of the input example; hence, it is obtained without angle labeling in the training set.
The prediction model that we propose shares weights between each translation, allowing the network to have a reduced model capable of class and angle inference with rotation invariant properties. Rotation invariant properties are best tested when the network is trained with objects in the same orientation (usually up-right orientation) and validated on randomly oriented examples. Hence, we train the network with up-right oriented examples and validate with randomly rotated examples to validate the network's rotation invariant capabilities.
With this methodology, we outperform state-of-the-art results on the MNIST and CIFAR-10 datasets. On the MNIST dataset, we obtain a 0.93% error rate with 42k trainable parameters. This result reaches the state-of-the-art error rate while keeping the number parameters lower than other approaches by at least 50% fewer parameters. On the CIFAR-10 dataset with randomly rotated validation, we achieve a 36.41% error rate outperforming the current 55.88\% error rate of previous approaches. Furthermore, the network uses 73k trainable parameters that is less than the previous methods with 130k parameters. In all cases, we can predict the classified object angle.
In conclusion, we obtain competitive state-of-the-art results on error rate while keeping a low-footprint network in terms of trainable parameters. Also, our network has angular prediction capabilities without angle labels in the training set. Smaller networks allow faster training times, embedded devices support, and a reduction in the energy costs involved with CNNs. |