Abstract: |
Dimensionality reduction (DR) methods aim to map high-dimensional datasets to 2D scatterplots for visual exploration. Such scatterplots are used to reason about the cluster structure of the data, so creating well-separated visual clusters from existing data clusters is an important requirement of DR methods. Many DR methods excel in speed, implementation simplicity, ease of use, stability, and out-of-sample capabilities, but produce suboptimal cluster separation. Recently, Sharpened DR (SDR) was proposed to generically help such methods by sharpening the data-distribution prior to the DR step. However, SDR has prohibitive computational costs for large datasets. We present SDR-NNP, a method that uses deep learning to keep the attractive sharpening property of SDR while making it scalable, easy to use, and having the out-of-sample ability. We demonstrate SDR-NNP on seven datasets, applied on three DR methods, using an extensive exploration of its parameter space. Our results show that SDR-NNP consistently produces projections with clear cluster separation, assessed both visually and by four quality metrics, at a fraction of the computational cost of SDR. We show the added value of SDR-NNP in a concrete use-case involving the labeling of astronomical data. |