Abstract: |
Words in the corpus include features and information, and the visualizing of such words can improve the user’s understanding of them. Words in text corpus may be consist of one-word or they may be a combination of words that together, constitute a word. The latter is referred to as a multiword expression. And if we analyze both single word and multiword with visualization, we can get more accurate results and more information than we analyze only single word from corpus. An interactive visualization can be useful for analyzing multiword expressions, because the following features are of interest to linguistics scholars: (1) Showing the combinatory POS pattern of a hierarchical form, (2) exploring results according to the POS pattern, and (3) searching the source corpus for the analysis-result verification. Therefore, we propose PreechVis, an interactive-visualization tool that includes all of the requisite functions for an analysis for which multiple words (http://202.30.24.167:3010/PreechVisMWE) are utilized. For the present study, we used a total of 957 speeches of 43 U.S. Presidents from George Washington to Barack Obama as the corpus data. PreechVis is divided into two views. In the first view, the system consists of a combination of Sunburst and RadVis. Through the circular Sunburst, we present the POS and its combination patterns for each gram. In RadVis, the Presidents were positioned according to their frequency value. In addition, when the President was selected, the frequency value was displayed on Sunburst to improve the user’s understanding. In the second view, the user can simultaneously confirm and verify the details of the result using the Wordcloud. The two different views are synchronized with each other and are changed by the selected grams, issues, and Presidents. In the experiments and case studies on the U.S.-President speech data, we verified the effectiveness and usability of PreechVis. |