Labeled topic detection of open source software from mining mass textual project profiles

0
496

Nowadays open source software has become an indispensable basis for both individual and industrial software engineering. Various kinds of labeling mechanisms like categories, keywords and tags are used in open source communities to annotate projects and facilitate the discovery of certain software. However, as large amounts of software are attached with no/few labels or the existing labels are from different ontology space, it is still hard to retrieve potentially topic-relevant software. This paper highlights the valuable semantic information of project descriptions and labels, proposes labeled software topic detection (LSTD), a hybrid approach combining topic models and ranking mechanisms to detect and enrich the topics of software by mining the large amount of textual software profiles, which can be employed to do software categorization and tag recommendation. L-STD makes use of labeled LDA to capture the semantic correlations between labels and descriptions and then construct the label-based topic-word matrix. Based on the generated matrix and the generality of labels, LSTD designs a simple yet efficient algorithm to detect the latent topics of software that expressed as relevant and popular labels. Comprehensive evaluations are conducted on the large-scale datasets of representative open source communities and the results validate the effectiveness of LSTD.