Recovering a Balanced Overview of Topics in a Software Domain

0
423

Domain analysis is a crucial step in the development of product lines and software reuse in general, in which domain experts try to identify the commonalities and variability between different products of a particular domain. This identification is challenging, since it requires significant manual analysis of requirements, design documents, and source code. In order to support domain analysts, this paper proposes to use topic modeling techniques to automatically identify common and unique concepts (topics) from the source code of different software products in a domain. An empirical case study of 19 projects, spread across the domains of web browsers and operating systems (totaling over 39 MLOC), shows that our approach is able to identify commonalities and variabilities at different levels of granularity (sub-domain and domain). In addition, we show how the commonalities are evenly spread across all projects of the domain.Â