Trend analysis model: trend consists of temporal words, topics, and timestamps

0
663

This paper presents a topic model that identifies interpretable low dimensional components in time-stamped data for capturing the evolution of trends. Unlike other models for time-stamped data, our proposal, the trend analysis model (TAM), focuses on the difference between temporal words and other words in each document to detect topic evolution over time. TAM introduces a latent trend class variable into each document and a latent switch variable into each token for handling these differences. The trend class has a probability distribution over temporal words, topics, and a continuous distribution over time, where each topic is responsible for generating words. The latter class uses a document specific probabilistic distribution to judge which variable each word comes from for generating words in each token. Accordingly, TAM can explain which topic co-occurrence pattern will appear at any given time, and represents documents of similar content and timestamp as sharing the same trend class. Therefore, TAM projects them on a latent space of trend dimensionality and allows us to predict the temporal evolution of words and topics in document collections. Experiments on various data sets show that the proposed model can capture interpretable low dimensionality sets of topics and timestamps, take advantage of previous models, and is useful as a generative model in the analysis of the evolution of trends.Â