Journalartikel

Choosing the Number of Topics in LDA Models - A Monte Carlo Comparison of Selection Criteria


AutorenlisteBystrov, Victor; Naboka-Krell, Viktoriia; Staszewska-Bystrova, Anna; Winker, Peter

Jahr der Veröffentlichung2024

ZeitschriftJournal of Machine Learning Research

Bandnummer25

ISSN1532-4435

URLhttps://jmlr.org/papers/volume25/23-0188/23-0188.pdf

VerlagJournal of Machine Learning Research


Abstract
Selecting the number of topics in Latent Dirichlet Allocation (LDA) models is considered to be a difficult task, for which various approaches have been proposed. In this paper the performance of the recently developed singular Bayesian information criterion (sBIC) is evaluated and compared to the performance of alternative model selection criteria. The sBIC is a generalization of the standard BIC that can be applied to singular statistical models. The comparison is based on Monte Carlo simulations and carried out for several alternative settings, varying with respect to the number of topics, the number of documents and the size of documents in the corpora. Performance is measured using different criteria which take into account the correct number of topics, but also whether the relevant topics from the considered data generation processes (DGPs) are revealed. Practical recommendations for LDA model selection in applications are derived.



Zitierstile

Harvard-ZitierstilBystrov, V., Naboka-Krell, V., Staszewska-Bystrova, A. and Winker, P. (2024) Choosing the Number of Topics in LDA Models - A Monte Carlo Comparison of Selection Criteria, Journal of Machine Learning Research, 25, Article 79. https://jmlr.org/papers/volume25/23-0188/23-0188.pdf

APA-ZitierstilBystrov, V., Naboka-Krell, V., Staszewska-Bystrova, A., & Winker, P. (2024). Choosing the Number of Topics in LDA Models - A Monte Carlo Comparison of Selection Criteria. Journal of Machine Learning Research. 25, Article 79. https://jmlr.org/papers/volume25/23-0188/23-0188.pdf



Schlagwörter



Zuletzt aktualisiert 2025-21-05 um 18:07