the 3rd avenue: Daenekindt, S. and Huisman, J. (2020) Mapping the scattered field of research on higher education: A correlated topic model of 17,000 articles, 1991–2018, Higher Education,

2020/04/02

Daenekindt, S. and Huisman, J. (2020) Mapping the scattered field of research on higher education: A correlated topic model of 17,000 articles, 1991–2018, Higher Education,

トピックモデルにおける前処理

50語以下のアブストは除外：トピックモデルは短文でパフォーマンスが悪い
大文字、句読点、数字、ストップワード（the, and）、頻出語（higher education, results, article → トピック特定に貢献しないため）を除去。
スペルの標準化（UK→US）。
Porter’s word stemming algorithm：語を基幹化（argue, argued, argues → argu）。
頻出しない語（1％以下）を除去。

Rのstmパッケージでcorrelated topic modelを推定。