Use different Topic Modeling approaches on Political Blogs to see the performance of diverse methods.

Introduction

Types of Models in Comparison

General LDA (R package)
Supervised LDA (David M. Blei, Jon D. McAuliffe)
Relational Topic Model (Jonathan Chang, David M. Blei)
Topic Link Block Model (Derek Owens-Oas)
Poisson Factor Modeling (Beta Negative Binomial Process Topic Model)
Dynamic Text Network Model (Teague Henry, David Banks et al.)

Key Values in Cleaned Blog Posts

After preprocessing the text extracted from blog posts:

dates: string of the given date in mm/dd/yy format
domains: string of the blog website where post was found (remove “www.”)
links: string of other websites occured in the post as hyperlinks (sorted alphabetically)
words: filtered words from raw text in the blog posts (TFIDF variance threading used)
rawText: direct content from blog posts (remove short posts and duplicate posts )
words_stem: stemmed words using Hunspell stemmer (e.g., apples -> apple)

Analysis via several Topic Modeling Methods

General LDA

General LDA Model via Collapsed Gibbs Sampling Methods for Topic Models:

Supervised LDA

Here use Blog Site as labels.

Relational Topic Model

RTM models the link as binary random variable that is conditioned on their text. The model can predict links between documents and predict words within them. The algorithm is based on variational EM algorithm.

For each document $d$:
1. Draw topic proportions $\theta_d|\alpha \sim \text{Dir}(\alpha)$
2. For each word $w_{d,n}$:
- Draw assignment $z_{d,n}|\theta_d \sim \text{Mult}(\theta_d)$
- Draw word w_d,n | z_d,n, $\beta$_1:K$\sim \text{Mult}(\beta$_{z_d,n}$)$
For each pair of documents $d,d’$:
- Draw binary link indicator $y|z_d,z$ _d’ $\sim \psi (\cdot | z_d,z$ _d’ $)$

Compare the performance of link prediction with the one of LDA. The plot below shows the predicted link probabilities from RTM against the ones of LDA for each document, and also shows the most expressed topics by the cited document. (sample 100)

Researcher✨Qiuyi Wu

Research on Public Policy Blogs