quanteda
(offizielles Quanteda-Tutorial in Englisch)Dieses Tutorial soll euch die Wordfish-Methode näherbringen. Im Unterschied zur Wordscore-Methode basiert Wordfish auf einem unsupervised learning Ansatz.
Ähnlich wie bei Wordscore hat sich vor allem das quanteda
-Team darum bemüht, eine sehr gute R
-Implementierung zu realisieren. Aus diesem Grund werde ich hier das offizielle Tutorial von der quanteda
bereitstellen. Die Originalquelle findet ihr hier.
Wordfish is a Poisson scaling model of one-dimensional document positions (Slapin and Proksch 2008). Wordfish also allows for scaling documents, but compared to Wordscores reference scores/texts are not required. Wordfish is an unsupervised one-dimensional text scaling method, meaning that it estimates the positions of documents solely based on the observed word frequencies.
library(quanteda)
## Package version: 3.0.0
## Unicode version: 10.0
## ICU version: 61.1
## Parallel computing: 4 of 4 threads used.
## See https://quanteda.io for tutorials and examples.
library(quanteda.textmodels)
library(quanteda.textstats)
library(quanteda.corpora) # Die Anleitung zur Installierung findet ihr hier: https://github.com/quanteda/quanteda.corpora
library(quanteda.textplots) # Für mehr Infos siehe: https://quanteda.io/articles/pkgdown/examples/plotting.html
In this example, we show how to apply Wordfish to the Irish budget speeches from 2010. First, we create a document-feature matrix. Afterwards, we run Wordfish.
dfmat_irish <- data_corpus_irishbudget2010 %>%
tokens(remove_numbers = TRUE, remove_punct = TRUE,remove_symbols = TRUE) %>%
tokens_remove(pattern = stopwords("english")) %>%
tokens_tolower() %>%
dfm()
tmod_wf <- textmodel_wordfish(dfmat_irish, dir = c(6, 5))
summary(tmod_wf)
##
## Call:
## textmodel_wordfish.dfm(x = dfmat_irish, dir = c(6, 5))
##
## Estimated Document Positions:
## theta se
## Lenihan, Brian (FF) 1.71694 0.02303
## Bruton, Richard (FG) -0.43672 0.03226
## Burton, Joan (LAB) -0.99597 0.01819
## Morgan, Arthur (SF) 0.07786 0.02935
## Cowen, Brian (FF) 1.92504 0.02509
## Kenny, Enda (FG) -0.81200 0.02604
## ODonnell, Kieran (FG) -0.30679 0.04668
## Gilmore, Eamon (LAB) -0.37496 0.03247
## Higgins, Michael (LAB) -1.20318 0.03647
## Quinn, Ruairi (LAB) -1.23489 0.03503
## Gormley, John (Green) 0.96884 0.07982
## Ryan, Eamon (Green) 0.15616 0.06407
## Cuffe, Ciaran (Green) 0.57038 0.07298
## OCaolain, Caoimhghin (SF) -0.05072 0.03722
##
## Estimated Feature Scores:
## presented supplementary budget house last april said work way
## beta 0.3174 1.100 0.06874 0.060 0.279 -0.2068 -0.9513 0.5502 0.3246
## psi -1.8172 -1.159 2.68241 1.014 0.944 -0.5962 -0.4688 1.0693 1.3726
## period severe economic distress today can report notwithstanding
## beta 0.5856 1.337 0.5018 1.698 0.1437 0.3711 0.7066 1.698
## psi -0.2549 -2.140 1.5044 -4.264 0.8054 1.5106 -0.3314 -4.264
## difficulties past eight months now road recovery enormous
## beta 1.248 0.5386 1.698 0.7393 0.3534 0.1302 0.4230 -0.005133
## psi -1.439 0.8689 -4.264 0.2048 1.5396 -0.1096 0.8337 -1.086163
## benefit main political parties share
## beta -0.02207 0.9451 -0.3507 0.5503 -0.07773
## psi 1.34103 -0.9040 -0.4458 -0.1179 -0.17386
We can plot the results of a fitted scaling model using textplot_scale1d().
textplot_scale1d(tmod_wf)
The function also allows to plot scores by a grouping variable, in this case the party affiliation of the speakers.
textplot_scale1d(tmod_wf, groups = dfmat_irish$party)
Finally, we can plot the estimated word positions and highlight certain features.
textplot_scale1d(tmod_wf, margin = "features",
highlighted = c("government", "global", "children",
"bank", "economy", "the", "citizenship",
"productivity", "deficit"))