quanteda
(offizielles Quanteda-Tutorial in Englisch)Dieses Tutorial soll euch die Wordscores-Methode näherbringen. Hierbei handelt es sich, ähnlich wie bei Naïve Bayse um eine supervised classification. Wordscores wurde maßgeblich von Ken Benoit entwickelt und in quanteda
implementiert. Aus diesem Grund werde ich hier das offizielle Tutorial von quanteda
bereitstellen. Die Originalquelle findet ihr hier.
Wordscores is a scaling model for estimating the positions (mostly of political actors) for dimensions that are specified a priori. Wordscores was introduced in Laver, Benoit and Garry (2003) and is widely used among political scientists.
library(quanteda)
## Package version: 3.0.0
## Unicode version: 10.0
## ICU version: 61.1
## Parallel computing: 4 of 4 threads used.
## See https://quanteda.io for tutorials and examples.
library(quanteda.textmodels)
library(quanteda.textstats)
library(quanteda.corpora) # Die Anleitung zur Installierung findet ihr hier: https://github.com/quanteda/quanteda.corpora
library(quanteda.textplots) # Für mehr Infos siehe: https://quanteda.io/articles/pkgdown/examples/plotting.html
Training a Wordscores model requires reference scores for texts whose policy positions on well-defined a priori dimensions are “known”. Afterwards, Wordscores estimates the positions for the remaining “virgin” texts.
We use manifestos of the 2013 and 2017 German federal elections. For the 2013 elections we assign the average expert evaluations from the 2014 Chapel Hill Expert Survey for the five major parties, and predict the party positions for the 2017 manifestos.
corp_ger <- download(url = "https://www.dropbox.com/s/uysdoep4unfz3zp/data_corpus_germanifestos.rds?dl=1")
summary(corp_ger)
## Corpus consisting of 12 documents, showing 12 documents:
##
## Text Types Tokens Sentences year party ref_score
## AfD 2013 450 944 43 2013 AfD NA
## CDU-CSU 2013 7615 46535 2527 2013 CDU-CSU 5.92
## FDP 2013 7953 42298 2375 2013 FDP 6.53
## Gruene 2013 13839 93595 5126 2013 Gruene 3.61
## Linke 2013 8451 43382 1850 2013 Linke 1.23
## SPD 2013 8360 47348 2532 2013 SPD 3.76
## AfD 2017 5947 18754 715 2017 AfD NA
## CDU-CSU 2017 4890 21510 1256 2017 CDU-CSU NA
## FDP 2017 8676 37609 1925 2017 FDP NA
## Gruene 2017 13353 72645 3220 2017 Gruene NA
## Linke 2017 11830 65728 2755 2017 Linke NA
## SPD 2017 8400 41938 2401 2017 SPD NA
Now we can apply the Wordscores algorithm to a document-feature matrix.
# create a document-feature matrix
dfmat_ger <- corp_ger %>%
tokens(remove_numbers = TRUE, remove_punct = TRUE,remove_symbols = TRUE) %>%
tokens_remove(pattern = stopwords("de")) %>%
tokens_tolower() %>%
dfm() #
# apply Wordscores algorithm to document-feature matrix
tmod_ws <- textmodel_wordscores(dfmat_ger, y = corp_ger$ref_score, smooth = 1)
summary(tmod_ws)
##
## Call:
## textmodel_wordscores.dfm(x = dfmat_ger, y = corp_ger$ref_score,
## smooth = 1)
##
## Reference Document Statistics:
## score total min max mean median
## AfD 2013 NA 455 0 23 0.01105 0
## CDU-CSU 2013 5.92 22854 0 245 0.55495 0
## FDP 2013 6.53 20497 0 186 0.49772 0
## Gruene 2013 3.61 45244 0 398 1.09864 0
## Linke 2013 1.23 20794 0 234 0.50493 0
## SPD 2013 3.76 22928 0 214 0.55675 0
## AfD 2017 NA 9647 0 108 0.23425 0
## CDU-CSU 2017 NA 10624 0 136 0.25798 0
## FDP 2017 NA 19214 0 261 0.46656 0
## Gruene 2017 NA 40828 0 1086 0.99140 0
## Linke 2017 NA 33004 0 788 0.80142 0
## SPD 2017 NA 20688 0 186 0.50236 0
##
## Wordscores:
## (showing first 30 elements)
## alternative deutschland wahlprogramm
## 3.291 4.740 3.295
## währungspolitik fordern geordnete
## 4.529 3.255 4.240
## auflösung euro-währungsgebietes braucht
## 3.336 4.240 4.153
## euro ländern schadet
## 3.329 4.227 3.911
## wiedereinführung nationaler währungen
## 4.463 4.577 4.240
## schaffung kleinerer stabilerer
## 4.288 4.425 4.240
## währungsverbünde dm darf
## 4.240 4.240 3.870
## tabu änderung europäischen
## 4.158 4.226 4.358
## verträge staat ausscheiden
## 3.553 4.791 3.697
## ermöglichen volk demokratisch
## 4.354 4.240 2.271
Next, we predict the Wordscores for the unknown virgin texts.
pred_ws <- predict(tmod_ws, se.fit = TRUE, newdata = dfmat_ger)
Finally, we can plot the fitted scaling model using quanteda‘s textplot_scale1d
function.
textplot_scale1d(pred_ws)