quanteda
: Wiederholungen und VorgriffDieses Tutorial ist zum großen Teil eine Wiederholung und zu einem kleinen Teil ein Vorgriff auf noch kommende Tutorials.
In den folgenden Abschnitten werden wir alle nötigen Schritte einer Textanalyse durchgehen. Dafür verwenden wir natürlich das quanteda
-Paket. Das Ziel dieses Tutorials ist es euch ein sicheres Verständnis zu vermitteln, welche Schritte bei einer Textanalyse tatsächlich gemacht werden müssen. Außerdem dienen die gewollten Wiederholen dazu, dass ihr sicherer und geübter im Umgang mit quanteda
und der Textanalyse werdet.
quanteda
Das quanteda-Paket ist ein Textanalyse-Paket für R
. Es deckt nahezu alles ab, was nötig ist, um eine Textanalyse durchzuführen. Natürlich bietet quanteda
auch eine klare und umfangreiche Dokumentation und eine Vielzahl an Tutorials.
Wir werden uns in diesem Tutorial auf die wichtigsten Schritte zur Vorbereitung einer Textanalyse konzentrieren und lernen, wie man die quanteda-Dokumentation für die einzelnen Funktionen nutzt.
Der Pflichttext für diese Sitzung von Welbers, van Atteveldt und Benoit, 2017 bespricht ebenfalls alle Schritte, welche wir jetzt besprechen werden.
Zur Erinnerung: das quanteda
-Paket und seine Nebenpakete müsst ihr beide jeder R
-Sitzung laden, jedoch nur einmal vorher installieren:
#install.packages("quanteda.textplots")
#install.packages("quanteda.textstats")
# install.packages("quanteda") # Falls noch nicht geschehen, dann bitte installiert das Paket bitte.
library(quanteda)
library(quanteda.textplots)
library(quanteda.textstats)
quanteda
Im folgenden werden wir die klassischen Schritte einer Textanalyse durchsprechen:
Texte importieren und Korpus erstellen
Dokumentenmatrix erstellen, bereinigen und filtern (Der wichtigste Teilschritt! Hier entscheidt sich die Güte der nachfolgenden Analyse)
Analyse (Dieser Schritt wird nur kurz besprochen, da wir im Verlauf des Seminars einzelne Analysemethoden im Detail besprechen werden.)
Der erste Schritt einer Textanalyse ist immer die zu analysierenden Texte in R
einzulesen. Textdateien werden in einer Vielzahl von Formaten gespeichert: einfache txt-Files, csv-Dateien, html-Dateien und natürlich PDF. Erfahrungsgemäß machen einfache txt-Dateien die wenigsten Probleme. Eine Anleitung wie ihr txt-Dateien einlest findet ihr im zweiten Tutorial zu dieser Sitzung (Titel: "Erste Schritte mit quanteda
"). Das Einlesen von Textdaten mit quanteda
funktioniert am effizientesten mit dem readtext
-Paket und dem readr
-Paket.
# install.packages("readtext") # Falls noch nicht geschehen, dann bitte installiert das Paket bitte.
# install.packages("readr")
# install.packages("tidyverse")
library(readtext)
library(readr)
library(tidyverse)
Der Einfachheit halber verwenden wir eine csv-Datei, die online verfügbar ist. Der Prozess des Einlesens der Daten ist identisch zu dem des Einlesens einer csv-Datei die ihr auf eurem Computer gespeichert habt (hier müsst ihr dann euren Computerpfad verwenden).
Die Datei die wir jetzt importieren beinhaltet die State of the Union-Reden der US-Präsidenten, wobei jedes Dokument (d.h. jede Zeile in der csv-Datei) ein Absatz einer Rede ist. Die Daten werden als ein data.frame importiert.
sotu_url <- "https://bit.ly/2QoqUQS"
sotu_data <- read_csv(sotu_url)
head(sotu_data) ## view first 6 rows
## # A tibble: 6 x 5
## paragraph date President Party text
## <dbl> <date> <chr> <chr> <chr>
## 1 1 1790-01-08 George Washi… Other I embrace with great satisfaction th…
## 2 2 1790-01-08 George Washi… Other In resuming your consultations for t…
## 3 3 1790-01-08 George Washi… Other Among the many interesting objects w…
## 4 4 1790-01-08 George Washi… Other A free people ought not only to be a…
## 5 5 1790-01-08 George Washi… Other The proper establishment of the troo…
## 6 6 1790-01-08 George Washi… Other There was reason to hope that the pa…
Jetzt können wir einen quanteda
-Corpus mit der corpus
-Funktion erstellen. Wie immer gilt: wenn ihr mehr über die Funktion und ihre Möglichkeiten erfahren wollt, dann nutzt die help-Page --> ?corpus
.
Für unser Beispiel müssen wir quanteda
informieren, dass wir einen data.frame verwenden und welche Spalte des Datensatzes das Textfeld enthält:
sotu_corpus <- corpus(sotu_data, text_field = "text")
sotu_corpus
## Corpus consisting of 23,469 documents and 4 docvars.
## text1 :
## "I embrace with great satisfaction the opportunity which now ..."
##
## text2 :
## "In resuming your consultations for the general good you can ..."
##
## text3 :
## "Among the many interesting objects which will engage your at..."
##
## text4 :
## "A free people ought not only to be armed, but disciplined; t..."
##
## text5 :
## "The proper establishment of the troops which may be deemed i..."
##
## text6 :
## "There was reason to hope that the pacific measures adopted w..."
##
## [ reached max_ndoc ... 23,463 more documents ]
Anstelle einer csv-Datei ist es natürlich auch möglich weitere Dateiformate zu importieren. Zum Beispiel txt-, .pdf- oder .docx-Dateien.
url <- "https://github.com/phimeyer/phimeyer.github.io/raw/master/_teaching/files.zip"
texts <- readtext(url)
texts
## readtext object consisting of 2 documents and 0 docvars.
## # Description: df[,2] [2 × 2]
## doc_id text
## <chr> <chr>
## 1 document.docx "\"This is a \"..."
## 2 pdf_file.pdf "\"This is a \"..."
Wie ihr steht, wurden die Dateien automatisch heruntergeladen, entpackt und in einfachen Text umgewandelt.
Wir können Texte natürlich auch von unserer lokalen Festplatte lesen (das ist in den meisten Fällen auch der "normale" Weg, da wir ja zu erst die Dokumente herunterladen). Hierfür müssen wir lediglich den Pfad angeben:
texts <- readtext("c:/pfad/zu/unseren/dateien")
texts <- readtext("/Benutzer/ich/Dokumente/dateien")
Die meisten Textanalysen basieren auf den Häufigkeiten von Wörtern in Dokumenten. Das wird als Bag-of-Words-Annahme bezeichnet: die Texte sind die Säcke in denen einzelne Wörter stecken. Interpunktion, Wortreihenfolge, Füllwörter und einzelne häufig vorkommende Wörter werden in diesem Vorgehen bereits in den ersten Schritten gelöscht. Obwohl dadurch viele relevante Informationen ignoriert werden, hat sich dieser Ansatz als leistungsfähig und effizient erwiesen.
Das Standardformat für die Darstellung eines Bag-of-Words ist eine Dokument-Term-Matrix (DTM). Hierbei handelt es sich um eine Matrix, in der Zeilen Dokumente sind und die Spalten Wörter repräsentieren. Wir werden zunächst eine kleine Beispiel-DTM aus ein paar Zeilen Text erstellen. Hier verwenden wir die tokens
-Funktion und die dfm
-Funktion von quanteda
. dfm steht für Document-Feature-Matrix:
text <- c(d1 = "Die Leibniz Universität Hannover macht gute Lehre.",
d2 = "Mit mehr Katzen wäre die Lehre noch besser!",
d3 = "Warum denn eine Katze? Ein Hund ist viel zutraulicher.",
d4 = "Wir wollen viele Katzen!")
text_tokens <- tokens(text)
dtm <- dfm(text_tokens)
dtm
## Document-feature matrix of: 4 documents, 28 features (70.54% sparse) and 0 docvars.
## features
## docs die leibniz universität hannover macht gute lehre . mit mehr
## d1 1 1 1 1 1 1 1 1 0 0
## d2 1 0 0 0 0 0 1 0 1 1
## d3 0 0 0 0 0 0 0 1 0 0
## d4 0 0 0 0 0 0 0 0 0 0
## [ reached max_nfeat ... 18 more features ]
Mit diesem Matrixformat können wir jetzt Analysen durchführen, wie z. B. die Analyse verschiedener Frames in Bezug auf Katzen, oder die Berechnung der Ähnlichkeit zwischen dem dritten Satz und den ersten beiden Sätzen.
Die direkte Umwandlung eines Textes in einen DFM ist sehr vereinfachend. Beachtet z. B., dass die Wörter "Katzen" und "Katze" unterschiedliche Spalten erhalten. Außerdem sind "Hannover" und "Lehre" genauso unterschiedlich wie "Katzen" und "Katze". Das ist ein Problem, vor allem da wir eher an der Tatsache interessiert sind, dass es in den Sätzen um Katzen geht und weniger um das spezifische Wort an sich. Wir wollen also den Inhalt analysieren. Weiterhin sind Textanalysen vor allem dann aussagekräftig, wenn weniger interessante Wörter wie "die" oder selte Wörter wie "Universität" ignoriert werden.
Um das zu erreichen, müssen wir zusätzliche Verarbeitungsschritte einfügen. Im nächsten Bespiel erstellen wir eine DFM. Vorher konvertieren wir die Texte in tokens und transformieren dabei alle Buchstaben zu Kleinbuchstaben, ignorieren Füllwörter (tokens_remove(pattern = stopwords("de"))
) und die Interpunktion.
Zusätzlich erstellen wir eine zweite DFM mit unseren State-of-the-Union-Reden (siehe oben) und führen dort zusätzlich ein Stemming durch (stemming fünktioniert mit der deutschen Sprache weniger gut, weshalb unsere Beispielsätze hierfür nicht geeignet sind). Einfach ausgedrückt, werden beim Stemming Wörter auf ihren Wortstamm reduziert. Dadurch werden einige Teile an den Wortenden entfernt und so verschiedene Formen desselben Wortes ignoriert. Zum Beispiel Singular versus Plural ("gun" oder "gun-s") und verschiedene Verbformen ("walk", "walk-ing", "walk-s").
# Beispiel 1: Unsere drei Sätze
text_tokens <- tokens(text, remove_punct=T) %>%
tokens_remove(pattern = stopwords("de")) %>%
tokens_tolower()
sätze_dtm <- dfm(text_tokens)
sätze_dtm
## Document-feature matrix of: 4 documents, 15 features (71.67% sparse) and 0 docvars.
## features
## docs leibniz universität hannover macht gute lehre mehr katzen wäre besser
## d1 1 1 1 1 1 1 0 0 0 0
## d2 0 0 0 0 0 1 1 1 1 1
## d3 0 0 0 0 0 0 0 0 0 0
## d4 0 0 0 0 0 0 0 1 0 0
## [ reached max_nfeat ... 5 more features ]
# Beispiel 2: SOTU
sotu_tokens <- tokens(sotu_corpus, remove_punct=T) %>%
tokens_remove(pattern = stopwords("en")) %>%
tokens_tolower() %>%
tokens_wordstem()
sotu_dfm <- dfm(sotu_tokens)
sotu_dfm
## Document-feature matrix of: 23,469 documents, 20,201 features (99.82% sparse) and 4 docvars.
## features
## docs embrac great satisfact opportun now present congratul favor prospect
## text1 1 1 1 1 1 2 1 1 1
## text2 0 0 0 0 0 1 0 0 0
## text3 0 0 0 0 0 0 0 0 0
## text4 0 0 0 0 0 0 0 0 0
## text5 0 0 0 0 0 0 0 0 0
## text6 0 0 0 0 0 0 0 0 0
## features
## docs public
## text1 1
## text2 0
## text3 0
## text4 0
## text5 0
## text6 0
## [ reached max_ndoc ... 23,463 more documents, reached max_nfeat ... 20,191 more features ]
# Beispiel 2: SOTU, ANDERE SCHREIBWEISE
sotu_dfm <- sotu_corpus %>%
tokens(remove_punct=T) %>%
tokens_remove(pattern = stopwords("en")) %>%
tokens_tolower() %>%
tokens_wordstem() %>%
dfm()
sotu_dfm
## Document-feature matrix of: 23,469 documents, 20,201 features (99.82% sparse) and 4 docvars.
## features
## docs embrac great satisfact opportun now present congratul favor prospect
## text1 1 1 1 1 1 2 1 1 1
## text2 0 0 0 0 0 1 0 0 0
## text3 0 0 0 0 0 0 0 0 0
## text4 0 0 0 0 0 0 0 0 0
## text5 0 0 0 0 0 0 0 0 0
## text6 0 0 0 0 0 0 0 0 0
## features
## docs public
## text1 1
## text2 0
## text3 0
## text4 0
## text5 0
## text6 0
## [ reached max_ndoc ... 23,463 more documents, reached max_nfeat ... 20,191 more features ]
# Beispiel 2: SOTU, SCHREIBWEISE OHNE PIPELINE-OPERATOR
sotu_tokens <- tokens(sotu_corpus, remove_punct=T)
sotu_tokens <- tokens_remove(sotu_tokens, pattern = stopwords("en"))
sotu_tokens <- tokens_tolower(sotu_tokens)
sotu_tokens <- tokens_wordstem(sotu_tokens)
sotu_dfm <- dfm(sotu_tokens)
sotu_dfm
## Document-feature matrix of: 23,469 documents, 20,201 features (99.82% sparse) and 4 docvars.
## features
## docs embrac great satisfact opportun now present congratul favor prospect
## text1 1 1 1 1 1 2 1 1 1
## text2 0 0 0 0 0 1 0 0 0
## text3 0 0 0 0 0 0 0 0 0
## text4 0 0 0 0 0 0 0 0 0
## text5 0 0 0 0 0 0 0 0 0
## text6 0 0 0 0 0 0 0 0 0
## features
## docs public
## text1 1
## text2 0
## text3 0
## text4 0
## text5 0
## text6 0
## [ reached max_ndoc ... 23,463 more documents, reached max_nfeat ... 20,191 more features ]
Das tokens_tolower
-Argument bestimmt, ob Texte in Kleinbuchstaben umgewandelt werden oder nicht. tokens_wordstem
bestimmt, ob Stemming verwendet wird oder nicht.
Ihr müsst die Befehle übrigens nicht als Pipeline (%>%
) schreiben. Alternativ könnt ihr die Funktionen auch einzeln ausführen (und dabei das jeweilige Objekt immer wieder einzeln ansteuern):
sotu_tokens <- tokens(sotu_corpus, remove_punct=T)
sotu_tokens <- tokens_remove(sotu_tokens, pattern = stopwords("en"))
sotu_tokens <- tokens_tolower(sotu_tokens)
sotu_tokens <- tokens_wordstem(sotu_tokens)
sotu_dfm <- dfm(sotu_tokens)
sotu_dfm
## Document-feature matrix of: 23,469 documents, 20,201 features (99.82% sparse) and 4 docvars.
## features
## docs embrac great satisfact opportun now present congratul favor prospect
## text1 1 1 1 1 1 2 1 1 1
## text2 0 0 0 0 0 1 0 0 0
## text3 0 0 0 0 0 0 0 0 0
## text4 0 0 0 0 0 0 0 0 0
## text5 0 0 0 0 0 0 0 0 0
## text6 0 0 0 0 0 0 0 0 0
## features
## docs public
## text1 1
## text2 0
## text3 0
## text4 0
## text5 0
## text6 0
## [ reached max_ndoc ... 23,463 more documents, reached max_nfeat ... 20,191 more features ]
Die tokens_remove
-Funktion ist etwas kniffliger. Wenn ihr einen Blick in die Dokumentation der Funktionen (?
) werft, werdet ihr sehen, dass remove
verwendet werden kann, um bestimmte benutzerinnendefinierte Merkmale bzw. Muster zu ignorieren. In diesem Fall haben wir tatsächlich eine andere Funktion, stopwords()
, verwendet, um eine Liste von Füllwörtern zu entfernen.
stopwords('de')
## [1] "aber" "alle" "allem" "allen" "aller" "alles"
## [7] "als" "also" "am" "an" "ander" "andere"
## [13] "anderem" "anderen" "anderer" "anderes" "anderm" "andern"
## [19] "anderr" "anders" "auch" "auf" "aus" "bei"
## [25] "bin" "bis" "bist" "da" "damit" "dann"
## [31] "der" "den" "des" "dem" "die" "das"
## [37] "daß" "derselbe" "derselben" "denselben" "desselben" "demselben"
## [43] "dieselbe" "dieselben" "dasselbe" "dazu" "dein" "deine"
## [49] "deinem" "deinen" "deiner" "deines" "denn" "derer"
## [55] "dessen" "dich" "dir" "du" "dies" "diese"
## [61] "diesem" "diesen" "dieser" "dieses" "doch" "dort"
## [67] "durch" "ein" "eine" "einem" "einen" "einer"
## [73] "eines" "einig" "einige" "einigem" "einigen" "einiger"
## [79] "einiges" "einmal" "er" "ihn" "ihm" "es"
## [85] "etwas" "euer" "eure" "eurem" "euren" "eurer"
## [91] "eures" "für" "gegen" "gewesen" "hab" "habe"
## [97] "haben" "hat" "hatte" "hatten" "hier" "hin"
## [103] "hinter" "ich" "mich" "mir" "ihr" "ihre"
## [109] "ihrem" "ihren" "ihrer" "ihres" "euch" "im"
## [115] "in" "indem" "ins" "ist" "jede" "jedem"
## [121] "jeden" "jeder" "jedes" "jene" "jenem" "jenen"
## [127] "jener" "jenes" "jetzt" "kann" "kein" "keine"
## [133] "keinem" "keinen" "keiner" "keines" "können" "könnte"
## [139] "machen" "man" "manche" "manchem" "manchen" "mancher"
## [145] "manches" "mein" "meine" "meinem" "meinen" "meiner"
## [151] "meines" "mit" "muss" "musste" "nach" "nicht"
## [157] "nichts" "noch" "nun" "nur" "ob" "oder"
## [163] "ohne" "sehr" "sein" "seine" "seinem" "seinen"
## [169] "seiner" "seines" "selbst" "sich" "sie" "ihnen"
## [175] "sind" "so" "solche" "solchem" "solchen" "solcher"
## [181] "solches" "soll" "sollte" "sondern" "sonst" "über"
## [187] "um" "und" "uns" "unse" "unsem" "unsen"
## [193] "unser" "unses" "unter" "viel" "vom" "von"
## [199] "vor" "während" "war" "waren" "warst" "was"
## [205] "weg" "weil" "weiter" "welche" "welchem" "welchen"
## [211] "welcher" "welches" "wenn" "werde" "werden" "wie"
## [217] "wieder" "will" "wir" "wird" "wirst" "wo"
## [223] "wollen" "wollte" "würde" "würden" "zu" "zum"
## [229] "zur" "zwar" "zwischen"
stopwords('en')
## [1] "i" "me" "my" "myself" "we"
## [6] "our" "ours" "ourselves" "you" "your"
## [11] "yours" "yourself" "yourselves" "he" "him"
## [16] "his" "himself" "she" "her" "hers"
## [21] "herself" "it" "its" "itself" "they"
## [26] "them" "their" "theirs" "themselves" "what"
## [31] "which" "who" "whom" "this" "that"
## [36] "these" "those" "am" "is" "are"
## [41] "was" "were" "be" "been" "being"
## [46] "have" "has" "had" "having" "do"
## [51] "does" "did" "doing" "would" "should"
## [56] "could" "ought" "i'm" "you're" "he's"
## [61] "she's" "it's" "we're" "they're" "i've"
## [66] "you've" "we've" "they've" "i'd" "you'd"
## [71] "he'd" "she'd" "we'd" "they'd" "i'll"
## [76] "you'll" "he'll" "she'll" "we'll" "they'll"
## [81] "isn't" "aren't" "wasn't" "weren't" "hasn't"
## [86] "haven't" "hadn't" "doesn't" "don't" "didn't"
## [91] "won't" "wouldn't" "shan't" "shouldn't" "can't"
## [96] "cannot" "couldn't" "mustn't" "let's" "that's"
## [101] "who's" "what's" "here's" "there's" "when's"
## [106] "where's" "why's" "how's" "a" "an"
## [111] "the" "and" "but" "if" "or"
## [116] "because" "as" "until" "while" "of"
## [121] "at" "by" "for" "with" "about"
## [126] "against" "between" "into" "through" "during"
## [131] "before" "after" "above" "below" "to"
## [136] "from" "up" "down" "in" "out"
## [141] "on" "off" "over" "under" "again"
## [146] "further" "then" "once" "here" "there"
## [151] "when" "where" "why" "how" "all"
## [156] "any" "both" "each" "few" "more"
## [161] "most" "other" "some" "such" "no"
## [166] "nor" "not" "only" "own" "same"
## [171] "so" "than" "too" "very" "will"
Diese Wortliste wird an das remove
-Argument in dfm() übergeben, um so diese Wörter zu ignorieren.
Es gibt weitere Techniken zum pre-processing. Für weitere Details lest den Text von Welbers, van Atteveldt und Benoit, 2017.
Unsere DFM mit den Reden zur Lage der Nation hat 23.469 Dokumente und 20.429 Merkmale (d. h. Terme/Wörter). Abhängig von der Art der Analyse, die wir durchführen wollen, benötigen wir vielleicht nicht so viele Wörter. Glücklicherweise sind viele dieser 20.000-Merkmale nicht so informativ. Das Entfernen der nicht informativen Wörter wird unsere Ergebnisse verbessern.
Wir können die Funktion dfm_trim
verwenden, um bestimmte Wörter zu entfernen. Im Beispiel definieren wir, dass wir alle Wörter entfernen wollen die weniger als 10 Mal vorkommen (d. h. der Summenwert der Spalte im DFM).
dtm <- dfm_trim(sotu_dfm, min_termfreq = 10)
dtm
## Document-feature matrix of: 23,469 documents, 5,299 features (99.36% sparse) and 4 docvars.
## features
## docs embrac great satisfact opportun now present congratul favor prospect
## text1 1 1 1 1 1 2 1 1 1
## text2 0 0 0 0 0 1 0 0 0
## text3 0 0 0 0 0 0 0 0 0
## text4 0 0 0 0 0 0 0 0 0
## text5 0 0 0 0 0 0 0 0 0
## text6 0 0 0 0 0 0 0 0 0
## features
## docs public
## text1 1
## text2 0
## text3 0
## text4 0
## text5 0
## text6 0
## [ reached max_ndoc ... 23,463 more documents, reached max_nfeat ... 5,289 more features ]
Jetzt haben wir also nur noch rund 5.000 Wörter übrig. Schat euch ?dfm_trim
für mehr Infos und Optionen zum filtern an!
Ein paar Analysemethoden habt ihr bereits kennengelernt. Zur Wiederholung werde ich diese und weitere noch einmal kurz vorstellen.
Schauen wir uns die häufigsten Wörter im Korpus mittels einer wordcloud an:
textplot_wordcloud(sotu_dfm, max_words = 50) # top 50 wörter
textplot_wordcloud(sotu_dfm, max_words = 50, color = c('blue','red')) # farben verändern
textstat_frequency(sotu_dfm, n = 10) # häufigkeiten
## feature frequency rank docfreq group
## 1 state 9234 1 5580 all
## 2 govern 8581 2 5553 all
## 3 year 7250 3 4934 all
## 4 nation 6733 4 4847 all
## 5 congress 5689 5 4494 all
## 6 unit 5223 6 3715 all
## 7 can 4731 7 3628 all
## 8 countri 4664 8 3612 all
## 9 peopl 4477 9 3388 all
## 10 upon 4168 10 3004 all
Wir können auch nur Teile von unserem Korpus untersuchen. Zum Beispiel, indem man nur die Obama-Reden betrachtet. Um eine DFM zu subsetten/zu zerteilen, nutzen wir die dtm_subset
-Funktion.
Mit docvars(dtm)
erhalten wir einen data.frame mit den Dokumentvariablen. Mit docvars(dtm)$Präsident
erhalten wir den Zeichenvektor mit den Namen der Präsidenten. Mit docvars(dtm)$Präsident == 'Barack Obama'
suchen wir nach allen Dokumenten, in denen der Präsident Obama war. Um dies zu verdeutlichen, speichern wir den logischen Vektor, der anzeigt, welche Dokumente 'TRUE' sind, als is_obama
. Diesen verwenden wir dann, um diese Zeilen aus dem DFM auszuwählen.
is_obama <- docvars(sotu_dfm)$President == 'Barack Obama'
obama_dtm <- sotu_dfm[is_obama,]
textplot_wordcloud(obama_dtm) # diesmal ohne eine max_words einschränkung
Hier verwenden wir (wieder) einen Vergleich, um den is_obama
-Vektor zu erhalten. Diesen verwenden wir dann in der Funktion textstat_keyness
, um anzugeben, dass wir die Obama-Dokumente (bei denen is_obama
TRUE ist) mit allen anderen Dokumenten (bei denen is_obama
FALSE ist) vergleichen wollen.
is_obama <- docvars(sotu_dfm)$President == 'Barack Obama'
sotus_obama_all <- textstat_keyness(sotu_dfm, is_obama)
head(sotus_obama_all, 20) ## view first 20 results
## feature chi2 p n_target n_reference
## 1 job 1933.3089 0 203 626
## 2 get 1204.4340 0 121 355
## 3 kid 609.4350 0 31 34
## 4 colleg 578.4565 0 57 160
## 5 tonight 496.7241 0 82 400
## 6 know 477.8215 0 108 695
## 7 small-busi 474.9019 0 14 3
## 8 america 439.8990 0 170 1644
## 9 afghan 422.3037 0 17 11
## 10 republican 395.0948 0 41 121
## 11 ceo 358.8193 0 9 0
## 12 cori 358.8193 0 9 0
## 13 american 355.8810 0 245 3385
## 14 laughter 348.2707 0 28 60
## 15 student 336.6490 0 41 144
## 16 clean 336.5539 0 35 103
## 17 innov 334.6970 0 27 58
## 18 trillion 320.6739 0 16 16
## 19 folk 314.4029 0 14 11
## 20 biden 314.1668 0 8 0
Unsere Ergebnisse können wir mit der Funktion textplot_keyness
visualisieren:
textplot_keyness(sotus_obama_all)
Ein Keyword-in-Context-Listing zeigt ein bestimmtes Keyword im Kontext seiner Verwendung. Generell ist das eine gute Methode einen Korpus kennenzulernen. Da ein DFM nur Worthäufigkeiten kennt, benötigen wir für die kwic
-Funktion das Korpusobjekt.
sotu_tokens <- tokens(sotu_corpus)
sotus_kwic <- kwic(sotu_tokens, 'freedom', window = 5)
head(sotu_corpus, 10) ## only view first 10 results
## Corpus consisting of 10 documents and 4 docvars.
## text1 :
## "I embrace with great satisfaction the opportunity which now ..."
##
## text2 :
## "In resuming your consultations for the general good you can ..."
##
## text3 :
## "Among the many interesting objects which will engage your at..."
##
## text4 :
## "A free people ought not only to be armed, but disciplined; t..."
##
## text5 :
## "The proper establishment of the troops which may be deemed i..."
##
## text6 :
## "There was reason to hope that the pacific measures adopted w..."
##
## [ reached max_ndoc ... 4 more documents ]
Die kwic
-Funktion kann auch verwendet werden, um eine Analyse auf einen bestimmten Suchbegriff zu fokussieren. Wir können die Ausgabe der Funktion verwenden, um eine neue DFM zu erstellen, damit nur die Wörter innerhalb des angezeigten Fensters in die Matrix aufgenommen werden. Mit dem folgenden Code wird ein DFM erstellt, die nur die Wörter enthält, die innerhalb von 10 Wörtern vor oder nach dem Begriff terror* (terrorism, terrorist, terror, etc.) vorkommen:
sotus_terror <- kwic(sotu_tokens, 'terror*')
sotus_terror_corp <- corpus(sotus_terror)
sotus_terror_tokens <- tokens(sotus_terror_corp, remove_punct=T) %>%
tokens_remove(pattern = stopwords("en")) %>%
tokens_tolower() %>%
tokens_wordstem()
sotus_terror_dtm <- dfm(sotus_terror_tokens)
textplot_wordcloud(sotus_terror_dtm)
Wie würde diese Kette von Befehlen in einer Pipeline aussehen? Probiert es mal aus!
Wir können mit quanteda
ganz einfach eine Wörterbuchsuche/-analyse durchführen. In Sitzung 6 werden wir noch genauer darauf eingehen. Aus diesem Grund werde ich euch im folgenden Beispiel zeigen, wie man bereits bei der Erstellung einer DFM Wörterbuchbegriffe einbaut. Vergesst bitte nicht, dass Wörterbücher nur dann aussagekräftig sind, wenn sie theoriegeleitet erstellt und auf ihre Validität hin überprüft wurden.
sotu_tokens <- tokens(sotu_corpus, remove_punct=T) %>%
tokens_remove(pattern = stopwords("en")) %>%
tokens_tolower() # zu Präsentationszwecken ohne Stemming
sotu_dfm <- dfm(sotu_tokens)
sotu_dfm
## Document-feature matrix of: 23,469 documents, 31,302 features (99.88% sparse) and 4 docvars.
## features
## docs embrace great satisfaction opportunity now presents congratulating
## text1 1 1 1 1 1 1 1
## text2 0 0 0 0 0 0 0
## text3 0 0 0 0 0 0 0
## text4 0 0 0 0 0 0 0
## text5 0 0 0 0 0 0 0
## text6 0 0 0 0 0 0 0
## features
## docs present favorable prospects
## text1 1 1 1
## text2 1 0 0
## text3 0 0 0
## text4 0 0 0
## text5 0 0 0
## text6 0 0 0
## [ reached max_ndoc ... 23,463 more documents, reached max_nfeat ... 31,292 more features ]
sotus_dict <- dictionary(list(terrorism = c("terror*", "bomb*","violence"),
economy = c("economy", "tax", "job"),
military = c("army","navy","milit*","airforce","soldier*"),
freedom = c("freedom","liberty", "democra*")))
sotus_dict_dtm <- dfm_lookup(sotu_dfm, dict = sotus_dict)
sotus_dict_dtm
## Document-feature matrix of: 23,469 documents, 4 features (94.63% sparse) and 4 docvars.
## features
## docs terrorism economy military freedom
## text1 0 0 0 0
## text2 0 0 0 0
## text3 0 0 0 0
## text4 0 0 1 0
## text5 0 1 1 0
## text6 0 0 0 0
## [ reached max_ndoc ... 23,463 more documents ]
Auf dieser Grundlage können wir jetzt unsere Analysen durchführen.
keyness_obama <- textstat_keyness(sotus_dict_dtm, docvars(sotus_dict_dtm)$President == "Barack Obama")
textplot_keyness(keyness_obama)
Wie ihr unschwer erkennen könnt: dieses Wörterbuch ist nicht sehr aussagekräftig. Aber das ist nicht weiter schlimm, hier geht es ja lediglich um den Zweck der Präsentation.
Wir können die DFM auch in einen Datensatz konvertieren, und erhalten dann die Anzahl der einzelnen Konzepte pro Dokument (die man dann z. B. mit Umfragedaten abgleichen könnte).
sotus_dict_df <- convert(sotus_dict_dtm, to="data.frame")
head(sotus_dict_df)
## doc_id terrorism economy military freedom
## 1 text1 0 0 0 0
## 2 text2 0 0 0 0
## 3 text3 0 0 0 0
## 4 text4 0 0 1 0
## 5 text5 0 1 1 0
## 6 text6 0 0 0 0
Ein gutes Wörterbuch bedeutet, dass alle Dokumente, die mit dem Wörterbuch übereinstimmen, tatsächlich von dem gewünschten Konzept handeln oder dieses enthalten.
Um das zu überprüfen, können wir eine Stichprobe von Dokumenten manuell kodieren und die Ergebnisse mit den Treffern im Wörterbuch vergleichen. Dazu mehr in Sitzung 6!
Wir können aber auch die Keyword-in-Context-Funktion auf ein Wörterbuch anwenden, um so schnell eine Reihe von Übereinstimmungen zu prüfen und zu sehen, ob sie sinnvoll sind:
kwic(sotu_corpus, sotus_dict$terrorism)
## Keyword-in-context with 501 matches.
## [text129, 109] produced symptoms of riot and | violence |
## [text220, 31] our citizens from injustice and | violence |
## [text223, 102] , ambition, avarice and | violence |
## [text294, 73] our pacific policy, the | violence |
## [text408, 7] from these unpleasant views of | violence |
## [text601, 62] by adverse weather of unusual | violence |
## [text607, 36] that they would experience whatever | violence |
## [text631, 3] This increased | violence |
## [text735, 80] , will henceforth lose their | terror |
## [text1328, 108] of honest commerce seized with | violence |
## [text1433, 236] has disarmed revolution of its | terrors |
## [text1446, 31] , it would be doing | violence |
## [text1579, 46] any recurrence of a similar | violence |
## [text1614, 86] which it has spread its | terrors |
## [text1791, 95] , she would but add | violence |
## [text1803, 73] the land with anarchy and | violence |
## [text1810, 55] has wantonly produced, the | violence |
## [text1947, 11] for losses sustained at the | bombardment |
## [text2097, 137] be readily executed without doing | violence |
## [text2136, 265] the commission of acts of | violence |
## [text2159, 73] , and all but produced | violence |
## [text2174, 363] for their security against external | violence |
## [text2288, 152] of adverse circumstances or the | violence |
## [text2394, 119] against invasion from without and | violence |
## [text2448, 106] commission of any acts of | violence |
## [text2820, 36] sure safeguard against force and | violence |
## [text3057, 184] military despotism or of popular | violence |
## [text3087, 62] are a source of constant | terror |
## [text3403, 198] , effectually, to prevent | violence |
## [text3404, 90] but treated with rudeness and | violence |
## [text3406, 141] a time specified he would | bombard |
## [text3406, 301] to protest against the contemplated | bombardment |
## [text3407, 26] for" a resort to | violence |
## [text3505, 63] against either invasion or domestic | violence |
## [text3527, 16] . It was attacked with | violence |
## [text3527, 176] reactionary effect of their own | violence |
## [text3542, 133] be only aggravated by their | violence |
## [text3561, 135] gravity of the acts of | violence |
## [text3561, 165] been seemingly filled with extreme | violence |
## [text3562, 35] . But incidents of actual | violence |
## [text3565, 74] have legislated otherwise without doing | violence |
## [text3567, 114] by improper influences, by | violence |
## [text3609, 13] recurrence of scenes of lawless | violence |
## [text3669, 127] be interrupted by fraud or | violence |
## [text3770, 72] foreign residents, against lawless | violence |
## [text3775, 120] A state of anarchy and | violence |
## [text3781, 49] obstructed or closed by lawless | violence |
## [text3781, 102] their progress and to lawless | violence |
## [text3833, 249] step further and attempt by | violence |
## [text3872, 131] A state of lawlessness and | violence |
## [text4209, 89] servile insurrection or tendency to | violence |
## [text4211, 56] protected against invasion and domestic | violence |
## [text4212, 27] element against whose hostility and | violence |
## [text4388, 29] general insecurity, by the | terror |
## [text4406, 81] occasionally committed acts of barbarous | violence |
## [text4545, 200] opposition to all strife, | violence |
## [text4562, 73] the elective franchise has by | violence |
## [text4672, 138] the sufferers by this lawless | violence |
## [text4690, 39] for opinion's sake, personal | violence |
## [text4795, 138] committed deeds of blood and | violence |
## [text4986, 32] to aid in suppressing domestic | violence |
## [text4987, 29] determination, by acts of | violence |
## [text4987, 102] enough were committed to spread | terror |
## [text4989, 364] protect the State against domestic | violence |
## [text4990, 239] , and to do no | violence |
## [text4990, 259] in ignoring the existence of | violence |
## [text4990, 338] ? They can not. | Violence |
## [text5142, 79] been exempt from acts of | violence |
## [text5195, 137] every instance of lawlessness and | violence |
## [text5243, 108] in the suppression of domestic | violence |
## [text5322, 103] first to some acts of | violence |
## [text5461, 55] large extent in acts of | violence |
## [text5697, 24] United States against" domestic | violence |
## [text6032, 9] Puritan, Amphitrite, and | Terror |
## [text6177, 9] the double-turreted monitors Puritan, | Terror |
## [text6381, 44] upon Chinese laborers and domestic | violence |
## [text6483, 29] Territories, and acts of | violence |
## [text6483, 59] Alaska. Much of this | violence |
## [text6590, 41] been the cause of constant | terror |
## [text6981, 61] to protect him from anticipated | violence |
## [text7009, 66] locations would result in much | violence |
## [text7104, 135] and to persecutions and personal | violence |
## [text7152, 92] secure minority control, while | violence |
## [text7155, 69] the product of fraud or | violence |
## [text7156, 35] be supplanted by intimidation and | violence |
## [text7181, 91] as an outbreak of mob | violence |
## [text7244, 99] happily free from incidents of | violence |
## [text7370, 142] reputation by many crimes of | violence |
## [text7456, 6] Neither Indian outbreaks nor domestic | violence |
## [text7457, 48] protect their citizens from domestic | violence |
## [text7489, 28] , and the coast-defense monitors | Terror |
## [text7508, 165] the mad scramble, the | violence |
## [text7739, 102] strike, accompanied by much | violence |
## [text8210, 39] May 30 Commodore Schley's squadron | bombarded |
## [text8224, 23] perilous undertakings in blockade and | bombardment |
## [text8547, 170] an era of misery and | violence |
## [text8606, 46] upon, which stopped the | bombardment |
## [text8833, 15] that to strike with ignorant | violence |
## [text8837, 129] body politic of crimes of | violence |
## [text8961, 117] and for the suppression of | violence |
## [text9126, 187] a time of disorder and | violence |
## [text9126, 441] corporation. Of course any | violence |
## [text9126, 528] circumstances the right to commit | violence |
## [text9223, 142] . The peace of tyrannous | terror |
## [text9262, 187] to perform deeds of murderous | violence |
## [text9275, 118] ; ( 5 ) the | bombardment |
## [text9361, 125] life or property by mob | violence |
## [text9362, 99] will enjoin any resort to | violence |
## [text9368, 38] epidemic of lynching and mob | violence |
## [text9374, 414] the form of that brutal | violence |
## [text9376, 472] burdens of class hatred, | violence |
## [text9377, 71] agitator on a platform of | violence |
## [text9421, 258] time perform acts of lawless | violence |
## [text9441, 91] for the insurrectionary or international | violence |
## [text9497, 102] agitator who incites to brutal | violence |
## [text9635, 2] The | violence |
## [text9833, 101] that much of the lawless | violence |
## [text9952, 108] United States, freedom from | violence |
## [text10314, 89] other. Action to suppress | violence |
## [text10554, 41] devastation of property, the | bombardment |
## [text10922, 55] -a peace secure against the | violence |
## [text10929, 300] in saving from the German | terror |
## [text10970, 137] unrest which manifest themselves in | violence |
## [text10975, 98] , with its blood and | terror |
## [text11289, 15] dealing with other countries by | terror |
## [text11423, 82] should be protected from all | violence |
## [text11423, 99] labor. Those who do | violence |
## [text11500, 103] crime of lynching. Although | violence |
## [text11601, 189] years these acts of unlawful | violence |
## [text11997, 32] system and would cushion the | violence |
## [text12376, 123] prevailing mental attitude with the | terror |
## [text12622, 54] with the crash of a | bomb |
## [text12658, 45] This includes 45,000 combat planes- | bombers |
## [text12658, 48] combat planes- bombers, dive | bombers |
## [text12680, 22] " suicide" squadrons of | bombing |
## [text12680, 34] only in the hope of | terrorizing |
## [text12689, 27] half long years have withstood | bombs |
## [text12703, 72] own home islands, and | bomb |
## [text12710, 28] day out our forces are | bombing |
## [text12716, 30] as they did when they | bombed |
## [text12748, 49] had to continue work through | bombings |
## [text12784, 34] trickery, deceit, or | violence |
## [text12933, 44] having been crushed by the | terror |
## [text12941, 78] frontiers of Germany and dropping | bombs |
## [text12949, 13] of the Nazi-Fascist reign of | terror |
## [text12978, 59] same time, despite ferocious | bombardment |
## [text13001, 17] from which our Super fortresses | bomb |
## [text13037, 29] ammunition, cotton duck, | bombs |
## [text13038, 4] Navy production of | bombardment |
## [text13095, 44] having been crushed by the | terror |
## [text13133, 13] of the Nazi-Fascist reign of | terror |
## [text13152, 13] of the Nazi-Fascist reign of | terror |
## [text13155, 33] the end of the Nazi-Fascist | terror |
## [text13239, 34] and industrial facilities had been | bombed |
## [text14196, 60] the enemy's supply of atomic | bombs |
## [text14288, 75] the shadow of the atomic | bomb |
## [text14343, 41] unlimited; a world where | terror |
## [text14381, 15] imperialism have reached heights of | violence |
## [text14386, 15] sole possessor of the atomic | bomb |
## [text14631, 37] services in many places where | violence |
## [text14658, 35] this government by force or | violence |
## [text14911, 15] in emphasis from reliance on | violence |
## [text14911, 20] violence and the threat of | violence |
## [text15116, 34] world but an Age of | Terror |
## [text15130, 22] our striking power, our | bombers |
## [text15133, 75] power of our increasingly efficient | bombers |
## [text15273, 25] War II.We are buying certain | bombers |
## [text15420, 15] , unmatched today in manned | bombers |
## [text15519, 13] of China during the all-out | bombardment |
## [text15538, 37] our B-58 Medium Range Jet | Bomber |
## [text15538, 44] our B-52 Long Range Jet | Bomber |
## [text15544, 98] in panic. The" | bomber |
## [text15723, 31] of science instead of its | terrors |
## [text15799, 65] percent the number of manned | bombers |
## [text15822, 26] is increasing his tactics of | terror |
## [text16034, 8] has often brought pain and | violence |
## [text16341, 37] the South. Attack and | terror |
## [text16347, 34] Vietnamese allies have dropped no | bombs |
## [text16350, 45] it in free elections without | violence |
## [text16350, 48] elections without violence, without | terror |
## [text16361, 42] they can amidst the uncertain | terrors |
## [text16461, 41] control the crime and the | violence |
## [text16492, 52] reject the fool's gold of | violence |
## [text16532, 28] the use of force and | terror |
## [text16537, 30] And this means reducing the | terrorism |
## [text16542, 80] were spending on bullets and | bombs |
## [text16569, 72] under the constant threat of | violence |
## [text16572, 26] must be answered before the | bombing |
## [text16573, 23] which said: - The | bombing |
## [text16604, 39] times the national average - | Violence |
## [text16649, 44] frustrations into achievements. But | violence |
## [text16650, 11] by attacking the causes of | violence |
## [text16652, 9] disorder and those who preach | violence |
## [text16857, 50] deafened by noise, and | terrorized |
## [text17027, 61] frustration that led finally to | violence |
## [text17095, 8] recover from the turmoil and | violence |
## [text17840, 21] the tide of crime and | violence |
## [text17844, 44] , the riots, urban | terrorism |
## [text17847, 37] hijacking, kidnapping, or | bombing |
## [text17998, 108] to protect ourselves from the | violence |
## [text18351, 66] launching submarine; the B-1 | bomber |
## [text18465, 33] , to resolve conflicts without | violence |
## [text19290, 7] extend the effectiveness of our | bomber |
## [text19678, 18] of legislation to decrease domestic | violence |
## [text20109, 17] captive, innocent victims of | terrorism |
## [text20109, 49] two acts one of international | terrorism |
## [text20114, 75] in condemning this act of | violence |
## [text20146, 13] have no outlet except through | violence |
## [text20423, 7] created an office on domestic | violence |
## [text20423, 19] agencies that now have domestic | violence |
## [text20428, 33] help the victims of domestic | violence |
## [text20788, 43] region of both repression and | terrorism |
## [text20991, 45] Toward those who would export | terrorism |
## [text20997, 91] peaceful change or disorder and | violence |
## [text21139, 90] like sexual abuse and family | violence |
## [text21150, 102] peace in Lebanon by state-sponsored | terrorism |
## [text21150, 132] legislative proposals to help combat | terrorism |
## [text21238, 35] and provides bases for Communist | terrorists |
## [text21252, 31] Seeing another girl freeze in | terror |
## [text21259, 98] increase in espionage and state | terror |
## [text21265, 160] from the prison of nuclear | terror |
## [text21288, 247] nor will we yield to | terrorist |
## [text21309, 81] of both totalitarianism and nuclear | terror |
## [text21349, 22] a future free of nuclear | terror |
## [text21457, 52] appalled at the recent mail | bombings |
## [text21484, 54] through tragic and despicable environmental | terrorism |
## [text21490, 44] been deeply concerned by the | violence |
## [text21490, 129] and a move away from | violence |
## [text21547, 14] 35 years, our strategic | bombers |
## [text21549, 88] substantial portion of our strategic | bombers |
## [text21564, 92] of the B - 2 | bombers |
## [text21651, 16] against the violent crime which | terrorizes |
## [text21685, 47] . There's still too much | violence |
## [text21726, 63] work together to stop the | violence |
## [text21733, 46] cripple the world's cities with | terror |
## [text21734, 8] , we secured indictments against | terrorists |
## [text21748, 10] take serious steps to reduce | violence |
## [text21752, 101] this campaign to reduce gun | violence |
## [text21754, 8] Americans, the problem of | violence |
## [text21754, 69] penalties for those who choose | violence |
## [text21754, 134] which has been filled by | violence |
## [text21772, 20] in the United States of | terrorist |
## [text21772, 64] a global effort to combat | terrorism |
## [text21772, 75] future to be marred by | terror |
## [text21839, 54] destroying the missiles and the | bombers |
## [text21841, 15] strengthen our hand in combating | terrorists |
## [text21841, 29] . As the cowards who | bombed |
## [text21841, 42] this country will hunt down | terrorists |
## [text21842, 7] this week, another horrendous | terrorist |
## [text21842, 67] go forward. But the | terrorists |
## [text21851, 55] incessant, repetitive, mindless | violence |
## [text21873, 23] children now tell their parents | violence |
## [text21901, 26] the deadly scourge of domestic | violence |
## [text21924, 7] , to reduce crime and | violence |
## [text21931, 17] . Think of them: | terrorism |
## [text21939, 34] can intensify the fight against | terrorists |
## [text21939, 54] proposed after the Oklahoma City | bombing |
## [text22036, 92] drug traffickers and to stop | terrorists |
## [text22037, 40] will help us to fight | terrorism |
## [text22040, 58] conflicts that fuel fanaticism and | terror |
## [text22042, 20] in ugly words and awful | violence |
## [text22042, 26] , in burned churches and | bombed |
## [text22052, 74] axis of new threats from | terrorists |
## [text22105, 19] and the outlaw states, | terrorists |
## [text22107, 36] a weapon of war and | terror |
## [text22135, 16] to destroy its weapons of | terror |
## [text22152, 33] woman called" the stark | terror |
## [text22215, 24] dangers from outlaw nations and | terrorism |
## [text22215, 50] Usama bin Ladin's network of | terror |
## [text22215, 53] network of terror. The | bombing |
## [text22216, 6] We must work to keep | terrorists |
## [text22219, 51] Force, flew a B-1B | bomber |
## [text22224, 26] but still held back by | violence |
## [text22228, 21] we pursue peace, fight | terrorism |
## [text22233, 13] lost their lives to school | violence |
## [text22241, 48] closed either. Discrimination or | violence |
## [text22259, 16] march of technology from giving | terrorists |
## [text22259, 54] can also make weapons of | terror |
## [text22319, 13] worries about the impact of | violence |
## [text22339, 6] When Slobodan Milosevic unleashed his | terror |
## [text22341, 45] , the narcotraffickers and the | terrorists |
## [text22366, 97] fight teen pregnancy, prevent | violence |
## [text22371, 34] ask you to reauthorize the | Violence |
## [text22410, 59] homelessness and addiction and domestic | violence |
## [text22436, 29] certain. They range from | terrorists |
## [text22436, 33] from terrorists who threaten with | bombs |
## [text22449, 8] Afghanistan are now allies against | terror |
## [text22451, 16] coalition partners, hundreds of | terrorists |
## [text22451, 28] tens of thousands of trained | terrorists |
## [text22451, 65] so long as nations harbor | terrorists |
## [text22452, 10] to prevent regimes that sponsor | terror |
## [text22453, 4] Our war on | terror |
## [text22453, 50] we stop now, leaving | terror |
## [text22453, 54] leaving terror camps intact and | terrorist |
## [text22459, 49] the world of thousands of | terrorists |
## [text22459, 53] of terrorists, destroyed Afghanistan's | terrorist |
## [text22460, 12] our Embassy in Kabul. | Terrorists |
## [text22460, 25] at Guantanamo Bay. And | terrorist |
## [text22462, 64] are winning the war on | terror |
## [text22465, 18] there, our war against | terror |
## [text22466, 26] , we will shut down | terrorist |
## [text22466, 30] down terrorist camps, disrupt | terrorist |
## [text22466, 35] terrorist plans, and bring | terrorists |
## [text22466, 46] , we must prevent the | terrorists |
## [text22467, 6] Our military has put the | terror |
## [text22467, 27] a dozen countries. A | terrorist |
## [text22468, 34] armed forces to go after | terrorist |
## [text22468, 56] the Bosnian Government, seized | terrorists |
## [text22468, 61] terrorists who were plotting to | bomb |
## [text22468, 83] weapons and the establishment of | terrorist |
## [text22469, 14] our call and eliminate the | terrorist |
## [text22469, 36] is now cracking down on | terror |
## [text22469, 58] timid in the face of | terror |
## [text22471, 8] pursues these weapons and exports | terror |
## [text22472, 12] toward America and to support | terror |
## [text22473, 6] States like these and their | terrorist |
## [text22473, 45] could provide these arms to | terrorists |
## [text22474, 10] with our coalition to deny | terrorists |
## [text22494, 68] applied to our war against | terrorism |
## [text22504, 46] live free from poverty and | violence |
## [text22506, 48] world beyond the war on | terror |
## [text22507, 80] demonstrate that the forces of | terror |
## [text22515, 21] countries have uncovered and stopped | terrorist |
## [text22516, 4] Our war against | terror |
## [text22516, 68] not permit the triumph of | violence |
## [text22517, 10] danger in the war on | terror |
## [text22517, 45] such weapons for blackmail, | terror |
## [text22517, 60] or sell those weapons to | terrorist |
## [text22527, 30] job. After recession, | terrorist |
## [text22555, 33] the manmade evil of international | terrorism |
## [text22556, 16] news about the war on | terror |
## [text22557, 46] Persian Gulf who planned the | bombings |
## [text22557, 102] , more than 3,000 suspected | terrorists |
## [text22558, 4] We have the | terrorists |
## [text22558, 21] One by one, the | terrorists |
## [text22561, 23] to track and disrupt the | terrorists |
## [text22561, 68] of Defense to develop a | Terrorist |
## [text22563, 24] gain the ultimate weapons of | terror |
## [text22566, 29] mass destruction, and supports | terror |
## [text22568, 41] aggression, with ties to | terrorism |
## [text22576, 41] of enriching uranium for a | bomb |
## [text22578, 68] Saddam Hussein aids and protects | terrorists |
## [text22578, 90] of his hidden weapons to | terrorists |
## [text22579, 27] lethal viruses, and shadowy | terrorist |
## [text22580, 17] imminent. Since when have | terrorists |
## [text22583, 90] , and its links to | terrorist |
## [text22593, 7] part of the offensive against | terror |
## [text22593, 19] regimes that harbor and support | terrorists |
## [text22594, 20] in their power to spread | violence |
## [text22596, 62] , confront the allies of | terror |
## [text22602, 24] world in the war on | terror |
## [text22603, 12] and intelligence officers are tracking | terrorist |
## [text22605, 34] to the dangerous illusion that | terrorists |
## [text22607, 80] , and Baghdad. The | terrorists |
## [text22608, 53] better share information to track | terrorists |
## [text22608, 97] even more important for hunting | terrorists |
## [text22609, 15] expire next year. The | terrorist |
## [text22610, 8] on the offensive against the | terrorists |
## [text22610, 51] brought the capture of the | terrorist |
## [text22610, 126] , we will bring these | terrorists |
## [text22611, 109] free and proud and fighting | terror |
## [text22613, 40] killers, joined by foreign | terrorists |
## [text22620, 45] and win the war on | terror |
## [text22621, 19] at all. They view | terrorism |
## [text22621, 71] was not settled. The | terrorists |
## [text22621, 110] with legal papers. The | terrorists |
## [text22627, 25] have come through recession and | terrorist |
## [text22665, 101] that respects women and rejects | violence |
## [text22666, 12] commitment of the war on | terror |
## [text22667, 112] captured or detained Al Qaida | terrorists |
## [text22678, 97] border to drug dealers and | terrorists |
## [text22696, 41] focused the FBI on preventing | terrorism |
## [text22696, 52] intelligence agencies, broken up | terror |
## [text22697, 30] continuing. The Al Qaida | terror |
## [text22697, 58] governments that sponsor and harbor | terrorists |
## [text22697, 91] is still the target of | terrorists |
## [text22698, 47] be the recruiting grounds for | terror |
## [text22698, 51] for terror, and that | terror |
## [text22698, 74] the rise of tyranny and | terror |
## [text22698, 97] and that is why the | terrorist |
## [text22701, 23] to break old patterns of | violence |
## [text22701, 73] help the Palestinian people end | terror |
## [text22702, 29] fight the common threat of | terror |
## [text22703, 18] regimes that continue to harbor | terrorists |
## [text22703, 39] Lebanon to be used by | terrorists |
## [text22703, 75] to end all support for | terror |
## [text22703, 93] world's primary state sponsor of | terror |
## [text22703, 141] and end its support for | terror |
## [text22704, 35] front in the war on | terror |
## [text22704, 41] , which is why the | terrorists |
## [text22704, 58] women in uniform are fighting | terrorists |
## [text22704, 89] ally in the war on | terror |
## [text22708, 2] The | terrorists |
## [text22708, 19] attack it. Yet the | terrorists |
## [text22708, 36] is seeing that the car | bombers |
## [text22711, 83] because that would embolden the | terrorists |
## [text22713, 55] be on the frontline against | terror |
## [text22717, 76] our country. Dictatorships shelter | terrorists |
## [text22717, 112] and join the fight against | terror |
## [text22718, 60] And third, we're striking | terrorist |
## [text22719, 54] comrade killed by a roadside | bomb |
## [text22721, 107] and despair are sources of | terrorism |
## [text22722, 33] been kept informed. The | terrorist |
## [text22722, 39] surveillance program has helped prevent | terrorist |
## [text22731, 46] faith into an ideology of | terror |
## [text22731, 50] of terror and death. | Terrorists |
## [text22732, 35] challenge us directly, the | terrorists |
## [text22732, 65] a bound captive, the | terrorists |
## [text22734, 60] remain on the offensive against | terror |
## [text22735, 19] a National Assembly are fighting | terror |
## [text22736, 66] been relentless in shutting off | terrorist |
## [text22741, 4] Our offensive against | terror |
## [text22741, 19] only way to defeat the | terrorists |
## [text22742, 53] Israel, disarm, reject | terrorism |
## [text22743, 32] regime in that country sponsors | terrorists |
## [text22746, 10] remain on the offensive against | terrorism |
## [text22747, 83] - I have authorized a | terrorist |
## [text22748, 10] - from the disruption of | terror |
## [text22791, 5] Every success against the | terrorists |
## [text22793, 4] The war on | terror |
## [text22793, 71] council on the war on | terror |
## [text22810, 79] drug smugglers and criminals and | terrorists |
## [text22811, 47] to hostile regimes and to | terrorists |
## [text22817, 42] felt the sorrow that the | terrorists |
## [text22817, 84] a glimpse of what the | terrorists |
## [text22818, 48] to win the war on | terror |
## [text22819, 45] long over. For the | terrorists |
## [text22820, 70] broke up a Southeast Asian | terror |
## [text22820, 143] their lives to finding the | terrorists |
## [text22821, 6] In the mind of the | terrorists |
## [text22821, 89] , instruct with bullets and | bombs |
## [text22822, 36] country. By killing and | terrorizing |
## [text22822, 79] this warning from the late | terrorist |
## [text22823, 74] which is funding and arming | terrorists |
## [text22826, 58] kill us. What every | terrorist |
## [text22827, 72] people of Afghanistan defied the | terrorists |
## [text22828, 43] the Cedar Revolution. Hizballah | terrorists |
## [text22830, 66] ally in the war on | terror |
## [text22831, 17] Government must stop the sectarian | violence |
## [text22831, 97] city by chasing down the | terrorists |
## [text22831, 115] Province, where Al Qaida | terrorists |
## [text22831, 144] with orders to find the | terrorists |
## [text22833, 52] regime. A contagion of | violence |
## [text22853, 28] of America that is confronting | violence |
## [text22853, 30] that is confronting violence and | terror |
## [text22856, 104] where they will fight the | terrorists |
## [text22884, 71] and Madrid ripped apart by | bombs |
## [text22884, 109] of liberty is opposed by | terrorists |
## [text22885, 13] taken the fight to these | terrorists |
## [text22886, 15] the 21st century. The | terrorists |
## [text22886, 33] Yet in this war on | terror |
## [text22886, 64] their own destinies will reject | terror |
## [text22886, 77] And that is why the | terrorists |
## [text22887, 5] In Iraq, the | terrorists |
## [text22887, 94] strongholds, and deny the | terrorists |
## [text22888, 47] neighborhoods, clearing out the | terrorists |
## [text22889, 64] citizens who are fighting the | terrorists |
## [text22890, 42] many said that containing the | violence |
## [text22890, 51] A year later, high-profile | terrorist |
## [text22893, 50] working, but among the | terrorists |
## [text22897, 57] and a marked increase in | violence |
## [text22898, 117] and the pain of sectarian | violence |
## [text22899, 72] , a partner in fighting | terror |
## [text22900, 17] strengthen Iran, and give | terrorists |
## [text22901, 31] President who recognizes that confronting | terror |
## [text22902, 57] in Iraq, supporting Hizballah | terrorists |
## [text22903, 87] , cease your support for | terror |
## [text22904, 115] and night to stop the | terrorists |
## [text22905, 37] is the ability to monitor | terrorist |
## [text22905, 50] need to know who the | terrorists |
## [text22905, 103] , our ability to track | terrorist |
## [text22977, 33] because I will not allow | terrorists |
## [text22980, 59] and certain justice for captured | terrorists |
## [text22982, 36] the 21st century - from | terrorism |
## [text22997, 133] fall into the hands of | terrorists |
## [text23065, 22] renewed our focus on the | terrorists |
## [text23165, 28] combat patrols have ended, | violence |
## [text23166, 47] try to inspire acts of | violence |
## [text23170, 61] fall into the hands of | terrorists |
## [text23204, 77] , who worked on a | bomber |
## [text23281, 55] . We will stand against | violence |
## [text23299, 39] from the fear of domestic | violence |
## [text23299, 47] the Senate passed the" | Violence |
## [text23302, 99] detention, and prosecution of | terrorists |
## [text23361, 99] who take the fight to | terrorists |
## [text23361, 126] take direct action against those | terrorists |
## [text23377, 49] debated how to reduce gun | violence |
## [text23379, 29] been torn apart by gun | violence |
## [text23379, 90] communities ripped open by gun | violence |
## [text23380, 10] prevent every senseless act of | violence |
## [text23393, 84] not: our resolve that | terrorists |
## [text23394, 34] Bay. Because we counter | terrorism |
## [text23450, 10] for the lives that gun | violence |
## [text23452, 77] that rejects the agenda of | terrorist |
## [text23453, 102] fought, not those that | terrorists |
## [text23454, 9] we actively and aggressively pursue | terrorist |
## [text23457, 27] future free of dictatorship, | terror |
## [text23458, 76] Iran is not building a | bomb |
## [text23459, 20] eyed about Iran's support for | terrorist |
## [text23459, 81] is not building a nuclear | bomb |
## [text23467, 19] killed by a massive roadside | bomb |
##
## .
## at sea, we have
## have been so long unrestrained
## and injustice of others may
## and wrong, I congratulate
## and continuance and such the
## might be committed on the
## is best explained by the
## . Fortifications in those quarters
## , and even plundered under
## . Not withstanding the strong
## to my feelings were I
## and protect our citizens in
## . Not with standing this
## to injustice, and could
## .
## of which it has been
## of Antwerp have been presented
## to public opinion.
## on the members thereof,
## and bloodshed. The imprudent
## and internal dissensions, and
## of unmerited denunciation. The
## from within. The rest
## or the manifestation of a
## . It is a subject
## . The law is the
## and annoyance to the inhabitants
## and bloodshed. The American
## those who sought to recover
## the town. By this
## . No steps of any
## and destruction of property and
## , like all other local
## on the false or delusive
## on the subject, awakened
## and unconstitutional action. A
## have been magnified partly by
## , when the whole amount
## or of organized obstruction of
## to another great principle of
## , or by fraud.
## in this quarter so imminent
## ."
## . Heretofore a seizure of
## prevails throughout that distant frontier
## , and in protecting the
## .
## to carry these doctrines into
## prevails on that distant frontier
## or cruelty has marked the
## . The constitutional obligation of
## it is to be protected
## of confiscation, and the
## upon emigrants and our frontier
## , and war, and
## and intimidation been denied to
## ,
## or threats toward persons entertaining
## ; but the prosecution and
## in that State. This
## and intimidation, to deprive
## among those whose political action
## . As Congress is now
## either to individuals or to
## and bloodshed in resistance to
## has been rampant in some
## by citizens of one Republic
## toward them, is required
## were able, by the
## on the part of some
## or intimidation. It has
## " this Government would be
## have been launched on the
## , and Amphitrite, contracted
## there. In both cases
## against those people, beyond
## can be traced to race
## to the settlers of Arizona
## at the hands of Terry
## and bloodshed, but happily
## of the most extreme character
## completes the shortcomings of fraud
## . The magistrate is then
## . If the proposed law
## against foreigners has assumed the
## .
## , a large per cent
## have called the Army into
## , lead to the suggestion
## , Puritan, Amphitrite,
## , and the fraudulent occupation
## and dangerous disturbance, with
## the forts guarding the mouth
## , and more than 50,000
## worse than any which has
## and lessened the rifle fire
## at the interests of one
## . Great corporations exist only
## against them.
## all other questions sink into
## , brutality, or corruption
## upon these, whether capitalists
## , the peace of craven
## . Such conduct is just
## of ports, cities,
## there should be no impairment
## or intimidation, especially by
## that springs up, now
## which invites lynch law.
## , and demagogy is such
## and hypocrisy. Whenever such
## against some class of foreigners
## which has hitherto been so
## . Everything that can be
## of the crusade for this
## and cruelty exhibited in lynchings
## and due process of law
## and restore tranquillity throughout the
## of defenseless cities, the
## of irresponsible monarchs and ambitious
## and whom we must not
## throughout the world bid us
## , is a painful object
## and force, and is
## and supported in the peaceable
## to them should be punished
## of this kind has very
## had been diminishing. In
## of liquidation in industry and
## and despair of five years
## .
## , dive bombers, pursuit
## , pursuit planes. The
## planes, they will do
## our people and disrupting our
## and starvation and have whipped
## them constantly from the air
## the enemy and meeting him
## Warsaw, and Rotterdam,
## and blackouts. And they
## , can stop them now
## of Nazi domination, the
## on the war industries of
## in Europe.
## from the air, built
## Tokyo itself and will continue
## , tires, tanks,
## ammunition is hampered by manpower
## of Nazi domination, the
## in Europe.
## in Europe.
## in Europe, and also
## into ruins. In addition
## .
## .
## and slavery are deliberately administered
## unmatched elsewhere - and the
## . That was a great
## threatened. It is the
## be treated as having,
## and the threat of violence
## to reliance on division,
## .
## would immediately be on their
## . One encouraging fact evidencing
## that cost their weight in
## , has taken on new
## of Quemoy restrained the Communist
## or our B-52 Long Range
## can carry more explosive power
## gap" of several years
## ." Specifically, I
## standing ready on a 15
## - where our own efforts
## . It is not yet
## increased, spurred and encouraged
## in North Vietnam.
## , without terror, and
## , and without fear.
## of war.
## that tear the fabric of
## .
## to settle political questions.
## and the armed attacks which
## and spend it on schools
## . - A President,
## is stopped.
## would stop immediately if talks
## has shown its face in
## will never bring progress.
## and only where there is
## must know that local authorities
## by crime?
## and to the worst civil
## of recent years, as
## which rose in the 1960s
## and burnings of the 1960s
## .
## of nature in the form
## , with its superior capability
## , and to proclaim in
## force with the addition of
## and provide shelters for battered
## and anarchy. Also at
## and one of military aggression-present
## , which is shocking and
## . But when peoples and
## within HHS to coordinate the
## relief programs, and to
## . Congress should pass a
## . We have respected ideological
## and subversion in the Caribbean
## . That's why we've laid
## .
## . We have seen this
## . And I will be
## attacking neighboring states. Support
## before an out-of-control school bus
## remains great. This is
## . America met one historic
## blackmail.
## .
## . Reduction of strategic offensive
## across this country. Every
## , he is dead wrong
## in the Baltics, and
## .
## stand down. No longer
## to primarily conventional use.
## . We will cancel the
## our people and which tears
## and not enough hope in
## that explodes our emergency rooms
## . As the world's greatest
## and sanctions against those who
## and prevent crime, beginning
## . I say to you
## is an American problem.
## , let us also remember
## and drugs and gangs.
## organizations that threaten to disrupt
## . We cannot permit the
## and fear and paralysis.
## that carry 9,000 nuclear warheads
## , whether they strike at
## the World Trade Center found
## and bring them to justice
## act in Israel killed 19
## represent the past, not
## and irresponsible conduct that permeates
## must never return; in
## in our country. And
## we have to reduce the
## , the spread of weapons
## and organized criminals at home
## , now. We can
## before they act and hold
## . We have no more
## . We are the world's
## , in burned churches and
## buildings. We must fight
## , international criminals, and
## , and organized criminals seeking
## . The Biological Weapons Convention
## and the missiles to deliver
## of penniless, helpless old
## . We will defend our
## . The bombing of our
## of our Embassies in Kenya
## from disrupting computer networks.
## over Iraq as we attacked
## and disease. We must
## , increase our strength,
## , I ask you to
## because of race or religion
## and potentially hostile nations the
## easier to conceal and easier
## in the media on their
## on Kosovo, Captain John
## and the organized criminals who
## among young people, promote
## Against Women Act.
## , to provide a hot
## who threaten with bombs to
## to tyrants in rogue nations
## . We'll be partners in
## have been arrested. Yet
## are still at large.
## , freedom is at risk
## from threatening America or our
## is well begun, but
## camps intact and terrorist states
## states unchecked, our sense
## , destroyed Afghanistan's terrorist training
## training camps, saved a
## who once occupied Afghanistan now
## leaders who urged followers to
## . The men and women
## is only beginning. Most
## camps, disrupt terrorist plans
## plans, and bring terrorists
## to justice. And second
## and regimes who seek chemical
## training camps of Afghanistan out
## underworld, including groups like
## cells that have executed an
## who were plotting to bomb
## our Embassy. Our Navy
## camps in Somalia.
## parasites who threaten their countries
## , and I admire the
## . And make no mistake
## , while an unelected few
## . The Iraqi regime has
## allies constitute an axis of
## , giving them the means
## and their state sponsors the
## .
## . No people on Earth
## .
## cannot stop the momentum of
## conspiracies targeting the Embassy in
## is a contest of will
## in the affairs of men
## , the gravest danger facing
## , and mass murder.
## allies, who would use
## attacks, corporate scandals,
## .
## . There's never a day
## of our embassies in east
## have been arrested in many
## on the run. We're
## are learning the meaning of
## . The FBI is improving
## Threat Integration Center, to
## . Once again, this
## . We also see Iranian
## , with great potential wealth
## . The British Government has
## , including members of Al
## or help them develop their
## networks are not easily contained
## and tyrants announced their intentions
## groups.
## , we are also confronting
## and could supply them with
## and fear. They are
## , and expect a higher
## . By bringing hope to
## threats; analysts are examining
## are not plotting and outlaw
## continue to plot against America
## , to disrupt their cells
## .
## threat will not expire on
## who started this war.
## Hambali, who was a
## to justice.
## , and America is honored
## , are a serious,
## .
## more as a crime,
## were still training and plotting
## and their supporters declared war
## attack and corporate scandals and
## . Taking on gang life
## , and I thank the
## . In the next 4
## .
## , begun to reform our
## cells across the country,
## network that attacked our country
## , but their number has
## who want to kill many
## , and that terror will
## will stalk America and other
## and replace hatred with hope
## Zarqawi recently declared war on
## and failure. Tomorrow morning
## and build the institutions of
## , while we encourage a
## and pursue weapons of mass
## who seek to destroy every
## and open the door to
## , pursuing nuclear weapons while
## . And to the Iranian
## , which is why the
## have chosen to make a
## in Iraq so we do
## , inspire democratic reformers from
## and insurgents are violently opposed
## ' most powerful myth is
## and assassins are not only
## and make them believe they
## . She wrote,"
## , and feed resentment and
## . Every step toward freedom
## targets while we train Iraqi
## . And those who know
## and organized crime and human
## surveillance program has helped prevent
## attacks. It remains essential
## and death. Terrorists like
## like bin Laden are serious
## have chosen the weapon of
## hope these horrors will break
## networks. We have killed
## while building the institutions of
## infiltration, clearing out insurgent
## involves more than military action
## is to defeat their dark
## , and work for lasting
## in the Palestinian territories and
## here at home. The
## surveillance program to aggressively pursue
## networks, to victory in
## is a reminder of the
## we fight today is a
## , made up of leaders
## . We'll enforce our immigration
## who could cause huge disruptions
## can cause. We've had
## intend for us, unless
## , we must take the
## , life since 9/
## cell grooming operatives for attacks
## and stopping them.
## , this war began well
## , and promise paradise for
## Americans, they want to
## Zarqawi:" We will
## like Hizballah, a group
## fears most is human freedom
## and elected a democratic legislature
## , with support from Syria
## .
## in its capital. But
## , insurgents, and the
## have gathered and local forces
## and clear them out.
## could spill out across the
## and terror and fighting drug
## and fighting drug traffickers.
## and train the Afghan Army
## . On a clear September
## and extremists, evil men
## and extremists. We will
## oppose every principle of humanity
## , there is one thing
## and refuse to live in
## are fighting to deny this
## and extremists are fighting to
## sanctuary anywhere in the country
## , and staying behind to
## . The Government in Baghdad
## was impossible. A year
## attacks are down, civilian
## there is no doubt.
## ." Members of Congress
## , reconciliation is taking place
## , and a source of
## a base from which to
## is essential to achieving a
## in Lebanon, and backing
## abroad. But above all
## from carrying out their plans
## communications. To protect America
## are talking to, what
## threats would be weakened and
## to plot against the American
## . Because living our values
## to nuclear proliferation, from
## .
## who threaten our Nation.
## is down, and a
## within our borders, we
## .
## assembly line, was part
## and intimidation. We will
## . Today the Senate passed
## Against Women's Act" that
## remains consistent with our laws
## , as we have in
## who pose the gravest threat
## . But this time is
## . They deserve a vote
## , they deserve a simple
## in this country. In
## do not launch attacks against
## not just through intelligence and
## steals from us each day
## networks. Here at home
## prefer from us: large-scale
## networks through more targeted efforts
## , and fear. As
## . And with our allies
## organizations like Hizballah, which
## . If John F.
## in Afghanistan. His comrades