Unstructured text
Readings due before class on Monday, March 18, 2019
Required
- Chapter 1: Tidytext format and Chapter 3: Analyzing word and document frequency: tf-idf in Julia Silge and David Robinson, Tidy Text Mining in RJulia Silge and David Robinson, Text Mining with R (Sebastopol, California: O’Reilly Media, 2017), https://www.tidytextmining.com/.
Recommended
Chapter 6: Topic modeling in Julia Silge and David Robinson, Tidy Text Mining in RIbid.
Ken Benoit, Kohei Watanabe, and Stefan Müller, “quanteda”
quanteda
is my favorite R-based text package. It is phenonemal and I highly recommmend it if you want to learn more about text analysis, including their related packagesreadtext
andspacyR
. I would also highly recommend Ken and Pablo Barbera’s 2018 LSE course on Quantitative Text Analysis.
Julie Silge, “Training, Evaluating, and Interpreting Topic Models”Julie’s blog is on the
stm
package, which is my favorite topic modeling package by Molly Roberts, Brandon Stewart, and Dustin Tingley (among others). If you want to learn more, see their great website.
Julie Silge, “Text Classification with tidy principles”This blog will teach how to set up text classification problems, i.e., supervised machine learning.