Research > Data & LSD
Data gathered for my published research are archived at the Harvard Dataverse. I also rely on data from the Canadian Opinion Research Archive, the American National Election Studies, the General Social Survey.
The Lexicoder Sentiment Dictionary (LSD)
The LSD is a bag-of-words dictionary designed for the automated coding of sentiment in news coverage, legislative speech and other text. It is discussed and tested in detail in Young and Soroka 2012 and is freely available for academic use.
The LSD was initially designed to be run in Lexicoder, Java-based software for automated content analysis designed by Mark Daku, myself, and Lori Young. The software is no longer updated, but the dictionary is still widely used. It is included directly in the in the quanteda package for R. We strongly recommend using the quanteda implementation of the dictionary.
For those who would prefer to implement the dictionary outside of quanteda, you can look over the user agreement and then download the dictionary.
Citation: Young, L. and Soroka, S. 2012. Lexicoder Sentiment Dictionary.
Much of the research using the LSD is available through this search. The original pre-processors designed to improve the performance of the LSD have been adapted for implementation in R, by Emily Luxon (University of Michigan at Dearborn), and can be downloaded as a single R script by clicking here. A French-language version of the LSD is also available, produced by Duval and Petry, here.
Topic Dictionaries
The Lexicoder Topic Dictionaries were aimed at capturing topics in news content, legislative debates, and policy documents. These were preliminary dictionaries, designed to capture the Major Topic codes from the comparative Policy Agendas project in English, Dutch, and Hebrew. These dictionaries were developed in conjunction with the INFOPOL project.
We are not continuing with the development of these dictionaries, but they have served a useful starting point for researchers interested in capturing topics. The draft versions of English and Dutch dictionaries can be downloaded here.
Citation: Albugh, Quinn, Julie Sevenans and Stuart Soroka. 2013. Lexicoder Topic Dictionaries, June 2013 versions, McGill University, Montreal, Canada.
The Discrete Emotions Dictionary (DED)
Some colleagues at the University of Michigan and I have taken a first shot at a dictionary designed to identify discrete emotions in political text. The dictionary may be useful on its own, or as a first step in more complex automated analyses. The paper and dictionary have been deposited at the OSF, and are available here.
Citation: Fioroni, Sarah, Ariel Hasell, Stuart Soroka and Brian Weeks. 2021. “Constructing a Dictionary for the Automated Identification of Discrete Emotions in News Content.” OSF Preprint, DOI 10.17605/OSF.IO/CBM9E.