(NLP) How did the US presidential speeches change over time?
0. Data & Tools
Data
- 43 presidents
- 968 speeches from Miller Center
Tools
- Scrapy, Spacy
- NLTK, Gensim, TextBlob
- Sklearn
1. Topic Modeling and Sentiment Analysis
1-0. LDA Topic Models with 8 Topics
*Bolded topics are more frequent than the others
- topic 1: shall, service, may, examination
- topic 2: men, world, people, great
- topic 3: states, government, united, congress (nation topics)
- topic 4: president, would, mr, general
- topic 5: states, slavery, would, south
- topic 6: president, people, mr, general (title topics)
- topic 7: government, upon, would, country
- topic 8: people, us, world, must, america
1-1. Vocabulary Evolution
The following picture shows what vocabularies were frequently used in each period.
From 1780 to 1990, words such as states, government, and united appeared the most while vocabularies that are more related to ordinary people including people, us, American started to replace them since the 20th century.
I plotted bayesian probabilities for states, government, and people based on LDA topic modeling. It turned out that the probability of the word “people” started to pick up around 1850s after Abraham Lincoln delivered the Gettysburg speech. Succeeding presidents used “people” more often in their speeches after Abraham Lincoln.
1-2. Sentiment Analysis
I used the TextBlob library to observe how sentiment index changed from 1900 to 2016 as follow. The higher the index is, the more positive words are used in the presidential speeches.
Then, I classified periods based on the degree of sentiment as follow. Sentiment bottomed during the World War 2, the early 1980’s energy cisis, and the 2008 financial crisis. On the contrary, the speech sentiment peaked during the post World War 2 and the dotcom bubble periods. During the bad times, negative words such as “bankruptcy, competitive, threat, cost, inflation” appear a lot in the speeches while positive words including “opportunity, right, future” are widely used for the good times as follow.
2. Clusters
In order to find clusters among presidents, I used Countervectorizer and Gensim to vectorize word frequencies of each president. After vectorizing word frequencies, I applied 2-dimensional PCA to visualize each president’s vector and normalize it to find cosine similarities among presidents.
2-0. K-Means with 8
The above plot is the normalized 2-dimensional PCA result. Richard Nixon turned out to be the only president who has his own cluster.
2-1. Clusters by Party
The above plot shows the difference between democratic presidents and conservative presidents. There is no significant difference between these two parties showing that the agenda for each party has evolved over time. I then analyzed discovered specific vocabularies that were mostly used by each party. The words “free, duty, Vietnam” appeared more frequently in the democratic presidential speeches and “business, tax, Iraq, security” were dominated by conservative presidents.
2-2. Clusters by Period
This plot shows how presidential speeches evolved over time. I was able to detect significant clusters among presidents before 1850s and after 1950s. Especially, vocabularies in the US presidential speeches started to converge on the right below side after 1950s except for Donald Trump. Donald Trump’s speech is more similar to the presidents before 1850’s and is most similar to the transcripts of Warren Harding, followed by Dwight Eisenhower and Woodrow Wilson.
3. Conclusion
According to the LDA topic models, nation-related vobabularies such as “government, “nation” appeared less after 1850s. In addition, presidential speeches tend to be depressed during the economic recessions while more positive words are used after the post World War 2 period and before the dotcom bubble collapsed. I then applied 2-dimensional PCA to find clusters among presidents and found that there is no strict distinction between parties, implying that parties’ agenda have evolved over time. Furthermore, presidential speeches tend to converge after 1950s regardless of the party register except for Donald Trump, whose speech is more similar to presidents such as Warren Harding, Dwight Eisenhower, and Woodrow Wilson.