Skip to content

Analyzing the Twitter data of the 20th German Parliament to identify patterns in retweeting behavior within and between politcal parties

Notifications You must be signed in to change notification settings

lucashoeft/Content-Sharing

Repository files navigation

Content-Sharing

Analyzing Twitter data of the 20th German Parliament (also known as "Bundestag") to identify patterns in retweeting behavior.

The data from Twitter was collected in May 2022 via the official Twitter-API for researchers.

Find the results of the research in Content_Sharing_Results.pdf.

data

The most important tables (stored as csv-files in the data-folder) are mdb_list, user_list, tweet_list and user_friendships.

  • mdb_list contains all politicians that are/were part of the 20th German Parliament.
  • user_list contains all politicians that have a Twitter account (even if the profile is not actively used). The additional account information was downloaded via the Twitter API. In the column api_call the time when the request was made is logged.
  • tweet_list contains the max. 3200 latest tweets by a user. This includes also retweets, quotes and replies. For a better scraping experience, the tweets of each user were stored in individual files. To work with all tweets, the files can be merged together into one table.
  • user_friendships contains all "friendships" between the users. As Twitter allows uni-directional und bi-directional "friendships", two users always appear twice in the list (e.g. X;Y;TRUE and Y;X;FALSE means the "friendship" is uni-directional).

Entity Relationship Diagram

data-collection

This folder contains all scripts for the data collection and transformation. The config.py script is needed to load the environment variables to be able to access the Twitter API. The files are listed in the order they need to be executed. Otherwise the output-files may contain errors.

Working with Twitter API

  • Create a Twitter account and get access to the API v2 and v1.1.
  • Create a .env file in the root directory with the keys API_KEY, API_SECRET_KEY, ACCESS_TOKEN, ACCESS_SECRET and BEARER_TOKEN.
  • Run the script config.py to load the environment variables from the .env file.

Preprocessing

  • mdb_collection: Reads the XML file which was downloaded from the Bundestag website and transforms it into a table of politicians that were/are part of the 20th German Parliament. The results are stored in mdb_list.csv.
  • mdb_twitter_collection.py: Loads the information about the Twitter usernames of the politicians from twitter_usernames.csv and appends it to the results of mdb_list.csv. They are joined via the bundestag_id. The result is stored in mdb_twitter_list.csv.

Accessing Twitter API

  • user_collection.py: Reads mdb_twitter_list.csv and enriches the table with the Twitter profile information. The API is accessed via the twitter_handle (Twitter username). The results are stored in user_list.csv and only contain entries of politicians with a Twitter account.
  • tweet_collection.py: Reads user_list.csv to get the twitter_id's of the accounts and then checks in the folder tweet_list if the tweets of the specific account were already downloaded. If not, it downloads the last 3200 tweets of the account in one file and stores it in the format tweet_list_TWITTER_ID_TWITTER_HANDLE.csv.
  • user_friendship_collection.py: Reads the files user_list.csv and user_friendships.csv. Then it goes through the account list and checks if the friendship between two users was already checked. Friendship means if the two users follow each other or only one account follows the other account. If the friendship was not checked before, it calls the API and stores the result directly in user_friendships.csv.

Transformations

  • retweeter_collection.py: Checks all tweet-files in the folder tweet_list and finds out which retweets/quotes are related to tweets in this list of tweets. In other words, it finds out which politician quoted/retweeted a tweet of another politician. The results are then stored in retweet_list.csv and quote_list.csv. Users can retweet/quote their own tweets as well.

data-preparation

This folder contains the scripts und jupyter notebooks for preparing the data for the tool Gephi that is used to visualize the graphs/networks.

data-analysis

This folder contains the jupyter notebooks for analysing the collected data. A summary of the results can be found in the PDF file Content_Sharing_Results.pdf.

About

Analyzing the Twitter data of the 20th German Parliament to identify patterns in retweeting behavior within and between politcal parties

Topics

Resources

Stars

Watchers

Forks