Content-Sharing

Analyzing Twitter data of the 20th German Parliament (also known as "Bundestag") to identify patterns in retweeting behavior.

The data from Twitter was collected in May 2022 via the official Twitter-API for researchers.

Find the results of the research in Content_Sharing_Results.pdf.

data

The most important tables (stored as csv-files in the data-folder) are mdb_list, user_list, tweet_list and user_friendships.

mdb_list contains all politicians that are/were part of the 20th German Parliament.
user_list contains all politicians that have a Twitter account (even if the profile is not actively used). The additional account information was downloaded via the Twitter API. In the column api_call the time when the request was made is logged.
tweet_list contains the max. 3200 latest tweets by a user. This includes also retweets, quotes and replies. For a better scraping experience, the tweets of each user were stored in individual files. To work with all tweets, the files can be merged together into one table.
user_friendships contains all "friendships" between the users. As Twitter allows uni-directional und bi-directional "friendships", two users always appear twice in the list (e.g. X;Y;TRUE and Y;X;FALSE means the "friendship" is uni-directional).

data-collection

This folder contains all scripts for the data collection and transformation. The config.py script is needed to load the environment variables to be able to access the Twitter API. The files are listed in the order they need to be executed. Otherwise the output-files may contain errors.

Working with Twitter API

Create a Twitter account and get access to the API v2 and v1.1.
Create a .env file in the root directory with the keys API_KEY, API_SECRET_KEY, ACCESS_TOKEN, ACCESS_SECRET and BEARER_TOKEN.
Run the script config.py to load the environment variables from the .env file.

Preprocessing

mdb_collection: Reads the XML file which was downloaded from the Bundestag website and transforms it into a table of politicians that were/are part of the 20th German Parliament. The results are stored in mdb_list.csv.
mdb_twitter_collection.py: Loads the information about the Twitter usernames of the politicians from twitter_usernames.csv and appends it to the results of mdb_list.csv. They are joined via the bundestag_id. The result is stored in mdb_twitter_list.csv.

Accessing Twitter API

user_collection.py: Reads mdb_twitter_list.csv and enriches the table with the Twitter profile information. The API is accessed via the twitter_handle (Twitter username). The results are stored in user_list.csv and only contain entries of politicians with a Twitter account.
tweet_collection.py: Reads user_list.csv to get the twitter_id's of the accounts and then checks in the folder tweet_list if the tweets of the specific account were already downloaded. If not, it downloads the last 3200 tweets of the account in one file and stores it in the format tweet_list_TWITTER_ID_TWITTER_HANDLE.csv.
user_friendship_collection.py: Reads the files user_list.csv and user_friendships.csv. Then it goes through the account list and checks if the friendship between two users was already checked. Friendship means if the two users follow each other or only one account follows the other account. If the friendship was not checked before, it calls the API and stores the result directly in user_friendships.csv.

Transformations

retweeter_collection.py: Checks all tweet-files in the folder tweet_list and finds out which retweets/quotes are related to tweets in this list of tweets. In other words, it finds out which politician quoted/retweeted a tweet of another politician. The results are then stored in retweet_list.csv and quote_list.csv. Users can retweet/quote their own tweets as well.

data-preparation

This folder contains the scripts und jupyter notebooks for preparing the data for the tool Gephi that is used to visualize the graphs/networks.

data-analysis

This folder contains the jupyter notebooks for analysing the collected data. A summary of the results can be found in the PDF file Content_Sharing_Results.pdf.

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
data-analysis		data-analysis
data-collection		data-collection
data-preparation		data-preparation
data		data
docs		docs
.gitignore		.gitignore
Content_Sharing_Results.pdf		Content_Sharing_Results.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Content-Sharing

data

data-collection

Working with Twitter API

Preprocessing

Accessing Twitter API

Transformations

data-preparation

data-analysis

About

Contributors 2

Languages

lucashoeft/Content-Sharing

Folders and files

Latest commit

History

Repository files navigation

Content-Sharing

data

data-collection

Working with Twitter API

Preprocessing

Accessing Twitter API

Transformations

data-preparation

data-analysis

About

Topics

Resources

Stars

Watchers

Forks

Contributors 2

Languages