Skip to content

This repository contains code for my thesis on detecting and quantifying gender bias in Dutch word embeddings. Using LSTM and Transformer models trained on the SoNaR-corpus, it analyzes bias evolution and localization with SVM-derived gender subspaces.

Notifications You must be signed in to change notification settings

KleinJonasUVT/biasintransformers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Quantifying Gender Bias in Dutch Word Embeddings

This repository contains the code and analysis for my Data Science & Society thesis on detecting and quantifying gender bias in Dutch word embeddings. The project leverages LSTM and Transformer models to track gender representation in embeddings, employing SVM-derived gender subspaces to analyze localization and evolution of biases over time. The research uses the SoNaR-corpus.

Data Pipeline

Contents

  • Data Preprocessing: Scripts for preparing the SoNaR-corpus, including tokenization and cleaning.
  • Model Training: Implementation of LSTM and Transformer models for creating Dutch word embeddings.
  • Bias Detection: Classifiers and SVM tools to identify and quantify gender bias.
  • Analysis: Analyzing the evolution and localization of gender bias in embeddings.
  • Evaluation: Visualizations and results documenting embedding behaviors and gender localization.

Repository structure

File (in code folder) Description
bert.ipynb First experimental code with BERT, not the final script
corpus_to_azure.py Script to upload parts of the local corpus to Azure
data_exploration_lemma.ipynb Data exploration at the lemma level (incomplete)
data_exploration.ipynb Exploratory Data Analysis (EDA) on the corpus
data_sentences.py Script handling sentence-level data processing
visualize_results.ipynb Visualization of model training results

About

This repository contains code for my thesis on detecting and quantifying gender bias in Dutch word embeddings. Using LSTM and Transformer models trained on the SoNaR-corpus, it analyzes bias evolution and localization with SVM-derived gender subspaces.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published