Quantifying Gender Bias in Dutch Word Embeddings

This repository contains the code and analysis for my Data Science & Society thesis on detecting and quantifying gender bias in Dutch word embeddings. The project leverages LSTM and Transformer models to track gender representation in embeddings, employing SVM-derived gender subspaces to analyze localization and evolution of biases over time. The research uses the SoNaR-corpus.

Data Preprocessing: Scripts for preparing the SoNaR-corpus, including tokenization and cleaning.
Model Training: Implementation of LSTM and Transformer models for creating Dutch word embeddings.
Bias Detection: Classifiers and SVM tools to identify and quantify gender bias.
Analysis: Analyzing the evolution and localization of gender bias in embeddings.
Evaluation: Visualizations and results documenting embedding behaviors and gender localization.

Repository structure

File (in code folder)	Description
`bert.ipynb`	First experimental code with BERT, not the final script
`corpus_to_azure.py`	Script to upload parts of the local corpus to Azure
`data_exploration_lemma.ipynb`	Data exploration at the lemma level (incomplete)
`data_exploration.ipynb`	Exploratory Data Analysis (EDA) on the corpus
`data_sentences.py`	Script handling sentence-level data processing
`visualize_results.ipynb`	Visualization of model training results

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
assets		assets
code		code
docs		docs
out		out
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quantifying Gender Bias in Dutch Word Embeddings

Contents

Repository structure

About

Releases

Packages

Languages

KleinJonasUVT/biasintransformers

Folders and files

Latest commit

History

Repository files navigation

Quantifying Gender Bias in Dutch Word Embeddings

Contents

Repository structure

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages