This repository contains the code created for my Bachelor thesis project "Automatic extraction of semantic relations in German compounds".
The packages I used for this project can be found in requirements.txt.
The amrlib model "model_parse_xfm_bart_large-v0_1_0" which is necessary for the parsing of the AMRs still needs to be downloaded and moved.
It can be found here:
The model should be extracted in the directory amrlib/data and the name should be changed to model_stog.
The Ghost-NN data set ( was utilized for this project. It contains German noun compounds and their semantic relations.
The Python module duden was used to extract the meanings of the compounds from the Duden online dictionary.
For the extraction of the meanings from Duden needs to be executed.
I analyzed the results of 50 random compounds of the data set. I generated the data set for the analysis in
The German meaning descriptions were translated into English with the DeepL Python API.
This is implemented in
The AMRs of these translations were parsed with amrlib.
To do this, has to be executed.
A data set with features was generated in which was then used for multiclass-classification in compound_relation_classification_with_sklearn.ipynb.
Finally, a multiclass-classification with a pretrained transformer model was performed in compound_classification_transformers.ipynb.