A Python tool for extracting medical concepts from text and mapping them to the SNOMED-CT ontology hierarchy.
This repository provides functionality to:
- Extract medical concepts from text documents
- Map extracted concepts to SNOMED-CT ontology
- Analyze concept distributions across different hierarchical levels
- Visualize concept relationships and distributions
Install the following packages with their specific versions:
- Apply for UMLS access at the UMLS website
- Download the required UMLS and SNOMED-CT files
- Follow the setup instructions in the pymedtermino2 documentation
- Clone this repository:
git clone https://github.com/yourusername/snomed-ontology-parser.git
cd snomed-ontology-parser
- Create and activate the conda environment:
conda env create -f environment.yml
conda activate snomed-parser
- Place your UMLS data files in the
./data
directory - Run the Jupyter notebook:
jupyter notebook concept_distribution.ipynb
Note: If you encounter a locked pym.sqlite
file error, use the provided script:
bash remove_sql_lock.sh
To look up specific concept IDs, you can use the SNOMED CT Browser (Note: The browser may use an older version of the SNOMED ontology)
Input text:
Alterations in the hypocretin receptor 2 and preprohypocretin genes produce narcolepsy in some animals.
snomed-ontology-parser/
├── src/
│ ├── main.py # Main application entry point
│ ├── concept_extractor.py # Extracts medical concepts from text
│ ├── concept_analyzer.py # Analyzes concept distributions
│ └── data_loader.py # Handles UMLS data loading
├── data/ # Directory for UMLS data files
├── concept_distribution.ipynb
├── environment.yml
└── remove_sql_lock.sh