My master's thesis written as part of the computer science course at Jagiellonian University.
In recent years, there happened a gigantic leap in the speed of DNA sequencing methods, which allowed us to sequence DNAs of complex organisms, such as humans, quickly. However, this leads to increasing demand for disk storage, as the sizes of the databases containing such data can easily reach dozens of terabytes. In his article "Context binning, model clustering and adaptivity for data compression of genetic data", Jarek Duda proposes promising compression techniques that should help build a compressor better than the current state of the art. This thesis describes the compressor built to evaluate those techniques, tests it with real-world data and compares it to other genetic data compression tools.
The PDF file can be downloaded from the GitHub Releases page.
Make sure you have Inkscape and a distribution of LaTeX installed in your system.
make
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.