Katedra informatiky - Detaily bakalárskej práce

Meno:	Viktor Samuel
Priezvisko:	Podhradský
Názov:	Statistical model for structural prediction of short tandem repeats employing population data
Vedúci:	Andrej Baláž, PhD.
Rok:	2025
Kľúčové slová:	Short Tandem Repeats (STRs), Expectation Maximization algorithm (EM), Expansion Diseases
Abstrakt:	Short Tandem Repeats (STRs) are highly polymorphic regions of the human genome, critical for genetic identity, population studies, and as causative factors in a growing number of repeat expansion diseases. Accurate STR genotyping presents substantial computational challenges due to their repetitive nature. This thesis introduces a novel statistical model for STR structure prediction based on the Expectation Maximization (EM) algorithm. The model enhances genotyping accuracy by integrating population allele data, a reference genome, and NGS read alignments to candidate alleles through probabilistic refinement. Evaluation against established tools using Genome in a Bottle reference data demonstrated the EM-based model's superior performance in minimizing error magnitudes, notably achieving the lowest Mean Absolute Error and Root Mean Squared Error, and accurately identifying pathogenic expansions. This work contributes a robust probabilistic methodology for STR analysis.

Súbory bakalárskej práce:

STR-structure-prediction-main.zip

Súbory prezentácie na obhajobe:

podhradsky_bc_prezentacia.pdf