Meno:Viktor Samuel
Priezvisko:Podhradský
Názov:Statistical model for structural prediction of short tandem repeats employing population data
Vedúci:Andrej Baláľ, PhD.
Rok:2025
Kµúčové slová:Short Tandem Repeats (STRs), Expectation Maximization algorithm (EM), Expansion Diseases
Abstrakt:Short Tandem Repeats (STRs) are highly polymorphic regions of the human genome, critical for genetic identity, population studies, and as causative factors in a growing number of repeat expansion diseases. Accurate STR genotyping presents substantial computational challenges due to their repetitive nature. This thesis introduces a novel statistical model for STR structure prediction based on the Expectation Maximization (EM) algorithm. The model enhances genotyping accuracy by integrating population allele data, a reference genome, and NGS read alignments to candidate alleles through probabilistic refinement. Evaluation against established tools using Genome in a Bottle reference data demonstrated the EM-based model's superior performance in minimizing error magnitudes, notably achieving the lowest Mean Absolute Error and Root Mean Squared Error, and accurately identifying pathogenic expansions. This work contributes a robust probabilistic methodology for STR analysis.

Súbory bakalárskej práce:

STR-structure-prediction-main.zip
bc_text.pdf

Súbory prezentácie na obhajobe:

podhradsky_bc_prezentacia.pdf

Upravi»