Abstrakt: | Short Tandem Repeats (STRs) are highly polymorphic regions of the human genome, critical for genetic identity, population studies, and as causative factors in a growing number of repeat expansion diseases. Accurate STR genotyping presents substantial computational challenges due to their repetitive nature.
This thesis introduces a novel statistical model for STR structure prediction based on the Expectation Maximization (EM) algorithm. The model enhances genotyping accuracy by integrating population allele data, a reference genome, and NGS read alignments to candidate alleles through probabilistic refinement. Evaluation against established tools using Genome in a Bottle reference data demonstrated the EM-based model's superior performance in minimizing error magnitudes, notably achieving the lowest Mean Absolute Error and Root Mean Squared Error, and accurately identifying pathogenic expansions. This work contributes a robust probabilistic methodology for STR analysis.
|
---|