Meno:Marian
Priezvisko:Rajnoha
Názov:Compressing neural networks using double block sparse factorization
Vedúci:Mgr. Vladimír Boľa, PhD.
Rok:2026
Kµúčové slová:deep learning, neural networks, structured pruning, block sparsity, structured sparsity, large language models, model compression
Abstrakt:Neural networks are a powerful tool, albeit computationally costly at scale. Various methods of compression have been explored in research, some of which are based on pruning, the removal of redundant connections. We propose and analyze algorithmic modifications to double sparse factorization (DSF), an existing pruning algorithm. We enforce structured, block-like patterns in the weights of a resulting model, aiming to accelerate it on modern hardware and reduce its storage requirements. Our experiments show that, even under these constraints, language models largely preserve their reasoning and comprehension abilities across several settings, with only minor degradation in accuracy metrics.

Súbory bakalárskej práce:

main_MR.pdf

Súbory prezentácie na obhajobe:

Upravi»