Meno:Jitka
Priezvisko:Muravská
Názov:Permuting Matrix Columns for Improved Block-Sparse Pruning
Vedúci:Mgr. Vladimír Boľa, PhD.
Rok:2026
Kµúčové slová:block pruning, linear sum assignment, TETRIS, permutation
Abstrakt:Modern transformer models, from large language models to vision transformers, achieve impressive task performance but require substantial compute and memory at inference. Pruning reduces this cost by zeroing out a fraction of its weights, ideally at little or no loss in task quality. Block pruning makes the resulting sparsity pattern hardwarefriendly, but at a cost: grouping weights into blocks before pruning can trap important weights inside otherwise-low-value blocks. The TETRIS algorithm of [16] addresses this by permuting the elements along the dimensions of the weight matrix before pruning, using a greedy local search to find good permutations. We propose Global Assignment TETRIS (GA-TETRIS), a variant that replaces the local search with an exact solution to a minimum-weight perfect matching in a bipartite graph problem at each iteration, augmented with noise injection and random-swap refinement to escape local optima. We evaluate GA-TETRIS against the Original TETRIS, three simpler heuristics (Sort-by-Norm, Random-Swaps, and Random-Swaps with sorted initialization), and the no-permutation Block-Wanda baseline on two Vision Transformers (0.9M and 13.4M parameters) and SmolLM2-360M. On the per-layer pruning objective, GATETRIS achieves the highest score improvement at wide block widths across all three models. On whole-model evaluation, performance collapses rapidly with growing block width on every model; only the Larger ViT at block 1 × 2 preserves substantial accuracy. In that single regime, GA-TETRIS is the best algorithm by 1.85 percentage points over the next best. On SmolLM2 at the same block width, however, the simple Sort-by-Norm heuristic produces the lowest perplexity — though still roughly seven times worse than the unpruned baseline — suggesting that the best algorithm depends on both block width and model.

Súbory diplomovej práce:

main.pdf
code.zip

Súbory prezentácie na obhajobe:

Upravi»