Názov:Distilling the Knowledge of SlovakBERT
Vedúci:prof. Ing. Igor Farkaą, Dr.
Kµúčové slová:knowledge distillation, BERT, Slovak language
Abstrakt:Deep neural networks have become a successful approach to natural language modeling. Model accuracy tends to increase with network size. The language models require more and more memory to train and use. A focus on reducing the size of the neural network while maintaining almost all of its performance, called knowledge distillation, has come to the forefront of the research community. In this work we deal with language-specific knowledge distillation and show that it is a viable technique for reducing the size of the model while maintaining almost all its accuracy. We evaluate distilled models on four language understanding tasks, some of which are machine-translated into Slovak, namely STS and BoolQ. In addition, we show that averaging logits and hidden states when performing knowledge distillation from multiple teachers, who have seen the same set of training data, does not provide an advantage to the student model. Our distilled models achieve from 91% to 99% accuracy of the original model, but have 46% fewer parameters.

Súbory diplomovej práce:

Distilling the Knowledge of SlovakBERT.pdf

Súbory prezentácie na obhajobe:

Distilling the knowledge of SlovakBERT.pdf