Meno:Patrícia
Priezvisko:Vnenčáková
Názov:Evaluating Large Language Models on Computational Thinking Tasks
Vedúci:Mgr. Marek ©uppa
Rok:2026
Kµúčové slová:large language models, Bebras challenge, computational thinking, reasoning tasks, multimodal input, model evaluation, Slovak language
Abstrakt:Large language models are systems based on deep neural networks that are capable of understanding and generating human-like text. In addition to language processing, they are also increasingly used for solving logical and reasoning tasks, although their true capabilities in this area are still actively studied. The goal of this thesis is to evaluate their ability to solve reasoning-based tasks from the Slovak Bebras challenge, competition focused on computational thinking. In this thesis, we created a database of tasks (822 tasks) from the competition, tested several language models (6) on the collected tasks, and compared their performance in terms of accuracy, execution time, and computational cost. The results showed significant differences between the evaluated models, with the most advanced models achieving substantially better performance than smaller or more efficient variants. We also observed that models perform best on purely textual tasks, while the presence of visual elements, especially multiple images, leads to a noticeable decrease in performance. The analysis across competition categories showed that task modality has a stronger impact on performance than the nominal difficulty level of the tasks. A comparison with student results demonstrated that the strongest models can outperform average students, while weaker models often achieve results comparable to lower-performing students. Finally, we analyzed different input representations and found that representing tasks as images does not reduce computational cost, but instead increases token usage without consistent improvements in accuracy.

Súbory diplomovej práce:

Vnencakova_diploma_thesis.pdf

Súbory prezentácie na obhajobe:

Upravi»