Názov:Consistency and Fault-Tolerance in Data Warehouses
Vedúci:Mgr. András Varga, PhD.
Kµúčové slová:analytic database, data warehouse, big data, fault-tolerance, data consistency
Abstrakt:Distributed data warehouses play an essential role in big data processing and analytics. However, their increased exposure to failures has a detrimental effect on the business requirement for consistent and available data. In this thesis, we consider various designs of fault-tolerant data warehouse systems and their resumption strategies. We implement a distributed data warehouse using big data tools like Hive and Spark to test the efficiency of the error-handling methods in a big data environment. In particular, we evaluate the performance of the optimized Dependency Analysis method compared to the naive resumption approach. The results of our testing suggest that the Dependency Analysis can improve the performance considerably and proves to be efficient even in a distributed setting.

Súbory bakalárskej práce:


Súbory prezentácie na obhajobe: