Some details
Our client, a global alliance of healthcare and technological companies built around the idea of digitizing different kinds of health data and utilizing AI to derive insights from that data, had a tremendous amount of data which was accumulated during a long period of time and was not concentrated in one location. The data was not utilized for business insights or decision-making purposes.
This
created a need to build centralized data repositories to support data exploration and analytics workload, as well as data sources for other repositories.Data Lake (DL) and Data Warehouse (DWH) were built as a pair of company-wide reference data repositories. DL used as a centralized storage of non-structured and semi-structured information, as well as the storage for “raw” structured data from individual products. DWH, in opposite, used as a repository of well-prepared, trusted datasets for data analysis and self-service analytics.