Skip to main content

Overcoming the challenge of high data growth and data silos

Blog by Jesper Lowgren, Principal Enterprise Architect, and Dr Haibo Yang, Principal Data Architect at DXC Technology.

In the past two decades, organisations have been experiencing a high growth of data in terms of volumes, types and sources. Many have found it challenging to keep up with the rapid growth and make good use of the data being collected. One common challenge we can observe in many industries is the unintended consequence of creating more data silos through introducing new software applications and systems.   
 
Data collected from different systems and proprietary applications, is often not directly compatible with each other, and requires complex logic to integrate and make sense of. Data silos are becoming costly to modern organisations and can prevent effective digital transformation, automation, reporting, analytics, and AI use cases.
 
Data warehouse and Data Lake are two typical solutions for modern enterprises to break data silos. 
 
On one hand, an enterprise data warehouse is often expensive to build and maintain, but it can enable integration, storage, governance, and processing of structured data collected from different relational sources such as CRMs and ERPs.  
 
On the other hand, a data lake focuses on semi-structured and unstructured data including documents and media files. Despite its low price tag, a data lake often ends up becoming a "data swamp", where the validity and reliability of data stored there becomes questionable over time when more and more ungoverned documents/files are being dumped into it.   
 
Microsoft Fabric, powered by OneLake, is considered a game-changer because of its low code approach to build open format Lakehouses. A lakehouse is essentially combining the advantages of a data warehouse and data lake by introducing the rigour and performance of data warehousing while still leveraging the cost efficiency and scalability of lake storage.  


With Microsoft Fabric, organisations will have the ability to connect, analyse, and govern all types of data through lakehouses without having to struggle between warehouses and lakes. Being able to join and correlate structured, semi-structured and unstructured datasets, can generate holistic insights into business operations and optimisations - at all levels.
 
OneLake is a core component of Microsoft Fabric and a single unified logical lake for the whole organisation to store and share different types of data, as well as build lakehouses on top. Just like OneDrive is for documents and files, OneLake can be viewed in Microsoft Windows File Explorer for all users to directly access and collaborate on data, of which they have permissions to access.  
 
OneLake has three key features including: 
•    One Copy of data using shortcuts with no more duplicates - to reduce cost and boost productivity; 
•    One Security enabling consistent settings enforced across all tools - to ensure compliance and prevent data leakage and privacy breaches;
•    One Data Hub centralising data discovery and management - to democratise data assets and to build confidence in data quality.
 
Before the introduction of OneLake, the technical barriers for business users to directly consume and interact with data lakehouses had been high and entrenched for years. OneLake has made data security, governance, and democratisation seamlessly integrated in one solution. By contrast, other comparable, non-Microsoft data platforms may still need to go a long way to match the OneLake experience.