From data lakes to data mesh: Rethinking platform modernization
- Posted on October 13, 2020
- Estimated reading time 3 minutes
This article was originally written by Avanade alum Dael Williamson.
Most clients I work with understand the importance of data and are either interested in or planning their next generation of a modern data platform. Many recognize that legacy systems are overwhelmed with varied data streams and that tightly coupled data pipelines have limited their power to provide high-velocity responses, which are particularly important given recent market disruptions.
I see many businesses that have worked tirelessly to centralize diverse, ever-multiplying datasets. Their data engineers often struggle to transform mountains of data they don’t understand into information that analysts and data scientists can understand. As a result, our key data talent, of which there is a mass market shortage, have been left to develop highly specialized, platform-specific skills and workarounds with limited ability to design highly scalable, domain-specific products.
Despite it all, investments in data grow
Despite the challenges, investments in using data to affect performance and competitive advantage are a strategic priority among 80% of CEO respondents. In fact, 67% of enterprises prioritize the creation of a data management capability to enable them to turn internal data into insights, as well as maintaining and refining data sets and data processes, according to IDC.
In certain sectors, we’re seeing unprecedented growth in data and AI investments. That’s why the evolution of data architectures is so fascinating. How data architectures have been designed can reveal gaps and failure modes that inhibit data and AI investments. It can also uncover potential solutions. While interesting to explore, I have learned data architecture may not be the most welcome topic of discussion at dinner time, at least until my daughter is in first grade.
Making the most of the data evolution
Throughout the 1990s and the first decade of this century, when data warehousing was at the core of data architecture, many tackled integration by building a single, centralized data model to fit all data intended for sharing across the enterprise. Subsequently, the limitations of the data warehouse revealed marginal capability to work with all types of data and database systems.
In response to the limits of a data warehouse, data lake architecture emerged, with new capabilities for handling unstructured and differently structured data and new challenges, including governance, security and managing the risks of becoming a data swamp.
Next in line, data hubs, gained traction. Business domains within the organization began looking to leverage new features for data storage, harmonization, indexing, processing, governance, metadata, search, and exploration to efficiently build AI-driven use cases. Several factors, including data sovereignty and privacy regulations, also contributed to the hub concept while the digital and mobile data explosion accelerated its use.
Moving from monolithic to cloud distributed data mesh
A cloud distributed data mesh supports a growing movement to decentralize data. Data mesh allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines. This is different from traditional monolithic data infrastructures that tightly couple and often slow down the ingestion, storage, transformation, and consumption of data from one central data lake or hub.
Data mesh presents new opportunities for each business domain and organization to improve the speed and accuracy of how data is modeled, governed, and abstracted from the underlying platform. Data mesh enables digital capabilities such as digital twin, internet of things and edge cloud computing. This makes it possible to connect products faster from design, manufacturing, and supply chain to enable revenue growth and cost efficiency.
Helping a client evolve to a next generation of platform modernization
Avanade partnered with one client in financial services to help define its data modernization journey. The client needed support in two key initiatives: (1) define the roadmap for modernizing one of the legacy platforms that looks after risk management across several capital market domains and several markets; and (2) assess and optimize the modern data platform it was building on Azure.
We were chosen due to our Microsoft experience and expertise in shaping use-case-led data platform modernization transformation roadmaps. As part of our Data Value Workshop assessment process we created a phased approach to:
- Introduce an Azure domain-agnostic data platform as an extension of the client’s existing ecosystem
- Productize its data and modeling into domains that reflect the way the data flows (rather than the way the business is organized)
- Modernize tools and platforms for consumers of the data directly from the cloud, and eventually reverse engineer the logic from on-prem legacy data warehouse to the cloud
- Incrementally redevelop using modern data movement methods to get directly from the data sources
- Eventually retire the legacy monolith system
At Avanade we call this our “Cloud Data Co-Existence Extension Pattern”.
In addition, leveraging the understanding of the client’s existing business and system landscape, we used our unified analytics modeling framework to reshape the data model aligning with the data mesh concept, which the client was keen to experiment with.
As part of this engagement, we not only recommended a solution but produced a comprehensive list of platform-as-services suitable for various capabilities, aligned to the company’s strategic IT vision and the various use cases, with the pros and cons of each and a cost profile to guide the choice, timing and usage. We also helped the client make the system more secure and robust.
Where to go from here?
Like any modernization approach, the data mesh is one arrow in the quiver, and should only be pointed at certain targets. In deciding whether a client needs a data mesh, we work with clients to identify the best design and approach to build the next generation of data platform modernization.
We help clients with a domain-driven design that leverages our experience to create high-value use cases and develop a governance plan for distributed domains and data. Most of all, we can help entire data teams with an easier way to manage the growing needs of their organization, from fielding the never-ending stream of ad hoc queries, to wrangling disparate data sources, to creating the greatest value at scale for the organization.
Find out more about how to power your data science teams with data and advanced analytics and help your business thrive in a changing world.