Data Mesh: Rethinking Enterprise Data Architecture
In an era of self-service business intelligence, almost every company is trying to establish itself as a data-driven organization. Most companies recognize the myriad benefits they can take advantage of to make wise decisions. The ability to provide customers with a premium, hyper-personalized experience while reducing costs and capital is most appealing. However, the organization continues to face a lot of complexity as it moves towards a data-driven approach and maximizes its potential.
While migrating from legacy systems, avoiding legacy culture, and prioritizing data management in a combination of constantly competing business needs are all valid constraints, the architectural structure of the Data Platform Initiative also proves to be a major obstacle. Siloed data warehouses and data lake architectures limit real-time data streaming capabilities, undermining the organization’s goals for scalability and democratization. Fortunately, the data mesh, a new and innovative architectural paradigm that is creating waves, can breathe new life into your data goals. Let’s take a closer look at data mesh and how it can transform big data management.
What Is a Data Mesh?
A data mesh basically refers to the concept of dividing a data lake and silo into smaller, more decentralized parts. Like the transition from monolithic applications to microservices architecture in the world of software development, data meshes can be described as a data-centric version of microservices.
The term was first defined by ThoughtWorks consultant Zhamak Dehghani as a type of data platform architecture designed to address the ubiquitous nature of data within an organization through a self-service domain-oriented structure.
As a new organizational and architectural concept, data mesh challenges the traditional view that big data must be centralized to realize its analytical potential. If all your data is stored in one place and not centrally managed, it cannot provide its true value. Data mesh argues that big data can only drive innovation.
To facilitate this, a new version of federated governance with automation needs to be introduced. Data democratization is a central premise that underlies the concept of data mesh and cannot be achieved without decentralization, interoperability, and prioritization of the data consumer experience. As a paradigm for the architecture, data meshes show the potential to enable large-scale analytics through rapid access to a rapidly growing distributed domain set. Especially for consumption spike scenarios, such as analytics, machine learning, and data-centric application development and deployment.
At its core, data mesh seeks to address the shortcomings associated with traditional platform architectures that have led to the creation of centralized data lakes or warehouses. In contrast to the monolithic data processing infrastructure, where data consumption, storage, processing, and output are limited to a central data lake, data meshes support the distribution of data to specific domains. The product-based data approach allows owners of different domains to independently manage their data pipelines.
The fabric and associated datasets that join these domains serve as an interoperability layer that maintains consistent syntax and data standards. Various data pockets are interwoven and organized by the Web, i.e., the mesh.
Problems that Data Mesh Seeks to Fix
As mentioned earlier, the limitations of traditional data architecture have proven to be a major stumbling block in an organizations’ pursuit of leveraging the data at their disposal for tangible gains in transforming business processes and practices. The real struggle lies in transforming mounds of data into astute, actionable insights.
Data mesh addresses these concerns by fixing the following glaring gaps in the traditional approach to big data management:
- Monolithic platforms can’t keep up: Monolithic data platforms such as warehouses and lakes often lack the diversity of data sources and domain-specific structures needed to generate valuable insights from mounting data chunks. As a result, crucial domain-specific knowledge gets lost in these centralized platforms. This hinders the ability of data engineers to create meaningful correlations between different data points to create accurate analyses that represent the reality of operations.
- The data pipeline creates a bottleneck: In traditional formats, the data pipeline creates bottlenecks by separating the processes of data ingestion, transformation, and delivery. In essence, different departments working with different datasets work without mutual collaboration. Blocks of data are passed from one team to another without any meaningful integration or transformation.
- Data Professionals Working on Different Goals: Super-professional data engineers, source owners, and consumers often work on different goals because they work from a completely different perspective. This is often a hotbed of counterproductive effects. The root cause of this inefficiency is the lack of expertise to map the analysis in a way that is relevant to the fundamentals of the business.
3 Key Components of Data Mesh
A data mesh requires different elements to operate seamlessly—data infrastructure, data sources, and domain-oriented pipelines. Each of these elements is essential for ensuring universal interoperability, observability, and governance as well as upholding domain agnostic standards in data mesh architecture.
The following key components play a crucial role in helping data mesh meet those standards:
- Domain Oriented data owners and pipelines: Data meshes amalgamate data ownership between different domain owners who are responsible for offering their data as a product as well as facilitating communication between different locations across which data has been distributed. While every domain is responsible for owning and managing its ExtractTransformLoad (ETL) pipeline, a set of capabilities are applied to different domains to facilitate storage, cataloging, and access to raw data. Domain owners can leverage data for operational or analytical needs once it has been served to a given domain and is duly transformed.
- Self-service capabilities: One of the key issues associated with a domain-centric approach to data management is the duplication of pipeline and infrastructure maintenance in all cases. To counter this, data mesh can extract and collect functionality from a domain-independent data infrastructure and process the data pipeline infrastructure from there. At the same time, each domain leverages the components needed to run the ETL pipeline, paving the way for the support and autonomy needed. This self-service feature allows domain owners to focus on specific data use cases.
- Communication Interoperability and Standardization: Each domain is supported by a set of underlying universal data standards that can greatly help pave the way for collaboration as needed. This is important because the same set of raw and transformed data can inevitably be valuable to multiple domains. Standardizing data features such as governance, discoverability, formation, and metadata specifications enable cross-domain collaboration.
Why Use a Data Mesh?
To date, most companies have leveraged individual data warehouses or data lakes as part of their big data infrastructure to meet their business intelligence needs. Such solutions are deployed, managed, and maintained by a small group of experts who often suffer from high levels of technical debt. The result is overloaded data teams struggling to keep up with growing business demand, disruptions between data producers and consumers, and increased impatience among data consumers.
In contrast, distributed structures like data mesh combine the strengths of both (centralized databases and distributed data domains with independent pipelines) to create more viable and scalable alternatives. Data mesh addresses all the shortcomings of data lakes by increasing the flexibility and autonomy of data ownership. This means that the range of data experimentation and innovation is broadened by reducing the burden on a small number of selected experts.
At the same time, the self-serve infrastructure as a platform opens up avenues for a far more universal yet automated approach toward data standardization as well as data collection and sharing. Overall, the benefits of data mesh represent a decisive competitive advantage over traditional data architectures.
To Mesh or Not to Mesh – Is It the Right Choice for You?
Given these upsides, any organization will choose to leverage the data mesh architecture for big data management. But is that right for you?
An easy way to find out is to first determine your data mesh score based on data quality, the number of data domains, data teams and their sizes, data engineering bottlenecks, and data governance practices. The higher the score, the more complex the data infrastructure requirements and the greater the need for a data mesh.