books

Data Mesh

Name: Data Mesh
Author: Zhamak Dehghani

Zhamak Dehghani

We're at an inflection point in data, where our data management solutions no longer match the complexity of organizations, the proliferation of data sources, and the scope of our aspirations to get value from data with AI and analytics. In this practical book, author Zhamak Dehghani introduces data mesh, a decentralized sociotechnical paradigm drawn from modern distributed architecture that provides a new approach to sourcing, sharing, accessing, and managing analytical data at scale.Dehghani guides practitioners, architects, technical leaders, and decision makers on their journey from traditional big data architecture to a distributed and multidimensional approach to analytical data management. Data mesh treats data as a product, considers domains as a primary concern, applies platform thinking to create self-serve data infrastructure, and introduces a federated computational model of data governance.Get a complete introduction to data mesh principles and its constituentsDesign a data mesh architectureGuide a d...

20 highlights

data-products data-mesh data-engineering use-case diagram critical-insight technical-challenge check-list

Highlights & Annotations

For data to be a product it adheres to a set of rules and exhibits a set of traits that make it fit right in the intersection of Cargan’s usability, feasibility, and valuable Venn diagram. For data to be a product, it must be valuable to its users—on its own and in cooperation with other data products. It must demonstrate empathy for its users and be accountable for its usability and integrity

Ref. E2BF-A

Data as a product expects that the analytical data provided by the domains is treated as a product, and the consumers of that data should be treated as customers—happy and pleased. Furthermore, data as a product underpins the case for data mesh, unlocking the value of an organization’s data by dramatically increasing the potential for serendipitous and intentional use.

Ref. 123F-B

Compared to past paradigms, data as a product inverts the model of responsibility. In data lake or data warehousing architectures the accountability of creating data with quality and integrity resides downstream from the source and remains with the centralized data team. Data mesh shifts this responsibility close to the source of the data. This transition is not unique to data mesh; in fact, over the last decade we have seen the trend of shift left with testing and operations, on the basis that addressing problems is cheaper and more effective when done close to the source.

Ref. B955-C

In his book INSPIRED, Marty Cagan, a prominent thought leader in product development and management, provides convincing evidence on how successful products have three common characteristics: they are feasible, valuable, and usable. Data as a product principle defines a new concept, called data product, that embodies standardized characteristics to make data valuable and usable. Figure 3-1 demonstrates this point visually. This chapter introduces these characteristics. Chapter 4 describes how to make building data products feasible.

Ref. 9E91-D

I go as far as saying that what gets shared on a mesh is not merely data; it is a data product.

Ref. 97D6-E

Data as a product is about applying product thinking to how data is modeled and shared. This is not to be confused with product selling.

Ref. B43F-F

Applying the magic ingredient of product thinking to internal technology begins with establishing empathy with internal consumers (i.e., fellow developers), collaborating with them on designing the experience, gathering usage metrics, and continuously improving the internal technical solutions over time to maintain ease of use. Strong digital organizations allocate substantial resources and attention to building internal tools that are valuable to the developers and ultimately the business.

Ref. ABF8-G

Curiously, the magical ingredient of empathy, treating data as a product and its users as customers, has been missing in big data solutions. Operational teams still perceive their data as a byproduct of running the business, leaving it to someone else, e.g., the data team, to pick it up and recycle it into products. In contrast, data mesh domain teams apply product thinking with similar rigor to their data, striving for the best data user experience.

Ref. 22AB-H

With an informed understanding of these use cases and what information the other teams need, the media player domain provides two different types of data as its products to the rest of the organization: near-real-time play events exposed as infinite event logs, and aggregated play sessions exposed as serialized files on an object store.

Ref. 2356-I

s the media player domain data products.

Ref. 6B46-J

As you can imagine, you can adopt the majority of the product ownership techniques to data. However, there is something unique about data. The difference between data product ownership and other types of products lies in the unbounded nature of data use cases, the ways in which particular data can be combined with other data and ultimately turned into insights and actions. At any point in time, data product owners are aware or can plan for what is known today as viable use cases of their data, while there remains a large portion of unknown future use cases for the data produced today, perhaps beyond their imagination.

Ref. AB81-K

They can continue to be used, transformed, and reinterpreted by data users of the future. Source-aligned data products need to balance the immediate known use cases and the unknown ones. They have no choice but to strive to model the reality of the business, as closely as possible, in their data, without too much assumption in how it will be used. For example, capturing all the play events as an infinite high-resolution log is a safe choice. It opens the spectrum of future users who can build other transformations and infer new insights from the data that is captured today.

Ref. BA05-L

Baseline Usability Attributes of a Data Product

Ref. 09E1-M

No one will use a product that they can’t trust. So what does it mean to trust a data product, and more importantly what does it take to trust? To unpack this, I like to use the concept of trust offered by Rachel Botsman: the bridge between the known and the unknown. A data product needs to close the gap between what data users know confidently about the data, and what they don’t know but need to know to trust it. While the prior characteristics like understandability and discoverability close this gap to a degree, it takes a lot more to trust the data for use.

Ref. C56B-N

However, a sense of discomfort arises when they go deeper into what it actually takes to implement the transformation toward data mesh. I found in my conversations with data mesh early implementers that while they verbalize the principles and their intention to implement them, the implementations remain heavily influenced by the familiar techniques of the past.

Ref. 46C5-O

Today, in the absence of these data-as-a-product practices, data lineage remains a vital ingredient for establishing trust. Data users have been left with no choice but to assume data is untrustworthy and requires a detective investigation through its lineage before it can be trusted. This lack of trust is the result of the wide gap between data providers and data users due to the data providers’ lack of visibility to the users and their needs, lack of long-term accountability for data, and the absence of computational guarantees.

Ref. 5929-P

Data mesh shifts from this dual mode of data versus code to data and code as one architectural unit, a single deployable unit that is structurally complete to do its job, providing the high-quality data of a particular domain. One doesn’t exist without the other.

Ref. A635-Q

The principle of data as a product is a response to the data siloing challenge that may arise from the distribution of data ownership to domains. It is also a shift in the data culture toward data accountability and data trust at the point of origin. The ultimate goal is to make data simply usable

Ref. 3D1A-R

The chapter explained eight nonnegotiable baseline usability characteristics of data products including discoverability, addressability, understandability, trustworthiness, native accessibility, interoperability, independently valuable, and security.

Ref. E598-S

Data mesh’s main difference with both approaches is that it shifts discoverability left. Data discoverability, understandability, etc., starts with the data product itself, when the data product is created and throughout its life cycle. It’s the responsibility of a data product to share the information needed to make itself discoverable, understandable, trustworthy, and explorable.

Ref. 8BD9-T