Why Data Mesh? Ft. Ben Stopford
Confluent
https://cnfl.io/podcast-episode-203 | With experience in data infrastructure and distributed data technologies, author of the book “Designing Event-Driven Systems” Ben Stopford (Lead Technologist, Office of the CTO, Confluent) explains the data mesh paradigm, differences between traditional data warehouses and microservices, as well as how you can get started with data mesh.
Highlights & Annotations
to get the principles to come in, but the first two principles are effectively sociotechnical. So they’re about domain ownership. You should get counterparty information or customer information from the system that originates it. That way, it’s most likely not to be broken, basically.So if there’s a data feed team in my bank, I should be getting my data feed data from them, not from the data warehouse team?Yeah, or the finance team that happened to be your mates, and they’ve got the stuff,
Ref. F1A8-A
to get the principles to come in, but the first two principles are effectively sociotechnical. So they’re about domain ownership. You should get counterparty information or customer information from the system that originates it. That way, it’s most likely not to be broken, basically.So if there’s a data feed team in my bank, I should be getting my data feed data from them, not from the data warehouse team?Yeah, or the finance team that happened to be your mates, and they’ve got the stuff,
Ref. F1A8-B
to get the principles to come in, but the first two principles are effectively sociotechnical. So they’re about domain ownership. You should get counterparty information or customer information from the system that originates it. That way, it’s most likely not to be broken, basically.So if there’s a data feed team in my bank, I should be getting my data feed data from them, not from the data warehouse team?Yeah, or the finance team that happened to be your mates, and they’ve got the stuff,
Ref. F1A8-C
to get the principles to come in, but the first two principles are effectively sociotechnical. So they’re about domain ownership. You should get counterparty information or customer information from the system that originates it. That way, it’s most likely not to be broken, basically.So if there’s a data feed team in my bank, I should be getting my data feed data from them, not from the data warehouse team?Yeah, or the finance team that happened to be your mates, and they’ve got the stuff,
Ref. F1A8-D
don’t really see it as part of your job so much. It’s like your primary reason for existing is to be a trading system and help these traders, not to disseminate information to the organization. So data as a product is about trying to reshape those priorities and say, “Look, being a data product in an organization these days is so important to the overall operation of the
Ref. 9B8E-E
don’t really see it as part of your job so much. It’s like your primary reason for existing is to be a trading system and help these traders, not to disseminate information to the organization. So data as a product is about trying to reshape those priorities and say, “Look, being a data product in an organization these days is so important to the overall operation of the
Ref. 9B8E-F
the customer information, I need to better get access to it immediately. I don’t have to ask anyone. I don’t have to raise a request, or maybe you have to ask permission to see the data or something, but basically, it’s all automated for me, and I can get hold or access to the whole dataset. The second one is governance, which is a bit like adding unit testing in agile. It’s like you don’t have to do it, but if you don’t do it, you’re probably going to make a mess of it. If you add it, you got a much better chance of succeeding.Right. Okay. So what is it? What’s the governance piece?
Ref. 3C57-G
Governance is actually… Yeah. It’s like you can do a whole podcast on that, but effectively, it’s a number of tools which will help you in an, actually, often, imperfect way, but they help you better manage data in the organization. So things that governance tools tend to do include answering questions like, “How do I find this piece of data? Okay. There’s a problem with this piece of data,
Ref. E924-H
Governance is actually… Yeah. It’s like you can do a whole podcast on that, but effectively, it’s a number of tools which will help you in an, actually, often, imperfect way, but they help you better manage data in the organization. So things that governance tools tend to do include answering questions like, “How do I find this piece of data? Okay. There’s a problem with this piece of data,
Ref. E924-I
Governance is actually… Yeah. It’s like you can do a whole podcast on that, but effectively, it’s a number of tools which will help you in an, actually, often, imperfect way, but they help you better manage data in the organization. So things that governance tools tend to do include answering questions like, “How do I find this piece of data? Okay. There’s a problem with this piece of data,
Ref. E924-J
If I want to change this field in a non-accurately-compatible way, how do I do it without it becoming some projects I have to hire five people to go and execute on because we don’t know who consumes the data and how we’re going to notify them, and ask them if it’s okay? Are they actually going to change their side of things at any point in time soon? So we’re going to chase that down. So just from a practical sense, it’s just like an operation of a large distributed data system, which is what a company ends up being. Doing
Ref. 5B37-K
which we’ll do in Confluent Cloud, which will do it efficiently, store that at a low cost for you.Then, you’ve got the discovery piece, which is what you asked about, which is covered by, basically, our data governance features. So what those allow you to do, a concrete example, is you basically have… If you use event streams to build a data mesh, the simplest approach is basically the data… Each data product is represented as an event stream. It’s just the
Ref. 7E4B-L
which we’ll do in Confluent Cloud, which will do it efficiently, store that at a low cost for you.Then, you’ve got the discovery piece, which is what you asked about, which is covered by, basically, our data governance features. So what those allow you to do, a concrete example, is you basically have… If you use event streams to build a data mesh, the simplest approach is basically the data… Each data product is represented as an event stream. It’s just the
Ref. 7E4B-M
which we’ll do in Confluent Cloud, which will do it efficiently, store that at a low cost for you.Then, you’ve got the discovery piece, which is what you asked about, which is covered by, basically, our data governance features. So what those allow you to do, a concrete example, is you basically have… If you use event streams to build a data mesh, the simplest approach is basically the data… Each data product is represented as an event stream. It’s just the
Ref. 7E4B-N
which we’ll do in Confluent Cloud, which will do it efficiently, store that at a low cost for you.Then, you’ve got the discovery piece, which is what you asked about, which is covered by, basically, our data governance features. So what those allow you to do, a concrete example, is you basically have… If you use event streams to build a data mesh, the simplest approach is basically the data… Each data product is represented as an event stream. It’s just the
Ref. 7E4B-O
which we’ll do in Confluent Cloud, which will do it efficiently, store that at a low cost for you.Then, you’ve got the discovery piece, which is what you asked about, which is covered by, basically, our data governance features. So what those allow you to do, a concrete example, is you basically have… If you use event streams to build a data mesh, the simplest approach is basically the data… Each data product is represented as an event stream. It’s just the
Ref. 7E4B-P
which we’ll do in Confluent Cloud, which will do it efficiently, store that at a low cost for you.Then, you’ve got the discovery piece, which is what you asked about, which is covered by, basically, our data governance features. So what those allow you to do, a concrete example, is you basically have… If you use event streams to build a data mesh, the simplest approach is basically the data… Each data product is represented as an event stream. It’s just the
Ref. 7E4B-Q
which we’ll do in Confluent Cloud, which will do it efficiently, store that at a low cost for you.Then, you’ve got the discovery piece, which is what you asked about, which is covered by, basically, our data governance features. So what those allow you to do, a concrete example, is you basically have… If you use event streams to build a data mesh, the simplest approach is basically the data… Each data product is represented as an event stream. It’s just the
Ref. 7E4B-R
which we’ll do in Confluent Cloud, which will do it efficiently, store that at a low cost for you.Then, you’ve got the discovery piece, which is what you asked about, which is covered by, basically, our data governance features. So what those allow you to do, a concrete example, is you basically have… If you use event streams to build a data mesh, the simplest approach is basically the data… Each data product is represented as an event stream. It’s just the
Ref. 7E4B-S
which we’ll do in Confluent Cloud, which will do it efficiently, store that at a low cost for you.Then, you’ve got the discovery piece, which is what you asked about, which is covered by, basically, our data governance features. So what those allow you to do, a concrete example, is you basically have… If you use event streams to build a data mesh, the simplest approach is basically the data… Each data product is represented as an event stream. It’s just the
Ref. 7E4B-T
which we’ll do in Confluent Cloud, which will do it efficiently, store that at a low cost for you.Then, you’ve got the discovery piece, which is what you asked about, which is covered by, basically, our data governance features. So what those allow you to do, a concrete example, is you basically have… If you use event streams to build a data mesh, the simplest approach is basically the data… Each data product is represented as an event stream. It’s just the
Ref. 7E4B-U
which we’ll do in Confluent Cloud, which will do it efficiently, store that at a low cost for you.Then, you’ve got the discovery piece, which is what you asked about, which is covered by, basically, our data governance features. So what those allow you to do, a concrete example, is you basically have… If you use event streams to build a data mesh, the simplest approach is basically the data… Each data product is represented as an event stream. It’s just the
Ref. 7E4B-V
which we’ll do in Confluent Cloud, which will do it efficiently, store that at a low cost for you.Then, you’ve got the discovery piece, which is what you asked about, which is covered by, basically, our data governance features. So what those allow you to do, a concrete example, is you basically have… If you use event streams to build a data mesh, the simplest approach is basically the data… Each data product is represented as an event stream. It’s just the
Ref. 7E4B-W
simplest way to do it. So if I have customer data, I would have a customer event stream in Confluent Cloud with maybe infinite storage enabled. It was enabled anyway, but you can store the whole dataset, and then you can provide that data to anyone once it’s… So how do they get hold of it?Well, you would tag that data, that topic, that event stream in Confluent Cloud as being customer
Ref. BF3D-X
simplest way to do it. So if I have customer data, I would have a customer event stream in Confluent Cloud with maybe infinite storage enabled. It was enabled anyway, but you can store the whole dataset, and then you can provide that data to anyone once it’s… So how do they get hold of it?Well, you would tag that data, that topic, that event stream in Confluent Cloud as being customer
Ref. BF3D-Y
simplest way to do it. So if I have customer data, I would have a customer event stream in Confluent Cloud with maybe infinite storage enabled. It was enabled anyway, but you can store the whole dataset, and then you can provide that data to anyone once it’s… So how do they get hold of it?Well, you would tag that data, that topic, that event stream in Confluent Cloud as being customer
Ref. BF3D-Z
simplest way to do it. So if I have customer data, I would have a customer event stream in Confluent Cloud with maybe infinite storage enabled. It was enabled anyway, but you can store the whole dataset, and then you can provide that data to anyone once it’s… So how do they get hold of it?Well, you would tag that data, that topic, that event stream in Confluent Cloud as being customer
Ref. BF3D-A
simplest way to do it. So if I have customer data, I would have a customer event stream in Confluent Cloud with maybe infinite storage enabled. It was enabled anyway, but you can store the whole dataset, and then you can provide that data to anyone once it’s… So how do they get hold of it?Well, you would tag that data, that topic, that event stream in Confluent Cloud as being customer
Ref. BF3D-B
simplest way to do it. So if I have customer data, I would have a customer event stream in Confluent Cloud with maybe infinite storage enabled. It was enabled anyway, but you can store the whole dataset, and then you can provide that data to anyone once it’s… So how do they get hold of it?Well, you would tag that data, that topic, that event stream in Confluent Cloud as being customer
Ref. BF3D-C
simplest way to do it. So if I have customer data, I would have a customer event stream in Confluent Cloud with maybe infinite storage enabled. It was enabled anyway, but you can store the whole dataset, and then you can provide that data to anyone once it’s… So how do they get hold of it?Well, you would tag that data, that topic, that event stream in Confluent Cloud as being customer
Ref. BF3D-D
data, and then people… and you actually also might target it with other things, like you might tag several sub-elements, which is other datasets that it might contain, so like secondary dimensions. If you’re tagging data in the cloud product, then you can do it with different dimensions, and you can also have an ability to search that pretty easily. You just put it into the search box and search away, and you will be able to find this tagged information. Then,
Ref. C05B-E
data, and then people… and you actually also might target it with other things, like you might tag several sub-elements, which is other datasets that it might contain, so like secondary dimensions. If you’re tagging data in the cloud product, then you can do it with different dimensions, and you can also have an ability to search that pretty easily. You just put it into the search box and search away, and you will be able to find this tagged information. Then,
Ref. C05B-F
data, and then people… and you actually also might target it with other things, like you might tag several sub-elements, which is other datasets that it might contain, so like secondary dimensions. If you’re tagging data in the cloud product, then you can do it with different dimensions, and you can also have an ability to search that pretty easily. You just put it into the search box and search away, and you will be able to find this tagged information. Then,
Ref. C05B-G
data, and then people… and you actually also might target it with other things, like you might tag several sub-elements, which is other datasets that it might contain, so like secondary dimensions. If you’re tagging data in the cloud product, then you can do it with different dimensions, and you can also have an ability to search that pretty easily. You just put it into the search box and search away, and you will be able to find this tagged information. Then,
Ref. C05B-H
data, and then people… and you actually also might target it with other things, like you might tag several sub-elements, which is other datasets that it might contain, so like secondary dimensions. If you’re tagging data in the cloud product, then you can do it with different dimensions, and you can also have an ability to search that pretty easily. You just put it into the search box and search away, and you will be able to find this tagged information. Then,
Ref. C05B-I
data, and then people… and you actually also might target it with other things, like you might tag several sub-elements, which is other datasets that it might contain, so like secondary dimensions. If you’re tagging data in the cloud product, then you can do it with different dimensions, and you can also have an ability to search that pretty easily. You just put it into the search box and search away, and you will be able to find this tagged information. Then,
Ref. C05B-J
data, and then people… and you actually also might target it with other things, like you might tag several sub-elements, which is other datasets that it might contain, so like secondary dimensions. If you’re tagging data in the cloud product, then you can do it with different dimensions, and you can also have an ability to search that pretty easily. You just put it into the search box and search away, and you will be able to find this tagged information. Then,
Ref. C05B-K
you can basically pull up a lineage graph, which will show you how that data flows inside the mesh. So you can see, for example, if it’s recombining with something in a derivative data product, but the easiest way to do a data mesh is actually not to have derivative data products. You just have like a… You keep the architecture as simple as you can. It’s like the Law of Demeter…So you stand with the primary resources?
Ref. DF0E-L
you can basically pull up a lineage graph, which will show you how that data flows inside the mesh. So you can see, for example, if it’s recombining with something in a derivative data product, but the easiest way to do a data mesh is actually not to have derivative data products. You just have like a… You keep the architecture as simple as you can. It’s like the Law of Demeter…So you stand with the primary resources?
Ref. DF0E-M
you can basically pull up a lineage graph, which will show you how that data flows inside the mesh. So you can see, for example, if it’s recombining with something in a derivative data product, but the easiest way to do a data mesh is actually not to have derivative data products. You just have like a… You keep the architecture as simple as you can. It’s like the Law of Demeter…So you stand with the primary resources?
Ref. DF0E-N
you can basically pull up a lineage graph, which will show you how that data flows inside the mesh. So you can see, for example, if it’s recombining with something in a derivative data product, but the easiest way to do a data mesh is actually not to have derivative data products. You just have like a… You keep the architecture as simple as you can. It’s like the Law of Demeter…So you stand with the primary resources?
Ref. DF0E-O
you can basically pull up a lineage graph, which will show you how that data flows inside the mesh. So you can see, for example, if it’s recombining with something in a derivative data product, but the easiest way to do a data mesh is actually not to have derivative data products. You just have like a… You keep the architecture as simple as you can. It’s like the Law of Demeter…So you stand with the primary resources?
Ref. DF0E-P
you can basically pull up a lineage graph, which will show you how that data flows inside the mesh. So you can see, for example, if it’s recombining with something in a derivative data product, but the easiest way to do a data mesh is actually not to have derivative data products. You just have like a… You keep the architecture as simple as you can. It’s like the Law of Demeter…So you stand with the primary resources?
Ref. DF0E-Q
Yeah. It’s like the Law of Demeter for the data, if you’ve ever come across the Law of Demeter.I can’t remember it. Remind me.Basically, it’s like the chain calls. So like you don’t have like “food.bar.whatever.” Those very long call chains because it creates… If you have a very long chain, like eight different object calls, what you’re really doing is creating a very tight coupling down to this method. You’re
Ref. 0F60-R
Yeah. It’s like the Law of Demeter for the data, if you’ve ever come across the Law of Demeter.I can’t remember it. Remind me.Basically, it’s like the chain calls. So like you don’t have like “food.bar.whatever.” Those very long call chains because it creates… If you have a very long chain, like eight different object calls, what you’re really doing is creating a very tight coupling down to this method. You’re
Ref. 0F60-S
data meshes or something about data meshes? Is this well-suited to event streaming, or is this an any-database, any-architecture kind of approach?No. I think there aren’t that many ways to do a data mesh, and event streaming is definitely the primary one. No doubt about that. I mean, you’ve got a many-to-many architecture, so you need… The primary functional principles that a data mesh has to provide is you need to be able to get
Ref. 6DE7-T
data meshes or something about data meshes? Is this well-suited to event streaming, or is this an any-database, any-architecture kind of approach?No. I think there aren’t that many ways to do a data mesh, and event streaming is definitely the primary one. No doubt about that. I mean, you’ve got a many-to-many architecture, so you need… The primary functional principles that a data mesh has to provide is you need to be able to get
Ref. 6DE7-U
data meshes or something about data meshes? Is this well-suited to event streaming, or is this an any-database, any-architecture kind of approach?No. I think there aren’t that many ways to do a data mesh, and event streaming is definitely the primary one. No doubt about that. I mean, you’ve got a many-to-many architecture, so you need… The primary functional principles that a data mesh has to provide is you need to be able to get
Ref. 6DE7-V
data meshes or something about data meshes? Is this well-suited to event streaming, or is this an any-database, any-architecture kind of approach?No. I think there aren’t that many ways to do a data mesh, and event streaming is definitely the primary one. No doubt about that. I mean, you’ve got a many-to-many architecture, so you need… The primary functional principles that a data mesh has to provide is you need to be able to get
Ref. 6DE7-W
They don’t need all the data that the organization has, so they’re not… They have some small subsets of data that they’re actually interested in, and because the mesh provides them data on demand, yeah, they don’t really have this… They don’t feel responsible for taking all of the data.So that point is worth drilling into, I think, because it’s a little bit subtle, but if you think about it in the traditional world of flat file transfers or enterprise messaging,
Ref. 831C-X
They don’t need all the data that the organization has, so they’re not… They have some small subsets of data that they’re actually interested in, and because the mesh provides them data on demand, yeah, they don’t really have this… They don’t feel responsible for taking all of the data.So that point is worth drilling into, I think, because it’s a little bit subtle, but if you think about it in the traditional world of flat file transfers or enterprise messaging,
Ref. 831C-Y
They don’t need all the data that the organization has, so they’re not… They have some small subsets of data that they’re actually interested in, and because the mesh provides them data on demand, yeah, they don’t really have this… They don’t feel responsible for taking all of the data.So that point is worth drilling into, I think, because it’s a little bit subtle, but if you think about it in the traditional world of flat file transfers or enterprise messaging,
Ref. 831C-Z
They don’t need all the data that the organization has, so they’re not… They have some small subsets of data that they’re actually interested in, and because the mesh provides them data on demand, yeah, they don’t really have this… They don’t feel responsible for taking all of the data.So that point is worth drilling into, I think, because it’s a little bit subtle, but if you think about it in the traditional world of flat file transfers or enterprise messaging,
Ref. 831C-A
They don’t need all the data that the organization has, so they’re not… They have some small subsets of data that they’re actually interested in, and because the mesh provides them data on demand, yeah, they don’t really have this… They don’t feel responsible for taking all of the data.So that point is worth drilling into, I think, because it’s a little bit subtle, but if you think about it in the traditional world of flat file transfers or enterprise messaging,
Ref. 831C-B
They don’t need all the data that the organization has, so they’re not… They have some small subsets of data that they’re actually interested in, and because the mesh provides them data on demand, yeah, they don’t really have this… They don’t feel responsible for taking all of the data.So that point is worth drilling into, I think, because it’s a little bit subtle, but if you think about it in the traditional world of flat file transfers or enterprise messaging,
Ref. 831C-C
as dataset change. You have to manage the life span of that data and all that sort of stuff, the life cycle of that data and so forth.Yeah. You know what it reminds me of? We had Gerard Klijs on the show a couple of weeks back talking about GraphQL.Okay.And how you can build up queries that just take the fields you need.Yeah.As a supplier of that, you can auditing which fields are actually in use and
Ref. 20A5-D