Designing Data-Intensive Applications
Kleppmann, Martin
Highlights & Annotations
Databases have traditionally not supported this kind of notification mechanism very well: relational
#
Ref. 2001-B
Whether message loss is acceptable depends very much on the application. For example, with sensor
Ref. E59D-C
book has been that for any given problem, there are several solutions, all
Ref. 2299-D
Thus, the most appropriate choice of software tool also depends on the circumstances. Every piece of
Ref. 632E-E
Faced with this profusion of alternatives, the first challenge is then to figure out the mapping
Ref. 08B5-F
full-text search index in
Ref. D4E4-G
more sophisticated search
Ref. 327E-H
When copies of the same data need to be maintained in several storage systems in order to satisfy
Ref. 9465-I
For example, you might arrange for data to first be written to a system of record database,
Ref. FDC0-J
If it is possible for you to funnel all user input through a single system that decides on an
Ref. E861-K
In most cases, constructing a totally ordered log requires all events to pass through a single
Ref. CAB7-L
The principle of deterministic functions with well-defined inputs and outputs is not only good for
Ref. B702-M
Storing data is normally quite straightforward if you don’t have to worry about how it is going to
Ref. D537-N
discuss what you can do with the stream once you have it
Ref. 58F9-O
Different applications have different requirements, and the best choice of technology for one use case may well be different from the best choice for another use case. It therefore seems likely that in the foreseeable future, relational databases will continue to be used alongside a broad variety of nonrelational datastores—an idea that is sometimes called polyglot persistence [3].
Ref. DD38-P
No matter how the state changes, there was always a sequence of events that caused those changes.
Ref. 092D-Q
Transaction logs record all the changes made to the database. High-speed appends are the only way to
Ref. C19C-R
Immutability in databases is an old idea. For example, accountants have been using immutability for
Ref. 49CB-S
If a mistake is made, accountants don’t erase or change the incorrect transaction in the
Ref. 34D4-T
Although such auditability is particularly important in financial systems, it is also beneficial for
Ref. 9C9B-U
Immutable events also capture more information than just the current state. For example, on a
Ref. C19E-V
Moreover, by separating mutable state from the immutable event log, you can derive several different
Ref. 82F4-W
would make sense for many other
Ref. 5B6E-X
Having an explicit translation step from an event log to a database makes it easier to evolve your
Ref. A36B-Y
The traditional approach to database and schema design is based on the fallacy that data must be
Ref. 0F62-Z
The biggest downside of event sourcing and change data capture is that the consumers of the event
Ref. C6DC-B
somewhere: typically, a database. The
Ref. ECBD-E
support for change streams”).
Ref. 7AE0-F
Taken together, the write path and the
Ref. 2EDE-J
read path encompass the whole journey of the data, from the
Ref. 0C2D-K
Some people assert that we should
Ref. 0953-L
data corruption that can occur.
Ref. 7425-N
deriving an alternative view onto some dataset so that you can
Ref. 01DA-O
for example, users of real estate
Ref. 8AF9-Q
Conventional search engines first index the documents and then run queries over the index. By
Ref. A6E2-R
There is no point in looking at the system clock of the machine
Ref. 9AD8-S
This approach has the advantage of being simple, and it is reasonable if the delay between event
Ref. 965D-T
For example, say a user
Ref. 6C83-U
As humans, we are able to cope with such discontinuities, but stream
Ref. FA18-V
Confusing event time and processing time leads to bad data
Ref. 85DD-W
tricky problem when defining windows in terms of event time is that you can never be sure when you
Ref. 977B-X
You can time out and declare a window ready after you have not seen any new events for a while, but
Ref. F62E-Y
Ignore the straggler events, as they are probably a small percentage of events in normal
Ref. A9BB-Z
However, the clock on a user-controlled
Ref. 6B58-A
To adjust for incorrect device clocks, one approach is to log three timestamps
Ref. 9020-B
By subtracting the second timestamp from the third, you can estimate the offset between the device
Ref. 5D2D-C
You could implement a 1-minute tumbling window by taking each event timestamp and
Ref. A044-D
Unlike the other window types, a session window has no fixed duration. Instead, it is defined by
Ref. 6F49-E
can choose a suitable window for the join—for
Ref. 24B1-F
Note that embedding the details of the search in the click event is not equivalent to joining the
Ref. 65D3-G
To implement this type of join, a stream processor needs to maintain state: for example, all the
Ref. 1AE0-H