Cover of Designing Data-Intensive Applications
books

Designing Data-Intensive Applications

Kleppmann, Martin

60 highlights
critical-insight on-architecture writing-quotable prose-restructure-reuse re-read:power-insights research-streaming-technical-challenges

Highlights & Annotations

Databases have traditionally not supported this kind of notification mechanism very well: relational

#

Ref. 2001-B

Whether message loss is acceptable depends very much on the application. For example, with sensor

Ref. E59D-C

book has been that for any given problem, there are several solutions, all

Ref. 2299-D

Thus, the most appropriate choice of software tool also depends on the circumstances. Every piece of

Ref. 632E-E

Faced with this profusion of alternatives, the first challenge is then to figure out the mapping

Ref. 08B5-F

full-text search index in

Ref. D4E4-G

more sophisticated search

Ref. 327E-H

When copies of the same data need to be maintained in several storage systems in order to satisfy

Ref. 9465-I

For example, you might arrange for data to first be written to a system of record database,

Ref. FDC0-J

If it is possible for you to funnel all user input through a single system that decides on an

Ref. E861-K

In most cases, constructing a totally ordered log requires all events to pass through a single

Ref. CAB7-L

The principle of deterministic functions with well-defined inputs and outputs is not only good for

Ref. B702-M

Storing data is normally quite straightforward if you don’t have to worry about how it is going to

Ref. D537-N

discuss what you can do with the stream once you have it

Ref. 58F9-O

Different applications have different requirements, and the best choice of technology for one use case may well be different from the best choice for another use case. It therefore seems likely that in the foreseeable future, relational databases will continue to be used alongside a broad variety of nonrelational datastores—an idea that is sometimes called polyglot persistence [3].

Ref. DD38-P

No matter how the state changes, there was always a sequence of events that caused those changes.

Ref. 092D-Q

Transaction logs record all the changes made to the database. High-speed appends are the only way to

Ref. C19C-R

Immutability in databases is an old idea. For example, accountants have been using immutability for

Ref. 49CB-S

If a mistake is made, accountants don’t erase or change the incorrect transaction in the

Ref. 34D4-T

Although such auditability is particularly important in financial systems, it is also beneficial for

Ref. 9C9B-U

Immutable events also capture more information than just the current state. For example, on a

Ref. C19E-V

Moreover, by separating mutable state from the immutable event log, you can derive several different

Ref. 82F4-W

would make sense for many other

Ref. 5B6E-X

Having an explicit translation step from an event log to a database makes it easier to evolve your

Ref. A36B-Y

The traditional approach to database and schema design is based on the fallacy that data must be

Ref. 0F62-Z

The biggest downside of event sourcing and change data capture is that the consumers of the event

Ref. C6DC-B

somewhere: typically, a database. The

Ref. ECBD-E

support for change streams”).

Ref. 7AE0-F

Taken together, the write path and the

Ref. 2EDE-J

read path encompass the whole journey of the data, from the

Ref. 0C2D-K

Some people assert that we should

Ref. 0953-L

data corruption that can occur.

Ref. 7425-N

deriving an alternative view onto some dataset so that you can

Ref. 01DA-O

for example, users of real estate

Ref. 8AF9-Q

Conventional search engines first index the documents and then run queries over the index. By

Ref. A6E2-R

There is no point in looking at the system clock of the machine

Ref. 9AD8-S

This approach has the advantage of being simple, and it is reasonable if the delay between event

Ref. 965D-T

For example, say a user

Ref. 6C83-U

As humans, we are able to cope with such discontinuities, but stream

Ref. FA18-V

Confusing event time and processing time leads to bad data

Ref. 85DD-W

tricky problem when defining windows in terms of event time is that you can never be sure when you

Ref. 977B-X

You can time out and declare a window ready after you have not seen any new events for a while, but

Ref. F62E-Y

Ignore the straggler events, as they are probably a small percentage of events in normal

Ref. A9BB-Z

However, the clock on a user-controlled

Ref. 6B58-A

To adjust for incorrect device clocks, one approach is to log three timestamps

Ref. 9020-B

By subtracting the second timestamp from the third, you can estimate the offset between the device

Ref. 5D2D-C

You could implement a 1-minute tumbling window by taking each event timestamp and

Ref. A044-D

Unlike the other window types, a session window has no fixed duration. Instead, it is defined by

Ref. 6F49-E

can choose a suitable window for the join—for

Ref. 24B1-F

Note that embedding the details of the search in the click event is not equivalent to joining the

Ref. 65D3-G

To implement this type of join, a stream processor needs to maintain state: for example, all the

Ref. 1AE0-H