The Myth of Artificial Intelligence
Erik J. Larson
Highlights & Annotations
The myth of artificial intelligence is that its arrival is inevitable, and only a matter of time—that we have already embarked on the path that will lead to human-level AI, and then superintelligence. We have not.
Ref. 2EA0-A
The path exists only in our imaginations. Yet the inevitability of AI is so ingrained in popular discussion—promoted by media pundits, thought leaders like Elon Musk, and even many AI scientists (though certainly not all)—that arguing against it is often taken as a form of Luddism, or at the very least a shortsighted view of the future of technology and a dangerous failure to prepare for
Ref. 8D19-B
And here we should say it directly: all evidence suggests that human and machine intelligence are radically different. The myth of AI insists that the differences are only temporary, and that more powerful systems will eventually erase them. Futurists like Ray Kurzweil and philosopher Nick Bostrom, prominent purveyors of the myth, talk not only as if human-level AI were inevitable, but as if, soon after its arrival, superintelligent machines would leave us far behind.
Ref. C1B3-C
As we successfully apply simpler, narrow versions of intelligence that benefit from faster computers and lots of data, we are not making incremental progress, but rather picking low-hanging fruit. The jump to general “common sense” is completely different, and there’s no known path from the one to the other. No algorithm exists for general intelligence. And we have good reason to be skeptical that such an algorithm will emerge through further efforts on deep learning systems or any other approach popular today. Much more likely, it will require a major scientific breakthrough, and no one currently has the slightest idea what such a breakthrough would even look like, let alone the details of getting
Ref. B48A-D
Who should read this book? Certainly, anyone should who is excited about AI but wonders why it is always ten or twenty years away. There is a scientific reason for this, which I explain.
Ref. 1D29-E
inference that contribute to understanding. Machine learning is only induction (as will be discussed in Chapter 11), and so researchers in the field should be more skeptical than they typically are about its prospects for artificial general intelligence.
Ref. 590D-F
They quote AI pioneer Yoshua Bengio’s observation that deep neural networks “tend to learn statistical regularities in the dataset rather than higher-level abstract concepts.”8
Ref. F879-G
Because many examples are required to boost learning (in the case of Go, the example games run into the millions), the systems are glorified enumerative induction engines, guided by the formation of hypotheses within the constraints of the game features and rules of play. The worlds are closed by rules and they are regular—it’s a kind of bell-curve world where the best moves are the most frequent ones leading to wins. This isn’t the real world that artificial general intelligence must master, which sits outside human-engineered games and research facilities. The difference means everything.
Ref. D7AD-H
Thinking in the real world depends on the sensitive detection of abnormality, or exceptions. A busy city street, for example, is full of exceptions. This is one reason we don’t have robots strolling around Manhattan (or, for another reason related to exceptions, conversing with human beings). A Manhattan robot would quickly fall over, cause a traffic jam by inadvisably venturing onto the street, bump into people, or worse. Manhattan isn’t Atari or Go—and it’s not a scaled-up version of it, either. A deep learning “brain” would be (and is) a severe liability in the real world, as is any inductive system standing in for genuine intelligence. If we could instruct Russell’s turkey that it was playing the “game” of avoiding becoming dinner, it might learn how to make itself scarce on Christmas Eve. But then it wouldn’t be a good inductivist turkey; it would have prior knowledge, supplied by humans.
Ref. 7402-I
Getting a misclassified photo on Facebook or a boring movie recommendation on Netflix may not get us into much trouble with reliance on data-driven induction, but driverless cars and other critical technologies certainly can. A growing number of AI scientists understand the issue. Oren Etzioni, head of the Allen Institute for Artificial Intelligence, calls machine learning and big data “high-capacity statistical models.”9 That’s impressive computer science, but it’s not general intelligence. Intelligent minds bring understanding to data, and can connect dots that lead to an appreciation of failure points and abnormalities. Data and data analysis aren’t enough.
Ref. F20E-J
an illuminating critique of induction as used for financial forecasting, former stock trader Nassim Nicholas Taleb divides statistical prediction problems into four quadrants, with the variables being, first, whether the decision to be made is simple (binary) or complex, and second, whether the randomness involved is “mediocre” or extreme. Problems in the first quadrant call for simple decisions regarding a thin-tailed probability distribution. Outcomes are relatively easy to predict statistically, and anomalous events have small impact when they happen. Second-quadrant problems are easy to predict but when the unexpected happens it has large consequences.
Ref. B32E-K
Third-quadrant problems involve complex decisions, but manageable consequences. Then there are the “turkey” problems, in the fourth quadrant. They involve complex decisions coupled with fat-tailed probability distributions, and high-impact consequences. Think stock market crashes. Taleb fingers overconfidence in induction as a key factor in exacerbating the impact of these events. It’s not just that our inductive methods don’t work, it’s that when we rely on them we fail to make use of better approaches, with potentially catastrophic consequences. In effect, we get locked into machine thinking, when analyzing the past is of no help. This is one reason that inductive superintelligence will generate stupid outcomes. As Taleb quips,
Ref. 7221-L
it is important to know how “not to become a turkey.”
Ref. 0D27-M
Turing Prize winner Judea Pearl, a noted computer scientist whose life’s work has been to develop effective computational methods for causal reasoning, argues in his 2018 The Book of Why that machine learning can never supply real understanding because the analysis of data does not bridge to knowledge of the causal structure of the real world, essential for intelligence. The “ladder of causation,” as he calls it, steps up from associating data points (seeing and observing) to intervening in the world (doing), which requires knowledge of causes. Then it moves to counterfactual
Ref. F857-N
Pearl here does us a favor by connecting observations and data.12 He also points out that movement up this ladder involves different types of thinking (more specifically, inference). Associating doesn’t “scale” to causal thinking or imaginings. We can recast the problem of scaling from artificial intelligence to artificial general intelligence as precisely the problem of discovering new theories to enable climbing this ladder (or, in the present framework, of moving from induction to other more powerful types of inference).
Ref. 315B-O
Your parents, or your partner or a friend, may have accused you of lacking common sense, but take heart: you have much more than any AI system, by far. As Turing well knew, common sense is what enables two people to engage in ordinary conversation.
Ref. 9F39-P
Stuart Russell begins his list of “Conceptual Breakthroughs to Come” with the as-yet mysterious “language and common sense.”15 Pearl, too, acknowledges language understanding as unsolved (and offers his own “mini-Turing test,” which requires understanding of causation).16
Ref. F15D-Q
So, to make progress in AI, we must look past induction. (If you’re on the association rung of a metaphorical ladder, look up.) Let’s do this next—or at least make a start. On our way to the necessity of abductive inference, we should first get into specifics; in particular, machine learning and its input source, big data.
Ref. 325E-R
Learning is “improving performance based on experience.”1 Machine learning is getting computers to improve their performance based on experience.
Ref. D5A0-S
Machine learning, in other words, is computational treatment of induction—acquiring knowledge from experience. Machine learning is just automated induction, so we shouldn’t be surprised that troubles with inductive inference spell troubles for machine learning. Fleshing out these unavoidable troubles is the point of this chapter.
Ref. D223-T
virtue of requiring significantly less data preparation, since labels aren’t added to training data by humans. But as a direct consequence of this loss of a human “signal,” unsupervised systems lag far behind their supervised cousins on real-world tasks.
Ref. 628F-U
Machine learning, viewed conceptually and mathematically, is intrinsically a simulation.
Ref. FEE0-V
data-intensive problem, and if there’s some possible machine learning treatment of it they deem it to be “well-defined.” They assume that some function can simulate a behavior in the real world or actual system. The actual system is assumed to have a hidden pattern that gives rise to the output observable in the data. The task is not to glean the actual hidden pattern directly—which would require understanding more than the data—but rather to simulate the hidden pattern by analyzing its “footprints” in data. This distinction is important.
Ref. 5170-W
The result of training the system is the generation of f as a model or theory of the behavior in the data.
Ref. 14E2-X
Machine learning is inherently the simulation of a process that is too complicated or unknowable, in the sense that ready-made programming rules aren’t available, or that would take too much human effort to get right.
Ref. 9258-Y
Patterns sometimes emerge from data after unsupervised learning reveals them. But the humans identify…
Ref. 6240-Z
the algorithm doesn’t know to look for it. If it did, that would be…
Ref. 41E5-A
Most of us know about functions from math class, and the classic example is arithmetic: 2 + 2 = 4 is an equation whose operator, the plus sign for addition, is technically a function. Functions return unique answers given their input: thus the addition…
Ref. D4C2-B
Early AI scientists assumed many problems in the real world could be solved by supplying rules amounting to functions with known outputs. as with addition. It turned out, however, that most problems that count as interesting to AI researchers have…
Ref. 0C86-C
Hence, we now have machine learning, which seeks to approximate or…
Ref. 31BF-D
This “fakeness” of machine learning goes unnoticed when system performance is notably close to a human’s, or better. But the simulative nature of machine learning gets exposed quickly when…
Ref. CE70-E
This fact is of enormous importance, and gets obscured too often in discussions…
Ref. 37D4-F
Here’s another fact: the limits of a machine learning system’s world are precisely established by the dataset given to it in training. The real world generates datasets all day long, twenty-four hours a day, seven days a week, perpetually. Thus any given dataset is only a very small time slice representing, at best, partial evidence of the behavior of real-world systems. This is one reason why the long tail of unlikely…
Ref. 047F-G
This is enormously important for discussions of deep learning and artificial general intelligence, and it raises a number of troubling considerations about how, when, and to what extent we should trust systems that technically don’t understand the phenomenon…
Ref. 5136-H
There are at least two problems with machine learning as a potential path to general intelligence. One, already touched on, is that learning can succeed, at least for a while, without any understanding. A trained system can predict outcomes, seemingly understanding a problem, until an unexpected change or event renders the simulation worthless. In fact, simulations that fail, as they so often do, can be even worse than worthless: think of using machine learning in driving, and having the…
Ref. 8FB6-I
Conversation switches topics. Stocks follow an upward trend, then some exogenous event like a corporate restructuring, an earthquake, or a geopolitical instability sends them downward. Joe may love conservative bloggers until the day his friend Lewis suggests a left-leaning zine, which his personalized news feed has all but screened out and hidden from him. Mary may love horses until Sally, her own horse, dies and move on to pursue a passion for Zen. And so on. Machine learning is really a misnomer, since systems are not learning in the sense that we do, by gaining an increasingly deep and robust…
Ref. 6C96-J
Common sense goes a long way toward understanding the limitations of machine learning: it tells us life is unpredictable. Thus the truly damning critique of…
Ref. 2618-K
But all machine learning is a time-slice of the past; when the future is open-ended and changes are desired, systems must be retrained. Machine learning can only trail behind our flux of experience, simulating (we hope)…
Ref. FE16-L
The simulative nature of machine learning also helps explain why it’s perpetually stuck on narrowly defined applications, showing little or no progress…
Ref. FEDF-M
Systems must be largely redesigned and ported to solve other problems, even when similar. Calling such systems learners is ironic, because the meaning of the word learn for humans essentially involves escaping narrow performances…
Ref. 1157-N
But chess-playing systems don’t play the more complex game of Go. Go systems don’t even play chess. Even the much-touted Atari system by Google’s DeepMind generalizes only across different Atari games, and…
Ref. 0B6C-O
The only games it played well were those with strict parameters. The most powerful learning systems are much more narrow and brittle than we might suppose. This makes sense, though, because the systems are just simulations. What else should we expect? The problems with induction noted above stem not from experience per se, but from the attempt to ground knowledge and…
Ref. 2788-P
We should not be surprised, then, that all the problems of induction bedevil machine learning and data-centric approaches to AI. Data are just observed facts, stored in computers for accessibility. And observed facts, no matter how much we…
Ref. 4A01-Q
the relatively recent availability of massive amounts of data, which at least initially were thought to empower AI systems with previously unavailable “smarts” and insight. In a sense, this is true, but not in the sense necessary…
Ref. EA2B-R
the business-analytics firm SAS, quickly invented a new executive title: Vice President of Big Data. Hype,
Ref. 76BE-S
thought that big data itself was responsible for better results, but as machine learning approaches took off, researchers started crediting the algorithms.
Ref. 55F1-T
Deep learning and other machine learning and statistical techniques resulted in obvious improvements. But the algorithms’ performance was tied to the larger datasets.
Ref. B912-U
truth it was because there was, initially, a hodgepodge of older statistical techniques in use for data science and machine learning in AI that the sought-after insights emerging from big data were mistakenly pinned to the data volume itself.
Ref. 4347-V
This was a ridiculous proposition from the start; data points are facts and, again, can’t become insightful themselves. Although this has become apparent only in the rearview mirror, the early deep learning successes on visual object recognition, in the ImageNet competitions, signaled the beginning of a transfer of zeal
Ref. B7CA-W
Thus big data has peaked, and now seems to be receding from popular discussion almost as quickly as it appeared. The focus on deep learning makes sense, because after all, the algorithms rather than just the data are responsible for trouncing human champions at Go, mastering Atari games, driving cars, and the rest.
Ref. A878-X
The immediate problem is that machine learning is inherently data-driven. I’ve made this point above; in what follows, I will make it more precisely.
Ref. 249A-Y
Data-driven methods generally suffer from what we might call an empirical constraint.
Ref. 1668-Z
particular problem typically start by identifying syntactic features, or evidence, in datasets that help learning algorithms home in on the desired output. Feature engineering is essentially a skill, and big money is paid to engineers and specialists with a knack for identifying useful features (and also the talent to tune parameters in the algorithm, another step in successful training). Once identified, features are extracted during training, test, and production phases, purely computationally. The purely computational constraint is the crux.
Ref. D9FB-A
Alas, the human-supplied feature cannot be added to other photos not prepared this way, so the feature is not syntactically extractable, and is therefore useless. This is the germ of the problem. It means that features useful for machine learning must always be in the data, and no clues can be provided by humans that can’t also be exploited by the machine “in the wild” when testing the system or after it is released for use.
Ref. F272-B
a horse in it, and the output is a label: HORSE. The machine learning system (“learner”) thus receives labeled or tagged pictures of horses as input-output pairs, and the learning task is to simulate the tagging of images so that only horse images receive the HORSE label. Training is continued until the learning produces a model—which is a statistical bit of code representing the probability of a horse given the input—that meets an accuracy requirement (or doesn’t).
Ref. 420F-C
label new, previously unseen images. This is the production phase. A feedback loop is often part of production, where mislabeled horse images can be corrected by a human and sent back to the learner to retrain.
Ref. D7FB-D
This can go on indefinitely, although the accuracy improvements will taper off at some point. User interaction on Facebook is an example of a feedback loop: when you click on a piece of content, or tag a photo as a friend, you send data back to Facebook’s deep learning–based training system, which perpetually analyzes and modifies your click stream to keep modifying
Ref. 4502-E
The empirical constraint is a problem for machine learning because all the additional information you might want to supply to the learner can’t be used.
Ref. 73FE-F
subsystem must handle it, which introduces an error rate—note that co-reference is a much more difficult problem than named entity recognition.
Ref. EE93-G
decided to name the operating system ‘Blue Box.’ ” By context, “Blue Box” is used as the product, not the company, but the named entity recognition system cannot use this contextual information during training. Why? Because it then can’t extract it purely by syntax alone, from its input during production.
Ref. FB7D-H
Though named entity recognition is a relatively simple task in natural language processing, even here we see the inherent limitations of purely data-driven approaches. A mention of Blue Box in a post about the product easily becomes a false positive, and gets labeled as about the company.
Ref. A57F-I
Like the empirical constraint, this is again a straightforward consequence of—really, a restatement of—the enumerative basis of inductive inference.
Ref. B5C6-J
Ironically, the value of big data for machine learning is actually an exposition of the assumption: more is better.
Ref. 43C6-K
Machine learning systems are sophisticated counting machines.
Ref. 709A-L
The frequency assumption comes into play because, in general, the greater the frequency of hits on this feature, the more useful it is for training. In data science, this is necessary; if the features in data are just random, nothing can be learned (recall the earlier discussion of this).
Ref. 3BB9-M
The frequency assumption explains “filter bubbles” in personalized content online, as well. Someone who despises right-leaning politics eventually receives only left-leaning opinions and other news content. The deep learning–based system controlling this outcome is actually just training a model that, over time, recognizes the patterns of the news you like. It counts up your clicks and starts giving you more of the same.
Ref. 9629-N
The learning algorithm isn’t in the knowledge business to start with, so the example is just another sequence of words. Sarcasm isn’t a word-based feature, and neither is it as frequent as literal meaning. Machine learning is notoriously obtuse about such language phenomena—much to the chagrin of companies like Google. It would love to detect sarcasm when targeting ads. For example, if “Get me some sunscreen!” is a sarcastic comment by someone posting about a blizzard, a context-sensitive ad placement system should try serving up ads for battery-heated socks, instead.
Ref. DCD5-O
Fundamentally, the underlying theory of inference is at the heart of the problem. Induction requires intelligence to arise from data analysis, but intelligence is brought to the analysis of data as a prior and necessary step. We can always hope that advances in feature engineering or algorithm design will lead to a more complete theory of computational inference in the future. But we should be profoundly skeptical. It is precisely the empirical constraint and the frequency assumption that limit the scope and effectiveness of detectable features—which are, after all, in the data to be syntactically analyzed. This is another way of saying what philosophers and scientists of every stripe have learned long ago: Induction is not enough.
Ref. BA5F-P
Saturation occurs when adding more data—more examples—to a learning algorithm (or a statistical technique) adds nothing to the performance of the systems. Training can’t go on forever returning higher and higher accuracy on some problem. Eventually, adding more data ceases to boost performance. Successful systems reach an acceptable accuracy prior to saturation; if they don’t, then the problem can’t be solved using machine learning. A saturated model is final, and won’t improve any more by adding more data. It might even get worse in some cases, although the reasons are too technical to be explained here.
Ref. 8945-Q
Peter Norvig, Director of Research at Google, let slip in The Atlantic back in 2013 his worries about saturation: “We could draw this curve: as we gain more data, how much better does our system get?” he asked. “And the answer is, it’s still improving—but we are getting to the point where we get less benefit than we did in the past.”13
Ref. 536C-R
The models are saturating, as Norvig predicted. New approaches will no doubt be required. Such considerations are one reason why so-called scaling from initial successes to full-blown ones is naive and simplistic. Systems don’t scale indefinitely. Machine learning—deep learning—isn’t a silver bullet.
Ref. 9A70-S