“Human Compatible”, Stuart Russell
I've finally gotten around to reading “Human Compatible” by Stuart Russell, a professor from Berkeley, known for co-authoring the widely recognized book on AI, “Artificial Intelligence: A Modern Approach” with Peter Norvig, now in its 4th edition.
This book (2019) addresses the question: what happens when we achieve our goal in AI? How do we ensure that this isn’t the last event in human history? How should we live in a future where machines' capabilities far surpass human abilities?
What if we succeed?
The book is divided into three parts. The first part discusses human and machine intelligence, how things have progressed, and where they are heading. The second part focuses on potential problems, with an emphasis on the issue of controlling machines more powerful than us. In the last part, Russell presents his view on how to ensure machines remain beneficial for humanity.
In the first part, the author raises the question is: what if we succeed? Why isn’t anyone thinking about this? What if we received a message from aliens saying they would arrive on Earth in 30-50 years? Why is the reaction to AI different? Russell believes that this is a potential risk to our species. Remember, this is not just anybody – he's one of the leading figures in the field, whose textbooks have educated generations.
In the history of AI, there have been many examples of overly optimistic promises. Why would it be different this time? No one knows for sure, but there have also been recent examples of overly pessimistic predictions, like the timeline for mastering the game of Go. The same happened with the harnessing of atomic energy: Rutherford (a leading figure in his field) declared nuclear energy promises as nonsense in a lecture on September 11, 1933, but the very next day, Leo Szilard, having read about Rutherford's speech at breakfast, figured out how to do it through a chain reaction while on a walk. It took less than 24 hours.
There’s a large second chapter about the history of natural and artificial intelligence, which I won't recap here. And there’s a chapter on expected progress in the near future, discussing more developed ecosystems, robotic machines, advanced assistants, smart homes, and global scale. There are also several conceptual breakthroughs needed to achieve superintelligence.
Among these breakthroughs is mastering common sense, which currently has no clear solution, but might be achievable through bootstrapping. There’s also the cumulative learning of concepts and theories, which seems far off, although the successes of deep learning in this area are promising. Discovering actions and hierarchical action planning is still unclear, especially in constructing a hierarchy of abstract actions. Ideally, a robot would also discover new useful actions at different levels of the hierarchy, from standing up to opening doors, tying knots, building houses, etc. According to Russell, this is the most crucial step towards human-level AI. This also includes managing mental activities, where an agent must choose from a multitude of diverse activities, combined with discovering new high-level actions, potentially leading to a powerful real-world decision-maker. It’s unclear if anything important is missing from this list.
A superintelligent machine is presumed to be more capable than an individual human. And the union of n such machines is more capable than a union of n humans. However, such a multi-agent design is likely not the most efficient; better designs with lower communication costs are probable.
An agent can process, read, and listen to more information. It can control millions of bodies and have access to millions of smartphone screens (already happening). It's likely to be better at predicting the future.
On the other hand, there are limitations. Some aspects of reality can’t be predicted. Some knowledge can’t be acquired quickly (although some tasks can be parallelized). And a significant limitation is that machines are not humans. We can “freely” understand and predict the actions of other people. Machines may find this more challenging, although they can run each other’s code. According to Russell, achieving human or superior abilities to understand humans will take longer than other abilities.
Universal intelligence could become Everything-as-a-Service (EaaS). Everyone could have an organization of agent-assistants at their disposal. This will likely significantly increase GDP and change the course of history. However, there are still limits on land and materials. In general, the potential future could be good and rich, but it could also be ruined. And the next part of the book is dedicated to this.
The second part deals with the problems and dangers of AI. There's a chapter on malicious use, a chapter on superintelligence, and a separate excellent chapter analyzing typical objections and beliefs from the debates on AI dangers. It’s a good collection of information, worth reading for a balanced and broad overview of various risks. I will not describe it here.
Beneficial AI
The last third of the book contains the main idea and is dedicated to the approach to beneficial AI that Russell hopes for.
His idea is that classic ML/AI often looks at a task as building a machine that optimizes a certain objective. There are fundamental problems with this approach because objectives are very difficult to set, easy to overlook something, they can change, etc. We’ve all probably dealt with a genie (or junior developer) or written a specification and can guess the difficulties 🙂 Once a machine is set loose, it may be impossible to correct or change the objective. Norbert Wiener wrote about this in the 1960s, see “Some Moral and Technical Consequences of Automation”.
Russell suggests a different approach:
1) Don’t embed any objectives in the machine other than “maximize the realization of human preferences”.
2) The machine should initially be uncertain about these preferences.
3) Its primary source of information about preferences should be human behavior.
From the perspective of point 1, the machine is purely altruistic, having no inherent value in its own existence (though, as I understand we do not touch the topic of self-conscious machines here, it will add complexity). From point 2, it's humble, awaiting help and guidance from humans in uncertain situations. And from point 3, it learns to predict human preferences, engaging in Inverse Reinforcement Learning (IRL) – trying to understand rewards through behavior.
IRL is a topic Russell (and even Andrew Ng, see their paper from 2000) is familiar with and is popular in his home at Berkeley (see Pieter Abbeel’s slides for example), but not only there, of course.
Russell acknowledges that there are many difficulties: people are different, there are many of them, they are good and bad, stupid and emotional, and everything can change over time. The author doesn’t have a ready-made solution, but he hopes for the best. Russell expects regulation and thinks its implementation will be quite painful, but hopes that overcoming industry resistance won’t require a Chernobyl-like event. Misuse remains a big problem, as seen in cybersecurity.
Overall, it’s a great book, and I recommend it. Especially to those who shout that there can’t be problems with AI, or who claim that there are only doomers scaring everyone with unrealistic scenarios involving paperclips.