"Building Machines That Learn and Think Like People", 7 years later

Oct 13, 2023

In the post about GPT-4V, I mentioned the paper “Building Machines That Learn and Think Like People” (https://arxiv.org/abs/1604.00289) by Brenden Lake, Tomer Ullman, Joshua Tenenbaum, and Samuel Gershman. In that seminal paper the authors suggested that truly human-like learning and thinking machines will have to reach beyond current engineering trends in both what they learn, and how they learn it.

GPT-4V is coming!

Grigory Sapunov

October 4, 2023

Authors: Zhengyuan Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Chung-Ching Lin, Zicheng Liu, Lijuan Wang Paper: https://arxiv.org/abs/2309.17421 Microsoft has released a 166-page article on the analysis of GPT-4V's capabilities, conceptually similar to their widely recognized 155-page article on

Read full story

There was a famous example of wrong image captioning (Figure 6 of the original paper, reproduced below), when a captioning model had no right causal model, and even if understanding the key objects in a scene correctly, it failed to understand the physical forces at work, the mental states of the people, or the causal relationships between the objects. The captions were funny:

The original Figure 6 from the paper: Perceiving scenes without intuitive physics, intuitive psychology, compositionality, and causality.

The original captions seven years ago was generated by the NeuralTalk2 image captioning model by Andrey Karpathy (https://github.com/karpathy/neuraltalk2). With GPT-4V finally available it would be strange not to try this again, with the new capabilities at hand.

The new captions by GPT-4V are really good! My prompt was just “Describe the image”.

The first image was a Wild West scene:

GPT-4V produced the following description:

“The image portrays a scene reminiscent of the Wild West. A dirt-covered street is flanked by wooden buildings, including a two-story structure with a sign reading "Last Chance Saloon." On the street, a man wearing a hat is riding a brown horse at a gallop. Unusually, he appears to be dragging another man behind the horse using a rope or lasso. This person is being pulled along the ground, creating a trail of dust behind him. There are bystanders observing the scene from the sidewalks and porches of the buildings. In the background, there are mountains, adding to the rugged setting of the scene.”

I want nothing to add. Seven years ago the NeuralTalk2 model gave the description “a woman riding a horse on a dirt road”.

The second image was an unusual situation with a plane:

What GPT-4V produced:

“The image captures a highly dramatic scene. A large airplane with propellers is seen flying extremely close to a bridge. The plane is at such a low altitude that its wing is almost touching the bridge's surface. The fuselage of the plane is oriented diagonally, suggesting that it might be in the midst of a descent or crash. The plane's propellers are clearly visible, indicating it's a propeller-driven aircraft. On the bridge, there is a taxi or a yellow car moving, seemingly unaware or unable to react to the looming plane. In the background, there are tall buildings and city structures, indicating this event is occurring in an urban area. The overall atmosphere of the image is intense, suggesting a moment of emergency or unexpected occurrence.”

The only thing I disagree is that the yellow taxi is likely deeply aware of the situation and stopped in emergency (the stop lights are on, the car is tilted to the right). But anyway, the description is good. Remember, the caption seven years ago was “an airplane is parked on the tarmac at an airport”.

The final image contains a natural disaster scene:

The GPT-4V was very wordy:

“The image depicts a distressing situation. Three men are seen wading through turbulent waters that appear to have flooded a street or coastal area. The man in the front is wearing a black shirt and orange shorts, and he seems to be leading the way, with a focused and urgent expression on his face. The second man, wearing a blue jacket, is carrying a large, orange lifejacket or flotation device on his back. The third man, dressed in a yellow raincoat, follows closely behind them.
Adding to the intensity of the scene, a white house is shown on its side, partially submerged in the water, suggesting it has been uprooted or knocked over by the force of the waters. Its windows and the facade suggest severe damage. There are also other structures and debris floating in the water, including parts of a wooden structure and a vehicle, further indicating the devastation of the area.
The overall atmosphere of the image conveys a sense of urgency, danger, and the destructive power of natural disasters. The cloudy and misty background adds to the sense of chaos and unpredictability.“

It’s so far from “a group of people standing on top of a beach“ seven years ago… The progress we have is really mindblowing.

It would be interesting to have a more thorough reassessment of the statements from the original paper.

Discussion about this post

throwaway

Oct 16, 2023Edited

Attempting to build thinking machines is the work of a fool because it incentivizes great societal evils, while disincentivizing known stable systems. The consequences of such are collapse/extinction.

Before you think this is some doomer who isn't speaking from a place of sound/rational education or knowledge, I'd suggest you thoroughly review how our economic and political systems actually work historically and what happens every time there are too many people given a set finite resources.

A more recent example based on history would be depicted in the movie Conspiracy (2001). The dialogue was taken from minutes of that specific event. This is how horrible things happen, behind closed doors without your participation unless you are privileged to be there. Now expand the consequences of that to every mid-level worker and below in the world instead of just refugees, with the only people who benefit being the power elite; and you have a relatively straight forward model to work from.

Next how do you manage economic calculation? The impacts of such a drastic shift in the factor markets must be managed somehow to avert unrest, but this too is an intractable problem.

Subjective value theorem as framed by Carl Menger pretty much says that there can be no discrete value placed on goods because it is relative by each individual's needs, psychology, personal resources, and availability (supply/demand) of primary and indirect goods.

If most workers are no longer needed, how do they pay for anything? UBI fails because it runs directly into economic calculation issues. Shortages occur as a side effect, producers leave the market, the inefficiencies propagate towards collapse in the absence of rational pricing (this is covered in blinding detail by Mises in his 1950s essays on Socialism, and why it structurally fails). I'm glossing over a lot of details, but they are there for you to review.

Exchange and distribution of labor only work efficiently in functioning marketplaces, without proper price discovery the system fails. Notably it requires price discovery and the core elements of money, where you have a stable medium of exchange, unit of account, and store of value.

These all get incredibly warped with concentration in fewer hands dictating chaotic dynamics until a cascade failure occurs. Most people are consumers, not producers in the goods sense. Consumers make money through factor markets which become largely obsolete, and only available to those who have inherent attributes weighted towards intelligence. What do you do with all those people who no longer can work for basic subsistence.

Notably, Inflationary pressures fight with deflationary forces on the store of value in fiat, which leads to a chaotic unstable system over time. This eventually self-sustains towards either hyper-inflation (and currency failure) or deflation; you ride a knifes edge until you miscalculate (at the policy level) because these are lagging indicators, hysteresis at work. Both increasingly interfere in markets over time as producers stop producing when they cannot guarantee profit. This makes any kind of economic calculation impossible.

In hyper-inflation, production halts because profit cannot be guaranteed, systems we depend on fail, Malthus law of population re-asserts and we're all dead, as we fight over the limited resources.

In deflation, production is reduced, prices fall, production halts because profit cannot meet minimum levels, systems we depend on fail, Malthus law of population re-asserts and we're all dead, same thing.

If you think this cannot happen, look at what happened with baby formula during the pandemic. All the domino's depend on production and forward consumption which cannot be calculated under these new environments.

These systems that you are seeking to directly impact/interfere in are societal level systems. Unlike small systems, you can't just wake up one day as an individual and direct everyone to do something radically different and expect them all to follow. You are fighting against friction and momentum.

This research is putting fuel into a runaway train that's going to take humanity over a cliff into oblivion because we have no functioning societal systems that can possibly work for the environment being created.

The populist majority with nothing will ensure those who have concentration of resources (with everything) are removed. Societal conflict increases and devolves into barbarism. MAD doctrine probably ceases to work, and you have psychopaths making decisions that determine everyone's future.

There are so many uneducated supporters of these type of outcome (destructionists) seeking this today, claiming these things don't matter in the big picture from a fallacious place of survivor bias.

What you do matters, contributing towards this goal in any form matters. The legacy of this area of study with the lack of controls over it will be tragic assuming anyone actually survives this. There's a reasonable likelihood no one will survive as once cascade failures start, its already past a point of no return; and that includes the next generation. Your kids.

Whatever happened to people actually retaining their core values, such as following the social contract, and preparing the way for the next generation to thrive. All that makes any impact or headlines today is irrational magical thinking that doesn't solve issues, and just creates more insidious issues that can't be solved.

What a travesty of circumstances we find ourselves in, and the people involved in these areas of study aren't asking "Should we be doing this", they are asking can we, and damn the consequences. You don't see people quitting their jobs and leaving the sector, over the existential threat these systems pose, you see business people seeking permission to eliminate workers jobs, appeasing policy makers despite being in a position where they reasonably should know better.

AGI doesn't even need to occur for any of the above to happen. All that needs to happen is sufficient interference in critical systems that lead to cascade failures.

Many business people consider the perfect ideal corporation to be a small board of directors, where all the work and cost is run by cheap machines and wealth is retained internally for shareholders, for all time. Except they don't pay recognition that it all fails once you reach a certain point of integration, as always happens on an S curve in business.

Profits are front-loaded, and towards the end there is nothing left in the tank when you need it to stave off disaster. That unfortunately is how oligopoly works, you effectively hold the world/market hostage by making yourself too big too fail.

Collapse will still occur if there is no one that can bail you out, and supporting things that have a high chance of ending in evil outcomes taints the people involved forever, regardless of what excuse they might tell themselves when all is said and done and its too late to do anything because it was ignored at the time it could have made a difference.

Expand full comment

No posts

Gonzo ML

"Building Machines That Learn and Think Like People", 7 years later

GPT-4V is coming!

Discussion about this post