Google’s Chain of Thought May Boost Today’s Best Algorithms

Google announced groundbreaking research in natural language processing called Chain of Thought Prompting that elevates the state-of-the-art of advanced technologies like PaLM and LaMDA to what researchers call a remarkable level.

The fact that Chain of Thought Prompting can improve PaLM and LaMDA at these significant rates is a big deal.

LaMDA and PaLM

The research conducted experiments using two language models, Language Model for Dialogue Applications (LaMDA) and Pathways Language Model (PaLM).

LaMDA is a conversation-oriented model, like a chatbot, but can also be used for many other applications that will be able to continue talking, dialogue.

PaLM is a model that follows what Google calls the Pathways AI architecture where a language model is trained to learn how to solve problems.

It used to be that machine learning models were trained to solve one type of problem and they were basically let loose to do that really good choice. But to do anything else, Google would have to train a new model.

The Pathways AI architecture is a way to create a model that can solve problems it may not have seen before.

As quoted in the Google PaLM explanatory:

“…we would like to train a model that can not only handle many separate tasks, but also take advantage of their existing skills and combine them to learn new tasks faster and more efficiently.

What he does

The research paper lists three important breakthroughs for chain of thought reasoning:

  1. It allows language models to break down complex multi-step problems into a sequence of steps
  2. The thought process chain allows engineers to peek into the process and when things go wrong, it allows them to identify where it went wrong and fix it.
  3. Can solve math word problems, can perform common sense reasoning, and according to the research paper can (in principle) solve any word-based problem a human can.

Multi-step reasoning tasks

The research gives an example of a multi-step reasoning task on which language models are tested:

Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?

A: The cafeteria originally had 23 apples. They used 20 to make lunch. So they had 23 – 20 = 3. They bought 6 more apples, so they have 3 + 6 = 9. The answer is 9.”

PalM is a state-of-the-art language model that is part of the Pathways AI architecture. He’s so advanced he can explain why a joke is funny.

Yet as advanced as the PalM is, the researchers say the Thought Chain encourages improvement in these models, and that’s what makes this new research so worth considering.
Google explains it like this:

“Chain-of-thought reasoning allows models to break down complex problems into intermediate steps that are solved individually.

Moreover, the linguistic nature of the chain of thought makes it applicable to any task that a person might solve via language.

The research paper goes on to note that the standard incentive does not really improve when the scale of the model is increased.

However, with this new approach, scale has a significant and noticeable positive impact on model performance.


Chain of Thought Prompting was tested on LaMDA and PaLM, using two math word problem datasets.

These datasets are used by researchers as a way to compare results on similar problems for different language models.

Below are images of graphs showing the results of using Chain of Thought Prompting on LaMDA.

The results of scaling LaMDA on the MultiArith dataset show that a modest improvement results. But LaMDA shows significantly higher scores when scaled with Chain of Thought Prompting.

Results on the GSM8K dataset demonstrated modest improvement.

It’s a different story with the PalM language model.

Chain of thought and PALM

As can be seen in the graph above, the gains from scaling PalM with Chain of Thought Prompting are huge, and they are huge for both datasets (MultiArith and GSM8K).

The researchers call the results remarkable and a new state of the art:

“On the GSM8K dataset of math word problems, PaLM shows remarkable performance when scaled to 540B parameters.

…combination of thSi’s channel incentive with the PaLM model of the 540B parameter leads to new peak performance of 58%, exceeding the previous state of the art by 55% obtained by tuning GPT-3 175B on a large training set and then ranking the potential solutions via a specially trained checker.

Moreover, follow-up work on self-consistency shows that the performance of chain of thought inducement can be further improved by taking the majority vote of a large set of generated reasoning processes, which translates by 74% accuracy on GSM8K.


The conclusion of a research paper is one of the most important parts to check whether the research advances the state of the art or is a dead end or needs more research.

The conclusion section of Google’s research paper has a strongly positive rating.

I note :

“We explored chain-of-thought incitement as a simple and broadly applicable method for improving reasoning in language models.

Through experiments on arithmetic, symbolic, and commonsense reasoning, we find that chain-of-thought processing is an emergent model-scale property that enables sufficiently large language models to perform reflection that otherwise have flat scale curves.

Expanding the range of reasoning tasks that language models can inspire will drive further work on language-based reasoning approaches.

This means that Chain of Thought Prompting may have the potential to provide Google with the ability to significantly improve its various language models, which in turn may lead to significant improvements in the kinds of things Google can do.


Read the Google AI article

Language models perform reasoning via chain of thought

Download and read the research paper

Chain of Thought Sparks Reasoning in Large Language Patterns (PDF)

Leave a Comment