Prompt LLMs for code, not arithmetic
It is now well-known that arithmetic is a weak spot for LLMs. If you directly prompt an LLM with calculation, or even simple arithmetic word problems, it will often get it wrong.
But this paper argues that trying to get an LLM to do calculations is a misunderstanding of its strengths. LLMs are meant for understanding language, not calculation. The key idea is to separate the problem into two parts: understanding, and computation (or solving). Let the LLM do what it is good at– understanding– and then offload the computation to an actual programming language.
Check out below the quick video summary I posted last week. Also see their GitHub for the full source prompts they use.
Let’s see a few examples of how that works.
But first, an example of where LLMs fail (all examples in this post come from the text-davinci
model. The LLM’s answers are demarcated with 🤖💬 and 🤖🛑).
Question: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? Answer: Roger started with 5 tennis balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6. The answer is 11. Question: The bakers at the Beverly Hills Bakery baked 200 loaves of bread on Monday morning. They sold 93 loaves in the morning and 39 loaves in the afternoon. A grocery store returned 6 unsold loaves. How many loaves of bread did they have left? Answer: 🤖💬 The bakers started with 200 loaves of bread. They sold 93 loaves in the morning and 39 loaves in the afternoon. 6 loaves were returned. 200 - (93 + 39 + 6) = 62 loaves of bread left. 🤖🛑
Instead, we prompt it to not compute the answer, but to provide Python code to compute the answer.
Question: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? Solution in Python: def solution(): """Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?""" # Roger started with 5 tennis balls. tennis_balls = 5 # 2 cans of 3 tennis balls each is bought_balls = 2 * 3 # tennis balls. The answer is answer = tennis_balls + bought_balls return answer Q: The bakers at the Beverly Hills Bakery baked 200 loaves of bread on Monday morning. They sold 93 loaves in the morning and 39 loaves in the afternoon. A grocery store returned 6 unsold loaves. How many loaves of bread did they have left? Solution in Python: 🤖💬 def solution(): """The bakers at the Beverly Hills Bakery baked 200 loaves of bread on Monday morning. They sold 93 loaves in the morning and 39 loaves in the afternoon. A grocery store returned 6 unsold loaves. How many loaves of bread did they have left?""" # The bakers started with 200 loaves of bread. bread = 200 # They sold 93 loaves in the morning and 39 loaves in the afternoon. sold_bread = 93 + 39 # A grocery store returned 6 unsold loaves. returned_bread = 6 # The answer is answer = bread - sold_bread + returned_bread return answer 🤖🛑
You can check that running the Python code gives the correct answer (74).
You can do the same thing with logical problems, not just arithmetic:
Question: say "java" twice and "data" once, and then repeat all of this three times. Answer: java java data java java data java java data Question: Repeat the word duck four times, but halfway through also say quack Answer: 🤖💬 duck quack duck duck quack duck 🤖🛑
OK it failed that. Let’s try the same with code-style prompts.
Question: say java twice and data once, and then repeat all of this three times. # solution using Python: def solution(): """Question: say java twice and data once, and then repeat all of this three times.""" result = [] tmp = ["java", "java", "data"] for i in range(3): result.extend(tmp) return " ".join(result) Question: Repeat the word duck four times, but halfway through also say quack # solution using Python: 🤖💬 def solution(): """Question: Repeat the word duck four times, but halfway through also say quack""" result = [] for i in range(4): if i == 2: result.append("quack") result.append("duck") return " ".join(result) 🤖🛑
tl;dr– use LLMs for what they were meant to do– understand and predict natural language– and then leverage that for computation.