Simple (and Lesser-Known) Strategies for Getting Better Answers from LLMs

Research-Backed Techniques and Insights

Feb 07, 2025

Large language models are powerful tools, but obtaining accurate and insightful answers requires the use of advanced techniques. The LLM Agents course taught by top researchers from OpenAI, DeepMind, Anthropic, Stanford, Google, Meta AI, presents research-backed strategies to enhance the quality of responses from language models. This article explores the most effective techniques.

Step-by-Step Reasoning

Breaking Problems into Steps

One of the key strategies for improving responses is encouraging the model to break down complex problems into sub-problems and solve them gradually.

There are two ways to accomplish it:

Few-Shot Prompting: Provide the model with examples of solutions that include intermediate steps, prompting it to generate similar structured responses.

Q: "Ruth has 8 apples. Yossi has 3 more apples than Ruth. How many apples do they have together?" 
A: Ruth has 8 apples. Yossi has 3 more apples, so: 8 + 3 = 11. 		  Together they have: 8 + 11 = 19 apples.

Zero-Shot Prompting: Add the phrase "Let's think step by step" to the prompt to encourage the model to analyze the problem systematically.

Q: "Elsa has 3 apples. Anna has 2 more apples than Elsa. How many apples do they have together?" 
A: "Let's think step by step…"

Analogical Reasoning

Drawing Conclusions via Analogies

Enhance response quality by instructing the model to recall a related problem and apply its solution to the current problem.

Q: "How many prime numbers exist between 1 and 200? Answer without direct counting. Recall a related problem and then solve."
A: Analogy:
  - Range 1–100: Using the Sieve of Eratosthenes, we know there are 25 primes.
  - Range 101–200: Using the same method, there are 21 primes.
Total: 25 + 21 = 46 prime numbers between 1 and 200.

Step-by-Step Reasoning + Self-Consistency

Select the Most Frequent Step-by-Step Response

Instead of relying on a single response, prompt the model to generate multiple step-by-step responses and select the most frequent one. This technique improves accuracy in complex queries.

Ignoring Irrelevant Information

Directing the model to disregard irrelevant information can enhance response accuracy by preventing unnecessary details from influencing the response.

Q: "... Ignore any information that is not relevant to solving the question."

Reflection

Self-Correcting Only Wrong Answers

This approach allows the model to review and revise its answers. However, research has shown that it only improves results when applied to responses that are known to be incorrect (self correcting wrong answers is called "Oracle").

In software development, correctness can be verified using automated testing. Applying self correction for wrong answers significantly improves accuracy.

Defining the Problem Clearly

Defining the problem correctly is a critical step in problem-solving. As Albert Einstein famously said:

"If I were given one hour to save the planet, I would spend 59 minutes defining the problem and one minute resolving it."

Therefore, it is essential to phrase the question clearly and precisely to guide the model toward better responses.

Assorted Notes from the Lecture

Additional points mentioned in the lecture that are worth noting:

“You can think of training LLMs as training parrots to mimic human languages.”
This analogy emphasizes that language models are trained to predict and generate human-like text based on patterns learned from vast datasets. Similar to parrots mimicking human speech without understanding, LLMs generate language without genuine comprehension. They analyze input data and produce statistically probable responses, but lack consciousness or true understanding. This highlights the importance of recognizing the limitations of LLMs and not attributing human-like understanding to them.
"Under all circumstances, language model is always a predictive model; it's not a human. Remember that."
It's crucial to keep in mind that LLMs function as predictive models. They generate outputs based on learned data patterns but do not possess consciousness, emotions, or self-awareness. Treating AI as human can lead to overestimating its capabilities and misapplying its use. Always approach AI outputs critically and understand their inherent limitations.
Premise Order Matters in LLM Reasoning
The order in which premises are presented significantly impacts how LLMs process and reason through information. Research indicates that when premises are presented in a random order, there is a performance drop of over 30 points across all frontier LLMs. This suggests that structuring input logically and consistently is crucial for obtaining accurate and meaningful responses from LLMs.
The book “How to Solve It: A New Aspect of Mathematical Method” by George Pólya was recommended as a good example for reasoning techniques.
First published in 1945, this seminal work delves into problem-solving strategies, particularly in mathematics. Pólya presents a systematic approach to tackling problems, emphasizing understanding the problem, devising a plan, carrying out the plan, and reviewing the process. The book offers heuristic techniques that are valuable not only in mathematics but also in various fields requiring structured reasoning. It's a timeless resource for developing analytical thinking and problem-solving skills.
Additional Reading
For those interested in exploring further, the following papers are advised:
- Chain-of-Thought Reasoning Without Prompting
  This paper investigates whether LLMs can perform reasoning tasks without explicit prompting. The authors explore alternative decoding strategies to elicit reasoning paths inherent in pre-trained models, suggesting that certain reasoning capabilities can be activated through specific decoding methods.
- Large Language Models Cannot Self-Correct Reasoning Yet
  This study examines the limitations of LLMs in self-correcting their reasoning processes. Despite advancements, the research indicates that current models struggle with self-correction, highlighting areas for future improvement in AI reasoning capabilities.
- Premise Order Matters in Reasoning with Large Language Models
  The paper explores how the sequence in which information is presented affects the reasoning performance of LLMs. The findings suggest that the order of premises can significantly influence the model's conclusions, underscoring the importance of input structuring in AI applications.
- Chain-of-Thought Empowers Transformers to Solve Inherently Serial Problems
  This research delves into the "Chain-of-Thought" prompting technique, demonstrating how guiding models to generate intermediate reasoning steps enhances their ability to tackle complex, sequential problems. The study provides both theoretical and empirical evidence supporting the efficacy of CoT in improving model performance on tasks requiring serial reasoning

Language models are powerful tools, but their output quality depends on how they are used. Implementing the above strategies can significantly improve the accuracy and usefulness of responses.