What You Must Know About LLM Agents
Large language models (LLMs), such the GPT, DeepSeek, Gemini, and Claude families, are deep learning models with billions of parameters, pre-trained on vast amounts of data. This data, typically sourced from human-generated content, includes web pages, codebases, textbooks, etc. Through training on these corpora, LLMs develop the ability to understand and generate language.
The Principles of LLMs: Next-Token Prediction
The fundamental operation that drives all LLM usage and applications is next-token prediction. A token is a basic unit of text that the model processes, which can represent words, characters, or subword units. Let x denote the textual input to the model, commonly referred to as the prompt. Given x, an LLM generates output by predicting one token at a time, ultimately producing a sequence of tokens \(y = (y_1, y_2, \ldots, y_T)\). Formally, this can be represented as \(p(y \mid x) = \prod_{t=1}^{T} p\big(y_t \mid x, y_1, y_2, \ldots, y_T\big)\). When predicting the next token, both the prompt x and the generated tokens \((y_1, y_2, \ldots, y_T)\)lie within a finite context window, which determines the maximum length of text the model can use at once.
LLMs are highly versatile. Due to the nature of next-token prediction, they can perform a wide range of tasks, as long as the task and the output can be expressed in text (input x and output y). Today, people can instruct LLMs using textual prompts to generate code, solve math problems, answer questions in healthcare, write essays, translate languages, create poetry, and much more. Given the current capabilities of LLMs, a prompt, which clearly provides instructions or questions and states the context, typically results in output of acceptable quality.
What Makes an LLM Agent?
Since there is no unanimous agreement on the definition of agents, the differences between a LLM and a LLM agent are often blurred. In contrast to a single prompt-response interaction with an LLM, LLM agents typically manage multi-stage tasks that unfold through a structured workflow, rather than a one-shot model invocation. Moreover, LLM agents are often characterized by several additional capabilities: memory, role-play, planning, tool use, cooperation, and reflection. Note that these abilities can manifest in various forms and be applied in different ways, depending on the context. Below, we briefly outline each capability to provide a general sense of what they mean. Concrete examples of these abilities will be the focus of the next section.
Memory
Memory is an agent's capacity to store, recall, and leverage information from prior interactions. Because LLMs have finite context windows and often struggle with very long inputs, simply retaining every detail might be neither practical nor effective. Memory allows an agent to selectively retain and retrieve information, merge similar information to reduce redundancy, reflect on information to distill high-level thoughts that help the agent generalize. As a result, the agent can build a useful knowledge base, draw on past experience to inform future actions, and remain efficient even as interactions grow over time. Memory can reside in the prompt-level memory or in external storage such as databases.
Role-play
Role-play denotes an agent's capacity to take high-level descriptions (such as demographic information, professional role, or personality traits) as input, and generate behaviors that consistently reflect those attributes. Studies show that prompting an agent to adopt the persona of "a conservative, white, male, strong Republican" versus "a liberal, white, female, strong Democrat" elicits sharply different political attitudes. Similarly, directing agents to "maximize self-interest" rather than "maximize collective benefit" produces pronounced variations in their willingness to cooperate.
Planning
Planning is an agent's capacity to decompose a complex task or goal into a coherent sequence of actionable sub-tasks or steps. LLMs still struggle with long-horizon tasks that demand multi-step reasoning and real-time adaptation. Take "book a flight" as an example: the agent must open an airline or aggregator site, enter origin, destination, and dates, compare fares, choose seats, and complete payment. A single error can unravel the entire process, and interface changes on the website require on-the-fly adaptation. Effective planning allows an agent to decompose the task into manageable steps, iteratively refine each step with fine-grained details, and reflect on progress, adapting as conditions change.
Tool use
Tool use refers to an agent's ability to select, compose, and invoke external functions, environments, or APIs. LLMs are limited by their static training data; they may be imprecise with factual information, unreliable in performing calculations, and unaware of events occurring after their training cutoff date. A common manifestation of tool use is code generation, where the agent produces executable code that can be run in an external environment (e.g., generating Python scripts to analyze data, simulate processes, or visualize results). Moreover, an agent can also generate structured function calls to trigger predefined external tools such as calculators, weather APIs, or web search engines. Both the generated code and function calls are executed outside the LLM, and the resulting outputs are fed back into the agent's reasoning loop. This external execution enables the agent to provide more accurate, verifiable, and up-to-date responses than the LLM alone could achieve.
Cooperation
Cooperation is the ability to cooperate with other agents or humans so that tasks too complex or time-inefficient for single agents can be accomplished effectively. This capability requires more than parallel effort. It includes inferring others' goals and intentions, communicating efficiently, dividing labor, aligning strategies, fostering trust, and resolving conflicts. Sometimes, it may also require detecting deception or betrayal, and being prepared to respond appropriately to discourage such behavior and maintain group integrity.
Reflection
Reflection mirrors the human ability to reconsider past behaviors and judgments to improve future decisions. It empowers agents to critique its performance, diagnose errors or gaps, and revise subsequent actions accordingly. Studies show that reflecting on past experience helps LLM agents distill prior interactions into more abstract and insightful thoughts, summarize overarching lessons from errors, and learn from mistakes to refine future behavior, thereby improving task performance, robustness, and long-term reliability in complex environments.
If you find this work helpful, please consider citing our paper:
@article{hu2025hands,
title={Hands-on LLM-based Agents: A Tutorial for General Audiences},
author={Hu, Shuyue and Ren, Siyue and Chen, Yang and Mu, Chunjiang and Liu, Jinyi and Cui, Zhiyao and Zhang, Yiqun and Li, Hao and Zhou, Dongzhan and Xu, Jia and others},
journal={Hands-on},
volume={21},
pages={6},
year={2025}
}