Internalizing Agent Abilities

As AI agents continue to advance, a recent trend is to internalize agent abilities directly within the model itself. This allows the model to perform agent-like behaviors (such as planning, tool use, and multi-step task execution) or integrate more seamlessly with agent frameworks that extend these abilities. For example, Claude Code incorporates structured reasoning and tool-usage capabilities optimized for autonomous coding workflows. It can generate code, run tasks, analyze results, and iterate based on feedback. Similarly, OpenAI's Operator, which combines the multimodal perception of GPT-4o with advanced reasoning through reinforcement learning, includes a built-in browser that the model can observe and control. It can inspect webpages, type, click, and scroll—interacting with the environment through standard mouse and keyboard actions. While internalizing agent abilities is fundamentally training the model, many techniques used for training LLMs can be applied. The key distinction, however, lies in the data: agent-centric training requires constructing datasets that capture agent behaviorsdemonstrations of actions, sequences, and decision-making in interactive environments.

If you find this work helpful, please consider citing our paper:

@article{hu2025hands,
  title={Hands-on LLM-based Agents: A Tutorial for General Audiences},
  author={Hu, Shuyue and Ren, Siyue and Chen, Yang and Mu, Chunjiang and Liu, Jinyi and Cui, Zhiyao and Zhang, Yiqun and Li, Hao and Zhou, Dongzhan and Xu, Jia and others},
  journal={Hands-on},
  volume={21},
  pages={6},
  year={2025}
}