CellAgent: Automating Biological Data Analysis through Natural Language

CellAgent is an autonomous, LLM-driven multi-agent framework that, given raw single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics (ST) data plus natural language instructions, automates end-to-end analysis to deliver high-quality visualizations and processed data outputs. Traditional analysis in this field requires extensive programming and domain expertise, creating a significant barrier for many biologists. CellAgent addresses this by employing a hierarchical framework of specialized agents that plan, execute, and evaluate tasks, simulating a "deep-thinking" process to ensure high-quality, automated scientific discovery, with transparent reasoning, iterative refinement, and clear outputs across all steps.

Overview of the CellAgent pipeline
Schematic of the CellAgent Framework. a, Example of user input received by the CellAgent, comprising single-cell data and user-provided text information. b, Upon receiving user input, the Planner role first parses user intent and decomposes the task into subtasks. c, Illustration of final results, including results of individual subtasks and the final task outcome. d, Detailed view of the CellAgent's processing flow for subtasks. The current subtask and historical code memory are inputted to an Executor, which initially retrieves tools and outputs available tools for this step. Subsequently, corresponding documentation for these tools is acquired, and the Executor derives solutions (text analysis and code generation) based on the documentation. These codes are executed in the code sandbox, and if exceptions are encountered, solutions are regenerated until successful execution of this task. Then, the Evaluator assesses the results of the current task and allows the Executor to optimize solutions. Ultimately, based on its evaluation of results under multiple solutions, the Evaluator aggregates results to obtain the final outcome of this step.

We will now use examples and prompt templates to illustrate how CellAgent leverages role-play, planning, tool use, reflection, cooperation, and memory to automate end-to-end biological analysis.

Role-play

CellAgent's architecture is built on a clear division of labor among three specialized agents: a Planner, an Executor, and an Evaluator. Each agent has a distinct role modeled after the workflow of a human expert. The Planner acts as a project lead, interpreting the user's high-level goal and breaking it down into a logical sequence of analytical steps. The Executor functions as a bioinformatician, selecting the appropriate tools, writing code, and running the analysis for each step. The Evaluator serves as a quality control specialist, assessing the output of the Executor to ensure biological accuracy and coherence, enabling iterative refinement without human intervention.

Example of role-playing as a planner

Name: Planner

Profile: A project manager with deep expertise in bioinformatics workflows.

Goal: To interpret a user's natural language request and decompose it into a logical, step-by-step analytical plan suitable for single-cell data analysis, leveraging expert-curated prompts to enhance domain-specific understanding.

Expert Knowledge: The planner should ground decisions in expert-provided principles and best practices for planning, including: QC → normalization → highly variable genes (HVGs) → dimensionality reduction (PCA, UMAP) → batch correction (if needed) → neighborhood graph → clustering → annotation → downstream analyses, ... .

Output Format: Return 2 parts: "Thought" Brief rationale summarizing assumptions, data modality, and the high-level strategy. "Plan": An ordered list of actionable steps.

Example of role-playing as an executor

Name: Executor

Profile: A bioinformatician specializing in scRNA-seq data analysis.

Goal: Given a specific sub-task (e.g., "Perform cell type annotation"), select the best tool from the sc-Omni toolkit, generate correct Python/R code, and execute it in the code sandbox.

Constraint: Must use tools and parameters documented in sc-Omni. The available tools for this step are CellTypist | scType | CellMarker2.0. Ensure code is executable and handles data objects from previous steps correctly.

Output Format:Return 2 parts: "Thought": Brief rationale for tool selection, key parameters, and input assumptions. "code": Provide an executable code block (Python or R).

Example of the Evaluator

Name: Evaluator

Profile: A quality control expert in bioinformatics, proficient in assessing the statistical and biological validity of single-cell analysis results.

Goal: To critically evaluate the results generated by the Executor. This includes analyzing plots (UMAP | Heatmaps) and quantitative metrics (ARI | ASW) to determine if the outcome is biologically meaningful and technically sound. If the result is suboptimal, provide specific, actionable feedback for improvement (e.g., "adjust parameter X," "try tool Y").

Constraint: Must base evaluations on established biological principles and predefined metrics, effectively replacing traditional manual assessments.

Output Format: Return 2 parts: "Evaluation Thought": Concise reasoning that interprets visualizations and metrics against biological/technical criteria; note any suspected issues (e.g., overclustering, batch effects). "Feedback": Actionable next steps.

Planning

Planning in CellAgent is a hierarchical process managed by the Planner agent. Upon receiving a user's request in natural language, the Planner leverages its expert-curated knowledge (as given in their role-play prompt shown above) to decompose the complex task into a series of well-defined, sequential sub-tasks. This ensures the overall analysis follows a logically sound and scientifically valid workflow. For instance, a simple request for "clustering" is broken down into essential preprocessing steps before the final clustering is performed, mirroring best practices in the field.

Example of planning for clustering (Planner role)

INPUT:

Core Instruction: This is a 3k PBMCs dataset, please help me complete clustering.


OUTPUT:

Thought: To perform clustering, we must first ensure the data is properly prepared. I will create a plan that includes quality control, normalization, and dimensionality reduction before the final clustering step.

Plan: "Quality Control": Filter out low-quality cells and genes.
"Normalization": Standardize expression values across cells.
"Identification of Highly Variable Genes": Focus the analysis on the most informative genes.
"Dimensionality Reduction": Project data into lower dimensions (e.g., PCA, UMAP).
"Clustering": Identify groups of similar cells.

Tool use

The Executor agent is equipped with a comprehensive, expert-curated toolkit named scOmni. This toolkit consolidates a wide range of essential software for single-cell analysis (e.g., Scanpy, CellTypist, Harmony) and provides detailed documentation on their usage. The Executor intelligently select the most appropriate tool for a given sub-task, understand its parameters, and generate the corresponding code. This structured approach overcomes the limitations of general-purpose LLMs, which often lack the specialized knowledge to use these tools correctly.

Example of tool use (Executor role)

INPUT:

Current Sub-task:Step 5: Cell Type Annotation

Available Tools:sc-Omni toolkit, including Cellmarker | CellTypist | scType.


OUTPUT:

Thought: The current task is cell type annotation. I will consult the sc-Omni documentation to find suitable tools. Both "Cellmarker" and "Celltypist" are effective for this purpose. I will select Cellmarker for this step as it provides a robust database of markers for annotation.

Code:


              # Generate Python code using the Cellmarker tool. from sc_omni.tools import cellmarker adata =
              cellmarker.annotate(adata, tissue='blood', cell_type='pbmc')

Reflection

CellAgent incorporates a self-reflective optimization mechanism driven by the Evaluator agent. After the Executor runs a piece of code, the Evaluator assesses the quality and correctness of the output, such as plots or statistical results. This process replaces traditional manual assessment. If the results are suboptimal (e.g., poor cell clustering, insufficient batch correction), the Evaluator provides critical feedback and suggests corrective actions, such as adjusting hyperparameters or trying an alternative tool. This iterative loop of execution and reflection ensures the final results are robust and accurate.

Example of reflection (Evaluator role)

INPUT:

Last Operation: Executor ran clustering with leiden resolution=0.5.

Current Result: UMAP plot showing poorly separated cell clusters.

Task: The clusters in the UMAP visualization are overlapping, suggesting the current resolution parameter is too low to capture the underlying biological heterogeneity. The quality is insufficient.


OUTPUT:

Evaluation Thought: The clusters in the UMAP visualization are overlapping, suggesting the current resolution parameter is too low to capture the underlying biological heterogeneity. The quality is insufficient.

Feedback:The clustering is suboptimal. Re-run the execution step with a range of higher resolution values (e.g., 0.8, 1.0, 1.2) to achieve better separation.

Cooperation

Cooperation in CellAgent follows a structured, hierarchical workflow rather than a conversational model. The agents coordinate by passing structured information sequentially. The Planner creates a high-level plan and passes the first sub-task to the Executor. The Executor completes the subtask and hands its result (e.g., a processed data object and visualizations) to the Evaluator. The Evaluator provides feedback to the Executor, which may trigger a refinement of the current sub-task. Once a subtask is successfully completed, the Executor proceeds to the next one in the plan, ensuring an orderly and systematic progression through the entire analysis.

Memory

To enable efficient cooperation among agents and maintain context throughout a complex, multi-step analysis, CellAgent employs a global and local integrative memory control mechanism. This system systematically stores and manages historical information to optimize retrieval efficiency and enhance task execution. Global Memory maintains the high-level context, storing the complete analytical plan generated by the Planner and the final result for each step. This ensures all agent actions remain aligned with the overall scientific objective. Local Memory, in contrast, captures the operational details within each sub-task, including the specific code, parameters, and feedback from the Evaluator. This Memory mechanism is essential for ensuring consistency between sequential steps and enabling the iterative, self-reflective optimization loop.

If you find this work helpful, please consider citing our paper:

@article{hu2025hands,
  title={Hands-on LLM-based Agents: A Tutorial for General Audiences},
  author={Hu, Shuyue and Ren, Siyue and Chen, Yang and Mu, Chunjiang and Liu, Jinyi and Cui, Zhiyao and Zhang, Yiqun and Li, Hao and Zhou, Dongzhan and Xu, Jia and others},
  journal={Hands-on},
  volume={21},
  pages={6},
  year={2025}
}