The AI Scientist: Towards Fully Automated Scientific Research

The AI Scientist aims to independently carry out a scientific research process from idea generation to paper writing without any human involvement. It produces a research paper in three phases: (1) idea generation, (2) experimental iteration, and (3) paper write-up. First, the AI Scientist generate a diverse set of ideas that are different from existing literature. Then the AI Scientist executes the proposed experiments and visualizes its results during experimental iteration. In paper write-up, it produces a concise and informative write-up of its progress in the style of a standard machine learning conference in LaTeX. Afterward, an LLM-generated review is introduced to assess the quality of the generated paper for the AI Scientist to revise accordingly.

Workflow of the AI Scientist system
Conceptual illustration of The AI Scientist, an end-to-end LLM-driven scientific discovery process. The AI Scientist first invents and assesses the novelty of a set of ideas. It then determines how to test the hypotheses, including writing the necessary code by editing a codebase powered by recent advances in automated code generation. Afterward, the experiments are automatically executed to collect a set of results consisting of both numerical scores and visual summaries (e.g. plots or tables). The results are motivated, explained, and summarized in a LaTeX report. Finally, The AI Scientist generates an automated review, according to current practice at standard machine learning conferences. The review can be used to either improve the project or as feedback to future generations for open-ended scientific discovery.

We will now use examples and prompt templates to illustrate how the AI Scientist combines planning, tool use, and reflection to carry out research and write papers autonomously.

Planning

Planning helps the AI Scientist break down a complex goal into smaller, sequential steps. For example, during experimental iteration, the AI Scientist plans the experimental steps for each selected idea, writes code and conducts experiments according to these plans. In the paper write-up phase, the AI Scientist organizes its scientific findings into research papers. Overall, the planning ability helps the AI Scientist organize its multi-step research process, allowing it to generate, improve, carry out ideas, and write research papers in a clear and purposeful way. We provide an example of how the AI Scientist generates an experimental plan below.

Example of planning an experiment list for the generated idea

INPUT:

Idea: Improving image classification through data augmentation.

Instruction: Your goal is to implement the given idea. Plan the list of experiments you would like to run. You are given a total of up to 5 runs to complete the necessary experiments.


OUTPUT:
1. Prepare Data: Load and inspect the CIFAR-10 dataset.
2. Build Model: Set up a simple convolutional neural network.
3. Train Baseline: Train the model without data augmentation and record accuracy.
4. Apply Augmentation: Add image transformations such as flipping, rotating, and adjusting the brightness.
5. Train Again: Retrain the model with augmentation and compare results.
6. Evaluate: Measure test accuracy and visualize performance improvements.

Tool use

In all three phases, the AI Scientist extensively leverages tools. During idea generation, the AI Scientist evaluates the novelty of ideas by retrieving literature. It prompts an LLM to formulate search queries (such as related paper titles or keyword combinations), and then passes these queries to a built-in Python module that calls the Semantic Scholar API2 to get related literature. In experimental iteration, to carry out experiments, the AI Scientist employs Aider3, a coding assistant. The Aider takes the experimental plan generated by the LLM as input and then automatically generates code to execute the plan. In paper write-up, after the LLM generates the manuscript in LaTeX format, the AI scientist calls a LaTeX compiler to process the tex file and produces the corresponding PDF. The following is an example of generating a literature retrieval query.

Example of generating a query for literature retrieval

INPUT:

Idea: Improving image classification through data augmentation.

Instruction: You have an idea and you want to check if it is novel. You will be given access to the Semantic Scholar API, which you may use to survey the literature and find relevant papers to help you make your decision. Please provide a search query to search the literature.

OUTPUT:

Query: Enhancing the image classification performance of convolutional neural networks using data augmentation.

The generated query is used as a parameter of the following built-in Python function, which retrieves related work by invoking Semantic Scholar API.

Python code of literature retrieval

def search_for_papers(...):
    results = requests.get(
        "https://api.semanticscholar.org/paper/search",
        headers={"API_KEY": API_KEY},
        params={
            "query": "Enhancing the image classification performance of convolutional neural networks using data augmentation",
            "fields": "title, authors, year, abstract, citationCount",
        },
    )
    papers = results["data"]
    return papers
Below is an example showing the return value of the function "search_for_papers()", which is the retrieved papers.
{
  "title": "ImageNet Classification with Deep Convolutional Neural Networks",
  "authors": "Krizhevsky, Sutskever, Hinton",
  "year": 2012,
  "abstract": "We trained a large, deep convolutional neural network to classify the 1.3 million high-resolution images in the LSVRC-2010...",
  "citationCount": 149105
}

Reflection

Human scientific research relies heavily on reflection, and the AI Scientist is no exception. When generating ideas, the AI Scientist use self-reflection to refine each proposed idea. During experiment iteration, it analyzes the experimental results and adjusts methods accordingly. In paper write-up, the AI Scientist perform two rounds of self-reflection section-by-section, aiming to improve the accuracy, conciseness, and coherence of each section. The most important reflection is based on the Automated Paper Reviewing mechanism. The PDF version of the generated paper is converted into text and reviewed by an LLM-based reviewer agent. The review feedback is used by the AI Scientist to improve the paper writing. An example for generating review comments is as follows.

Example of generating review comments by the reviewer agent

INPUT:

Instruction: You are an AI researcher who is reviewing a paper that was submitted to a prestigious ML venue. Be critical and cautious in your decision. If a paper is bad or you are unsure, give it bad scores and reject it. Below is a reviewing guidelines. When writing your review, please keep in mind that after decisions have been made, reviews and meta-reviews of accepted papers and opted-in rejected papers will be made public.

Paper:
# Data Augmentation for Improved Image Classification
## Abstract
This paper explores the use of data augmentation techniques to enhance the performance of convolutional neural networks in image classification tasks...
## Introduction
Deep learning models, particularly convolutional neural networks (CNNs), have achieved remarkable success in image classification. However, their performance heavily depends on the availability of large, diverse datasets...
...

Guidelines:
## Review Form
Below is a description of the questions you will be asked on the review form for each paper and some guidelines on what to consider when answering these questions... 1. Summary: Briefly summarize the paper and its contributions...
2. Questions: Please list up and carefully describe any questions and suggestions for the authors....
3. Limitations: Have the authors adequately addressed the limitations and potential negative societal impact of their work?...

OUTPUT:

Summary:
This paper explores the application of data augmentation techniques in image classification tasks...
Strengths: 1. Detailed ablation studies analyzing the contribution of different augmentation strategies. 2. ...
Weaknesses: 1. Lacks comparison with state-of-the-art automatic data augmentation methods. 2. ...
Questions: 1. Why was this specific combination of augmentation strategies chosen? 2. ...
Overall: 4
Confidence: 4
Decision: Reject

If you find this work helpful, please consider citing our paper:

@article{hu2025hands,
          title={Hands-on LLM-based Agents: A Tutorial for General Audiences},
          author={Hu, Shuyue and Ren, Siyue and Chen, Yang and Mu, Chunjiang and Liu, Jinyi and Cui, Zhiyao and Zhang, Yiqun and Li, Hao and Zhou, Dongzhan and Xu, Jia and others},
          journal={Hands-on},
          volume={21},
          pages={6},
          year={2025}
        }