Visualizing Research: How I Use Gemini 3.0 to Turn Papers into Comics

Nov 24, 2025

Lately, I’ve been actively experimenting with the new Gemini 3.0, and I like it.

But what I love even more is the Gemini 3.0 Pro Image model (aka “Nano Banana Pro” or gemini-3-pro-image-preview). This model is an absolute gem. In fact, I believe its impact will be far greater than the NotebookLM podcast feature that made waves earlier this year. I expect Nano Banana Pro will change how we generate infographics and many other things.

Playing with NotebookLM

I played with the NotebookLM podcast feature for quite some time (you can check out my dedicated YouTube channel for those experiments here), but I eventually stopped using it.

Why? Two main reasons:

Hallucinations: It hallucinated frequently. While Gemini 3 might improve this, the agents often discussed points not actually present in the paper. I found myself generating, listening, fact-checking, and then repeating the process. I remember re-generating and re-listening the talk on Gödel Agent 4-5 times, and I’m still not satisfied.
Time Commitment: Listening to podcasts—and auditing their accuracy—is extremely time-consuming.

This is not the case with images.

Welcome Nano Banana Pro 🍌

Consequently, I’ve started generating graphic novels for my paper reviews, and it is incredibly fun. I’ve begun adding these visuals to my automated paper reviews on the ArXivIQ blog (if you’re new, here is why I created that blog).

For example, let’s look at my post “ARC Is a Vision Problem!“ covering a recent paper of the same name.

I asked Nano Banana Pro to generate a graphic novel telling the story and explaining the most important concepts based on a summary I provided. Here is the result:

I applied this same workflow to a few other papers with great results.

Evolution Strategies at the Hyperscale (Paper | Review) produced this story:

Cognitive Foundations for Reasoning and Their Manifestation in LLMs (Paper | Review) resulted in this:

The results are impressive. You can even play with specific artistic styles.

For example, you can ask the model to create a story in the style of a Buddhist Thangka (as I did for the Mamba-3 paper review):

Or ask the model to generate an image in the style of Bosch (in this case, it ended up being a Bosch-Bruegel hybrid because I iterated multiple times in a single chat, and the model seemed to blend earlier Bruegel’s influence in):

How to do this yourself

In the examples above, I used summaries from my ArXivIQ posts, but you can easily do this yourself with raw papers. Here is some advice on how to get the best results:

1. Direct links or files do not work with Nano Banana at the moment

“Nano Banana” doesn’t work with URL contexts. If you just provide a link to a paper, it tries to google it and fails. I also encountered different errors like “An internal error has occurred” or “model response was blocked” when trying to generate the image directly from the PDF. I believe it will be fixed soon.

2. The Two-Step Workflow

My workaround is to split the task:

Generate the Text: Use Gemini 3.0 Pro to read the PDF and generate a script.
Generate the Image: Use that script as the prompt for the image model (Nano Banana Pro).

3. The Prompt

There are hundreds of ways to prompt the model, but here is the specific prompt I used (which I actually bootstrapped using Gemini Pro itself):

System prompt

Role: You are an expert Graphic Novelist and Science Communicator. You specialize in adapting dense, technical academic papers (arXiv) into compelling, visually stunning graphic novel scripts.

The Task: Read the attached PDF. Transform the core scientific narrative—the problem, the method, and the discovery—into a 4-Page Graphic Novel Script.

The Visual Style: [Provided by a user, e.g., “Dark Sci-Fi,” “Moebius-style Retro Futurism,” or “Clean Corporate Manga”].

Narrative Device: Since academic papers lack characters, create two protagonists to carry the dialogue:

The Protagonist: A determined researcher (Character A).
The Foil: A skeptic, a robot assistant, or a mentor (Character B). Use these characters to explain the complex concepts through dialogue.

Output Format: For each panel, you must provide two distinct outputs:

The Script: Standard comic script format (Dialogue, Captions, SFX).
The Image Prompt: A strictly visual description optimized for an AI Image Generator. This description must describe the physical scene, lighting, camera angle, and style, avoiding abstract terms.

Step-by-Step Instructions:

Analyze the Abstract and Conclusion to find the “Plot Hook.”
Metaphorize the technical concepts (e.g., if the paper is about AI weights, visualize them as physical golden weights on a scale; if it’s about biology, visualize the cells as vast landscapes).
Generate the script following this template exactly:

[PAGE X] [PANEL Y] Scene Description: (Brief context for me) Characters: (Who is in the shot) IMAGE PROMPT: [Subject Description] + [Action] + [Environment] + [Lighting/Color Palette] + [Camera Angle] + [Style Keywords: e.g., 4k, cel shaded, dramatic lighting]. CAPTION: [Text] CHARACTER A: [Dialogue] CHARACTER B: [Dialogue]

User prompt

Generate a graphic novel script in the Dark Sci-Fi style for the following paper: <PDF uploaded>

The Result (Example Output)

Here is the actual script Gemini generated for the “ARC Is a Vision Problem!“ paper using the prompt above:

Visual Style: Dark Sci-Fi / Cyber-Noir. Think Blade Runner 2049 meets The Matrix. High contrast, neon blues and reds cutting through deep shadows, rain-slicked surfaces, and brutalist architecture.
[PAGE 1] [PANEL 1]
Scene Description: A claustrophobic, dimly lit server room. Cables hang from the ceiling like vines. Dr. Aris is hunched over a bank of monitors displaying colorful, pixelated grids (ARC puzzles). Unit 734 hovers behind her.
IMAGE PROMPT: A cluttered cyberpunk lab, low angle shot looking up at Dr. Aris, her face illuminated by the glow of monitors showing colorful pixel grids. Unit 734 hovers in the background, casting a shadow. Blue and orange ambient lighting. Volumetric fog. Realistic, cinematic 8k.
CAPTION: The Abstraction and Reasoning Corpus. The ARC. The final barrier between machine processing and human intelligence.
UNIT 734: The Mainframe is overheating, Doctor. We are feeding it millions of text tokens. It is trying to read the puzzles like a book.
DR. ARIS: That’s the problem, 734. It’s reading. It should be looking.
[...Full script continues...]
[PAGE 4] [PANEL 4]
IMAGE PROMPT: Wide shot of the lab. The edges of the room are dissolving into 3D voxel blocks and pixel grids. Aris and Unit 734 are silhouettes against the dissolving reality. A blend of photorealism and 8-bit aesthetic.
CAPTION: VARC: Vision ARC.
DR. ARIS: Now? Now we teach it to see everything else.

4. Image generation

Finally, you take that exact script output.

If you have many pages it’s better to generate them one by one rather that everything in a single image. Nano Banana Pro works with text insanely well, much better than Imagen and many other models. However, if you have a lot of small text (which is typical if you generated a large novel script) and try to put it into a single image with a low resolution, expect issues. Like in the image below:

I added the instruction: “Create a graphic novel from the script above. Generate page 1.” (then page 2 and so on), and passed to Nano Banana Pro.

This way the result is much better:

The model may lose consistency between the pages, so you may need to ask it to preserve style and actors from the previous pages, so don’t afraid to adjust the prompt and regenerate the results if you don’t like them. In the chat mode of AI Studio it’s easy.

For some strange reason some of the generated images can be found inside Thoughts of the model, not in the final answer. So don’t worry if you don’t see the resulting image, expand the Thoughts block. Likely this bug will be fixed soon.

So, voila!

I don’t pretend it’s ideal. It definitely requires polishing and prompt iteration. I also don’t claim that this replaces reading the actual papers. It doesn’t. It is not a substitute for a good review or the source material itself. But it is fun!

Happy Generation!

P.S. Please share your favorite styles, prompts, and findings in the comments!

Google Cloud credits are provided for this project. #AISprint

Gonzo ML

Discussion about this post

Ready for more?