Why AI is a Better Editor Than Writer

Everyone is trying to get AI to write for them. The results are mediocre. Even Sam Altman admits it. In an October 2025 interview with Tyler Cowen, he said that even GPT-7 might only produce "a real poet's okay poem." Not great. Not good. Okay.

Patrick Collison and Tyler Cowen's "New Aesthetics" grant program went looking for great AI-created art. They concluded: "we haven't seen much great work that only uses AI."

This is not a temporary limitation. It is a structural one. AI is bad at writing. But it is surprisingly good at editing. Understanding why changes how you should use it.

Why AI Writing Falls Short

A 2023 study from Stony Brook University by Tuhin Chakrabarty and colleagues put this to the test. They created a 14-criteria rubric for evaluating New Yorker-style short stories and compared LLM-generated stories against human writers. The results were consistent. LLM stories scored worse across every dimension.

The stories were technically competent. Grammar was fine. Structure was acceptable. But they lacked what makes fiction alive: surprise, specificity, a point of view that could only come from a particular person who has lived a particular life.

Jasmine Sun, a writer and researcher who has written extensively about AI and writing, identifies three reasons for this.

First, good writing is hard to evaluate. It is not like math, where answers are right or wrong. Training a model to produce good prose requires a training signal for "good," and no one agrees on what good writing is. RLHF optimizes for "helpful" and "harmless." It does not optimize for "vivid" or "honest" or "surprising."

Second, good writing is not a business priority for AI labs. The revenue is in coding assistants, customer service bots, and enterprise search. Making Claude write a better short story does not move the quarterly numbers. The models get better at tasks that make money. Creative writing is not one of them.

Third, good writing requires grounding in lived experience. When Raymond Carver writes about a couple eating dinner in silence, the tension comes from decades of observing real marriages. When an LLM writes the same scene, it is pattern-matching against thousands of stories about couples eating dinner. The output has the shape of insight without the substance.

These are not problems that scale away. More parameters and more training data will not give a model a childhood, a failed relationship, or the experience of watching someone you love make a bad decision. The ceiling on AI writing is structural.

But Editing Is a Different Job

Here is the interesting finding from the Chakrabarty study. While LLMs scored poorly as writers, both human experts and LLM judges achieved high consensus when grading stories. The AI could not write well. But it could evaluate writing with remarkable accuracy.

This is not contradictory. It is the same asymmetry you see everywhere. Most film critics could not direct a good movie. Most book editors could not write a good novel. Evaluation is a different skill from creation. It requires pattern recognition, taste, and the ability to articulate what is not working. It does not require having something original to say.

Jasmine Sun makes this argument explicitly. She points out that great editors need "good taste, low egos, and therapeutic talents to elicit a neurotic writer's best work." That description is closer to what LLMs can do. They have broad exposure to what good writing looks like. They have no ego invested in their suggestions. And they are endlessly patient with revision after revision.

Writing requires generating something from nothing. You start with a blank page and your own thoughts and produce sentences no one has written before. Editing requires looking at something that exists and seeing how it could be better. The first task demands originality. The second demands judgment. LLMs are mediocre at originality. They are surprisingly good at judgment.

Proof It Works: Building an AI Editor

Sun did not just theorize about this. She built it. She created a custom editing rubric and scaffold for Claude, teaching it her personal taste before asking it to judge her writing.

The result surprised her. "The resulting tool is as good as many human editors I've had," she wrote. Not perfect. Not better than the best human editors. But competitive with the average professional editor, which is a high bar.

The key was the setup. She did not just paste text into Claude and ask "make this better." She taught it what "better" meant to her. She defined her criteria. She gave it examples of edits she liked and edits she did not. She built a framework that channeled the model's judgment toward her specific standards.

This is exactly the asymmetry at work. Claude could not have written her essays. But once she wrote them, Claude could identify where the argument was thin, where the prose was flabby, where a paragraph went on too long. It could spot problems she was too close to see.

The limitation: it took significant setup. She had to manually build what a dedicated product could provide out of the box. Most writers will not spend hours crafting a custom rubric in a chat interface. They need a tool that already understands the editing relationship.

Orwell and Klinkenborg Predicted the Split

This distinction between writing and editing maps onto two books we have written about before.

George Orwell warned that prefabricated phrases "will construct your sentences for you - even think your thoughts for you." That is a perfect description of AI generation. The model assembles probable sequences of words. The result sounds fluent but carries no thought because no one was thinking. AI writes like Orwell's bad politician: confident, articulate, and completely empty.

But Orwell also gave six rules for identifying bad writing. Cut dead metaphors. Replace long words with short ones. Delete unnecessary words. Use active voice. Those rules are editorial. They describe how to look at existing text and make it better. An AI can apply them to your writing even if it could never produce good writing from scratch.

Verlyn Klinkenborg's Several Short Sentences About Writing completes the picture. He writes about "volunteer sentences" - sentences that show up uninvited and fill space without earning it. AI output is the ultimate volunteer sentence. Every sentence it generates volunteered. None of them had to be there.

But an AI editor can flag your volunteer sentences. It can point to a paragraph and say: this sentence is doing no work. It can suggest tighter alternatives. It cannot generate the spark that makes writing original. It can remove the dead weight that obscures it.

Why the Tool Matters

If AI is a better editor than writer, the implication for tools is clear. Most AI writing tools are built around generation. You type a prompt. The AI produces text. You paste it into your document. This is the wrong model. It uses AI where it is weakest.

The right model is the one code editors figured out. In Cursor, you write code, then ask the AI to improve it. The AI shows you a diff: green for additions, red for deletions. You see every change. You accept the ones that make your code better. You reject the rest.

Writing needs the same workflow. You write. The AI edits. You see the diff. You decide.

This is not a small UX preference. It is the difference between using AI where it fails and using it where it succeeds. Generation asks the model to be original. Editing asks the model to be observant. One is outside its capabilities. The other is squarely within them.

The diff interface also solves the trust problem. When an AI rewrites your paragraph in a chat window, you get a wall of text. You cannot tell what changed. You cannot make granular decisions. With inline diffs, every change is visible. You can accept one word swap and reject another. You stay in control because you can see what the AI is doing.

The Practical Shift

Here is what this means for how you use AI to write.

Stop asking AI to write for you. It will produce generic text. Fluent, grammatically correct, and indistinguishable from a million other AI outputs. The research on AI and writing quality confirms this. AI-generated text uses smaller vocabularies, more filler words, and less surprise. It is the textual equivalent of elevator music.

Write your first draft yourself. It does not have to be good. It has to be yours. Your ideas, your structure, your word choices. The draft is where the thinking happens. Skip the draft and you skip the thinking.

Then ask AI to edit. Ask it to tighten a paragraph. Ask it to find a better transition. Ask it to cut unnecessary words. These are the tasks where AI genuinely helps. It has read millions of examples of good prose. It can spot patterns you are too close to see. It just cannot produce them from nothing.

Use inline diffs so you see every change. This is non-negotiable. If you cannot see what the AI changed, you cannot exercise judgment. And judgment is the whole point. Accept the edits that sharpen your writing. Reject the ones that flatten it. The AI proposes. You decide.

Reject anything that sounds generic. If the AI suggests a phrase that could appear in anyone's writing, reject it. Your voice lives in the specific, the unusual, the sentences only you would write. Let the AI fix your grammar and tighten your structure. Never let it touch your voice.

The Opportunity

The fact that AI is a better editor than writer is not a limitation. It is an opportunity. It means the best use of AI in writing is the one that keeps you in the driver's seat. You do the creative work. The AI handles the mechanical work. You write. It edits. You decide what stays.

Most writing tools have this backwards. They put AI in the driver's seat and make you the passenger. They optimize for the prompt-to-output pipeline because generation is flashy and easy to demo. But generation is exactly where AI is weakest.

The writers who get the most from AI will be the ones who never ask it to write. They will write first. Then they will hand their draft to an AI editor that shows them every suggested change, lets them accept or reject each one, and never overwrites their voice.

That is the model Athens is built on. Not AI as ghostwriter. AI as editor. Because that is where the technology actually works.