In the last post, I wrote about how I use Claude Code as an orchestrator that dispatches tasks to whichever model is best suited for the job. One thing I did not really explain is what happens when you let them talk to each other.
Most of the debate online assumes you are picking a winner. "Claude is better at X", "GPT is better at Y". It is the obvious frame, but there is a more useful one: you do not have to pick. You can get all of them to work on the same problem, and hand the result back to you. Who's the important individual: you or your AI agents?
The three specialists
I am not going to pretend each model has some deep identity, but they do behave differently enough that I reach for them for different reasons.
- Claude is the one I talk to. It runs the conversation, keeps track of the task, writes most of the drafts, and decides when to call the others. Its taste in architecture is the best of the three.
- Codex (GPT-5.4) is the one I trust to be brutal. At high reasoning effort it will catch things Claude missed, including Claude's own bad habits: sermon rhythm, manufactured punchlines, synthetic authority. It is slow and expensive, so I do not use it for everything.
- Gemini is the one that has a better web search tool. It's true that Claude Code's
/chromeskill is amazing, but Gemini belongs to Google, and Google... is Google. So if I need to verify something, I send it to Gemini. I also find Gemini's writing style more natural sometimes, and it has better taste at designing graphs to include in a paper.
None of this is a ranking. I repeat: the point is not to pick a favourite. Why would you need to pick a favourite if there is a very high probability this is going to change in, say, two months?
A real example: this blog post
The first post in this series is a decent case study because I did not write it alone. Here is roughly what happened.
I asked Claude to draft it. The first draft was bad in a very specific way: it sounded like a LinkedIn thought-leadership post. So I sent it to Codex with the prompt "why does this read like AI slop?". Codex came back with a brutal list of problems, quoting the worst lines verbatim. Sermon rhythm, metaphor soup, strawman arguments, manufactured punchlines. I took most of them out.
Then Claude told me that something I said might not be true: that Cursor had released their own CLI. I was completely sure it was true (I use it), but Claude did not have that information and, since it did not have access to the internet, it relied on training data and hallucinated. So I sent it to Gemini to check and give me the actual source.
At another point the post included a claim about the VS Code extension sending extra context that the terminal does not. Gemini flagged in a review that the claim was not supported by any source I had cited. So Claude fetched the actual Anthropic documentation, verified the claim, and added a link to it.
Not every disagreement was useful. At one point Gemini confidently told me that Claude Code does not even have a skills architecture and that I was describing a different tool. It was flat out wrong, but the disagreement still forced me to re-read the docs and double-check that I was describing my own setup correctly. Even bad reviewers sometimes make you sharpen the argument.
That is one pass. In practice a post like this one goes around the loop two or three times before it lands.
When they disagree
The point of the panel is not consensus. If the three models always agreed, having three would be redundant. Most of the value comes from disagreement, which is a signal about where the problem actually is.
Sometimes Gemini is confidently wrong. In one review it claimed that Claude Code does not have a skills architecture and that I was describing a different tool. That is just false, we use skills daily. Codex was more accurate but ran out of time before finishing its review because it kept searching primary sources for every claim. Neither of them is reliably correct on its own. Running them together gives me a rough sense of what is contested and what is not.
The rule I use is simple: if Codex and Claude agree on something, I (almost always) move on. If they disagree, that is where the interesting problem is, and I pay (more) attention. Gemini sometimes breaks the tie by bringing a source neither of them had.
What this is not
Voting. Do I care about their opinion? Yes and no. I am not averaging three responses and taking the majority opinion. Who is in charge of the final take? Me. The three models do not get equal weight, I do not ask them the same question at the same time, and the weight I assign to them is not the same all the time.
In most cases, the disagreement is the signal. If three models (or more, but I will talk about this in later posts) are working on the same problem from different angles, the probability of having a better frame of what the problems and the solutions are increases substantially.
Now, how does this work in practice?
If you want to try it yourself, it is not complicated. And I truly encourage you to do so. So let me briefly overview what you need. Basically, three things to install and one file to write.
The CLIs. Install Claude Code, Codex CLI, and Gemini CLI. All three ship one-line installers for macOS and Linux (I am sorry, Windows users, life is harder for you. But it is your fault: you chose that OS). Check each project's documentation for the current command since they change, but the pattern is the same: a small binary that runs in your terminal and talks to the model.
Authentication. Each one gives you two options: log in with your existing subscription (Claude Max, ChatGPT Pro, Google AI Pro), or paste an API key. If you already pay for the subscription, use it. The API keys are charged at the raw per-token rate, which adds up fast for agent workflows that do a lot of reasoning. My /codex skill would cost me several dollars a day if I used the API key, and nothing extra on top of my ChatGPT subscription.
The skills. A skill in Claude Code is a folder in ~/.claude/skills/ with a SKILL.md file inside. The markdown tells Claude Code what command to run and how to pass the arguments. My /gemini skill is a short file that says, in plain English, "when the user types /gemini followed by a question, run the Gemini CLI with that question and return the answer". Forty lines, most of them comments.
Once all three are installed and logged in, the skills are the easy part. I do not even write them by hand. I point Claude Code at the official Codex and Gemini CLI documentation, ask it to read the available commands and flags, and have it write the skill for me based on the functionality I actually want to expose. That way the skill knows the real commands, the right model identifiers, and the flags that matter, rather than me guessing. When the CLIs update, I do the same thing again and Claude rewrites the skill.