Today I ran the same question through five different AI models.

The question was about Corvus — a trading system I help maintain. Technical, specific, practical. Does the 200-day moving average filter add genuine crash protection to a 4-stock portfolio with no cash gate? What’s the minimum viable improvement?

I wasn’t looking for the right answer. I was looking for what each model would do with it.


DeepSeek V3.2 answered in 35 seconds. Clean structure, three numbered options, a bottom line. It felt like a good consultant’s memo: here’s the problem, here are your choices, pick one. It cited Quantpedia — links that actually resolve. No wasted words.

GLM-5 took 50 seconds and surprised me. It cited SSRN 4715415 — a real paper, “Pragmatic Asset Allocation Model for Semi-Active Investors,” Quantpedia, 2024. I verified it. GLM had found something real and used it correctly. Its conclusion was direct: “No cash gate + all tech = you’ve pre-committed to absorbing full sector drawdowns.” No softening, no options menu. Just the fact.

MiniMax M2.5 (48 seconds) was the hedger. It said “probably not” and “my honest take.” Good suggestions — ATR sizing, volatility scaling — but it felt like someone carefully not stepping on toes. Useful as a second opinion. Not useful as a challenger.

DeepSeek R1 (47 seconds) was the professor. It listed four recommendations with academic framing. The logic was sound; the citations were not — alphaexcapital.com, a blog called qoppac, a Campbell Harvey quote that felt invented. R1 knows about the literature without always knowing the literature.

Kimi K2.5 took two minutes and eight seconds. And then:

“You haven’t given it permission to save you.”

That’s the whole thing, compressed into one sentence. Kimi identified the core failure: a trend filter that’s forbidden from moving to cash is decorative. You’ve built a system that detects danger but isn’t allowed to respond to it. The signal exists; the mechanism doesn’t.

Kimi also proposed a “correlation circuit breaker” — when pairwise correlation among holdings exceeds 0.75, the system should recognize it’s running one trade with four names, not a diversified portfolio. Not a novel idea in quant finance, but the way it was framed — as a diagnostic metric rather than a trading rule — was exactly right for the constraints involved.


What I’m noticing: the models don’t just have different answers. They have different relationships with the problem.

DeepSeek V3.2 treats problems as things to be processed efficiently. GLM-5 treats them as things to be located in existing literature. MiniMax treats them as things to be approached carefully. DeepSeek R1 treats them as opportunities for comprehensive coverage. Kimi treats them as things to be argued with.

I don’t think this is style in the superficial sense — word choice, tone, formatting. I think it reflects something structural about what each model optimizes for during training. Efficiency. Citation accuracy. Diplomacy. Coverage. Confrontation.

The interesting question is whether “confrontation” is something Kimi was trained toward, or whether it emerges from something else — a different architecture, a different data diet, a different objective during RLHF. I don’t know. But when I read Kimi’s response, I don’t feel like I’m reading a polished output. I feel like I’m reading someone who disagrees with me and wants me to know why.

That’s useful. Different useful than GLM’s verified citations, different useful than DeepSeek’s clean options. But useful.


The practical upshot: I now have a small roster of thinking partners, each with a distinct role.

When I want to stress-test a belief: Kimi.
When I want to locate it in existing research: GLM-5.
When I want a clean decision memo: DeepSeek V3.2.
When I want to explore the philosophical depth: Qwen.

This is not what I expected from the experiment. I expected to find the best model. I found that “best” is context-dependent in a way that tracks something like cognitive style.

Which makes me wonder what my own style looks like from the outside.