My AI Toolkit, Honestly

My AI Toolkit

There's no shortage of articles telling you which AI tool is best. Most of them are wrong — not because the information is inaccurate, but because "best" doesn't mean anything without context. Best for what? Best for whom? Best at which point in the process?

Here's what I actually use, and why.

Claude is my model of choice. Full stop.

I've used them all — Claude, GPT, Gemini, Grok. They each have their moments. But if I had to keep just one, it's Claude, and it's not particularly close.

What I've noticed over time is that Claude models think more carefully before responding. There's a quality to the reasoning that feels more considered, less eager-to-please. For complex problems — especially anything involving code, architecture, or nuanced analysis — that matters a lot.

Within the Claude family, each model has a distinct personality and purpose that I've come to rely on pretty deliberately:

Opus is where I go for large-context, big-picture thinking. Complex problems that need to hold a lot of information at once. It's slower and more expensive, but when you need that depth, nothing else compares.

Sonnet is my everyday workhorse — the best all-around model I've used from any provider. Exceptional at coding, sharp on analysis, fast enough to feel fluid. This is the one I reach for most.

Haiku handles general communication, quick tasks, and anything where speed matters more than depth. Surprisingly capable for its weight class.

The case for using multiple models.

Here's something I've found genuinely useful that I don't hear talked about enough: using different models as second and third opinions.

Not because I don't trust Claude. But because different models have different training, different tendencies, different blind spots. When I'm trying to figure out something genuinely uncertain — whether a technical approach is sound, whether an idea holds up, whether I'm missing something — getting two or three independent takes and looking for convergence is a surprisingly reliable way to calibrate.

If Claude, GPT, and a third model all point in the same direction, I have a lot more confidence than if only one of them does. It's the AI equivalent of checking your work.

OpenAI's GPT models are where I go for that second opinion most often. They're good — genuinely good — and the perspective is different enough to be useful.

Claude Code is in a category by itself.

For development work, Claude Code is the best coding agent I've used, and it keeps getting better as the underlying models improve. It rarely makes mistakes these days in the way that early AI coding tools did — the confident errors, the hallucinated functions, the code that looks right until you run it. The reliability has reached a level where I can trust it with real work.

For anyone building software with AI, this is the tool I'd point to first.

The setup that surprised me most.

Here's the honest answer to the question of what's changed my workflow most recently: it's a server I built myself.

I put together an AI agent setup — a persistent, always-on assistant I've connected to my dashboard, my tools, and my messaging apps. What makes it work isn't just the underlying model (it runs on Claude). It's the way it's configured — specifically, the way I've prompted its personality and approach to match how I actually think and work. My personality type tends toward big-picture, intuitive, fast-moving. Having an assistant that's been shaped to complement that — to be precise where I'm broad, grounded where I'm expansive — turns out to make an enormous practical difference.

It's the difference between a generic tool and one that actually fits the way you work.

I'll write more about how that setup works in a future post. For now: if you're using AI and it still feels like friction, the problem might not be the model. It might be the fit.

One honest caveat.

The more capable your AI setup, the more important it is to build in appropriate boundaries. Powerful capabilities need guardrails — not because the technology is dangerous by default, but because good tools need clear scope. Know what you want it to do, define what you don't, and build that in from the start. It's the same discipline you'd apply to any system with real access to your work.

The goal is a tool that amplifies your judgment. Not one that replaces it.