navigate open Esc close

Notes

Quick Thoughts on AI

Short, unstructured notes on the current state of AI: what is real, what is hype, what to watch, and what to ignore.

A collection of short, unstructured thoughts on AI, as of mid-2026. Not a coherent essay. Just things I am thinking about, written down so I can come back to them in a year and see what I got right or wrong.

On capability#

  • The pace of model improvement has slowed. The jump from GPT-4 to GPT-5 was meaningful but incremental, not a paradigm shift. The jump from GPT-3 to GPT-4 was a paradigm shift.
  • Reasoning models (o1, o3, Claude with extended thinking) have carved out a real niche. They are not better at everything, but they are dramatically better at math, code, and multi-step planning. For those tasks, the latency cost is worth it.
  • Multimodal is table stakes now. Every frontier model handles text, images, audio, and video. The interesting question is what to do with it, not whether it works.
  • Agents work, in narrow domains. The “general-purpose agent that can use a computer” demo is still a demo. Real agents are scoped: a coding agent that can read a repo, a research agent that can search and synthesise, a customer-service agent that can look up orders.

On the economy#

  • The cost of inference is collapsing. Token prices have fallen by roughly 10x per year for two years. At this rate, what costs $1 today will cost $0.0001 in five years. This changes the economics of every product built on top of these models.
  • The training cost is exploding. The leading labs are spending $1–10 billion per training run. Only a handful of companies can afford to play at this level. The rest are fine-tuning existing models, which is good enough for most applications.
  • The “AI replaces white-collar work” narrative is half right. AI is changing the nature of white-collar work, not eliminating it. The lawyer still reviews the contract; the contract is just better drafted. The analyst still makes the call; the analysis is just faster.
  • The “AI does nothing useful” narrative is wrong. Every developer I know uses AI tools daily. Most writers, researchers, and designers I know do too. The productivity gains are real, even if they are hard to measure.

On open source#

  • Open-weights models have closed most of the gap with closed models. Llama, Mistral, Qwen, and DeepSeek are all within 10–20% of frontier performance for most tasks. For the cost, the gap is essentially closed.
  • The fine-tuning ecosystem has matured. LoRA, QLoRA, and parameter-efficient fine-tuning are accessible to anyone with a GPU and a few hours. The cost of customisation has collapsed.
  • The inference ecosystem has not matured. Running these models in production is still hard. vLLM, TGI, and other engines work, but they require expertise. The “AI cloud” market is real and growing.

On safety#

  • The safety conversation has shifted. The early-2020s framing was “AI will kill us all.” The mid-2020s framing is “AI is being misused for fraud, harassment, and disinformation.” The first framing is not entirely wrong, but the second is more actionable.
  • Misuse is the dominant near-term risk. Voice cloning for scams. Deepfakes for non-consensual imagery. Disinformation at scale. Election interference. All of these are happening now, and the defences are weak.
  • Long-term risk is still a thing. The alignment problem is unsolved. The interpretability problem is unsolved. The governance problem is unsolved. The probability of a catastrophic outcome from advanced AI is low, but not zero, and the cost of being wrong is high.
  • The labs have responded to the safety concerns with more paperwork and more internal review. Whether this is genuine progress or compliance theatre is a matter of opinion. I lean towards “some of each.”

On products#

  • The “AI wrapper” SaaS companies are struggling. The moat is thin, the differentiation is thin, and the foundation model can absorb any feature overnight. The companies that are doing well are the ones with proprietary data, deep workflow integration, or both.
  • Coding tools are the clearest success story. Cursor, Copilot, and similar tools have changed how professional developers work. The productivity gains are real, measurable, and durable.
  • Customer service is the next big category. Voice agents that sound human, can look up accounts, and can resolve common issues are rolling out at scale. The quality is uneven, but improving.
  • Education is a wild card. AI tutors are real, they are cheap, and they are accessible to anyone with a phone. The impact on educational outcomes is not yet clear, but the potential is enormous.

On what to watch#

  • The next generation of multimodal models. Specifically, can they reason about video in a useful way? “Watch this 30-minute lecture and answer these questions” is a capability that does not exist yet.
  • The next generation of agents. Specifically, can they operate a computer end-to-end? “Log in to this SaaS, find this data, generate this report” is a capability that is partially there and will get better.
  • The next generation of hardware. Specifically, what does Nvidia look like in five years? The custom silicon from Google, Amazon, Microsoft, and Meta is catching up. The dependence on a single vendor is a real risk.
  • The next generation of policy. Specifically, what does the EU AI Act actually do in practice? What does the US executive order actually do? What does India’s approach end up being? The policy landscape is unsettled, and the next two years will matter.

On what to ignore#

  • AGI timelines. Nobody knows. Anyone who claims to know is selling something.
  • Sentience debates. The models are not conscious. The question of whether they could be is interesting but not actionable.
  • AI vs. human creativity debates. The piano did not replace the composer. The camera did not replace the painter. The right framing is “what new things can we make?”
  • Twitter threads claiming that a single demo changes everything. The interesting questions are about deployment, distribution, and durability. A demo is the start of the conversation, not the end.

A closing note#

I am not an AI researcher. I am a software engineer who uses these tools daily, reads the papers when I can, and is trying to figure out which parts of the hype are real.

My current best guess: AI is a general-purpose technology, on a par with electricity or the internet. It will take 10–20 years to fully diffuse, and the impact will be uneven across industries and countries. The companies and individuals that thrive will be the ones that adopt early, build real workflows, and learn to ignore the noise.

The most useful thing I can say is: use the tools. Build with them. The future is being written now, and the best way to predict it is to participate.