Intelligence Is All You Need

Colin Levy
14 minutes ago
7 min read

By Chad Atlas

The pace of progress in foundational AI is nothing short of explosive, and legal technology is feeling the impact in real time. In just the past month or so, Google’s latest Gemini models jumped to the top of intelligence leaderboards for the first time. OpenAI released ChatGPT o3 and then o3-pro, their most advanced reasoning model yet to a broader range of users. And Anthropic upgraded Claude from version 3.7 to 4.0. Advances in core AI capabilities are directly reshaping what is possible in legal tech. The speed of progress is now measured in weeks, not months.

This acceleration is not just hype. Recently, prominent legal tech startup Harvey announced it would integrate multiple foundation models from Anthropic and Google, explaining that “general foundation models have improved at baseline legal reasoning” so dramatically that optimization can now focus on task execution rather than baseline reasoning. Their pivot reflects a broader industry reality: the intelligence itself has become so capable that traditional engineering approaches are being rendered obsolete—now handled by the models themselves.

For a sense of what this means, I recently tested one of these models on a complex antitrust law exam shared by a professor friend curious about the tech’s capabilities. The model, as graded by the professor, earned at least a B+, possibly an A-, on a question that would challenge all but the most capable law students. Three prompts, less than 15 minutes. (My friend is not easily impressed; in this case, he was.)[1]

This validates something I’ve theorized for several years as a CLO and startup advisor (and written about before): the most significant advances in legal AI come from improvements in the underlying models, not from specialized wrappers or specific legal adaptations. Raw intelligence is what matters—and we are now seeing that play out in real time.

Yet most lawyers and legal leaders evaluating tech investments have not realized how quickly this shift is happening. Many are still focused on products, feature lists, and workflow demos, rather than the real driver: the intelligence powering it all.

Understanding the Architecture

So, what actually powers today’s legal AI? Virtually every tool you see can be modeled as a simple three-layer architecture:

Layer 1: Intelligence. The foundation model (ChatGPT, Claude, Gemini) that does the actual reasoning, analysis, and text generation.

Layer 2: Engineering. The plumbing, piping, and orchestration that make the AI useful for specific legal tasks,;retrieval systems that fetch documents, laws, and regulations; prompts that guide the AI’s behavior; workflows that chain together multiple steps to execute actual work; and various connections to legal databases or internal document systems.

Layer 3: Application. The user interface: a web app, a Word plugin, or whatever makes the system accessible to the end user.

That’s it. Intelligence at the bottom, plumbing in the middle, interface on top.

The Bitter Lesson: Why General Intelligence Wins

Here is the crucial part: that middle layer, the plumbing, has historically existed because the foundational intelligence had clear limits. The AI was like a smart lawyer without any access to law books or a computer. You could ask it questions, and it would respond from vague memories and experience, and that was that. It often made things up or got things wrong. Still, it was smart, so legal tech companies built elaborate engineering around these models to compensate. Retrieval systems searched legal databases or documents and brought back snippets for the AI to review; workflow chains tried to mimic what lawyers do and allow the lonely AI lawyer to execute on projects.

But this engineering was always a stopgap. The retrieval systems were primitive and often returned irrelevant results, missed context that matters, or broke down when matters took an unexpected turn. Every additional question to the AI added cost, so vendors cut corners to keep prices down and minimize the engineering and workflow complexity as much as possible. The demos looked slick, but the reality was fragile.

This pattern is not unique to law. AI researcher Rich Sutton’s famous essay The Bitter Lesson described this trend precisely: “The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin.” Sutton observed that “AI researchers have often tried to build knowledge into their agents, this always helps in the short term, and is personally satisfying to the researcher, but in the long run it plateaus and even inhibits further progress, and breakthrough progress eventually arrives by an opposing approach based on scaling computation.” We have seen this pattern play out in chess, image recognition, language translation and now in legal tech.

What is happening now is that models like ChatGPT o3, Claude 4.0, and Google’s Gemini 2.5 Pro are swallowing the stack—meaning the models themselves are now handling much of the retrieval, workflow, and even some application-level reasoning that used to require hand-coded intervention. Leading legal AI companies have quietly pivoted to this reality, focusing less on custom engineering and more on harnessing the raw, ever-improving power of the latest models.

The general approach wins, again.

The DIY Question: When to Build vs. Buy

This creates an interesting dilemma for legal teams: How much should you rely on pre-built solutions versus working directly with the intelligence yourself?

My father was a perfectionist handyman who once spent many weekends meticulously working on the wood framing around our windows rather than hiring a contractor. The results were superior, but the time investment was enormous—and one could argue that perfect became the enemy of good. Different people have different value preferences, different skill levels, and different tolerance for complexity.

The same dynamic applies to legal AI. If you are a lawyer who can find relevant cases and statutes, extract key facts, and clearly structure problems, you are already better at “wrapping” the intelligence than most hard-coded software solutions. Working directly with advanced models, e.g. prompting, iterating, and fact-checking, often delivers results faster, more transparently, and at lower cost than pricey legal tech platforms.

But not everyone wants to be their own handyman. There is legitimate value in platforms that handle infrastructure, route tasks to the best models, and provide peace of mind for privacy or compliance. For a busy legal team, a trustworthy solution can mean less IT overhead, better integration, and more time focused on what matters.

The key is understanding what you are paying for.

Why Lawyering Is Still Different

Law remains different from other domains where AI has achieved breakthrough performance. Lawyers are not just looking for productivity—they need transparency, control, and the ability to interrogate every step. Most lawyers I know, when handed a template, playbook, or automated workflow, want to see exactly what it is, how it works, and adapt it to their context. Accepting someone else’s black box—especially in high-stakes matters—rarely feels right. (Indeed, we have an ethical duty to ensure we reasonably understand the issues and applicable law.)

This is why working directly with the most capable models may be optimal for many legal workflows. You retain control, can synthesize the best inputs, and understand the reasoning behind outputs. You can see the model’s work, challenge its conclusions, and iterate until you are satisfied. The interaction becomes a kind of productive intellectual dialogue rather than a passive consumption of pre-packaged results.

If you do adopt a legal tech platform or workflow solution, make sure you know what is inside the box: whose judgment are you trusting? Whose templates, whose playbooks, whose risk tolerances? The transparency question is not just about understanding the technology—it is about understanding whose legal judgment is baked into the system.

Proof in Practice: Where Wrappers Matter—and Where They Don’t

Some wrappers and orchestration layers genuinely add value, especially as intelligence gets cheaper and more accessible. Software engineers have flocked to tools like Cursor, which offers a code editor built around AI. It isn’t just about the underlying model’s intelligence—Cursor’s workflow, search, and integration features make it easier for users to harness that intelligence effectively. OpenAI’s recent acquisition of such a development tool, Windsurf, for $3 billion, suggests that intelligence wrappers have real value. These “application layers” matter when they truly enable new kinds of productivity and collaboration.

The same principle applies in legal. For example, Harvey’s shift toward task execution rather than baseline reasoning represents an intelligent adaptation to the new reality. There is real value in systems that understand legal context, maintain proper citation formats, integrate with existing workflows, and handle the mundane but critical details that practicing lawyers need. (Bureaucracy solutions in a box; yes, please.)

But the bar for genuine value is getting higher. Templates, basic prompt libraries, and simple workflow automation have limited long-term differentiation when the underlying intelligence can handle these tasks directly.

The Real Evaluation Question

When evaluating a legal AI solution, ask yourself: what am I really paying for? Is there genuine value-add I cannot replicate? Or am I paying for packaging around the same core intelligence I could access directly?

Consider these criteria:

● Integration complexity: Does the solution handle genuinely difficult technical integration, or could you achieve similar results with direct access?

● Legal domain expertise: Are the prompts, workflows, and guardrails meaningfully better than what you could develop yourself?

● Transparency and control: Can you understand and modify the system’s behavior, or are you locked into someone else’s judgment?

● Cost structure: Are you paying a reasonable premium for convenience, or multiples above the underlying intelligence cost?

The question isn’t whether wrappers are good or bad—it is whether you understand where the value comes from, what tradeoffs you are making, and how much transparency or control you're willing to give up for convenience.

Looking Forward

Rich Sutton’s “bitter lesson” from AI research is clear: as intelligence gets stronger and cheaper, custom engineering layers get swallowed up. But that does not mean all wrappers disappear only that the bar for real value is getting higher. Legal tech companies that recognize this reality, like Harvey pivoting to task execution, are positioning themselves to add genuine value rather than just repackaging commodity intelligence.

For legal teams, this means being more sophisticated about what you are buying. The most successful legal departments will likely combine direct use of frontier models for complex reasoning with specialized tools for specific workflows where the wrapper genuinely adds value.

Intelligence is all you need, provided you know how to wield it and critically assess the value of everything built on top.

About the Author

Chad Atlas is Chief Legal & Ethics Officer at an AI-first fintech startup and advisor to early- to late-stage biotech companies. He has over two decades of legal experience spanning federal clerkships, BigLaw litigation, and executive leadership roles at a clinical-stage biotech company.

His philosophy and computer science background from Duke initially fueled his interest in the intersection of law and emerging technology. He recently launched No Vehicles in the Park, where he writes about legal AI, professional judgment, and the evolving legal landscape.

[1] Since submitting this article for publication, professors at the University of Maryland Law School released a paper stating that o3 (the same model I used) got three A+s, one A-, two B+s, and a B on exams they tested it on. Link to the paper here.