Lean Harness, Fat Skill: The Real Source of 100x AI Productivity

By: blockbeats|2026/04/13 13:28:17

Original Article Title: Thin Harness, Fat Skills
Original Article Author: Garry Tan
Translation: Peggy, BlockBeats

Editor's Note: As "stronger models" become the default answer in the industry, this article provides a different perspective: what truly widens productivity gaps by 10x, 100x, or even 1000x is not the model itself, but a whole system design built around the model.

The author of this article, Garry Tan, current President and CEO of Y Combinator, has long been involved in AI and early-stage startup ecosystems. He introduces the "fat skills + thin harness" framework, breaking down AI applications into key components such as skills, runtime framework, context routing, task division, and knowledge compression.

In this system, the model is no longer the entire capability but merely an execution unit within the system. What truly determines output quality is how you organize context, solidify processes, and delineate the boundary between "inference" and "computation."

More importantly, this approach is not merely conceptual but has been validated in real scenarios: faced with data processing and matching tasks from thousands of entrepreneurs, the system achieves capabilities close to human analysts through a "read-summarize-infer-write back" loop, continuously self-optimizing without the need for code rewrites. This "learning system" transforms AI from a one-off tool to an infrastructure with a compounding effect.

Thus, the core reminder provided in the article becomes clear: in the AI era, efficiency gaps are no longer determined by whether you use the most advanced model but by whether you have built a system that can continuously accumulate capabilities and evolve automatically.

The following is the original text:

Steve Yegge said that those using AI programming agents are "10 to 100 times more efficient than engineers who only code with a cursor and chat tools, roughly 1000 times more efficient than a 2005 Google engineer."

Note: Steve Yegge is a highly influential software engineer, technical blogger, and engineering culture commentator in Silicon Valley, known for his sharp, lengthy, and strongly opinionated technical articles. He has served as a senior engineer at companies such as Amazon and Google, later joining Salesforce, then moving to startups in the AI space, and also being one of the early advocates of the Dart project.

This is not an exaggeration. I have seen it with my own eyes and experienced it firsthand. However, when people hear about such a gap, they often attribute it to the wrong factors: a stronger model, a smarter Claude, more parameters.

In reality, the person who is twice as efficient and the one who is a hundred times more efficient are using the same model. The difference is not in "intelligence" but in "architecture," and this architecture is so simple that it can fit on a notecard.

The Harness (Execution Framework) is the Product Itself.

On March 31, 2026, in an unexpected turn of events, Anthropic accidentally released the complete source code of the Claude Code to npm—totaling 512,000 lines. I read through it all. This validated something I have always talked about at YC (Y Combinator): the real secret is not in the model but in the "layer that wraps the model."

Real-time codebase context, Prompt cache, tools designed for specific tasks, maximum compression of redundant context, structured session memory, parallel-running subagents—none of these make the model smarter. But they can provide the model with the "right context" at the "right time," while avoiding being overwhelmed by irrelevant information.

This wrapping layer is called the harness (execution framework). And the real question all AI builders should ask is: What should go into the harness, and what should stay outside?

Interestingly, this question has a very specific answer—a thin harness, fat skills.

Five Definitions

The bottleneck has never been in the intelligence of the model. The model already knows how to reason, synthesize information, and write code.

They fail because they do not understand your data—your schema, your agreements, what shape your problem takes. And the following five definitions are precisely designed to address this issue.

1. Skill File

A skill file is a reusable markdown document that teaches the model "how to do something." Note that it does not tell it "what to do"—that part is provided by the user. The skill file provides the process.

The key point that most people overlook is this: a skill file is actually like a method call. It can take parameters. You can call it with different parameters. The same process, when called with different inputs, can demonstrate vastly different capabilities.

For example, there is a skill called /investigate. It consists of seven steps: Define data scope, Build timeline, Diarize each document, Synthesize, Argue from both sides, Cite sources. It takes three parameters: TARGET, QUESTION, and DATASET.

If you point it at a security scientist and 2.1 million forensic emails, it will turn into a medical research analyst to determine if a whistleblower has been suppressed.

If you point it at a shell company and the Federal Election Commission (FEC) disclosure filings, it will transform into a litigation forensics investigator to trace coordinated political donations.

Same skill. Same seven steps. Same markdown file. The skill describes a decision-making process, and what actually brings it to life is the input parameters at runtime.

This is not prompt engineering but software design: only here, markdown is the programming language, and human judgment is the runtime environment. In fact, markdown is even more suitable for encapsulation than rigid source code because it describes the process, judgment, and context, which happen to be the language the model understands best.

2. Harness (Runtime Framework)

Harness is the layer of software that drives the LLM's operation. It only does four things: run the model in a loop, read and write your files, manage context, and enforce safety constraints.

That's it. That's "thin."

The opposite pattern is: fat harness, thin skills.

You've probably seen this: over 40 tool definitions, where the documentation alone takes up half the screen; an all-powerful God-tool that takes 2 to 5 seconds roundtrip to the Controllable Data Generator; or wrapping every endpoint of a REST API into a separate tool. The result is triple the token usage, triple the latency, and triple the failure rate.

The truly ideal approach is to use purpose-built tools that are fast and narrowly focused.

For example, a Playwright CLI that takes only 100 milliseconds for each browser operation; not a Chrome MCP that takes 15 seconds to do a screenshot → find → click → wait → read. The former is 75 times faster.

Modern software no longer needs to be "over-engineered." What you should do is: only build what you truly need and nothing more.

3. Resolver

A resolver is essentially a context routing table. When task type X occurs, document Y is loaded as a priority. Skills tell the model "how to do"; resolvers tell the model "when to load what."

For example, a developer changes a certain prompt. Without a resolver, they might just finish the change and release it right away. With a resolver, the model would first read docs/EVALS.md. This document would say: run the evaluation suite first, compare scores before and after; if accuracy drops by more than 2%, roll back and investigate the reason. This developer may not have even known about the existence of the evaluation suite. It is the resolver that loads the right context at the right time.

Claude Code comes with a built-in resolver. Each skill has a description field, and the model automatically matches the user's intent to the skill's description. You don't even need to remember whether the /ship skill exists—the description itself is the resolver.

To be honest, my previous CLAUDE.md was a whopping 20,000 lines long. Every quirk, every pattern, every lesson I had learned was crammed into it. Utterly absurd. The model's attention quality significantly decreased. Claude Code even directly told me to get rid of it.

The final fix was probably only 200 lines—keeping only a few document pointers. Let the resolver load the necessary document at the crucial moment. This way, 20,000 lines of knowledge can still be accessed when needed without polluting the context window.

4. Latent and Deterministic

In your system, every step is either in this category or that. And confusing these two is the most common mistake in agent design.

· Latent space is where intelligence resides. The model reads, understands, judges, and decides here. It deals with: judgment, synthesis, pattern recognition.

· Deterministic is where trustworthiness resides. Same input, always the same output. SQL queries, compiled code, arithmetic operations all belong to this side.

A single LLM can help you seat 8 people for a dinner party, taking into account each person's personality and social dynamics. But if you ask it to seat 800 people, it will earnestly generate a "seemingly reasonable but actually completely wrong" seating chart. Because it's no longer a matter of potential space that needs handling, but a deterministic problem that has been forcibly squeezed into the latent space—a combinatorial optimization problem.

The worst systems always misplace the work on either side of this boundary. The best systems, however, will starkly delineate the boundary.

5. Diarization (Document Clustering / Topic Portraiture)

This diarization step is what truly gives AI the ability to produce value in working with real-world knowledge.

It means: the model reads through all materials related to a topic, then produces a structured portrait. Condensing judgments from dozens or even hundreds of documents onto a single page.

This is not something an SQL query can produce. Nor is it something an RAG pipeline can produce. The model must actually read, hold contradictory information in mind simultaneously, note what changed, when it changed, and then synthesize these contents into structured intelligence.

This is the difference between a database query and an analyst briefing.

This Architecture

These five concepts can be combined into a very simple three-layer architecture.

· The top layer is Fat Skills: processes written in markdown, carrying judgments, methodologies, and domain knowledge. 90% of the value resides in this layer.
· The middle layer is a thin CLI harness: about 200 lines of code, taking JSON input, producing text output, defaulting to read-only.
· The bottom layer is your application system: QueryDB, ReadDoc, Search, Timeline—these are deterministic infrastructure.

The guiding principle is directional: push "intelligence" as high up as possible into skills; push "execution" as far down as possible into deterministic tools; keep the harness light.

The result is: every time the model's capabilities improve, all skills automatically become stronger; while the foundational deterministic systems remain stable and reliable.

Learning Systems

Below, I will use a real system we are building at YC to show how these five definitions work together.

In July 2026, Chase Center. Startup School has 6000 founders in attendance. Everyone has structured application materials, questionnaire responses, transcripts of 1:1 mentor conversations, and public signals: posts on X, GitHub commit history, and usage of Claude Code (indicating their development speed).

The traditional approach is for a 15-person project team to read applications one by one, make intuitive judgments, and then update a spreadsheet.

This method can work with 200 people, but it completely fails with 6000 people. No human can hold so many profiles in their mind and realize that the AI agent infrastructure suggests the top three candidates for direction: the founder of a development tool in Lagos, a compliance entrepreneur in Singapore, and a CLI tool developer in Brooklyn—each of whom, in different 1:1 conversations, described the same pain point using completely different expressions.

The model can do it. Here's how:

Enrichment

There is a skill called /enrich-founder, which pulls from all data sources, performs enrichment, diarization, and highlights the difference between "what the founder said" and "what they are actually doing."

The underlying deterministic system handles: SQL queries, GitHub data, browser tests of Demo URLs, social signal extraction, CrustData queries, etc. A scheduled task runs once a day. The profiles of 6000 founders are always up to date.

The output of diarization can capture information that keyword searches could never find:

Founder: Maria Santos Company: Contrail (contrail.dev) Self-description: "Datadog for AI agent" Actual activity: 80% of code commits are focused on the billing module → Essentially building a FinOps tool disguised as an observability tool

This difference between "what is said and what is done" requires reading GitHub commit histories, application materials, and conversation records simultaneously and integrating them mentally. No embedding similarity search or keyword filtering can achieve this. The model must read in full and then make judgments. (This is exactly the kind of task that should be in the latent space!)

Matching

This is where "skill = method invocation" shines.

With the same matching skill, calling it three times can result in completely different strategies:

/match-breakout: Handle 1200 people, cluster by domain, group of 30 each (embedding + deterministic assignment)

/match-lunch: Handle 600 people, cross-domain "randomized matching," 8 people per table without repetition — LLM first generates topics, then a deterministic algorithm arranges the seats

/match-live: Handle live on-site participants, based on nearest neighbor embedding, complete 1-on-1 matching within 200ms, and exclude people who have already met

The model can also make judgments that traditional clustering algorithms cannot achieve:

"Both Santos and Oram fall under AI infrastructure, but they are not in a competitive relationship — Santos does cost attribution, Oram does orchestration. They should be placed in the same group."
"Kim's application stated developer tools, but the 1:1 conversation revealed they are working on SOC2 compliance automation. Should be reclassified under FinTech / RegTech."

This kind of reclassification is completely missed by embeddings. The model must read the entire profile.

Learning Loop

After the event, an /improve skill reads the NPS survey results, conducts diarization on those feedback categorized as "okay, but could be better" — not negative reviews, but those that are almost there — and extracts patterns.

It then proposes new rules and writes them back into the matching skill:

When a participant mentions "AI infrastructure," but over 80% of their code is for billing:
→ Categorized as FinTech, not AI Infra

When two people in the same group already know each other:
→ Reduce matching weight
Prioritize introducing new relationships

These rules are written back to the skill file. They take effect automatically on the next run. Skills are "self-editing." In the July event, "okay, but could be better" ratings accounted for 12%; in the next event, it dropped to 4%.

The skill file learns what "okay" means, and the system gets better without anyone rewriting the code.

This pattern can be migrated to any field:

Retrieve → Read → Diarize → Count → Synthesize

Then: Research → Investigate → Diarize → Rewrite skill

If you were to ask what the most valuable loop of 2026 is, it's this one. It can be applied to almost any knowledge work scenario.

Skill is a Permanent Upgrade

I recently posted a command to OpenClaw on X, which received a much bigger response than expected:

Prompt: You are not allowed to do one-off work. If I ask you to do something that will repeat in the future, you must: manually process the first time 3 to 10 samples, show me the results; If I approve, turn it into a skill file; If it should run automatically, add it to the scheduled task. The criterion is: If I need to ask a second time, you have failed.

This content received thousands of likes and over two thousand bookmarks. Many people thought this was a prompt engineering technique.

Actually, it's not. It's the architecture mentioned above. Every skill you write is a permanent upgrade to the system. It won't degrade, won't be forgotten. It will run automatically at three in the morning. And when the next generation model is released, all skills will instantly become stronger—the judgment ability of the latent part improves, while the deterministic part remains stable and reliable.

This is where Yegge's 100x efficiency comes from.

Not from smarter models, but from: Thick Skills, Thin Harness, and the discipline of solidifying everything into capabilities.

The system will grow exponentially. Build once, run long-term.

[Original Article Link]

On June 9, The Kobeissi Letter, citing Goldman Sachs data, reported that global investors are selling South Korean stocks at an unusually rapid pace. In the latest trading session, foreign investors sold about $801 million worth of Kospi constituent stocks again; total foreign outflows last week reached about $10 billion, and the market has been in net foreign selling on nearly every trading day over the past month. According to the data cited in the report, foreign investors have sold about $75 billion worth of South Korean stocks so far this year. Meanwhile, South Korean retail and institutional investors together recorded roughly $69 billion in net buying over the same period, suggesting that the market’s main buying support has come from domestic capital rather than returning overseas funds. The information currently disclosed still mainly comes from The Kobeissi Letter’s retelling and Goldman Sachs data summaries, while public details on the statistical period and the specific definition of “selling” remain relatively limited.

Fortune Warns of Strategy’s Financing Structure Risks as Bitcoin Premium Narrows

Fortune warned that Strategy’s Bitcoin treasury model faces growing financing risks as MSTR’s net asset premium narrows and preferred stock dividend pressure increases.

Ferrari Challenge Le Mans: Carl Moon to Dominate in WEEX Livery

The art of absolute control. Inside Carl Moon’s Ferrari 296 Challenge quest at Le Mans, taming the storm together with the official WEEX livery.

Sahara AI Responds to SAHARA’s Sharp Drop: No Contract or Product Security Issues Found, Internal Investigation Underway

Sahara AI responded to SAHARA’s 60% price drop, saying no token contract or product security issues have been found and an internal investigation is underway.

WEEX Deposit/Withdrawal Dynamic Island: Your Asset Status, Always in Sight

WEEX introduces Deposit and Withdrawal Info on Dynamic Island for iOS. See fund transfer progress on your dynamic island, lock screen, or while using other apps. No more guessing. No more refreshing.

Scaling Crypto Derivatives: The Digital Asset Infrastructure Behind High-Volume Trading

In the fast-moving digital asset ecosystem, derivatives platforms face an extreme architectural test. High-leverage futures markets demand more than just standard security—they require absolute operational precision, zero-latency matching engines, and ironclad structural scalability, all while navigating intense market volatility.

As global platforms scale to meet these demands, the industry is shifting away from rigid, monolithic setups toward a more agile, "decoupled" infrastructure philosophy.

The Blueprint for High-Volume Copy Trading

For elite global exchanges like WEEX (founded in 2018), this architectural choice becomes critical when scaling high-volume retail features like social copy trading. When thousands of users automatically mirror the real-time strategies of elite traders simultaneously, it triggers sudden, monumental spikes in concurrent transactional volume.

To prevent execution latency or settlement bottlenecks during these peak volatility events, a platform's primary engine must remain entirely dedicated to risk management, copy-trade synchronization, and order matching.

The Architectural Rule: New-generation platforms must separate front-end user execution engines from heavy backend infrastructural overhead to eliminate operational friction.

By separating these layers, platforms can maintain complete sovereignty over their trading environments and user experiences while strategically aligning with institutional-grade infrastructure ecosystems. This strategic framework allows modern exchanges to leverage advanced Digital Asset Custody infrastructure such as Cobo’s behind the scenes, ensuring that backend wallet management scales elastically alongside trading spikes.

Capitalizing on Market Momentum and 400× Leverage

In a derivatives arena where platforms offer up to 400× leverage on perpetual contracts, capital efficiency and market agility are core business metrics. To capture market momentum, an exchange needs the ability to rapidly expand its asset offerings, supporting everything from legacy crypto assets to sudden, trending altcoins across a massive library of trading pairs.

Adopting a flexible, scalable Wallet-as-a-Service (WaaS) solution such as Cobo’s could completely rewrite the development timeline for high-growth exchanges. Instead of spending months of engineering capital building out custom backend wallet architectures for every new blockchain network, platforms can deploy localized infrastructure in days.

This agility allows platforms to instantly scale their listings to over a thousand trading pairs without compromising security or delaying time-to-market. It mirrors the exact operational advantages seen during high-velocity market events, similar to how advanced wallet infrastructure empowers platforms during sudden asset surges; allowing exchanges to pass that speed and liquidity directly to their global user base.

A Mature Foundation for Growth

The synergy between trusted infrastructure ecosystems and global trading platforms represents the natural evolution of a maturing crypto market. As WEEX continues to scale its global spot and derivatives offerings for over 6 million users, adopting robust backend paradigms proves that platforms no longer have to compromise between cutting-edge trading velocity and uncompromised structural security.

Get Paid to Onboard? Try WEEX’s New Homepage with Rewards for Registration, Deposit & Trade

WEEX just launched a brand new homepage and a 3-step new user onboarding guidance. Complete Registration → Deposit → Trade to earn exclusive rewards. Faster navigation, clear progress, and instant bonuses. Download the latest WEEX App to try it now.

WEEX Custom Layout: Build Your Perfect Trading Workspace in Seconds

WEEX introduces custom layout on futures trading page: left/right panel switch, hide/show core modules, full-screen focus, and one-click reset. Trade your way now.

Morning Report | BitMine increased its holdings by 126,971 ETH last week; trader Eugene announced his exit from the crypto market

Overview of Important Market Events on June 8th

Wang Chuan: How can one not feel anxious after the neighbor Old Wang made thirty times profit by investing in storage stocks? (Seven) - A quarter-century cycle