Persona Skills for Claude Code: Why Naming an Expert Beats Role Prompting
Role prompting treats a persona as a one-line costume. This guide shows why naming an expert only pays off inside a Claude Code skill, backed by distilled references and a loading map that routes the model into its densest cluster.
Role prompting treats a persona as a costume. A persona skill treats it as a coordinate: a pointer to the densest thing the model already knows about your problem.
This is the second piece in a series.
The first covers SKILL.md, references, the loading map, and when a skill is worth building at all. If you have not shipped a skill yet, start with the complete guide to creating Claude Code skills.
This one is about a single decision inside that architecture: whether to name your skill after a person. Done casually, it is theater. Done right, it is the highest-leverage move in the whole design. The reason is mechanical, not stylistic.
Role prompting has a bad reputation, and the research backs it up
Search “role prompting” and you land in a debate the researchers mostly settled by changing their own minds mid-study.
The most cited paper on it is Zheng et al. (arXiv:2311.10054), later accepted to Findings of EMNLP 2024. Watch what happened to its own conclusion between drafts.
The first version, November 2023:
Through extensive analysis of 3 popular LLMs and 2457 questions, we show that adding interpersonal roles in prompts consistently improves the models’ performance over a range of questions.
(“Is ‘A Helpful Assistant’ the Best Role for Large Language Models?”, v1)
The final version, October 2024, after widening the test from three models to four full model families:
Through extensive analysis of 4 popular families of LLMs and 2,410 factual questions, we demonstrate that adding personas in system prompts does not improve model performance across a range of questions compared to the control setting where no persona is added.
(“When ‘A Helpful Assistant’ Is Not Really Helpful”, v3)
Same authors, same 162 personas, same dataset. They flipped the finding from “consistently improves” to “does not improve” and rewrote the title from a hopeful question into a flat verdict. Their own analysis adds the knife: even the single best persona per question cannot be picked automatically any better than “random selection.”
So the skeptics are right, and I agree with them. As a one-line label, role prompting is a costume. It changes tone, not knowledge. The model dresses up as “a marketing expert” and hands you what any LinkedIn post could say.
But look at what that study, and every study like it, actually tested: a role label, alone, in the system prompt, with nothing behind it. None of them tested a label attached to distilled references, decision frameworks, and a routing map. Because that is not a prompt. That is a skill.
A persona inside a skill is a different machine.
A persona is an activation mechanism, not a costume
Naming an expert does three things a role line cannot. Each one is a discipline, not a vibe:
- It forces you to study. To personify Hormozi, you have to extract his method. The costume needs no homework; the skill does.
- It aims at a denser cluster in the model. “Alex Hormozi” points at a specific corpus in the weights: books, talks, transcripts, phrases. “A marketing expert” points at an average.
- It raises the density of every answer, because the skill routes to that cluster instead of hoping the model wanders into it.
The rest of this article is those three, in order. The running example is an Alex Hormozi skill I use to evaluate offers, funnels, pricing, guarantees, and retention.
Force 1: personifying forces you to study
Here is the part nobody markets, because it sounds like work.
To personify someone well, you have to learn their method well enough to encode it. A role line lets you skip that step. “Act like Hormozi” costs nothing and teaches nothing. A persona skill won’t let you cheat: the references are empty until you fill them, and you can only fill them by reading, watching, and distilling until you can state each decision rule in your own words.
That constraint is the feature. The skill is downstream of your study. Building it is how you study.
Distill, don’t paste
The corpus is not the skill. Dumping three books and forty transcripts into references/ gives you a bigger costume, not a method.
Distillation means pulling out, in your own words:
- principles: the load-bearing beliefs
- frameworks: the repeatable structures (Value Equation, Grand Slam Offer, RAISE)
- decision criteria: what makes an offer pass or fail
- examples and anti-patterns: what good and bad look like, concretely
- calibration phrases: the lines that expose drift (more on those later)
Distillation is completion, not just compression.
The model already knows the public Hormozi from training data: interviews, books, videos. What it does not know is your organized version: the Delivery Cube you separated out, the decision rules you sharpened, the internal examples you use to judge a real offer. You read widely and keep only what makes a decision.
Skill anatomy
- 1
- SKILL.md router
- 13
- references/
- 80KB
- distilled method
- 0
- prompts to repeat
one file per decision domain
books, videos, playbooks
the study lives in the skill
Split those references by decision, not by source. The user’s question never arrives as “Book 2, chapter 4.” It arrives as a problem:
| File | Decision it supports |
|---|---|
00-canon.md |
Core offer principles |
01-value-equation.md |
Perceived value and pricing |
02-grand-slam-offer.md |
Irresistible offer construction |
04-pricing-and-guarantees.md |
Price, guarantee, and risk reversal |
07-closing-and-sales.md |
Objections and closing |
08-retention-and-ltv.md |
Retention and LTV |
11-decision-checklist.md |
Verdict on a specific offer |
Splitting by source (book-1.md, podcast-3.md) is good for archiving and bad for execution. Split by the decision the file has to make.
Force 2: the name aims at a denser cluster
An LLM’s knowledge is not evenly distributed.
Around a named expert with a real public body of work, it is dense: books, talks, transcripts, the phrases they repeat, the examples they reach for. Around an abstract category like “a marketing expert,” it is an average of everything and no one.
Naming is how you choose which region the model reaches into first. Treat that as a working model, not a documented switch. It predicts behavior reliably: a specific name pulls specific method, a generic one pulls the average.
Naming is activation
| Weak name | Sharper name |
|---|---|
| marketing | alex-hormozi-offer-design |
| writing | arthur-miller-script-review |
| seo | neil-patel-seo-strategy |
| teaching | feynman-explanation-review |
| funnels | russell-brunson-funnel-architecture |
The sharper name pulls specific frameworks and a specific voice. The generic name pulls the average.
But the name only opens the door. Walk through it with nothing in your hands and you hit the trap.
That is why Force 1 exists.
The name activates the cluster. Your references make the model use your version instead of the public one. Persona is not density. Activation is the persona’s job. References carry the method. Skip them and the skill is cosplay: a confident voice with nothing under it.
ExpertPrompting reaches the same conclusion: “imagine you are an expert” buys you almost nothing. The references you distilled are the part that pays.
Force 3: routing turns the cluster into density
Activation gets you into a good region of the weights. It does not guarantee the model reads your references. Left alone, the agent takes the shortcut and answers with whatever is easiest in the weights.
The loading map closes that gap. It is the explicit bridge between the user’s intent and the reference that should answer it.
| Reference | When to load |
| --------------------------------------- | ------------------------------------------ |
| references/00-canon.md | Any offer decision |
| references/01-value-equation.md | Questions about value, price or conversion |
| references/02-grand-slam-offer.md | Building a new offer |
| references/04-pricing-and-guarantees.md | Pricing, guarantee or risk reversal |
| references/07-closing-and-sales.md | Sales calls, objections, close rate |
Anthropic calls this progressive disclosure: the metadata (name and description) sits in context always, the SKILL.md body loads when the skill triggers, and the files under references/ load only when the model is told to reach for them. That last part is the catch. Nothing auto-loads the right reference. The model decides, from the instructions you wrote. The loading map is how you make that decision non-optional.
Without the map, the agent guesses, and guessing favors the weights over your files. With it, a price question loads pricing, a retention question loads retention, and the persona’s cluster arrives already narrowed to the decision at hand.
That is where density comes from. Not from the name. From the routing.
Forcing the method into the answer
“Please consult the references” is a suggestion, and the model treats it like one. You force density structurally, in layers that stack, each removing a way for the agent to answer from the weights instead of your references.
Four layers that force density
- L1
Explicit routing
The skill says which references to load for each type of question. Do not let the agent decide everything by itself. It will try to save effort.
- L2
Framework-dependent output
If the answer must apply the Value Equation, RAISE, or Grand Slam Offer, the model has less room to answer generically. The framework becomes a lock: to satisfy the output mode, the agent has to use the right reference.
Every recommendation must map to the relevant framework. If pricing is discussed, use RAISE. If offer value is discussed, use the Value Equation.
- L3
Decision checklist
A reference like decision-checklist.md forces the model to cross criteria before giving a verdict. Open advice becomes a lecture; a checklist becomes a decision.
- L4
Canonical phrases as calibration
Anchor behavior and expose drift. If the answer sounds like any AI assistant could have written it, the persona did not anchor. The phrases are the tell.
"Simple scales. Fancy fails." "Volume negates luck." "Revenue is vanity. Profit is sanity. Cash is reality."
Persona or function? The boundary of the technique
Not every skill should wear a face.
Use a persona when there is a strong corpus behind it: books, talks, interviews, published frameworks, playbooks, case studies. Alex Hormozi for offers, Russell Brunson for funnels, Neil Patel for SEO, Richard Feynman for clarity.
For internal technical work, a function is usually better:
security-reviewerdatabase-migration-plannerapi-contract-auditorreact-performance-reviewertechnical-seo-auditor
The rule is simple. If there is a person with a strong public method, a persona can help. If the work depends more on a technical checklist than a cognitive style, use a function. Forcing a persona where it does not belong makes the skill theatrical and reintroduces the costume you were trying to escape.
Persona skills are composable
I do not run one persona. I run several, each an independent module with its own contract, references, and expected output.
| Skill | Domain |
|---|---|
| Alex Hormozi | Offers, pricing, guarantees, value stack |
| Russell Brunson | Funnels, value ladder, conversion |
| Gary Vaynerchuk | Content, distribution, social |
| Neil Patel | SEO, keywords, organic content |
| Arthur Miller | Script and narrative |
| Richard Feynman | Didactic clarity |
I do not try to make two personas argue inside one call. If I need Hormozi and Brunson on the same strategy, I run one, take the output, and feed the other. Composition happens in orchestration, not inside the skill.
Think microservices. Each has a clear boundary, does one thing well, and returns something the next one can consume. A good persona skill has the same property. It does not try to solve the world.
How I build the first version of a persona skill
The process starts with the job. One sentence. If it does not fit in a sentence, the persona is too broad.
Building the first version
Input
One-sentence job. If it does not fit in a sentence, the persona is too broad.
- 01Name the corpus
Pick a person the model has real material on: books, talks, playbooks. No corpus, no persona.
- 02Distill, don't paste
Pull principles, frameworks, decision criteria, examples, anti-patterns, and calibration phrases in your own words.
- 03Split by decision
pricing-and-guarantees.md, closing-and-sales.md, not book-1.md, video-2.md.
- 04Write the loading map
Price question loads pricing. Retention loads retention. Obvious, and almost nobody does it.
- 05Test on a real case
A real offer, funnel, price, objection, competitor. A persona that only works on a toy example is not production ready.
Output
A persona skill you can version, test, and reuse: infrastructure, not a costume
For the Hormozi example, the corpus is concrete: $100M Offers, $100M Leads, $100M Money Models, the Acquisition.com playbooks, videos, interviews, and the canonical phrases. For an internal skill, the corpus can be ADRs, old pull requests, runbooks, postmortems, or approved specs. Same distillation, different face.
Checklist for a persona skill
Before I consider a persona skill ready, I run this:
Before a persona skill is ready
- Required:The name points at a real, recognizable corpus.
- Required:The description makes activation clear.
- Required:References are distilled in your own words, not pasted.
- Required:References are split by decision, not by source.
- Required:There is an explicit loading map.
- Required:The output mode forces the relevant framework.
- Required:Canonical phrases are present as calibration.
- Required:There is at least one real input and output example.
- Required:The skill is versioned in git.
- Required:You can test whether it consulted the references.
The mistake that fools everyone: persona for density
The most common persona failure is trusting the voice.
You anchor “Alex Hormozi,” the tone lands, the energy is right, and you assume the method came with it. It did not. The voice is free. The method is the 80KB you had to distill.
Persona improves activation. Density comes from references. Without references, the skill is cosplay.
So test it. Give the same real input before and after the skill, and compare: did it use the framework, load the right reference, drop the generic hedging, and stay consistent across two sessions? If you do not measure that, you are not running a persona skill. You are just believing in one.
FAQ
Does role prompting actually work?
As a prompt, barely. The most cited study on it (Zheng et al., arXiv:2311.10054, Findings of EMNLP 2024) tested 162 personas across four model families and found a bare persona label does not improve factual accuracy. It changes tone, not knowledge.
As a skill, yes. The difference is not the persona; it is the distilled references and the loading map behind it. Naming an expert with nothing under it is a costume. Naming one with a method under it is activation plus retrieval.
What does the research say about persona prompting and accuracy?
That the label alone is not enough. Zheng et al. found no reliable accuracy gain from personas in system prompts, and ExpertPrompting found that 'imagine you are an expert' scores about the same as no persona at all.
This article agrees with that finding and goes past it: the studies tested identity without knowledge. A persona skill supplies the missing half: the references and frameworks the expert would actually use.
Is a persona skill the same as a persona prompt?
No.
A prompt is a session instruction that decays. A persona skill is a persistent module: name, distilled references, loading map, output mode. Same behavior every session.
Does a skill need a persona at all?
No.
A persona helps when there is a recognizable body of public work. For internal technical tasks, a function is often better: reviewer, architect, security auditor, migration planner.
How much do I actually need to distill?
Enough to separate decision domains and make a real verdict.
My offer skill has 13 references and around 80KB. A tighter persona can start with 3 to 5 well-distilled references, as long as each one makes a decision.
Should Claude Code load every reference automatically?
No.
Write a loading map in SKILL.md that tells the agent which files to load for each type of question. That is what routes the persona's cluster to the decision at hand.
Can I use this outside Claude Code?
Yes.
The architecture is portable. SKILL.md is one implementation, but naming a corpus, distilling references, and routing to them works for any agent that supports persistent context and reference files.
Closing
Role prompting earned its bad reputation honestly. As a one-line costume, it changes tone, not knowledge.
A persona skill is built differently. The name is a coordinate, not decoration. It forces you to study the method, aims the model at its densest cluster, and routes answers through your references instead of the internet’s average.
A persona prompt lasts one session.
A persona skill is the study, the corpus, and the routing. Run it on a real problem. Either the references loaded and the answer is different, or you still have a costume.