Products

Anthropic

Products

Beginner

Anthropic3/26/2026

Key Summary

•This report studies how people used Claude during one week in February 2026 to see what kinds of jobs and tasks AI is helping with across the economy.
•Use cases on Claude.ai became more diverse, with the top ten tasks shrinking from $24\%$ to $19\%$ of all use (Example: out of 1,000 chats, that’s 240 dropping to 190).
•Coding work is moving from the chat website to automated API workflows, where many small code steps are split into separate tasks.
•The average dollar value of tasks on Claude.ai dipped from $\$49.30$ to $\$47.90$ per hour equivalents (Example: for 10 hours of such work, that’s $\$493$ vs. $\$479$).
•Across countries, usage stayed unequal and even became slightly more concentrated, while US states kept slowly catching up with each other.
•People pick stronger models like Opus more often for tougher, higher-paid tasks, and this effect is even bigger in API workflows.
•Experienced users (6+ months) do more work-like and higher-education tasks, collaborate more with Claude, and have higher success rates in conversations.
•Statistical checks show higher-tenure users are still about $+4$ percentage points more successful even after controlling for task types, models, and countries (Example: if others succeed $70\%$ of the time, they succeed about $74\%$).
•Automation patterns are rising in some API areas like sales outreach and market operations, while chat usage shows slightly more learning and validation.
•These patterns hint that early, skilled adopters may gain more from AI first, which could contribute to skill-biased changes in the job market.

Why This Research Matters

This report shows that real value from AI depends on people choosing the right model for the right task and getting better at using it over time. That means training, onboarding, and smart defaults can help more people benefit, not just early experts. It also highlights that some kinds of work, like sales outreach and market monitoring, are moving toward automation faster than others. Policymakers can use these insights to target support where adoption lags and where jobs may change quickly. Businesses can reorganize workflows to boost augmentation, reserving automation for reliable, well-scoped steps. Educators can help students climb the learning curve sooner with good prompts and model selection habits. Overall, thoughtful practice plus access can turn AI into a broader opportunity engine rather than a driver of inequality.

Reading Workflow

Turn this paper into a decision

Scan fast. Promote only the papers that survive triage.

No workflow history yet.

Detailed Explanation

Tap terms for definitions

01Background & Problem Definition

🍞 Hook: Imagine your school gets a new super calculator. At first, only a few math whizzes try it for big problems. Later, everyone starts using it for lots of different homework—and even for planning a bake sale.

🥬 The Concept: What this report is about in one sentence: it’s a careful, privacy-preserving look at how people actually use Claude across many kinds of tasks, how that’s changing over time, and what it might mean for jobs and the economy. How it works (story of the world before):

Before tools like Claude, people did everything themselves or with simple software, so complex writing, coding, and analysis took longer.
When early AI models arrived, usage clustered around a few high-value tasks—think coding help or drafting complex memos.
Policymakers and researchers wanted to know: who is using AI, for what, and is it making work faster or better? Why it matters: Without tracking real use, we’d be guessing about AI’s impact on jobs, education, and fairness.

🍞 Anchor: Picture a teacher asking, “Are my students using the calculator for tough problems or just for times tables?” This report is that check-up—for the whole economy using Claude.

—

🍞 Hook: You know how when a new game comes out, first a few super fans play it nonstop, and then slowly the whole school gives it a try?

🥬 Adoption Curve: What it is: it’s a pattern showing how new tech spreads—from early experts to everyone else. How it works:

Early adopters pick a few high-value uses (like coding).
As more people join, uses diversify (sports facts, product comparisons, home fixes).
Average task value can dip as simpler, personal questions rise. Why it matters: Without understanding adoption curves, we might misread a dip in average value as “AI is less useful” when it could just be broader, more casual use.

🍞 Anchor: First, the chess club adopts the new game for tournaments; months later, everyone plays at recess, including quick, casual rounds.

—

🍞 Hook: Think of a giant scoreboard that shows not just who scored, but how they scored, from which spots, and against which teams.

🥬 Economic Index Framework: What it is: a measurement system that turns many messy, real-world AI uses into clear, comparable numbers about tasks, places, and trends. How it works:

Collect a big, privacy-safe sample of Claude conversations.
Sort them into task types using a job-task map (like O*NET).
Track primitives such as education level needed, time, autonomy, and success.
Compare across time, models, places, and platforms (chat vs. API). Why it matters: Without a shared scoreboard, different groups would argue about AI’s effects using guesses instead of consistent measurements.

🍞 Anchor: It’s like grading many classes with the same fair rubric, so you can see real changes—not just noisy differences.

—

🍞 Hook: Some puzzles are quick jigsaws; others are 1,000-piece monsters.

🥬 Task Complexity: What it is: how difficult a task is, based on skills, time, and knowledge needed. How it works:

Estimate how long a human would take.
Note the education/skills expected.
See how much autonomy the AI is given.
Use these clues to gauge difficulty and value. Why it matters: Without tracking complexity, we can’t tell if stronger models are being saved for tougher jobs.

🍞 Anchor: Solving a 5-piece puzzle vs. a 1,000-piece puzzle calls for different tools—and maybe a teammate.

—

🍞 Hook: When you fix a bike, you pick a wrench for bolts and a pump for tires—you don’t use one tool for everything.

🥬 Model Selection: What it is: choosing between models like Haiku, Sonnet, and Opus based on cost, speed, and power. How it works:

Look at the task’s difficulty and importance.
If it’s complex and valuable, pick a stronger model like Opus.
If it’s simple, pick a faster, cheaper one. Why it matters: Without smart model choice, you either waste resources on easy tasks or underpower hard ones and fail.

🍞 Anchor: People used Opus more for higher-paid tasks and less for tutoring; API users adjusted models even more sharply.

—

🍞 Hook: Think about two ways to work with a helpful robot: tell it exactly what to do and walk away, or sit together and refine the work step by step.

🥬 Automation vs. Augmentation: What it is: automation is the AI doing the task with little human help; augmentation is humans and AI teaming up. How it works:

Automation: directive, hands-off instructions.
Augmentation: back-and-forth learning, validation, and iteration.
Track which pattern shows up in real usage. Why it matters: Without this, we can’t see whether AI is replacing parts of work or boosting people’s abilities.

🍞 Anchor: The report finds slightly more augmentation in Claude.ai chats, while APIs show clearer automation for things like sales outreach and market ops.

—

🍞 Hook: Practice makes progress—like getting better at piano the more you play.

🥬 Learning Curves: What it is: how users improve at getting value from Claude over time. How it works:

Group users by how long they’ve used Claude (tenure).
Compare their task choices, collaboration patterns, and success.
Control for differences like task type, model, language, and country. Why it matters: Without measuring learning curves, we miss that experience can unlock more value from the same AI.

🍞 Anchor: Users with 6+ months of practice have more work-like tasks and higher success—even after careful statistical controls.

—

🍞 Hook: If the library adds high-tech tools, kids who already read well might benefit first, widening gaps until others catch up.

🥬 Skill-Biased Technological Change: What it is: when new tech helps skilled workers more at first, it can widen pay and opportunity gaps. How it works:

Early adopters bring complex, high-value tasks.
They learn fast and use AI more effectively.
Their success can compound, leaving others behind unless access and training spread. Why it matters: Without watching this, AI could deepen inequalities rather than lift everyone.

🍞 Anchor: The report shows higher-tenure, often technical users get more success from Claude now; that advantage may grow unless we support broader learning.

02Core Idea

🍞 Hook: You know how a magnifying glass doesn’t create sunlight—it focuses it? The clearer you aim it, the brighter the spot.

🥬 The Aha Moment (one sentence): The value people get from Claude depends not just on the model’s power, but on how wisely they match models to tasks and how much they learn by using it.

Multiple analogies (3 ways):

Toolbox: Picking Opus vs. Sonnet is like grabbing a power drill vs. a screwdriver; the right choice makes the job smooth, the wrong one strips the screw.
Cooking: A great oven helps, but a practiced chef who knows when to broil, bake, or simmer gets the best meal.
Sports: Even with the best shoes, a runner improves most by training; smart pacing (task-model matching) plus practice (learning curves) wins races.

Before vs. After:

Before: People assumed impact came mostly from the AI’s raw capability and that adoption was similar across uses.
After: We see real-world patterns where users actively route harder, higher-value tasks to stronger models, coding migrates to APIs, and experience strongly correlates with success and collaboration.
Change: Impact is co-created by the user and the model; skill in using AI is becoming its own superpower.

Why it works (intuition, no equations):

Task difficulty is uneven. Stronger models pay off on tougher tasks but are overkill for easy ones. Users who spot this tradeoff use resources better.
Feedback loops matter. With augmentation, humans and AI refine answers, catching mistakes and improving quality.
Practice builds playbooks. Over time, users learn prompts, workflows, and model-switching habits that raise success.
Data confirms it: higher-tenure users choose more complex, work-like tasks and, even controlling for many factors, succeed more often.

Building blocks:

Privacy-preserving measurement: Aggregate signals without exposing personal content.
Task mapping: Use O*NET-style categories to compare apples to apples across jobs.
Economic primitives: Track education level, time, autonomy, and success to gauge difficulty and value.
Collaboration modes: Distinguish automation (directive) from augmentation (learning, validation, iteration).
Model classes: Haiku/Sonnet/Opus with tradeoffs in speed, cost, and performance.
Tenure and success: Compare newer vs. seasoned users with controls for tasks, models, and geography.

🍞 Anchor: Just like a well-trained team picks the right players for each play and improves with practice, users who choose the right model for the right job and keep practicing get the most from Claude.

03Methodology

At a high level: Input (anonymized conversation samples) → Privacy-preserving task and pattern labeling → Economic and behavioral metrics (task value, collaboration mode, model choice, tenure, success) → Comparisons across time, platforms, places, and users → Insights about learning curves and model-task matching.

Step-by-step details:

Sampling conversations safely

What happens: The team samples about one million conversations from Claude.ai and the first-party API over a specific week in February 2026, using systems designed not to reveal individual content.
Why it exists: We need broad coverage, but privacy comes first; otherwise, we can’t responsibly study the economy-wide effects.
Example: Like counting how many students use the library during a week without looking at their personal notes.

Mapping to tasks (O*NET framework)

What happens: Each conversation is placed into task categories tied to real-world occupations (e.g., Software Developers, Tutors) to estimate the kind of work being done.
Why it exists: Without a shared map, comparing tasks across time or platforms becomes apples vs. oranges.
Example: If a chat asks for Python debugging, it maps to Computer and Mathematical tasks; if it’s lesson planning, it maps to Education.

Measuring economic primitives

What happens: For each labeled task, the system estimates features such as human education years needed, time a human would take, autonomy given to AI, and success.
Why it exists: These provide standardized signals about complexity and value.
Example: If a task likely needs 4-year college knowledge and would take 30 minutes for a human, it scores higher on complexity than a quick sports-fact lookup.

Collaboration mode detection (automation vs. augmentation)

What happens: Conversations are sorted into interaction types (directive, validation, learning, iteration, feedback loop) and then grouped as automation or augmentation.
Why it exists: This shows whether AI is substituting for steps or teaming up with people.
Example: A one-shot “Write this email, send it” looks directive (automation); a back-and-forth draft review looks iterative (augmentation).

Model class identification

What happens: Each conversation notes which model was used (Haiku, Sonnet, Opus) and in what context (chat vs. API), and tracks when users switch models for different tasks.
Why it exists: This reveals task-model matching: are stronger models used for tougher, higher-value tasks?
Example: A user might pick Sonnet for simple Q&A, then switch to Opus for detailed code refactoring.

Estimating task value

What happens: The report links each task to a US hourly wage estimate for related occupations to approximate economic value.
Why it exists: Wages act as a common yardstick for how valuable a task is in the labor market.
Example: If software developer tasks average $\$ 60 $per hour and tutoring averages$ $25 $per hour, choosing Opus more often for the former would show smart model matching (Example math: a$ $35 $gap implies stronger-model use will likely be higher on the$ $60$ task side).

Tracking diversification and migration

What happens: The team tracks how concentrated usage is (e.g., what share the top 10 tasks hold) and how coding shifts from chat to API.
Why it exists: This indicates whether AI is spreading into many use cases and whether some areas are moving toward more automated pipelines.
Example: Top-10 share drops from $24\%$ to $19\%$ means usage is spreading out (Example: out of 1,000 chats, from 240 to 190 in the top ten categories).

Tenure and learning curves

What happens: Users are grouped by how long they’ve been using Claude (e.g., 6+ months vs. newer), and their patterns and success are compared.
Why it exists: To see if practice correlates with better outcomes and different habits.
Example: High-tenure users show fewer personal-use chats and more work-like, higher-education tasks, plus higher success rates.

Statistical controls for fairness

What happens: The report runs regressions that compare high- vs. low-tenure users within the same fine-grained task clusters and control for model, use case, and country.
Why it exists: Without controls, we could mistake task mix or language effects for true learning.
Example: The raw success gap shrinks from about $+5$ percentage points to about $+3$ with task fixed effects, and is about $+4$ with full controls (Example: if baseline is $70\%$ , full controls suggest about $74\%$ for high tenure—74 out of 100).

Geography and inequality measures

What happens: Usage is adjusted by population to create per-capita measures across US states and countries; concentration metrics like the Gini and top-share are tracked over time.
Why it exists: This shows who’s getting access and benefit—and whether gaps are closing or widening.
Example: Top 5 US states’ share dropping from $30\%$ to $24\%$ suggests catch-up (Example: out of 100 total per-capita usage units, those states’ share falls from 30 to 24), while top 20 countries rising from $45\%$ to $48\%$ shows growing concentration globally (Example: out of 100 usage units, from 45 to 48).

The secret sauce:

Combining privacy-preserving measurement, a standardized job-task map, and careful statistical controls lets the team separate real learning effects and smart model selection from simple compositional changes. In short, it’s a fair, apples-to-apples way to watch how people and AI grow more effective together.

04Experiments & Results

The test: The report measures how Claude is used (task mix, model choice, collaboration style), how valuable those tasks are (wage proxies), and how success changes with user experience and over time.

The competition (baselines): Results are compared against earlier Economic Index snapshots (e.g., November 2025) to see what’s changing and how fast.

The scoreboard with context:

Diversification: The top 10 tasks’ share in Claude.ai fell from $24\%$ to $19\%$ (Example: for 1,000 chats, that’s 240 dropping to 190), meaning usage spread out across more kinds of tasks.
Task value in chat: The average estimated wage value dipped from $\$ 49.30 $to$ $47.90 $(Example: 10 hours of such tasks would total$ $493 $vs.$ $479$), reflecting more simple, personal questions and migration of coding to API.
Coding migration: Coding remains huge overall but has moved into API workflows that split actions into many specialized calls, making API traffic look broad even as coding grows there.
Collaboration patterns: In Claude.ai, augmentation (validation and learning) ticked up slightly; in the API, directive automation patterns remained strong in areas like customer service and grew in sales outreach and market ops.
Geography: Within the US, lower-usage states kept catching up (top 5 states’ share from $30\%$ to $24\%$ ; Example: out of 100 usage units per person, drop from 30 to 24), but across countries usage concentration grew (top 20 countries from $45\%$ to $48\%$ ; Example: from 45 to 48 out of 100), signaling uneven global access.
Model choice and task value: Users select Opus more for higher-wage tasks. In Claude.ai, each $\$ 10 $increase in task wage links to about$ +1.5 $percentage points higher Opus use; in the API, about$ +2.8 $points (Example: a$ $30 $wage gap implies roughly$ +4.5 $points in chat and about$ +8.4$ points in API).
Learning curves: High-tenure users (6+ months) have about a $+10\%$ higher raw success rate, and—even after controls for task, model, use case, and country—show about $+4$ percentage points higher success (Example: from $70\%$ to $74\%$ success, i.e., 70 vs. 74 out of 100 attempts).

Surprising findings:

Despite more complex coding in the ecosystem, chat prompts in Claude.ai slightly simplified on average (e.g., lower education years, shorter human-only times), likely due to more casual users joining.
Experienced users collaborate more (augment) instead of automating more, counter to the guess that advanced users would go fully hands-off.
API workflows showed strong growth in specific automation niches (sales outreach, market operations), hinting at early, task-focused transformation rather than blanket automation.

Context matters: These results align with an adoption-curve story—experts arrive early with high-value tasks; broader usage follows, diversifying use cases and slightly lowering average task value in chat while API automations mature in the background.

05Discussion & Limitations

Limitations:

Tenure vs. talent: Early adopters may simply be more technical; survivorship bias means we don’t see those who tried Claude long ago and left. Even with controls, fully separating learning-by-doing from who sticks around is hard.
Time window: The sample covers a specific week in February 2026, which may reflect seasonal patterns (e.g., academic breaks) rather than permanent shifts.
Measurement noise: Mapping chats to O*NET tasks and wage proxies can misestimate value for some tasks or regions.
Platform differences: Chat vs. API show different slice sizes and labeling quirks (e.g., agentic coding split), which can blur comparisons.

Required resources to replicate or extend:

Privacy-preserving data pipelines at scale, standardized task taxonomies (like O*NET), robust labeling and auditing, and statistical tooling for fixed-effect comparisons.

When not to use these measures alone:

Evaluating individuals’ productivity, pay, or hiring decisions; these are aggregate, anonymized signals, not performance reviews.
Predicting long-run displacement in a specific job from a short-term API spike—task redesign and human workflows matter.

Open questions:

Causality of learning: How much of the success gap is true learning vs. selection? A cohort study over time could help.
Diffusion: What training or interface changes help newer users climb the learning curve faster?
Equity: Which policies, pricing, and access models reduce global concentration and support lower-income regions and non-English users?
Dynamics: Will augmentation stay dominant in knowledge work, or will more tasks move to safe, reliable automation as tools mature?
Value capture: How do firms reorganize around API automations (e.g., sales ops, market monitoring) and what new jobs emerge?

06Conclusion & Future Work

Three-sentence summary: This report shows that Claude’s impact depends on what tasks people bring to it, which model they choose, and how much practice they have using it. Usage in chat is diversifying and getting slightly more casual, while coding and certain business workflows are migrating toward API automations. Experienced users collaborate more and achieve higher success, suggesting learning-by-doing and smart task-model matching drive real value.

Main achievement: Turning messy, real-world AI usage into clear, privacy-preserving measures that reveal learning curves and model-task matching as key drivers of outcomes.

Future directions: Track cohorts over longer periods to isolate learning from selection; design onboarding, prompts, and tooling that shorten the learning curve; support equitable access across countries; and monitor where augmentation gives way to safe, beneficial automation.

Why remember this: AI’s power isn’t just in the model—it’s in the match between the right model, the right task, and a user who knows how to work with it. That trio explains who benefits first, where work is changing now, and how we can help more people share in the gains.

Practical Applications

•Create onboarding checklists that teach when to use Haiku, Sonnet, or Opus based on task difficulty and stakes.
•Build workflow templates that start with augmentation (draft → review → refine) before attempting automation.
•Add model-switch prompts (e.g., “This looks complex—switch to Opus?”) to help newer users choose wisely.
•Use API pipelines for repeatable tasks like sales outreach or market monitoring, with human review gates.
•Track team success rates and share winning prompts and patterns to accelerate learning curves.
•Segment tasks by estimated value and route high-value ones to stronger models with clearer instructions.
•Offer short trainings on prompt clarity, validation steps, and iterative feedback loops.
•Monitor geographic adoption metrics and provide language/localization support where usage lags.
•Adopt privacy-preserving analytics to measure impact without exposing sensitive content.
•Pilot augmentation first in complex roles (analysis, coding reviews) while automating narrowly defined, low-variance tasks.

Version: 1