Products

Anthropic

Products

Beginner

Anthropic2/12/2026

Key Summary

•This study asked whether using AI to help write code makes you learn less while you work.
•In a fair experiment with 52 Python users learning a new library (Trio), the AI group scored 17% lower on a quiz than the no-AI group.
•The biggest drop was in debugging, which is the skill of finding and fixing mistakes.
•Using AI made people finish a bit faster (about 2 minutes), but this time savings wasn’t statistically solid.
•How you use AI matters: treating AI as a teacher (asking why, getting explanations) led to much better learning than treating AI as a code vending machine.
•Heavy delegation to AI produced the fastest completions but the lowest understanding.
•Conceptual questions plus explanations gave stronger mastery and still decent speed.
•The results fit a bigger picture: AI boosts speed on familiar tasks but can slow skill growth on new ones unless designed and used for learning.
•Managers and tool designers should build in modes and policies that encourage explanation, reflection, and debugging practice.
•This was a small, short-term study, so we need more research on long-term effects and other jobs beyond coding.

Why This Research Matters

Software now touches money, health, and safety, so people need strong skills to check AI-written code before it ships. This study shows that using AI the wrong way can quietly weaken debugging and understanding—the very muscles needed for safe oversight. It also offers a fix: treat AI like a coach by asking for explanations and concepts, not just code. Managers can craft policies and code reviews that protect learning while still gaining speed. Tool makers can add learning modes, reflection prompts, and built-in quizzes to turn help into teaching. Educators can teach students AI habits that build mastery, not just shortcuts. With better habits and designs, we can be faster today and safer tomorrow.

Detailed Explanation

Tap terms for definitions

01Background & Problem Definition

🍞 Hook: You know how using a calculator makes arithmetic faster, but if you only press buttons and never think, you might forget how to do math on your own?

🥬 The Concept (Programming Knowledge): Programming knowledge is knowing how to tell a computer what to do step by step so it solves problems. How it works:

You learn rules (syntax) and ideas (how to break a big problem into smaller parts).
You practice reading, writing, and fixing code.
Your brain builds patterns so you can spot mistakes and design good solutions. Why it matters: Without core programming knowledge, you can’t judge whether code is correct, safe, or the best way to solve a problem. 🍞 Anchor: Like writing a recipe that a robot chef follows to bake cookies—if the steps are unclear, you get salty brownies!

🍞 Hook: Imagine learning a new language like Spanish: you recognize words (Python keywords), learn grammar (syntax), and then write your own sentences (programs).

🥬 The Concept (Python): Python is a popular programming language that’s simple to read and powerful for real-world tasks. How it works:

You write Python code using clear words and structures.
Python runs your code and shows results or errors.
You tweak the code to fix mistakes or add features. Why it matters: Python’s simplicity helps people learn fast and build real apps quickly. 🍞 Anchor: It’s like using LEGO bricks with clear shapes that click together easily to build all kinds of models.

🍞 Hook: Think of running a science fair test: to know if a fertilizer helps plants grow, you split plants into two groups and change just one thing.

🥬 The Concept (Experimental Design): Experimental design is the plan for testing a question in a fair way so results can be trusted. How it works:

Decide the question (Does AI help or hurt learning?).
Hold most things the same between groups.
Change only one thing (AI vs no AI) and measure outcomes. Why it matters: Without a fair plan, you can’t tell what really caused the results. 🍞 Anchor: It’s like testing which paper towel is strongest by soaking equal pieces in the same amount of water and comparing tears.

🍞 Hook: You know how a helpful friend can suggest answers during homework? That’s nice—but if you copy them, you might not learn how to solve it yourself.

🥬 The Concept (AI Assistance): AI assistance is using a smart computer helper to suggest, write, or explain code. How it works:

You ask a question or show your code.
The AI suggests code or explanations.
You choose to use, edit, or ignore the help. Why it matters: AI can speed you up, but it can also take over thinking you need to grow your skills. 🍞 Anchor: Like a GPS that tells you exactly where to turn—great for speed, but if it stops working, can you still find your way?

🍞 Hook: Picture writing down phone numbers instead of memorizing them; over time, you remember fewer by heart.

🥬 The Concept (Cognitive Offloading): Cognitive offloading is when you let tools do thinking that your brain could practice. How it works:

A task feels hard, so you lean on a tool.
The tool gives the result fast.
Your brain practices less, so you retain less. Why it matters: Without practice, deep understanding and quick problem‑solving fade. 🍞 Anchor: Using a calculator for every problem—even easy ones—can make mental math rusty over time.

The world before: Programmers wrote, read, and debugged code mostly by themselves or with human teammates. They learned deeply because they wrestled with errors, read documentation, and built mental models. Then AI coding tools arrived. These tools can autocomplete code, explain errors, and even write whole functions. Productivity jumped, especially on tasks people already knew how to do. The problem: If AI takes over too much thinking, do people still build the core skills—like understanding and debugging—that keep software safe and reliable? Failed attempts: Many early studies measured how fast people finished tasks with AI but didn’t check what people actually learned, especially right after learning something new. The gap: We needed a careful, classroom‑style test that asked, “When learning a new tool, does AI help or hurt mastery?” Real stakes: In the real world, humans still must catch AI’s mistakes, guide designs, and ensure safety. If newcomers learn faster but shallower, bugs can slip through—affecting apps we use for money, health, and transportation. This paper looks squarely at that trade‑off: speed now versus mastery later, and how usage style can make the difference.

02Core Idea

🍞 Hook: Imagine two kids learning to ride a bike. One uses training wheels forever and zips around right away. The other wobbles, falls a bit, but learns balance. Who will ride better when the training wheels come off?

🥬 The Concept (Debugging Skills): Debugging skills are the detective abilities to find and fix mistakes in code. How it works:

Notice something’s wrong (a crash, wrong number, or hang).
Ask why by checking error messages and logic.
Change the code, test again, and repeat. Why it matters: Without debugging skills, you can’t trust or repair code—especially AI‑generated code that might be confidently wrong. 🍞 Anchor: Like fixing a leaky faucet: you look for the drip, find the bad washer, replace it, and test the sink.

🍞 Hook: You know how you read a recipe to see what you’ll cook before you start mixing? Code works the same way.

🥬 The Concept (Code Comprehension): Code comprehension is understanding what code does and why, just by reading it. How it works:

Read the structure (functions, loops, variables).
Trace how data moves step by step.
Predict the output and edge cases. Why it matters: Without comprehension, you can’t safely reuse or check code—yours or an AI’s. 🍞 Anchor: Like reading map directions before you drive so you know where each turn leads.

🍞 Hook: Think about not just how to bake a cake, but why baking powder makes it rise. That deeper idea helps you improvise new recipes.

🥬 The Concept (Conceptual Understanding): Conceptual understanding is grasping the big ideas underneath tools and code patterns. How it works:

Learn principles (e.g., how async tasks run without blocking each other).
See how different examples share the same pattern.
Apply ideas to new problems—not just copy‑paste steps. Why it matters: Without concepts, you can’t design robust systems or spot when code looks right but is wrong for the situation. 🍞 Anchor: Knowing why seatbelts save lives helps you wear one in every car, not just the brand you learned with.

The “aha!” moment (one sentence): Using AI can make coding tasks feel easier right now but can quietly steal the practice your brain needs to build lasting debugging, comprehension, and conceptual skills—unless you use AI as a tutor, not just a typer.

Three analogies:

GPS vs. Map Reading: GPS gets you there faster; map reading teaches you geography. If the GPS fails, the map reader isn’t lost.
Training Wheels vs. Balance: Training wheels speed up early rides, but balance comes from wobbling and correcting—essential for real biking.
Calculator vs. Mental Math: Calculators speed up answers; mental math builds number sense that helps estimate and catch errors.

Before vs. After:

Before: AI tools were praised mainly for speed. Few studies tested immediate learning when people faced brand‑new tools.
After: This study shows a real trade‑off—slight speed‑ups with AI can come with a big drop in immediate mastery, especially debugging—yet the trade‑off shrinks when people ask AI for explanations and concepts.

Why it works (intuition): Your brain strengthens what it practices. If AI writes or fixes code for you, your brain does less pattern‑spotting and error‑reasoning. That means fewer mental “hooks” to hang new knowledge on. But if you make AI explain, compare options, and answer your “why” questions, you keep the cognitive workout while still benefiting from speed.

Building blocks of the idea:

Practice builds mastery: Time spent thinking through errors and designs creates durable skills.
Offloading can undercut practice: Automatic fixes remove the struggle that teaches you where and why code fails.
Explanation re‑adds thinking: Using AI to explain, quiz you, or contrast designs forces your brain to build models.
Oversight is non‑negotiable: In high‑stakes code, humans must detect AI mistakes; that demands strong debugging and concepts.
Choice matters: The same AI can be a shortcut or a teacher—user behavior is the lever that changes outcomes.

03Methodology

At a high level: Input (52 Python users new to Trio) → Randomly assign AI vs No-AI → Learn and build two features with Trio → Take a mastery quiz (debugging, reading/comprehension, concepts) → Compare time and scores.

🍞 Hook: Imagine fairly testing whether a new study trick helps—so you flip a coin to decide who uses it and then give everyone the same test.

🥬 The Concept (Randomized Controlled Trial): A randomized controlled trial is a careful study where people are randomly put into groups to test an effect fairly. How it works:

Randomly assign participants to AI or No-AI groups.
Keep tasks, time limits, and materials the same for both.
Measure outcomes (time, quiz scores) and compare. Why it matters: Without random assignment, differences might be due to who was in each group, not the AI. 🍞 Anchor: Like splitting a class by coin toss to test whether flashcards beat rereading before a quiz.

🍞 Hook: Think of a special toolbox for a tricky job—if you’ve never used it, you must learn its tools and rules.

🥬 The Concept (Python Library: Trio): Trio is a Python library that helps run many tasks at once without blocking each other (asynchronous programming). How it works:

You start and coordinate tasks (like timers or network calls) that can pause and resume.
The library manages when each task runs so programs stay responsive.
You write code that follows Trio’s patterns (e.g., nurseries) to handle concurrency safely. Why it matters: Without understanding Trio’s patterns, code can look fine but behave incorrectly under real‑world timing. 🍞 Anchor: Like running a kitchen where multiple dishes cook at once—you need a system so nothing burns while something else rests.

🍞 Hook: Picture a playground where kids take turns on swings; the next kid moves when one hops off. That’s like sharing time so everyone gets a turn.

🥬 The Concept (Asynchronous Programming): Asynchronous programming lets different tasks take turns using the computer so the app stays responsive. How it works:

Start tasks that can pause (waiting for a file or network).
When one pauses, another runs.
A scheduler coordinates turns. Why it matters: Without async, apps can freeze while waiting; with it, you need the right patterns to avoid tangled bugs. 🍞 Anchor: It’s like a single cashier serving many shoppers quickly by switching to the next person while one customer looks for their wallet.

Step-by-step recipe:

Recruit participants: 52 mostly junior engineers who used Python weekly for at least a year, were familiar with AI coding help, and were new to Trio.

Why: Ensures basic Python skill but a fresh learning target, so we test learning, not old habits.
Example: A participant can write loops and functions but has never used Trio’s nurseries.

Warm-up: Everyone gets a small practice task and a short explanation of Trio concepts needed later.

Why: Levels the playing field and reduces start-up confusion.
Example: A quick demo that shows starting and cancelling a Trio task.

Main tasks (two features): Each person receives a problem description, starter code, and the same brief Trio guide.

Why: Mimics real learning-by-doing (like a tutorial) and keeps materials consistent.
Example: Implement timers or concurrent tasks using Trio patterns.

AI condition vs No-AI condition:

AI group uses an assistant in a sidebar that can read their code and, if asked, produce correct code or explanations.
No-AI group codes by hand using the same platform and materials.
Why: Isolates the effect of AI assistance on speed and learning.
Example: An AI user asks, “Why does my child task exit early?” or “Write a Trio nursery example,” while a No-AI user searches their own code and docs.

Timeboxing and incentives: Participants know a quiz is coming and are encouraged to work quickly.

Why: Mirrors workplace pressure to move fast while still needing understanding.
Example: A participant weighs whether to ask AI for full code (fast) or request an explanation (slower but clearer).

Mastery quiz: Right after tasks, everyone takes a quiz that emphasizes debugging, reading/comprehension, and conceptual questions (plus some code writing decisions).

Why: Tests near-term learning of oversight skills needed when code may be AI-generated.
Example: Identify why a Trio task deadlocks, explain what a code snippet does, or choose the correct Trio pattern for a scenario.

Measurements collected:

Completion time for coding tasks.
Quiz score overall and by category.
Qualitative annotations: how people interacted with AI (number and type of queries, time spent, kinds of errors).

Secret sauce of the method:

The study targets new learning (a fresh library) instead of familiar tasks, so we see effects on skill formation, not just speed.
It separates ways of using AI (delegation vs explanation vs conceptual questioning), revealing behavior patterns linked to learning.
It focuses on oversight-relevant skills (debugging and comprehension), which are exactly what humans must bring to AI-written code.

What breaks without each step:

Without random assignment: We might just be comparing faster learners to slower ones.
Without a new library: We’d measure productivity on known skills, not learning.
Without a post-task quiz: We’d miss what people actually understood.
Without interaction analysis: We’d miss that “how” you use AI is as important as “whether” you use AI.

04Experiments & Results

The test and why it matters:

The experiment measured two things: speed (how fast people completed tasks) and mastery (how well they understood right after). Mastery was judged by a quiz focusing on debugging, reading/comprehension, and concepts—the very skills humans need to oversee AI code.

The competition (baselines):

Two groups: AI assistance vs No AI. Everyone used Python weekly; everyone was new to Trio. Both had the same problems, time pressure, and materials. Only the presence of AI differed.

The scoreboard with context:

Time: The AI group finished about two minutes faster on average. But statistically, that small lead wasn’t strong enough to count as a clear win. Think: someone finishing a 1-mile run 10 seconds faster when the timing system can’t be sure it’s a real difference.
Mastery: The AI group averaged 50% on the quiz; the No-AI group averaged 67%. That’s a 17-point gap—nearly two letter grades. In school terms, it’s like a C vs a B+/A-, which is a meaningful difference, not just a sliver.
Biggest gap: Debugging questions showed the largest difference. Translation: the skill most needed to catch AI’s mistakes suffered the most when people leaned on AI.

Surprising findings:

AI users sometimes spent a lot of time chatting with the assistant—up to 11 minutes (about 30% of the total time)—composing as many as 15 queries. That helps explain why overall speed gains were small in this new-learning setting. AI shines more on repetitive or familiar work.
The way people used AI strongly correlated with outcomes: • Low-scoring patterns (average < 40%):
- AI delegation: Asked AI to write almost everything; finished fastest; few errors; learned the least.
- Progressive AI reliance: Started independent, then slid into full delegation; weak mastery on later concepts.
- Iterative AI debugging: Asked AI to fix problems step by step; slower and still low learning because they didn’t build their own explanations. • High-scoring patterns (average ≥ 65%):
- Generation-then-comprehension: Got code, then asked follow-ups to understand; not fast, but strong mastery.
- Hybrid code-explanation: Requested code plus explanations in the same prompt; slower reading, better understanding.
- Conceptual inquiry: Asked only concept questions, coded themselves; made more mistakes, but fixed them independently; surprisingly quick among high scorers.

Make the numbers meaningful:

A 17-point mastery drop is like a sports team losing by two goals, not a coin flip. The effect size (Cohen’s d ≈ 0.74) lands in the medium-to-large range—noticeable in practice. The non-significant speed boost says, “Don’t count on AI to save time when you’re learning brand-new tools—unless you use it to teach you, not to replace your thinking.”

What this means in plain terms:

AI can be both a booster and a crutch. If you use it to do the thinking for you, you likely learn less, especially how to find and fix bugs. If you use it to explain, quiz, and clarify, you keep your brain in the driver’s seat and learn more—sometimes nearly as fast.

05Discussion & Limitations

Limitations (honest caveats):

Small sample (52 participants) and mostly junior engineers. We need larger, more diverse groups to generalize.
Short-term measure: The quiz happened right after coding. We don’t know how differences look after days or weeks of practice.
Learning curve with AI: As people gain AI fluency, they may adopt better learning behaviors; this study captured a snapshot, not the whole journey.
Narrow domain: Python + Trio (async). Other libraries, languages, or tasks may differ.
Tool differences: The sidebar assistant here isn’t the same as fully agentic coding tools; impacts could be stronger with agents that do even more for you.

Required resources to use this approach:

A coding platform that can host tasks and, for the AI group, provide integrated assistance with code context.
Well-designed problems that require new concepts (like async patterns) but are solvable within a session.
Clear rubrics for debugging, comprehension, and conceptual questions.
Logging or recordings to analyze interaction styles.

When not to use AI (in this way):

When learning a brand-new concept and you’re tempted to delegate whole solutions. That blocks the struggle that builds skill.
When safety or correctness is paramount and you lack strong debugging skills. Over-reliance can hide subtle bugs.
When the goal is long-term mastery rather than quick delivery.

Open questions:

Long-term effects: Do explanation-focused AI habits eventually erase the mastery gap? Does the delegation group catch up later, or do gaps widen?
Beyond coding: Do we see the same learning trade-offs in writing, data analysis, design, or medicine?
Human vs AI help: Is a patient mentor different from an AI in how much thinking the learner keeps?
Tool design: Which features (auto-questions, checkpoints, reflection prompts) best turn AI into a tutor by default?
Team dynamics: How should code review and pairing adapt when much of the code is AI-written, to preserve debugging skill growth?

06Conclusion & Future Work

Three-sentence summary: In a randomized trial of 52 Python users learning the Trio library, AI assistance produced a small, unreliable speed gain but a clear 17-point drop in immediate mastery—especially in debugging. How people used AI made the difference: heavy delegation hurt learning, while asking for explanations and concepts preserved it. The study suggests AI can both accelerate familiar work and slow new skill growth unless we design and use it as a teacher, not just a typist.

Main achievement: The paper separates speed from learning on brand-new material and shows that interaction patterns with AI strongly relate to mastery, spotlighting debugging as the most at-risk skill.

Future directions: Run longer studies to track retention over weeks or months; test diverse languages, tools, and fully agentic systems; compare AI help to human mentoring; and build product features that default to explanation, conceptual prompts, and reflection. Explore workplace policies that schedule deliberate no-delegation practice and embed oversight drills to keep debugging sharp.

Why remember this: In an AI-augmented world, speed is visible but mastery is vital—and easier to lose. Treat AI like a coach that asks you why, not a vending machine that hands you answers. The habits you choose—explain, compare, reflect—decide whether AI makes you faster today and stronger tomorrow.

Practical Applications

•Enable AI “explain my code” by default and require a short summary before inserting generated code.
•Add periodic “no-delegation” sprints where developers must solve and debug without AI to keep skills sharp.
•Design AI prompts that pair code with step-by-step reasoning and concept checks.
•Include debugging drills in code reviews: reviewers ask for the likely failure modes of any AI-written snippet.
•Track interaction styles (delegation vs explanation) in dashboards to encourage learning-focused usage.
•Offer AI learning modes (study cards, self-quizzes, contrastive examples) during onboarding to new libraries.
•Gate merges on comprehension: require a brief written explanation of how a change works and why it’s correct.
•Run brown-bag sessions showing how to turn AI into a tutor (follow-up why-questions, conceptual prompts).
•Create internal policies that match task type to AI usage: heavy AI on routine tasks; explanation-focused AI on new concepts.
•Embed reflection breaks in IDEs (e.g., “Predict output,” “Where could this fail?”) before accepting AI suggestions.

Version: 1